The Oxford Nanopore Technologies (ONT) NCM 2022 masterclass I watched tonight was “How to basecall your data and detect methylation” with Jessica Anderson, a field application scientist with ONT. The recording is available. Anderson described how Nanopore sequencing works: as DNA/RNA strands pass through the pore, changes in current are detected. Anderson explained that basecalling has improved over the years following improvements in the chemistry and software. Raw read accuracy is denoted by percentage: 99% means 99 bases out of 100 in a read were called accurately. Anderson explained that the new chemistry improves a high capture rate and the possibility of duplex calling to improve accuracy.
Basecalling was defined as the computational process of converting raw electrical signals to nucleotide sequence. The input is FAST5 or POD5 files with the output being FASTQ files. Additional features can be incorporated into basecalling such as barcoding/demultiplexing and alignment. Anderson described FAST5 files as “a type of HDF5 designed to contain all information needed for analyzing Nanopore sequencing data and tracking it back to its source.” POD5 was developed to reduce raw data file size: pore open data. This improves read and write performance and leads to less computational resource needs. Whether FAST5 or POD5 files are used, the output will still be FASTQ files. Anderson stressed that it is important to think about what kind of read length you may want to keep and coverage. Longer reads are preferred for methylation analysis according to Anderson.
The analysis platforms available by ONT have been made accessible to those of us used to graphical user interfaces. The MinKNOW software controls the sequencing device. MinKNOW has basecalling, QC, analysis, and reporting functions. Anderson explained that basecalling in MinKNOW can transform the “squiggle” data into sequence with AI. When choosing basecalling accuracy models, considering the computational resources available is important. One can also perform basecalling after a run. Anderson explained that Remora is available in MinKNOW software as a lightweight modified base basecalling option. Anderson mentioned that there is a MinKNOW app! I downloaded it and will set up a host to connect to. Model training is needed but simplified with Remora.
Anderson explained that the EPI2ME agent software is available as an analysis solution with several workflows. After starting MinKNOW you can start an EPI2ME cloud-based workflow. EPI2ME Labs can be set up locally or on a sequencing device and configured for custom workflows. There are several EPI2ME Labs workflows that unique to EPI2ME Labs. Anderson ended by stressing that product updates. I would like to try modified basecalling with bacterial organisms. It seems that the methods and algorithms continue to improve!
