Continuing with the Human genome sequencing and analysis Nanopore Learning course, I watched the session entitled “MinKNOW: Live basecalling and output folder structure.” Marta Verdugo, a member of the Technical Services Team with Oxford Nanopore Technologies, introduced the different data types and how MinKNOW processes signals into reads. Each read corresponds to a signal from a single DNA or RNA molecule. Basecalling starts by detecting ionic strengths that are processed using models to produce sequences of reads. MinKNOW may ‘fall behind’ with basecalling depending on GPU and CPU settings and basecalling model. Basecalling uses the flip-flop model to deconvolute the area and produce basecalls. Fast basecalling is not as computationally intensive. Fast5 and now Pod5 files can be reprocessed with new basecalling models. Fastq files consist of standard text files with blocks of header, sequence data, header with plus sign, and quality. Different base disk spaces requirements depend on number of reads and Gbases of data produced. The table shared was useful as it suggested that ~20 Gbases produce over 380 Gbytes of disk space. Storage has become a consideration as we continue sequencing!
