Challenges in Storing Large-Scale Genomic Data

Tonight, I watched the question and answer session of the Oxford Nanopore Technologies (ONT) webinar on “Unlocking comprehensive genome analyses for large-scale projects.” The panelists were asked about the role of methylation in large-scale genome sequencing projects. They spoke about investigating methylation in rare diseases, which is emerging as an area of interest. Another question was about the subsequent studies deCODE is doing. They mentioned using the methylation data to learn about the effects of the environment. Panelists were asked about the storage requirements for large projects. Interestingly, storing raw files is a challenge even at the Sanger Institute. The different output files, including POD5 and FASTQ, are essential, yet storage and access limitations are not limitless. deCODE spoke about the importance of phasing reads, which is key for their analyses, including methylation. Panelists were asked about the incorporation of long-read technologies. Cancer 2.0 and the England rare disease genomics programs use ONT because of the footprint and PacBio devices. For example, Cancer 2.0 uses a variety of approaches. deCODE explained that they are using AI algorithms. Panelists mentioned that the data are rich and automated exploration is required. Panelists also spoke about the need to inform consenting participants. The last question was about data equity and access to long-read sequencing technologies. I appreciate how the panelists noted that while ONT has increased access, disparities in availability continue.

How do large-scale genomic analyses cope with data storage and access? AI-generated image.