Understanding Long-Read Phasing Methods in Genomics

Nikhita Damaraju from the University of Washington in Seattle presented at the Nanopore Community Meeting in Boston. The session’s title was “Evaluating the quality of long-read phasing methods in clinically relevant genes.” They explained that variants are inherited in equal measure. Damaraju defined phasing as the process of assigning variants to parental copies. Understanding which parental copy carries a deletion, for example, may have implications for treatment. There are two ways of phasing: trio-based and population-based. Trio-based phasing uses information from parents, which is not always feasible. Population-based phasing uses common variants across individuals. Damaraju explained that they evaluated phasing in OMIM genes using Oxford Nanopore Technologies R9, Q27, and PacBio. The team used the Genome in a Bottle consortium data as the truth set. Using the CHM13 as the reference genome helped reduce the phasing error rate. Next, the team used 100 samples from the 1000 Genomes ONT Sequencing Consortium. Variants in this dataset were identified using Clair3 and phased with Whatshap. Some genes showed larger variability in quality scores. Haplotagging quality across 100 samples seemed balanced. A publicly available Shiny dashboard was created to visualize the data. Damaraju concluded that read-based phasing using long reads can overcome the need to use parental information and population reference panels. Using the Chm13 reference reduced phasing errors in OMIM genes compared to GRCh38 across sequencing technologies. The dashboard the team created can provide visual comparisons and a source for updates in phasing information.

How do we evaluate phasing information using long-read sequencing data? AI-generated image.