Stereo Duplex Basecalling and Sampling Updates

Katherine Lawrence, a Machine Learning Bioinformatician with Oxford Nanopore Technologies, presented at London Calling 2023. They started with simplex base calling and the switch in sampling rates from four to five kilohertz: more measurements per second. Lawrence described how an intermediate sampling rate provided the benefit without too much noise. The POD5 format helps with the sampling rate increase. They presented read accuracy histograms from sequencing a human genome reference. The high accuracy model (HAC) and super accuracy model (SUP) showed increased accuracy. In duplex sequencing, signals from both strands are used to generate higher accuracy Q30+ read accuracies. Duplex sequencing is done by the sequential sequencing of each strand. Next, the software recognizes the complementary strand and uses both squiggle signals. Stereo duplex basecalling uses the basecaller to reverse complement the second strand. The accuracies of duplex base calling increase for both the HAC and the SUP model and are independent of read length. some applications of duplex data include telomere to telomere sequencing (T2T) and assembly. Stereo duplex has been implemented in Dorado to take the identified pairs of reads and in parallel send to a stereo basecaller. Dorado, Lawrence said, is parallelizing these processes. I am excited about trying duplex base calling with these updates. I think this will be very helpful for genome sequencing!

ground level shot of stereo on street with graffiti on wall behind
How can stereo duplex in Dorado benefit read accuracies? Photo by Pixabay on Pexels.com