Katherine Lawrence, a Machine Learning Bioinformatician with Oxford Nanopore, spoke at the Nanopore Community Meeting 222 about “Advances in duplex basecalling.” They defined duplex sequencing as both ends of the DNA with adaptor motor proteins and two reads from the same molecule. Duplex basecalling incorporates information from both reads’ signals and improves accuracy with the median accuracy 99.9% (Q30). In duplex sequencing, one strand, the template strand is sequenced and the other strand follows it through the same pore. Similar sequence lengths and complementary bases compose this duplex. Two measurements help reduce random errors. Lawrence also noted that the “reverse complement sequence gives orthogonal information.” Lawrence focused on a region in which there was a mismatch between the template and complement strands. The error can then be resolved with a reference, in this case. Pair decoding is the previous approach used. Lawrence mentioned that both signals are run through the decoder independently. This method offers high accuracy but is computationally intensive. The simple approach is base-space duplex. One sequence is reverse complemented and both are analyzed. the stereo duplex uses a similar approach where both signals are basecalled, one is reverse complemented, and afterwards, a stereo basecaller accepts input data from both signals and the combination. Duplex provides an increase from Q20 to Q30. Lawrence explained that stereo duplex is available in Dorado. First you run fast basecalling in Dorado and then generate a pairs file using duplex_tools. In the future there will be a dorado duplex with model and use of POD5 files. I am curious: have we been missing opportunities for duplex basecalling that could be base called with new models and Dorado? How can we incorporate this into our workflow?
