ATCC Insights: Base-Calling in Genomics

David Yarmosh from ATCC presented at the Nanopore Community Meeting in Boston. The session’s title was intriguing: “How good is good enough?” Yarmosh is a bioinformatician with ATCC. They noted that the ATCC was founded in 1925 and now has nearly 5,000 genomes. Yarmosh explained that they are doing about 1,000 genome assemblies per year. They have performed hybrid Illumina and Nanopore sequencing projects. With Dorado, base-calling relies on a training dataset. This becomes an issue with the breadth of the ATCC library. Yarmosh and team took 8 viruses, 26 fungi, and 239 bacteria, basecalled with Dorado and Guppy thrice. They sequenced on Nanopore and Illumina. Dorado in this set is faster. Their findings suggest that Dorado and Guppy perform similarly in speed and contigs. This is interesting! Yarmosh emphasized that Dorado does offer methylation detection features that set it apart. The ATCC dataset and now sequencing project are expansive and can help benchmark and improve genome sequencing and assembly tools.

How did the ATCC leverage its enormous dataset to test base calling? AI-generated image.