Tonight, I watched Miten Jain’s London Calling 2019 presentation on “Generating high-quality reference human genomes using PromethION nanopore sequencing.” Jain is from the University of California, Santa Cruz. They have developed a framework to produce reference-quality human genomes in a week. Jain spoke about the need to sequence many human genomes that can serve as references. For this, selecting samples that maximize diversity is important. Jain selected ten individuals. The bottlenecks are cost and speed, and they are not very scalable. Jain’s solution is using Nanopore 100kb+ sequencing and scalable assembly and polishing. They performed long-read sequencing with the PromethION for eleven genomes in nine days with >60x coverage. The aim was for long reads. They used the Circulomics Short Read Eliminator (SRE) to obtain about 7X enrichment of longer fragments above 10 kb. The lowest N50 was about 30 kb, and the highest was above 50. Coverage was within the range they wanted: ~50-85X. Next, they developed an assembler called Shasta, which was created with the Chan Zuckerberg Initiative. The assembler performed as well as other assemblers with lower misassemblies. They are doing assemblies in less than six hours and for $70 on Amazon Web Services. The median contig NG50 is 23 Mb. Jain also explained they created a two-step polishing of assemblies: first, they use MarginPolish, which is a graph-based alignment polisher, and then HELEN, a deep neural network-based consensus sequence polishing algorithm. Jain shared that that indicates that Shasta assembly and polishing improves the calling of homopolymers. The next steps are faster base calling and haplotype phasing. On Jain’s methods slides, I noticed that they used Puregene extraction and the Ligation Sequencing Kits (LSK, previous versions). The polishers MarginPolish and HELEN were described as “scalable versions of Racon and Medaka.” Also, with Hi-C data, assemblies produced chromosome-level scaffolds.
