Assembling Redwoods, Giant Sequoia, and a Fungus

“Sequencing and assembling the mega-genomes of mega-trees: the giant sequoia and coast redwood genomes” is the title of the session Steven Salzberg from Johns Hopkins University presented. I was curious why this session was classified as “Metagenomics.” Salzberg works on several different organisms. The project’s first results were released in April 2019, describing the genome assemblies of the giant sequoia and redwood. The giant sequoia is the largest tree species on Earth. The giant sequoia are only found in California. The tree they sequenced is endangered, and the location is not being released. This tree was about 1,360 years old and 96.3 meters tall (315.9 ft), the tallest known sequoia in the world. The genome size is 8.5 Gigabases. They finished sequencing in November 2017. The assembly finished in 2018. Then, the “Dovetail” assembly continued. The coast redwood is the tallest tree species and is highly endangered. The genome is hexaploid with 27 Gigabases. All the Nanopore sequencing was performed at Johns Hopkins. The recipe was a combination of Illumina and Nanopore. The goal is to obtain as much data from a single seed. The seed is “pealed” to reveal the megagametophyte. The assembler used is MaSuRCA, which can take Illumina reads and merge them before aligning them to Nanopore reads. The tiled super reads have much better quality. For the Giant sequoia, the team used 13 flow cells. The third step was to scaffold using Dovetail and Hi-C. Salzberg also shared data from the walnut genome assembled into 16 contigs corresponding to the chromosomes. Salzberg also explained that the assembly/scaffolding with Hi-C resulted in 11 big contigs and thousands of small ones. They were able to find telomeres and centromeres. The biggest contig was separated into two chromosomes after the team identified an erroneously assembled segment. The redwood Illumina data had a surprise fungal genome that was about 40 Mbases. This is the metagenomics part! The assembly had a maximum memory usage of 2 Tb, and error correction required 330,000 CPU hours… the wall clock time was 5-6 months! Redwood is a hexaploid, and Salzberg thinks there should be three subgenomes. What a challenging set of genomes to sequence and assemble!

bottom up view of redwood trees
How do you assemble the genomes of giant trees? Photo by Nagihan Yilmaz on Pexels.com