Liftoff and Novel Gene Annotation

Alaina Shumate from Johns Hopkins University was a speaker at the Nanopore Community Meeting 2021. Shumate was a graduate student and worked on “The annotation of novel genes in a complete human genome.” They noted that in 2003, scientists “finished” the human genome, but reions were still incomplete. In 2021, the Telomere-to-Telomere (T2T) Consortium was able to complete the human genome. Shumate explained that long-read sequencing helped identify and assemble duplications. Novel genes were found, and Shumate and their lab saw a need for a tool for mapping gene annotations. Liftoff was “designed to address challenges specific to lifting over gene annotations.” It uses Minimap2 to align complete gene sequences, including introns. Only the exon coordinates are lifted. Shumate explained how acyclic graphs are used to produce alignments for the exons. Liftoff cannot find entirely novel genes. Thus, Shumate worked with collaborators to adapt the Comparative Annotation Toolkit and Liftoff to find eight entirely novel genes. Liftoff and CAT identified 1,956 novel genes, with three having paralogs that are known to be clinically relevant. Shumate emphasized how annotation depends on high-quality assemblies. This is facilitated with high-throughput long-read sequencing.

person standing on rock in front facing lush green mountains
How can Liftoff and other bioinformatic tools help identify novel genes or correct assembly/annotation errors? Photo by Julian Jagtenberg on Pexels.com