Mark T.W. Ebbert from the Mayo Clinic presented at London Calling 2019 on “Long-read sequencing technologies resolve most ‘dark’ and ‘camouflaged’ gene regions.”Dark and camouflaged regions? Ebbert explained that regions can be dark because there are no reads available (“dark by depth“) or dark by low sequence quality (“dark by MAPQ“). Ebbert explained that most dark by MAPQ genes are ‘camouflaged’ arising from genomic duplications. Aligners have difficulty assigning reads to regions that are duplicated and assign randomly with a mapping quality of zero. Ebbert explained that approximately 6054 gene bodies are partially ‘dark’ (from human genome 38), and 527 are 100% dark. The research team found 76 genes with at least 25% dark CDS are associated with HGMD mutations. Ebbert’s favorite gene, CR1, is a top Alzheimer’s disease gene. It was identified as 26% dark. The team identified that the binding domain is dark. Long-read technologies are able to resolve most of the 2855 dark CDS regions. 10X Genomics and PacBio approaches decreased “darkness” but not as well as ONT. I did now know about camouflaged regions. This session was easy to follow and full of helpful visuals. I wonder how often bacterial genes are camouflaged?
