SNP and Indel Detection Tools

Tonight I watched the next video of the Human Genome Sequencing and Analysis course that is part of Nanopore Learning. The lesson is an “Introduction to SNP and Indel Detection,” and Anthony Doran, a Technical Services Scientist, from ONT discussed bioinformatics tools for SNP and indel detection. Doran spoke about the need to map/align reads to/against a reference sequence and detecting variants after applying an optional filter. The recommended level of coverage will depend on the genome and organism. Doran explained that a reference sequence in fasta format is needed. The next step is generating an alignment by mapping your reads against that reference. Reference sequences may include complete genomes, transcriptomes or de novo assemblies. Generating an alignment produces SAM/BAM format files. Reads aligned to a reference genome can be used for variant calling. The Variant Calling Format (VCF) is the standard for storing sequence variation information. The VCF format includes coordinates and several columns with information. Doran recommended learning more about SNP and indel detection by reviewing the EPI2ME Labs workflows. A representative workflow starts with alignment using minimap2, SNV calling with DeepVariant or Medaka, and phasing with WhatsHap to generate a VCF file. I had not thought about the process of variant calling and the tools needed in this level of detail. It is helpful to learn about the VCF format and the alignment step.

illustration of green DNA and pink roses
What tools are required to identify SNPs and indels? Photo by Google DeepMind on Pexels.com