Matt Attreed presented at the Nanopore Community Meeting 2022 on “How to generate assemblies and call variants.” This is a Masterclass and started by describing the resources on the Nanopore website. There is a page dedicated page on Nanopore accuracy information. The session included different workflows that process FASTQ files to SAM, BAM, and other formats. Adjusting the quality can be important depending on the number of reads that passed quality control and the requirements for applications. Analyses begin with MinKNOW, which as some analyses such as basecalling, alignment to a reference, and barcoding (demultiplexing). EPI2ME Labs has workflows that are available on GitHub and can be run by command line. EPI2ME is a web-based analysis. This reminded me that EPI2ME Labs has a workflow for bacterial genome assembly that is very useful. EPI2ME Labs requires Docker. The workflows can be configured through menus. Attreed explained that sequence assembly produces a FASTA file format representing the consensus sequence. Reference sequences can be provided and retrieved from Ensembl, UCSC, and RefSeq. Attreed explained that assembly process takes reads to form contigs based on overlaps and then can build scaffolds. Pore-C and ultra-long assembly can be used for large N50 assemblies. De novo assembly without a reference benefits from long reads and coverage. Additionally, a polishing step may be needed. Upon basecalling, fastq files can be assembled with Flye and polished with Medaka. This is the workflow that is used in the workflow for bacterial genome assembly in EPI2ME Labs. Just today I received information from Technical Support for the use of sample sheets to perform multiple assemblies for a series of barcoded bacterial genomes. Metagenomic assembly can be used to reveal metabolic networks and identify antimicrobial resistance. Attreed recommended using MetaFlye for assemblies with polishing with Medaka. Long-read data can be useful for identifying variants. To generate an alignment, Attreed noted that reads are filtered and aligned to a reference sequence to produce a BAM file describing the alignment. The Variant Calling Format (VCF) is a text-based, tab-delimited format with specific field headers. To identify variants in human samples, Attreed noted that Oxford Nanopore Technologies recommends using the Human Variation Workflow from EPI2ME Labs. To call and phase single-nucleotide variants, the recommended workflow is to align reads with minimap2. SNV calling is performed with Chair3. Phasing can then be done with WhatsHap to produce a VCF file. Attreed explained that deletions, duplications, invertions and other structural variants can be identified with long reads. After quality control and filtering, reads are aligned with minimap2. Variants are called using Sniffles2. After filtering variants, a VCF file is produced. Transcriptome workflows can be used for cDNA or native RNA to identify isoforms, measure gene expression, and even detect fusion genes. The wf-transcriptomes workflow in EPI2ME Labs can perform de novo assembly. Attreed noted that on PromethION platforms, alternative splicing can be identified at the single-cell and spatial resolution! EPI2ME labs has a workflow that uses 10X Genomics data. The output is a rich series of gene expression tables. This Masterclass was useful to revisit as we received the GridION today and will try to produce transcriptomes this summer.
