A Guide to Structural Variation Analysis

Today I watched the Knowledge Exchange session from Nanopore Learning about “A beginner’s guide to structural variation analysis: from discovery to basic annotation.” Steven Rudd, a bioinformatics expert at Oxford Nanopore Technologies, shared a couple of slides and then explored the structural variation tutorial. A structural variant (SV), Rudd said, is typically considered as “a genomic insertion, deletion, duplication or other complex rearrangement of >50 bp in length.” Long-read sequencing provides the length to span structural variations and analyze them. There are ONT tutorials to learn applied bioinformatics. Rudd explained that they are adhering to best bioinformatic practices: literate programming using R and markdown to produce reports. Versioning of software and reports are logged. This information can help ensure reproducibility in bioinformatics. The tutorials ONT has developed are available on their GitHub repository. They often use Conda to manage software versions. Snakemake is a piece of software that manages workflows, Rudd explained. The basic QC tutorial is produced during a sequence run and basecalling. You can access the tutorial and type the commands to produce the graphs and reports. Rudd shared the “Executive Summary” that includes the essence of the run. Graphs depict how much data was produced and flow cell performance. The structural variation pipeline tutorial has a vignette and explains the commands. In a Linux console, you can type the commands and run the Snakemake workflow. Then, you can render the report. Rudd demonstrated the commands and installation of mini Conda. Rudd explained how the software and conda environment were prepared. Rudd activated the environment they set up and pasted the command. The demo was with the Genome in a Bottle dataset. Rudd reviewed the report and the structural variants identified. A karyogram was produced indicating the SV type and chromosome location. The report also identifies SVs that are associated with coding sequences. Spreadsheet files are generated with this information. The report compares to a “truth” to provide some metrics about precision, recall, and and F1 score. Rudd then shared how data can be visualized with IGB and also with Ribbon. Ribbon runs in the web browser to explore structural variants. Rudd also shared that more tutorials would be created. This Knowledge Exchange was very helpful, as I keep on learning about the importance of structural variant analysis.

red peppers on a white table
What does the ONT structural variation guide provide? Photo by Laker on Pexels.com