Long-reads Tools in KBase

I ran out of Nanopore videos on their YouTube channel! So I started watching a KBase webinar on long-read tools. Benjamin Allen from KBase introduced the session. This webinar will help me prepare for BIT 295 and isolate sequencing next semester. Allen shared KBase resources including the YouTube channel webinar recordings and the LISA Workshop tutorials. The long-read workshop has a tutorial workbook that I should download and print! Allen used as an example the Luteolibacter sp. strain Populi they isolated with a summer research intern! The result was a Genome Announcement! Allen presented the paper and a static narrative that I plan on reviewing. The narrative has access to the data for the broader public. Ellen Dow from KBase Education Org noted that there is now a collection of narratives from educators. The narrative begins with the long-reads from sequencing an isolate. The narrative starts with scientific background on the organism and importing the reads. In the example, they imported Nanopore reads. There are options for PacBio now. Importing of Oxford Nanopore Technologies reads is done by specifying that they are non-interleaved reads. The read object produced produces a summary. The first step is to filter reads by size. Allen did show the Compute Simple Read Library Stats to obtain a report of the reads. A question in the chat was if you could run FASTQC on Nanopore data. The answer was yes, though the quality plots may not be that useful. LongQC is not available on KBase… yet! This session was recorded on December 4, 2024, and I hope this changes. Filtlong is used to remove short reads that complicate the assembly. The threshold set by Allen was 1000 bp: removing anything below that length. A user can also adjust a “keep percent” or remove duplicates. Allen specified 90% of reads to be kept. A genome size can also be set, which will optimize coverage. After using Filtlong, the output is used in the Flye assembler. At the time of recording, Flye was the only long-read only assembler. Flye has options for PacBio and ONT read types. The basecalling parameters are set. Flye also has a metagenome or “uneven coverage” mode. The QUAST report indicated a single contig. The QUAST report can be expanded. Allen also shared a hybrid assembly with long and short reads. If you have a long-read assembly and short reads from the same sample, Polypolish can be used to use the short reads to identify potential errors in the long-reads. This approach is one I would like to try in class. Unicycler is also a hybrid assembler available in KBase. HybridSPAdes is also available. Prokka can be used for annotation. Allen did suggest DRAM: an annotation tool that uses multiple reference libraries to assign annotations. DRAM also produces functional gene heat plots. There is a DRAM for viral annotation. Post annotation, SpeciesTree can be used to compare the isolate to nearby neighbors. Compute ANI with Fast ANI can be done with nearby neighbors. Allen then used the GTDB app to build taxonomies and build a phylogenetic tree. During the question session, Allen shared that you can visualize a genome with Circos. Jorg is also a beta app to improve and circularize single-genome assemblies and MAGs. In the paper Allen and the high school intern published, they compared Flye and another assembler (Tricycler) run outside of KBase. Allen did note that there is a webinar on the use of DRAM that I will watch next!

How can KBase help assemble and analyze bacterial genomes?