Tonight, I watched the Microbial Annotation Workshop on the KBase YouTube channel. This session was posted on July 9, 2024. The title of the session was “Implementation of MEGGA through new KBase model building and gapfilling apps.” The KBase narrative used in the workshop is publicly available. The static version of the narrative is called “GSP 2024: Model building and gapfilling using Probabilistic Annotation, MS2, and OMEGGA.” The narrative begins with the assembly of a Rhodococcus genome. The genome is annotated with RAST and includes the seed numbers. They also annotated with other algorithms including DRAM that associates KEGG and EC numbers. Another annotation step includes the PDB metadata. Now the genome annotations can from several algorithms can be analyzed. Next, Bill Nelson from the Pacific Northwest National Laboratory presented a session entitled “Snekmer: Degenerate k-mer analysis for functional classification of protein sequences.” Nelson spoke about “the annotation problem” and that “many bacterial proteins cannot be assigned a specific biochemical function.” They developed Snekmer using a reduced character set to represent protein sequence “by leveraging property similarities between amino acides” and the computational efficiency of kmer profiling. Nelson explained that they are essentially grouping amino acids as hydrophobic or hydrophilic, for example. The Snekmer Learn can be used to take a training set to build a set of kmers using that “alphabet.” Snekmer Apply then uses the training set and cosine similarity matrix to apply the tool to unknown proteins. This tool has been implemented in KBase as the Snekmer Apply app. The input object is a protein sequence object or a genome object. Three ontologies are available: Pfam, PANTHER, and TIGRFams. Nelson noted that the runs don’t take too long. The app returns an annotated genome object and a table. Nelson mentioned that the annotations currently do not integrate with KBase metabolic modeling, though they are working on that integration. Patrik D’haeseleer and Jeff Kimbrel from Argonne National Laboratory presented on “Probabilistic Annotation and Ensemble Metabolic Modeling in KBase. They noted that 30-50% of genes are still lacking any annotations and for others different annotations disagree. They developed a set of KBase apps to import third party annotations and merge them for metabolic modeling with probabilities. The set of apps is still in development and will bring probabilistic/ensemble modeling to KBase. You can use tools to obtain probabilities from the analysis of different annotations. You can then import data. Kimbrel shared output from the Compare Metabolic Models app. The Merge Metabolic Annotations app can then build a model. Kimbrel noted that they are working on customization options for this app. Hyun-Seob Song from the University of Nebraska-Lincoln presented on “Integration of Phenotype and Multi-omics Data for Metabolic Network Reconstruction in KBase with the OMEGGA Tool.” The OMics-Enabled Global GApfilling algorithm was developed to increase model accuracy. The team decided to implement OMEGGA through new KBase model building and gapfilling apps. The two apps are called MS2 – Build Prokaryotic Metabolic Models with OMEGGA and MS2 – Improved Gapfill Metabolic Models with OMEGGA. The team has done some validations with E. coli and non-model organisms. Jeremy Jacobson from the Pacific Northwest National Laboratory then presented on “Building models, gapfilling, and omics integration.” Jacobson demonstrated the KBase narrative steps with growth data to build a model. Before gapfilling, the model indicated no growth of the Rhodococcus isolate. Gapfilling added dozens of reactions. Janake Edirisinghe from Argonne National Laboratory then shared outputs and results. The addition of expression data and experimental evidence helps fill gaps. The resulting maps highlight in green gaps filled and allow users to obtain information about reactions. I used metabolic modeling with KBase several semesters ago for a half-semester course we did and called “metabolic modeling.” I love how the new tools allow users to do some of the things students requested!
