Tonight I watched another KBase Science Session: Data integration to support (or refute) predictions. “Integrating data to predict functions for gaps in metabolic models” was the title of Bill Nelson’s session. Nelson is from the Pacific Northwest National Laboratory. The work was part of two PNNL SFA projects: a soil microbiome project and a persistence control SFA. In both projects, they are collecting multi-omic data. They are running into gaps in metabolic modeling. Nelson explained that they have genes with unknown function and nutrient/metabolite transformations for which the biochemistry is unknown. Nelson also explained that that current gap-filling of metabolic models assumes missing required reactions are present and attempts to lower the function classification thresholds. Instead, Nelson and team wanted to apply machine learning models and improvements. The first tool Nelson shared was MetaPathPredict that helps predict complete KEGG modules from incomplete KEGG ortholog data. They trained with 24,000 complete genomes and 293 KEGG modules. MetaPathPredict was evaluated and found to have high accuracy even with incomplete information. The second tool developed was OMEGGA: OMics-Enabled Global GApfilling. The tool is now available in KBase! The idea is that OMEGGA allows for global gapfilling. OMEGGA integrates transcriptomic, proteomic, and metabolomic data where available to improve the models. Snekmer, a rapid annotation tool, was the last tool Nelson shared. Snekmer predicts function of proteins by recoding the sequences of proteins to smaller alphabets and taking into account protein families. This tool has been published and is available in KBase. Nelson concluded that a user can integrate these tools by using Snekmer for annotation and then MetaPathPredict and OMEGGA to integrate data and predict functions. A question asked to Nelson was how to incorporate information from growth data (for example, Biolog plates) for conditions without growth. The response was that some of the tools can remove elements of models to prevent false positives.
