Continuing with the KBase Science Session: Data integration to support (or refute) predictions, tonight I watched Chris Henry from Argonne National Laboratory present on “Predicting Protein function using structure nd sequence similarity in KBase.” Henry and team built a pipeline in KBase to analyze structure and sequence similarity data. Henry noted that KBase has a four-step functional discovery pipeline. The first step of the project was to gather hypothesis for protein function from diverse algorithms. This step accumulates annotations, and the tool they developed compares annotations algorithms in a table. The team also added a PDB annotation app in collaboration with PDB. This tool provides the closest PDB IDs and related data. The team then evaluated annotation sources and the methods. Interestingly, RAST had slightly higher performance but DRAM provided more annotations. Henry also spoke about using AI and large language models (LLM) for protein/DNA analyses. For step 2, the group wants to integrate experimental data to identify evidence-supported functions. I was excited that they are evaluating with PMI and ENIGMA collaborations using phenotypic data. The team is able to build a model and then use phenotypic data to improve gapfill of metabolic models. This work resulted in the MS2 app. The PickAxe tool can predict novel pathways based on similarities. Step 3 is using structure-based methods to test candidate genes. For this, they are importing structure and docking scores. Step 4 explores function in evolutionary context using trees, SSN, and gene neighborhood networks. This process allows researchers to build and characterize protein families. Henry also spoke about integrating fungal data to improve pathway gaps for fungal modeling. Tools to use structure to predict function were used successfully for fungal genes. For step 5, experimental validation, the team is developing automated lab workflows for knock-out and complementation validation in a self-driven lab. The approach uses Ellen Neidle’s synthetic biology system. This ten minute session was packed with new resources and collaborations. I would love to try some of this work with Delftia spp. and courses!
