How LLMs Boost Scientific Research with KBase

  • Home
  • KBase
  • How LLMs Boost Scientific Research with KBase

Continuing with the KBase Science Session: Data integration to support (or refute) predictions, I watched the session by Paramvir Dehal and team. The title of the session was “Leveraging LLMs to Synthesize and Develop New Questions.” They talked about the use of a KBase Research Assistant with the goal of accelerating science by helping with planning and summarizing results, for example. The KBase Research Assistant could also help with adjusting parameters. The assistant could take a user’s goal, help design a workflow, and summarize the results. As a model, they used a genome white paper. The team tried to follow the KBase principles of addressing a genomics research gap while protecting and sharing data. The team started with a chatbot for user analysis. They took all KBase documentation and tutorials and the KBase Educators Handbook! The chatbot could then guide users to analyze the data. The research team used various LLM models: GPT-4 and others. The MRA template for isolates created by Zack Crocket was used for training. This included the MRAs that KBase Education users put together! The user journey starts with stating a goal, approving the steps suggested by the bot, running the narrative apps, summarizing results, recommending next steps, and doing the writing. On the agent side, there are several agent versions: analyst, narrative, job, works pace, and writer agents. As an example, the analyst can suggest trimming if adapter contamination is detected. Interestingly, the agents can “fight” with each other. The agents are trained on information and need to use tools. They are using retrieval-augmented generation (RAG): “a technique for enhancing the accuracy and reliability of LLM models with facts fetched from external sources like documentation.” Another AI tool is the use of Knowledge Graph (KG) populated from the KBase App Catalog. When the agent identifies a step, the KG helps select the tool. The research team is now figuring out guardrails and performing user testing. One question from the audience was if the research team had reached out to journals to consider potential publication challenges. This is an intriguing question! Another question from the audience was about what datasets to use for training. I am excited about the potential of AI agents. I hope I can be a tester!

How can AI be used to improve KBase analyses? AI-generated image.