Unlocking Knowledge Extraction with KBase

  • Home
  • KBase
  • Unlocking Knowledge Extraction with KBase

I didn’t know KBase could be used for knowledge extraction from literature! Tonight I watched the session by Shinjae Yoo from Brookhaven National Laboratory titled “Knowledge extraction from literature.” This was part of the KBase Science Session: Data integration to support (or refute) predictions I started watching yesterday. The primary focus of this study was synthetic biology and they wanted to accelerate sharing of tools. For this, an automated approach is highly desirable. Yoo used machine learning approaches to “automate harvesting of synthetic biology knowledge from the literature.” The support from previous funding helped the research team improve table and figure data extraction. Figure extraction is more challenging than table data, which can be detected and interpreted with optical computer recognition (OCR). For figure detection, large language models can be used to extract information from charts. For protein-protein interactions, they developed tools to mine information from databases. For a related project, Yoo identified organism hosts and genetic data using large language model (LLM) evaluation. Accuracy improved from 70 to 94% accuracy by providing more contextual information. Yoo automatically recognized biological entities and genetic tools from articles in bioRxiv (71K) using keywords. KBase is prototyping a chatbot interface allowing a user to ask questions. The session ended with a short demo of the chatbot interface. Yoo emphasized that the information logged can be used to further improve the application. This could be very useful in courses!

How can a chatbot improve data extraction in KBase? AI-generated image.