Unlocking Knowledge Extraction with KBase

I didn’t know KBase could be used for knowledge extraction from literature! Tonight I watched the session by Shinjae Yoo from Brookhaven National Laboratory titled “Knowledge extraction from literature.” This was part of the KBase Science Session: Data integration to support (or refute) predictions I started watching yesterday. The primary focus of this study was synthetic biology and they wanted to accelerate sharing of tools. For this, an automated approach is highly desirable. Yoo used machine learning approaches to “automate harvesting of synthetic biology knowledge from the literature.” The support from previous funding helped the research team improve table and figure data extraction. Figure extraction is more challenging than table data, which can be detected and interpreted with optical computer recognition (OCR). For figure detection, large language models can be used to extract information from charts. For protein-protein interactions, they developed tools to mine information from databases. For a related project, Yoo identified organism hosts and genetic data using large language model (LLM) evaluation. Accuracy improved from 70 to 94% accuracy by providing more contextual information. Yoo automatically recognized biological entities and genetic tools from articles in bioRxiv (71K) using keywords. KBase is prototyping a chatbot interface allowing a user to ask questions. The session ended with a short demo of the chatbot interface. Yoo emphasized that the information logged can be used to further improve the application. This could be very useful in courses!

How can a chatbot improve data extraction in KBase? AI-generated image.

Post Categories

Credits

Website images were purchased from and edited in Canva.com. Blog post images are from the WordPress free image library powered by Pexels. Gallery images used were taken or created by Carlos C. Goller or otherwise attribution is stated. Blog posts represent my reflections and reference relevant sources of information, including conferences, podcasts, books, and workshops when applicable. I strive for proper attribution of sources and accessibility of content. I am still early in the journey. I appreciate feedback!

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Unlocking Knowledge Extraction with KBase

Understanding Alternative Splicing with Blessy R Package

Advancements in Antisense Oligonucleotide Design

Ultra-Fast Classifiers for Pediatric Tumors: Insights from Lennart Kester