OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

Yujie Luo, Xiangyuan Ru, Kangwei Liu, Lin Yuan, Mengshu Sun, Ningyu Zhang, Lei Liang, Zhiqiang Zhang, Jun Zhou, Lanning Wei, Da Zheng, Haofen Wang, Huajun Chen

2024-12-31

OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

Summary

This paper talks about OneKE, a new system designed to extract knowledge from various sources like the web and PDF books using multiple agents and a structured approach.

What's the problem?

Extracting useful information from different types of data, such as web pages and documents, can be complicated because the data often comes in many formats and structures. Existing methods may not handle this variety well, making it hard to get accurate information quickly.

What's the solution?

To solve this problem, the authors created OneKE, which uses a dockerized environment and multiple specialized agents to extract knowledge effectively. These agents work together to handle different tasks, like analyzing data formats and correcting errors. OneKE also has a configurable knowledge base that helps it adapt to various extraction scenarios, improving its overall performance. The system can process different types of texts, such as HTML and PDF, making it versatile for many applications.

Why it matters?

This research is important because it provides a powerful tool for gathering and organizing information from diverse sources. By improving how knowledge is extracted, OneKE can be used in many fields, such as science and news reporting, helping users find accurate information more efficiently. Additionally, the open-sourcing of the code allows others to build upon this work and further advance knowledge extraction technologies.

Abstract

We introduce OneKE, a dockerized schema-guided knowledge extraction system, which can extract knowledge from the Web and raw PDF Books, and support various domains (science, news, etc.). Specifically, we design OneKE with multiple agents and a configure knowledge base. Different agents perform their respective roles, enabling support for various extraction scenarios. The configure knowledge base facilitates schema configuration, error case debugging and correction, further improving the performance. Empirical evaluations on benchmark datasets demonstrate OneKE's efficacy, while case studies further elucidate its adaptability to diverse tasks across multiple domains, highlighting its potential for broad applications. We have open-sourced the Code at https://github.com/zjunlp/OneKE and released a Video at http://oneke.openkg.cn/demo.mp4.

View Paper