Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models
Alla Chepurova, Aydar Bulatov, Yuri Kuratov, Mikhail Burtsev
2025-12-02
Summary
This paper introduces Wikontic, a new method for building knowledge graphs from text that aims to improve how large language models use structured information.
What's the problem?
Large language models are getting better, but they often struggle to reliably use factual knowledge. While knowledge graphs can help provide this knowledge, current systems usually just use them to find relevant text snippets, rather than truly leveraging the organized structure of the knowledge graph itself. Existing methods for creating these knowledge graphs can be inefficient and produce graphs with inconsistencies or redundancies.
What's the solution?
Wikontic tackles this by creating knowledge graphs in a few key steps. First, it pulls out potential facts (triplets) from text, including details that add context. Then, it checks these facts against a large, existing knowledge base called Wikidata to make sure the types of things and the relationships between them make sense. Finally, it cleans up the graph by making sure different names refer to the same thing, reducing repetition. The result is a smaller, more accurate, and well-organized knowledge graph.
Why it matters?
Wikontic is important because it shows that you can build high-quality knowledge graphs efficiently, and that these graphs can directly improve a language model’s ability to answer questions and retain information *without* needing to search through lots of extra text. It outperforms other methods in terms of both accuracy and speed of construction, offering a practical way to integrate structured knowledge into LLMs.
Abstract
Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored. In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain text by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets. On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention performance on the MINE-1 benchmark (86%), outperforming prior KG construction methods. Wikontic is also efficient at build time: KG construction uses less than 1,000 output tokens, about 3times fewer than AriGraph and <1/20 of GraphRAG. The proposed pipeline enhances the quality of the generated KG and offers a scalable solution for leveraging structured knowledge in LLMs.