Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models
Qika Lin, Tianzhe Zhao, Kai He, Zhen Peng, Fangzhi Xu, Ling Huang, Jingying Ma, Mengling Feng
2025-02-03

Summary
This paper talks about a new way to combine Knowledge Graphs (KGs) with Large Language Models (LLMs) using a method called Self-supervised Quantized Representation (SSQR). It's like creating a special code that helps these two different types of systems understand each other better.
What's the problem?
Knowledge Graphs and Large Language Models are like two smart friends who speak different languages. KGs store information in a structured way, like a complex web of facts, while LLMs understand and generate human-like text. The problem is that it's hard to get these two to work together smoothly because they process information so differently.
What's the solution?
The researchers came up with a clever two-step plan. First, they created a method called SSQR that takes all the complex information in a Knowledge Graph and turns it into a short code for each piece of information. This code is like a secret language that both the KG and the LLM can understand. Then, they taught the LLM how to use these codes by giving it special instructions. This way, the LLM can easily use the knowledge from the KG without getting confused by its complex structure.
Why it matters?
This matters because it's like giving super smart AI systems a universal translator. By helping Knowledge Graphs and Large Language Models work together better, we can create AI that's both knowledgeable and good at understanding human language. This could lead to more accurate and helpful AI assistants, better search engines, and smarter decision-making systems in fields like healthcare or finance. Plus, it does this while using less computer power, which is great for making AI more efficient and accessible.
Abstract
Due to the presence of the natural gap between Knowledge Graph (KG) structures and the natural language, the effective integration of holistic structural information of KGs with Large Language Models (LLMs) has emerged as a significant question. To this end, we propose a two-stage framework to learn and apply quantized codes for each entity, aiming for the seamless integration of KGs with LLMs. Firstly, a self-supervised quantized representation (SSQR) method is proposed to compress both KG structural and semantic knowledge into discrete codes (\ie, tokens) that align the format of language sentences. We further design KG instruction-following data by viewing these learned codes as features to directly input to LLMs, thereby achieving seamless integration. The experiment results demonstrate that SSQR outperforms existing unsupervised quantized methods, producing more distinguishable codes. Further, the fine-tuned LLaMA2 and LLaMA3.1 also have superior performance on KG link prediction and triple classification tasks, utilizing only 16 tokens per entity instead of thousands in conventional prompting methods.