Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering
Bolei He, Xinran He, Run Shao, Shanfu Shu, Xianwei Xue, Mingquan Cheng, Haifeng Li, Zhenhua Ling
2025-08-27
Summary
This paper focuses on improving how large language models, which are good at answering general questions, perform when dealing with specialized topics like medicine or law.
What's the problem?
While models can be improved by giving them access to extra information (like a database) or retraining them on specific data, both approaches have drawbacks. Adding external info can lead to the model making things up, and retraining is expensive and doesn't help with other topics. The core issue is that specialized knowledge is often very diverse, meaning the model doesn't fully utilize what it *already* knows, and learning should happen in stages – understanding basics before tackling complex problems.
What's the solution?
The researchers developed a system called Selct2Know (S2K) that’s designed to be efficient and effective. It works by letting the model decide when to use its internal knowledge versus looking up external information. It also uses a carefully designed process to create training data that focuses on building reasoning skills, and incorporates a technique called GRPO to further boost those skills. Essentially, it's a smart way to teach the model to learn and apply specialized knowledge without massive retraining costs.
Why it matters?
This work is important because it offers a cheaper and more flexible way to make large language models experts in specific fields. S2K achieves performance comparable to models that have been fully retrained for those fields, but at a significantly lower cost, making specialized AI more accessible and adaptable.
Abstract
Large Language Models (LLMs) perform well in general QA but often struggle in domain-specific scenarios. Retrieval-Augmented Generation (RAG) introduces external knowledge but suffers from hallucinations and latency due to noisy retrievals. Continued pretraining internalizes domain knowledge but is costly and lacks cross-domain flexibility. We attribute this challenge to the long-tail distribution of domain knowledge, which leaves partial yet useful internal knowledge underutilized. We further argue that knowledge acquisition should be progressive, mirroring human learning: first understanding concepts, then applying them to complex reasoning. To address this, we propose Selct2Know (S2K), a cost-effective framework that internalizes domain knowledge through an internal-external knowledge self-selection strategy and selective supervised fine-tuning. We also introduce a structured reasoning data generation pipeline and integrate GRPO to enhance reasoning ability. Experiments on medical, legal, and financial QA benchmarks show that S2K consistently outperforms existing methods and matches domain-pretrained LLMs with significantly lower cost.