Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering

Jihao Zhao, Chunlai Zhou, Biao Qin

2025-05-07

Invoke Interfaces Only When Needed: Adaptive Invocation for Large
Language Models in Question Answering

Summary

This paper talks about a new way to help small language models, like the ones used in chatbots or virtual assistants, avoid making up false information, which is called hallucination. The researchers introduce a metric called AttenHScore that helps the system figure out in real time if the model might be giving a wrong answer, so it only uses extra resources or checks when really necessary.

What's the problem?

The problem is that small language models sometimes make mistakes and give answers that sound right but are actually made up. It's hard to catch these mistakes quickly and efficiently, especially without constantly retraining the model or using a lot of extra computing power.

What's the solution?

To solve this, the researchers created AttenHScore, a new way to measure and detect when a model might be hallucinating as it answers questions. This lets the system adapt and only call in more advanced checks or interfaces when it seems like the answer might be wrong, instead of doing it all the time. This approach improves performance and saves resources.

Why it matters?

This matters because it helps make AI systems more trustworthy and efficient, especially when they're used in real-time situations like customer service or online help. By catching mistakes as they happen and only using extra checks when needed, the system can stay fast and accurate without wasting time or energy.

Abstract

A new metric called AttenHScore is proposed to dynamically detect hallucinations in small language models, enhancing real-time performance and reducing the need for additional training.

View Paper