From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
Maor Ivgi, Ori Yoran, Jonathan Berant, Mor Geva
2024-07-10

Summary
This paper talks about how large language models (LLMs) behave when they are uncertain about what to say. It categorizes the different types of mistakes these models make, such as repeating themselves, producing vague or nonsensical text, and creating completely made-up information, which are called fallback behaviors.
What's the problem?
The main problem is that LLMs can sometimes generate incorrect or undesirable responses when they don't have enough information to provide a reliable answer. This can lead to issues like repeating phrases, giving unclear responses, or even hallucinating facts that aren't true. Understanding how and why these fallback behaviors occur is crucial for improving the reliability of these models.
What's the solution?
To address this issue, the authors conducted experiments to analyze how different LLMs exhibit fallback behaviors based on their training. They categorized these behaviors into three types: sequence repetitions (repeating text), degenerate text (vague or nonsensical responses), and hallucinations (fabricated information). They found that as LLMs become more advanced—through more training data or parameters—their fallback behaviors change in a specific order: from sequence repetitions to degenerate text and finally to hallucinations. They also noted that common techniques used to generate text can sometimes reduce one type of error but increase another, making it important to find a balance.
Why it matters?
This research is important because it helps us understand the weaknesses of LLMs when they face uncertainty. By identifying and categorizing these fallback behaviors, researchers can work on improving AI systems to make them more reliable and trustworthy. This is essential for applications where accurate information is critical, such as in education, healthcare, and customer service.
Abstract
Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under uncertainty, and investigate the connection between them. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them in models from the same family that differ by the amount of pretraining tokens, parameter count, or the inclusion of instruction-following training. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i.e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations. Moreover, the same ordering is observed throughout a single generation, even for the best-performing models; as uncertainty increases, models shift from generating hallucinations to producing degenerate text and then sequence repetitions. Lastly, we demonstrate that while common decoding techniques, such as random sampling, might alleviate some unwanted behaviors like sequence repetitions, they increase harder-to-detect hallucinations.