Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation

Damien Sileo

2024-07-19

Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation

Summary

This paper discusses a problem with large language models (LLMs) when they try to recommend missing items from long lists. It introduces the concept of 'attention overflow,' which happens when the model gets overwhelmed by too many items and starts repeating suggestions.

What's the problem?

When LLMs are given long lists of items (like over 100), they struggle to provide accurate recommendations for missing items. Instead of suggesting new items, they often repeat ones that are already in the list. This issue, called attention overflow, occurs because the model has to keep track of all the items at once, making it hard to focus on what’s missing.

What's the solution?

The authors evaluated this problem using both simple examples (like finding missing numbers) and real-life scenarios (like movie recommendations). They found that while using loops can help reduce repetition, it also increases the workload on the model, making it less effective at generating new and interesting suggestions. They suggest that better strategies are needed to help LLMs handle long lists without losing track of what has already been mentioned.

Why it matters?

Understanding and addressing attention overflow is crucial for improving how LLMs make recommendations. By finding ways to enhance their performance with long inputs, we can create smarter AI systems that provide better suggestions in various applications, such as personalized recommendations in streaming services or online shopping.

Abstract

Large language models (LLMs) can suggest missing elements from items listed in a prompt, which can be used for list completion or recommendations based on users' history. However, their performance degrades when presented with too many items, as they start to suggest items already included in the input list. This occurs at around 100 items for mid-2024 flagship LLMs. We evaluate this phenomenon on both synthetic problems (e.g., finding missing numbers in a given range of shuffled integers) and realistic movie recommendation scenarios. We refer to this issue as attention overflow, as preventing repetition requires attending to all items simultaneously. Although iterative loops can mitigate this problem, their costs increase with the repetition rate, affecting the language models' ability to derive novelty from lengthy inputs.

View Paper