Retrospective Learning from Interactions
Zizhao Chen, Mustafa Omer Gul, Yiwei Chen, Gloria Geng, Anne Wu, Yoav Artzi
2024-10-18

Summary
This paper discusses a new method called ReSpect that allows large language models (LLMs) to learn from past interactions with users by analyzing feedback signals, improving their performance over time.
What's the problem?
When users interact with LLMs, they often give indirect feedback if the model doesn't respond as expected. For example, a user might rephrase their question or express frustration. However, LLMs typically don't learn from these interactions because they lack a system to capture and analyze this implicit feedback, which limits their ability to improve and adapt to user needs.
What's the solution?
To solve this problem, the authors introduced ReSpect, a method that enables LLMs to learn from past interactions by focusing on the feedback signals from users. By analyzing thousands of interactions where users provided implicit feedback, ReSpect helps the model understand and adjust its responses based on what users really want. In tests, this method significantly improved the model's task completion rate from 31% to 82%, showing that it can effectively learn from user interactions without needing extra training data.
Why it matters?
This research is important because it enhances how AI can interact with people by making it more responsive and adaptable. By allowing LLMs to learn from their conversations with users, ReSpect can lead to better user experiences in applications like virtual assistants, customer service bots, and educational tools. This advancement helps AI become more useful and aligned with human expectations.
Abstract
Multi-turn interactions between large language models (LLMs) and users naturally include implicit feedback signals. If an LLM responds in an unexpected way to an instruction, the user is likely to signal it by rephrasing the request, expressing frustration, or pivoting to an alternative task. Such signals are task-independent and occupy a relatively constrained subspace of language, allowing the LLM to identify them even if it fails on the actual task. This creates an avenue for continually learning from interactions without additional annotations. We introduce ReSpect, a method to learn from such signals in past interactions via retrospection. We deploy ReSpect in a new multimodal interaction scenario, where humans instruct an LLM to solve an abstract reasoning task with a combinatorial solution space. Through thousands of interactions with humans, we show how ReSpect gradually improves task completion rate from 31% to 82%, all without any external annotation.