In-context learning and Occam's razor

Eric Elmoznino, Tom Marty, Tejas Kasetty, Leo Gagnon, Sarthak Mittal, Mahan Fathi, Dhanya Sridhar, Guillaume Lajoie

2024-10-22

Summary

This paper introduces CBT-Bench, a new evaluation framework designed to assess how well large language models (LLMs) can assist in cognitive behavioral therapy (CBT).

What's the problem?

There is a significant gap between the mental health support that patients need and what is currently available. While LLMs have shown promise in various applications, their effectiveness in providing therapeutic support is not well understood. Existing methods do not systematically evaluate how well these models can assist with different aspects of CBT, which is crucial for ensuring they can be used safely and effectively in mental health settings.

What's the solution?

To address this issue, the authors created CBT-Bench, which includes three levels of tasks to evaluate LLMs: 1) Basic knowledge of CBT through multiple-choice questions, 2) Understanding cognitive models by classifying cognitive distortions and core beliefs, and 3) Generating therapeutic responses to patient interactions during therapy sessions. By testing LLMs on these tasks, the researchers can measure their capabilities in providing CBT assistance and identify areas where they may need improvement.

Why it matters?

This research is important because it helps bridge the gap between technology and mental health care. By evaluating how well LLMs can support therapists and patients in CBT, the findings could lead to better tools for mental health professionals, ultimately improving access to care for those who need it. Additionally, understanding the limitations of these models can guide future development to ensure they are safe and effective.

Abstract

The goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees for generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.

View Paper