Implicit Bias-Like Patterns in Reasoning Models
Messi H. J. Lee, Calvin K. Lai
2025-03-24
Summary
This paper explores whether AI models that are designed to reason also show patterns similar to human biases.
What's the problem?
AI models can make unfair or discriminatory decisions if they are biased, but we don't fully understand how these biases work inside the models.
What's the solution?
The researchers created a new way to test AI reasoning models for bias, and they found that these models do show patterns similar to human implicit bias.
Why it matters?
This work matters because it suggests that AI systems can have hidden biases that could affect their decisions in real-world applications.
Abstract
Implicit bias refers to automatic or spontaneous mental processes that shape perceptions, judgments, and behaviors. Previous research examining `implicit bias' in large language models (LLMs) has often approached the phenomenon differently than how it is studied in humans by focusing primarily on model outputs rather than on model processing. To examine model processing, we present a method called the Reasoning Model Implicit Association Test (RM-IAT) for studying implicit bias-like patterns in reasoning models: LLMs that employ step-by-step reasoning to solve complex tasks. Using this method, we find that reasoning models require more tokens when processing association-incompatible information compared to association-compatible information. These findings suggest AI systems harbor patterns in processing information that are analogous to human implicit bias. We consider the implications of these implicit bias-like patterns for their deployment in real-world applications.