Evaluate Bias without Manual Test Sets: A Concept Representation Perspective for LLMs
Lang Gao, Kaiyang Wan, Wei Liu, Chenxi Wang, Zirui Song, Zixiang Xu, Yanbo Wang, Veselin Stoyanov, Xiuying Chen
2025-05-22
Summary
This paper talks about BiasLens, a new tool that helps researchers find and understand bias in large language models without needing to create special test sets or labeled examples.
What's the problem?
It's really hard and time-consuming to check if AI models are biased because you usually have to make big test sets with lots of labeled data, and even then, some types of bias can go unnoticed.
What's the solution?
The researchers created BiasLens, which uses advanced math techniques like concept activation vectors and sparse autoencoders to look inside the AI model and spot bias directly, so it can find even hidden biases without needing any labeled test data.
Why it matters?
This matters because it makes it much easier and faster to make sure AI is fair and trustworthy, helping prevent problems like discrimination or unfair treatment in real-world applications.
Abstract
BiasLens is a test-set-free framework that uses concept activation vectors and sparse autoencoders to analyze bias in large language models without labeled data, detecting previously undetected forms of bias.