Preference Leakage: A Contamination Problem in LLM-as-a-judge

Dawei Li, Renliang Sun, Yue Huang, Ming Zhong, Bohan Jiang, Jiawei Han, Xiangliang Zhang, Wei Wang, Huan Liu

2025-02-04

Preference Leakage: A Contamination Problem in LLM-as-a-judge

Summary

This paper talks about a problem called preference leakage, which happens when AI models used to judge other AI systems are biased because they are too closely related to the models they are evaluating. This bias can make the evaluations unfair and unreliable.

What's the problem?

When AI models are used to generate synthetic data and then evaluate other models, there is a risk that the judge AI might favor the models it is related to. This happens because the judge and the model being evaluated might share similarities, like being the same model, part of the same family, or having a parent-child relationship. This bias, called preference leakage, is hard to detect and can contaminate the evaluation process, making it less trustworthy.

What's the solution?

The researchers studied preference leakage by defining three types of relationships between data-generating AI models and judge AI models that could cause this bias. They conducted experiments using different benchmarks and AI setups to measure how much bias occurs in these situations. They developed a preference leakage score to quantify the bias and found that it is widespread across multiple scenarios. Their analysis showed that preference leakage is harder to detect than other types of biases in AI evaluation systems.

Why it matters?

This research matters because it highlights a hidden problem in how AI systems are evaluated and trained. If evaluations are biased due to preference leakage, it can lead to unfair results and slow down progress in developing better AI technologies. By understanding this issue, researchers can design more reliable evaluation methods, ensuring that future AI systems are more accurate and trustworthy.

Abstract

Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods in model development. While their combination significantly enhances the efficiency of model training and evaluation, little attention has been given to the potential contamination brought by this new model development paradigm. In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators. To study this issue, we first define three common relatednesses between data generator LLM and judge LLM: being the same model, having an inheritance relationship, and belonging to the same model family. Through extensive experiments, we empirically confirm the bias of judges towards their related student models caused by preference leakage across multiple LLM baselines and benchmarks. Further analysis suggests that preference leakage is a pervasive issue that is harder to detect compared to previously identified biases in LLM-as-a-judge scenarios. All of these findings imply that preference leakage is a widespread and challenging problem in the area of LLM-as-a-judge. We release all codes and data at: https://github.com/David-Li0406/Preference-Leakage.

View Paper