< Explain other AI papers

I Think, Therefore I Am Under-Qualified? A Benchmark for Evaluating Linguistic Shibboleth Detection in LLM Hiring Evaluations

Julia Kharchenko, Tanya Roosta, Aman Chadha, Chirag Shah

2025-08-08

I Think, Therefore I Am Under-Qualified? A Benchmark for Evaluating
  Linguistic Shibboleth Detection in LLM Hiring Evaluations

Summary

This paper talks about a new benchmark that tests how large language models react to subtle language cues, called linguistic shibboleths, which can reveal someone's background like gender or social class.

What's the problem?

The problem is that these language models tend to give lower scores to people who use hesitant or indirect language, even when their answers have the same quality, showing hidden biases against certain ways of speaking.

What's the solution?

The solution was to create a controlled set of interview questions and answers that only change specific language features without changing the meaning, allowing precise measurement of how language style affects the model's evaluation.

Why it matters?

This matters because it reveals that AI systems used in hiring might unfairly judge people based on how they speak, instead of what they know, highlighting the need to make these systems fairer and free from such biases.

Abstract

A benchmark evaluates Large Language Models' response to linguistic markers that reveal demographic attributes, demonstrating systematic penalization of hedging language despite equivalent content quality.