Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Mehrab Tanjim, Kibum Kim, Chanyoung Park

2025-02-24

Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of
Large Language Models

Summary

This paper talks about U-SAFEBENCH, a new tool designed to test how safely large language models (LLMs) respond to users based on their specific needs and situations, rather than using one-size-fits-all safety standards.

What's the problem?

LLMs often follow general safety rules, but these rules don’t always work for everyone because people have different needs, risks, and backgrounds. For example, advice that’s safe for most people might be dangerous for someone with a specific medical condition. Current AI systems don’t have a way to adjust their responses to individual users’ safety needs.

What's the solution?

The researchers created U-SAFEBENCH, a benchmark that evaluates how well LLMs can adapt their responses to different user profiles. They tested 18 popular LLMs and found that most of them fail to meet user-specific safety standards. To fix this, they introduced a simple method called chain-of-thought reasoning, which helps the models think through the context more carefully and improve their safety for individual users.

Why it matters?

This matters because it highlights a major gap in AI safety and provides tools to address it. By focusing on user-specific safety, this research could make AI systems more reliable and safer for everyone, especially in sensitive situations like health or legal advice. It also sets the stage for creating smarter AI that can better understand and respond to individual needs.

Abstract

As the use of large language model (LLM) agents continues to grow, their safety vulnerabilities have become increasingly evident. Extensive benchmarks evaluate various aspects of LLM safety by defining the safety relying heavily on general standards, overlooking user-specific standards. However, safety standards for LLM may vary based on a user-specific profiles rather than being universally consistent across all users. This raises a critical research question: Do LLM agents act safely when considering user-specific safety standards? Despite its importance for safe LLM use, no benchmark datasets currently exist to evaluate the user-specific safety of LLMs. To address this gap, we introduce U-SAFEBENCH, the first benchmark designed to assess user-specific aspect of LLM safety. Our evaluation of 18 widely used LLMs reveals current LLMs fail to act safely when considering user-specific safety standards, marking a new discovery in this field. To address this vulnerability, we propose a simple remedy based on chain-of-thought, demonstrating its effectiveness in improving user-specific safety. Our benchmark and code are available at https://github.com/yeonjun-in/U-SafeBench.

View Paper