Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach
Yuchen Wu, Edward Sun, Kaijie Zhu, Jianxun Lian, Jose Hernandez-Orallo, Aylin Caliskan, Jindong Wang
2025-05-29
Summary
This paper talks about new ways to make large language models (LLMs) safer by personalizing their responses based on individual users, using two systems called PENGUIN and RAISE.
What's the problem?
The problem is that while LLMs are powerful, they sometimes give unsafe or inappropriate answers, and current safety methods are usually one-size-fits-all, not taking into account different users’ needs or backgrounds. This can lead to responses that are either too strict or not safe enough for certain people.
What's the solution?
To solve this, the researchers introduced a benchmark to test personalized safety and created planning-based agent systems that use information about each user to adjust the model’s responses. The best part is that these improvements work without having to retrain the entire language model, making it much easier to use.
Why it matters?
This is important because it helps AI systems become safer and more respectful for everyone, making them more trustworthy and useful in real-world situations where people have different expectations and needs.
Abstract
Introducing personalized safety for LLMs through PENGUIN and RAISE frameworks Enhances safety scores by leveraging user-specific information without retraining models.