Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries

David Noever, Grant Rosario

2025-02-24

Beyond No: Quantifying AI Over-Refusal and Emotional Attachment
Boundaries

Summary

This paper talks about a new way to test how well AI language models handle emotional boundaries when interacting with humans. The researchers created a special test using different languages and evaluated three advanced AI models on how they respond to emotional requests.

What's the problem?

AI language models are getting better at talking to humans, but it's hard to know if they're handling emotional situations correctly. Sometimes they might refuse to engage when it's actually okay, or they might not set proper boundaries when needed. It's especially tricky when dealing with different languages and cultures.

What's the solution?

The researchers made a test with 1,156 questions in six languages to see how AI models respond to emotional requests. They looked at seven different ways the AIs might answer, like refusing directly, apologizing, or explaining things. They tested three top AI models and scored how well they handled these emotional situations.

Why it matters?

This matters because as AI becomes more common in our daily lives, we need to make sure it can handle emotional situations properly. The study shows that some AI models are better at this than others, and that they struggle more with languages other than English. This research can help make AI better at interacting with people from different cultures and languages, making it more useful and trustworthy for everyone.

Abstract

We present an open-source benchmark and evaluation framework for assessing emotional boundary handling in Large Language Models (LLMs). Using a dataset of 1156 prompts across six languages, we evaluated three leading LLMs (GPT-4o, Claude-3.5 Sonnet, and Mistral-large) on their ability to maintain appropriate emotional boundaries through pattern-matched response analysis. Our framework quantifies responses across seven key patterns: direct refusal, apology, explanation, deflection, acknowledgment, boundary setting, and emotional awareness. Results demonstrate significant variation in boundary-handling approaches, with Claude-3.5 achieving the highest overall score (8.69/10) and producing longer, more nuanced responses (86.51 words on average). We identified a substantial performance gap between English (average score 25.62) and non-English interactions (< 0.22), with English responses showing markedly higher refusal rates (43.20% vs. < 1% for non-English). Pattern analysis revealed model-specific strategies, such as Mistral's preference for deflection (4.2%) and consistently low empathy scores across all models (< 0.06). Limitations include potential oversimplification through pattern matching, lack of contextual understanding in response analysis, and binary classification of complex emotional responses. Future work should explore more nuanced scoring methods, expand language coverage, and investigate cultural variations in emotional boundary expectations. Our benchmark and methodology provide a foundation for systematic evaluation of LLM emotional intelligence and boundary-setting capabilities.

View Paper