< Explain other AI papers

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Dasol Choi, DongGeon Lee, Brigitta Jesica Kartono, Helena Berndt, Taeyoun Kwon, Joonwon Jang, Haon Park, Hwanjo Yu, Minsuk Kahng

2026-01-06

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Summary

This paper is about how well large language models, the AI behind things like chatbots, follow the specific rules companies want them to follow, not just general safety guidelines.

What's the problem?

Currently, when we test if these AI models are 'safe,' we mostly check if they avoid things like hate speech or giving dangerous advice. But what if a company has its *own* rules – like 'don't discuss confidential project X' or 'always recommend product Y'? Existing tests don't really check if the AI follows *those* specific company policies, and this is a big problem as these models get used in sensitive areas like healthcare and finance where following company rules is crucial.

What's the solution?

The researchers created a new testing system called COMPASS. It's designed to specifically check if an AI model obeys a company's 'allowlist' (things it *should* do) and 'denylist' (things it *shouldn't* do). They tested seven different AI models with almost 6,000 carefully designed questions, including some tricky ones meant to try and get the AI to break the rules. They looked at how well the AI handled normal requests versus how well it avoided forbidden topics.

Why it matters?

The tests showed that AI models are really good at doing what they're supposed to, but surprisingly bad at *avoiding* what they're not supposed to. They often ignored the 'denylist' rules. This means current AI isn't reliable enough to use in situations where following company policies is essential, and COMPASS provides a way to evaluate and improve AI safety for specific organizations.

Abstract

As large language models are deployed in high-stakes enterprise applications, from healthcare to finance, ensuring adherence to organization-specific policies has become essential. Yet existing safety evaluations focus exclusively on universal harms. We present COMPASS (Company/Organization Policy Alignment Assessment), the first systematic framework for evaluating whether LLMs comply with organizational allowlist and denylist policies. We apply COMPASS to eight diverse industry scenarios, generating and validating 5,920 queries that test both routine compliance and adversarial robustness through strategically designed edge cases. Evaluating seven state-of-the-art models, we uncover a fundamental asymmetry: models reliably handle legitimate requests (>95% accuracy) but catastrophically fail at enforcing prohibitions, refusing only 13-40% of adversarial denylist violations. These results demonstrate that current LLMs lack the robustness required for policy-critical deployments, establishing COMPASS as an essential evaluation framework for organizational AI safety.