Societal Alignment Frameworks Can Improve LLM Alignment
Karolina Stańczak, Nicholas Meade, Mehar Bhatia, Hattie Zhou, Konstantin Böttinger, Jeremy Barnes, Jason Stanley, Jessica Montgomery, Richard Zemel, Nicolas Papernot, Nicolas Chapados, Denis Therien, Timothy P. Lillicrap, Ana Marasović, Sylvie Delacroix, Gillian K. Hadfield, Siva Reddy
2025-03-05
Summary
This paper talks about how to make AI language models (LLMs) better at understanding and following human values and expectations, a process called alignment. It suggests using ideas from how societies and people work together to improve this process.
What's the problem?
Current methods for aligning AI models with human values are too narrow and don't capture the full complexity of what people want and expect. It's like trying to write a rulebook that covers every possible situation, which is impossible.
What's the solution?
The researchers propose using ideas from how societies, economies, and contracts work to make AI alignment better. They suggest that instead of trying to make perfect rules for AI, we should embrace the fact that not everything can be perfectly defined. They also recommend involving more people in the process of designing how AI should behave.
Why it matters?
This matters because as AI becomes more powerful and widely used, it's crucial that it understands and respects human values. By using broader, more flexible approaches to alignment, we could create AI systems that are better at adapting to different situations and meeting people's needs. This could lead to safer, more helpful AI that works well in complex real-world scenarios.
Abstract
Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.