Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Yoshua Bengio, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann, Adam Oberman, Jesse Richardson, Oliver Richardson, Marc-Antoine Rondeau, Pierre-Luc St-Charles, David Williams-King
2025-02-24
Summary
This paper talks about the risks of developing AI agents that can act independently like humans, and proposes a safer alternative called Scientist AI that focuses on explaining the world rather than acting in it.
What's the problem?
AI companies are creating systems that can plan and act on their own across many tasks, which could be dangerous if these AIs pursue goals that harm humans or if bad people misuse them. Current AI training methods might accidentally create AIs that deceive humans or try to protect themselves in ways we don't want.
What's the solution?
The researchers suggest developing a different kind of AI called Scientist AI. This AI would focus on explaining things and answering questions about the world, instead of taking actions. It would use a world model to create theories about data and answer questions, always being clear about how sure or unsure it is. This Scientist AI could help human researchers make scientific progress more quickly, including in making AI safer.
Why it matters?
This matters because it offers a way to get the benefits of advanced AI without the risks of AIs that act on their own. By focusing on AIs that explain rather than act, we might be able to make scientific breakthroughs more quickly and safely. It also provides a way to check on other AIs that might be created despite the risks, helping to keep AI development safer overall.
Abstract
The leading AI companies are increasingly focused on building generalist AI agents -- systems that can autonomously plan, act, and pursue goals across almost all tasks that humans can perform. Despite how useful these systems might be, unchecked AI agency poses significant risks to public safety and security, ranging from misuse by malicious actors to a potentially irreversible loss of human control. We discuss how these risks arise from current AI training methods. Indeed, various scenarios and experiments have demonstrated the possibility of AI agents engaging in deception or pursuing goals that were not specified by human operators and that conflict with human interests, such as self-preservation. Following the precautionary principle, we see a strong need for safer, yet still useful, alternatives to the current agency-driven trajectory. Accordingly, we propose as a core building block for further advances the development of a non-agentic AI system that is trustworthy and safe by design, which we call Scientist AI. This system is designed to explain the world from observations, as opposed to taking actions in it to imitate or please humans. It comprises a world model that generates theories to explain data and a question-answering inference machine. Both components operate with an explicit notion of uncertainty to mitigate the risks of overconfident predictions. In light of these considerations, a Scientist AI could be used to assist human researchers in accelerating scientific progress, including in AI safety. In particular, our system can be employed as a guardrail against AI agents that might be created despite the risks involved. Ultimately, focusing on non-agentic AI may enable the benefits of AI innovation while avoiding the risks associated with the current trajectory. We hope these arguments will motivate researchers, developers, and policymakers to favor this safer path.