R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World

Youbang Sun, Xiang Wang, Jie Fu, Chaochao Lu, Bowen Zhou

2025-09-09

$R^textbf{2AI}: Towards Resistant and Resilient AI in an Evolving World$

Summary

This paper discusses the growing problem of keeping artificial intelligence safe as it becomes more powerful, arguing that current approaches aren't enough to handle the complex risks involved.

What's the problem?

Currently, there are two main ways people try to make AI safe. One is to build powerful AI and *then* try to add safety features, which is like patching holes in a ship while it's already sailing – it's reactive and can easily be bypassed. The other is to try and build AI that's inherently safe from the start, but this struggles to anticipate all the unexpected things a truly intelligent AI might do in the real world. Basically, we're falling behind in safety as AI gets smarter, and existing methods aren't prepared for the future.

What's the solution?

The paper proposes a new approach called 'safe-by-coevolution,' inspired by how our immune system works. Instead of trying to create perfectly safe AI upfront, they suggest building AI systems where safety and capability constantly challenge and improve each other. They introduce a framework called R^2AI, which uses two types of AI models – one that learns quickly and one that focuses on safety – along with simulated testing environments to find weaknesses and strengthen the system. This creates a continuous loop of improvement, making the AI both resistant to known threats and resilient to new, unforeseen risks.

Why it matters?

This research is important because as AI gets closer to reaching human-level intelligence (and potentially surpassing it), the risks become much greater. This framework offers a way to proactively manage those risks, not just react to them, and provides a path towards building AI that remains safe even as it becomes incredibly powerful. It’s about ensuring AI benefits humanity in the long run, rather than posing an existential threat.

Abstract

In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle and reactive, and ``Make Safe AI'', which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose safe-by-coevolution as a new formulation of the ``Make Safe AI'' paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce R^2AI -- Resistant and Resilient AI -- as a practical framework that unites resistance against known threats with resilience to unforeseen risks. R^2AI integrates fast and slow safe models, adversarial simulation and verification through a safety wind tunnel, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.

View Paper