VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

Lesly Miculicich, Mihir Parmar, Hamid Palangi, Krishnamurthy Dj Dvijotham, Mirko Montanari, Tomas Pfister, Long T. Le

2025-10-08

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

Summary

This paper introduces VeriGuard, a new system designed to make AI agents, specifically those powered by large language models, much safer and more reliable when used in important areas like healthcare.

What's the problem?

When we start using AI to make decisions in sensitive fields, there's a real risk that the AI could do something unintended, break privacy rules, or even be tricked into doing harmful things. Current methods aren't good enough at *guaranteeing* that an AI will always behave safely and as expected, leaving room for errors with serious consequences.

What's the solution?

VeriGuard works in two steps. First, it carefully figures out exactly what the AI *should* do and creates a set of rules for safe behavior. Then, it thoroughly tests and mathematically proves that the AI will follow those rules. Second, while the AI is actually running, VeriGuard constantly checks each action the AI wants to take against those pre-approved rules, stopping anything that doesn't fit. This two-part process makes sure the AI stays on track.

Why it matters?

This research is important because it provides a way to build more trustworthy AI systems. By formally verifying that an AI agent is safe *before* and *during* its operation, we can have more confidence in using these powerful tools in areas where mistakes could be dangerous or damaging, like medical diagnosis or financial decisions.

Abstract

The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These agents may deviate from user objectives, violate data handling policies, or be compromised by adversarial attacks. Mitigating these dangers necessitates a mechanism to formally guarantee that an agent's actions adhere to predefined safety constraints, a challenge that existing systems do not fully address. We introduce VeriGuard, a novel framework that provides formal safety guarantees for LLM-based agents through a dual-stage architecture designed for robust and verifiable correctness. The initial offline stage involves a comprehensive validation process. It begins by clarifying user intent to establish precise safety specifications. VeriGuard then synthesizes a behavioral policy and subjects it to both testing and formal verification to prove its compliance with these specifications. This iterative process refines the policy until it is deemed correct. Subsequently, the second stage provides online action monitoring, where VeriGuard operates as a runtime monitor to validate each proposed agent action against the pre-verified policy before execution. This separation of the exhaustive offline validation from the lightweight online monitoring allows formal guarantees to be practically applied, providing a robust safeguard that substantially improves the trustworthiness of LLM agents.

View Paper