Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
Qi Li, Bo Yin, Weiqi Huang, Ruhao Liu, Bojun Zou, Runpeng Yu, Jingwen Ye, Weihao Yu, Xinchao Wang
2026-04-28
Summary
This paper is a comprehensive overview of the safety concerns that arise with Vision-Language-Action (VLA) models, which are AI systems that can understand both images and language to perform actions in the real world.
What's the problem?
As VLA models become more capable, they also introduce new safety risks that aren't present in simpler AI systems like those that just process text. These risks are due to the fact that VLAs physically interact with the world, meaning mistakes can have real-world consequences. There are multiple ways these systems can be attacked – through manipulated images, deceptive language, or even by exploiting how the AI learns. Existing research on AI safety is spread across different fields, making it hard to get a complete picture of these risks and how to address them. The speed at which these systems need to react also makes defending against attacks difficult.
What's the solution?
The paper organizes the safety challenges of VLA models by looking at *when* an attack can happen (during training or while the system is running) and *when* defenses can be applied (also during training or runtime). It then examines existing research in four main areas: how VLAs can be attacked, how to defend against those attacks, how to evaluate the safety of these systems, and the safety issues that come up when deploying them in real-world situations. It covers threats like intentionally misleading the AI with fake data or subtly altering images to cause errors, and discusses potential defenses.
Why it matters?
This work is important because VLA models are becoming increasingly common in robots and other systems that interact with the physical world. Understanding and addressing their safety vulnerabilities is crucial to prevent accidents, misuse, and ensure these technologies are used responsibly. The paper identifies key areas where more research is needed, like creating ways to guarantee the AI will behave safely even in unexpected situations and developing defenses that can work in real-time.
Abstract
Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language, and state, real-time latency constraints on defense, error propagation over long-horizon trajectories, and vulnerabilities in the data supply chain. Yet the literature remains fragmented across robotic learning, adversarial machine learning, AI alignment, and autonomous systems safety. This survey provides a unified and up-to-date overview of safety in Vision-Language-Action models. We organize the field along two parallel timing axes, attack timing (training-time vs. inference-time and defense timing (training-time vs. inference-time, linking each class of threat to the stage at which it can be mitigated. We first define the scope of VLA safety, distinguishing it from text-only LLM safety and classical robotic safety, and review the foundations of VLA models, including architectures, training paradigms, and inference mechanisms. We then examine the literature through four lenses: Attacks, Defenses, Evaluation, and Deployment. We survey training-time threats such as data poisoning and backdoors, as well as inference-time attacks including adversarial patches, cross-modal perturbations, semantic jailbreaks, and freezing attacks. We review training-time and runtime defenses, analyze existing benchmarks and metrics, and discuss safety challenges across six deployment domains. Finally, we highlight key open problems, including certified robustness for embodied trajectories, physically realizable defenses, safety-aware training, unified runtime safety architectures, and standardized evaluation.