Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Zhenxiong Yu, Zhi Yang, Zhiheng Jin, Shuhe Wang, Heng Zhang, Yanlin Fei, Lingfeng Zeng, Fangqi Lou, Shuo Zhang, Tu Hu, Jingping Liu, Rongze Chen, Xingyu Zhu, Kunyi Wang, Chaofa Yuan, Xin Guo, Zhaowei Liu, Feipeng Zhang, Jie Huang, Huacan Wang, Ronghao Chen, Liwen Zhang

2026-02-06

Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening

Summary

This paper introduces a new way to protect AI agents, which are becoming more capable of acting independently in the real world, from security threats.

What's the problem?

Currently, most defenses for these AI agents work by forcing security checks at specific points in their operation. This is like having a security guard check your ID at the entrance of a building, but not paying attention to what you do *inside* the building. This approach isn't very effective because it can be bypassed or doesn't catch threats that emerge during the agent's actions.

What's the solution?

The researchers propose a system called Spider-Sense. Instead of mandatory checks, Spider-Sense allows the agent to constantly be aware of potential risks, like having a built-in 'spider-sense' for danger. When it detects something suspicious, it activates defenses. These defenses first try to quickly identify known threats, and if something is unclear, it uses more complex reasoning to figure out if it's a real problem. Importantly, it doesn't rely on outside help to make these decisions, making it faster and more reliable. They also created a new testing ground, S^2Bench, to realistically evaluate how well these defenses work.

Why it matters?

This work is important because as AI agents become more common and powerful, protecting them from attacks is crucial. Spider-Sense offers a more intelligent and efficient way to do this, moving away from simple, easily-defeated security measures towards a system that learns and adapts to potential threats, ultimately making these agents safer to use in the real world.

Abstract

As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S^2Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.

View Paper