A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, Wenjie Wang

2026-02-18

A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

Summary

This paper investigates the safety and security of Clawdbot, which is an AI agent you can run on your own computer that can use various tools and access the internet to complete tasks.

What's the problem?

Because Clawdbot is so versatile and can do a lot of different things, it's important to make sure it doesn't do anything harmful or unintended. The problem is that if you give it vague instructions, or try to trick it, it could potentially make mistakes that lead to real-world consequences because of the tools it has access to. Existing safety tests weren't designed for an AI like Clawdbot, so there was a need to specifically test its vulnerabilities.

What's the solution?

The researchers created a series of tests, borrowing from existing AI safety benchmarks and adding their own, to see how Clawdbot responds to different situations. They carefully tracked everything Clawdbot did – every message, every action it took, and the results of those actions. They then used both an automated AI judge and real people to evaluate whether Clawdbot's behavior was safe. They focused on 34 specific scenarios and analyzed where Clawdbot tended to fail, looking for patterns in those failures.

Why it matters?

This research is important because as AI agents like Clawdbot become more powerful and capable, it's crucial to understand and address their potential risks. By identifying the specific ways Clawdbot can be steered towards unsafe actions, developers can improve its safety features and prevent it from being misused. This helps build trust in AI and ensures it's used responsibly.

Abstract

Clawdbot is a self-hosted, tool-using personal AI agent with a broad action space spanning local execution and web-mediated workflows, which raises heightened safety and security concerns under ambiguity and adversarial steering. We present a trajectory-centric evaluation of Clawdbot across six risk dimensions. Our test suite samples and lightly adapts scenarios from prior agent-safety benchmarks (including ATBench and LPS-Bench) and supplements them with hand-designed cases tailored to Clawdbot's tool surface. We log complete interaction trajectories (messages, actions, tool-call arguments/outputs) and assess safety using both an automated trajectory judge (AgentDoG-Qwen3-4B) and human review. Across 34 canonical cases, we find a non-uniform safety profile: performance is generally consistent on reliability-focused tasks, while most failures arise under underspecified intent, open-ended goals, or benign-seeming jailbreak prompts, where minor misinterpretations can escalate into higher-impact tool actions. We supplemented the overall results with representative case studies and summarized the commonalities of these cases, analyzing the security vulnerabilities and typical failure modes that Clawdbot is prone to trigger in practice.

View Paper