Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security
Gabriel Chua
2025-07-29
Summary
This paper talks about CIRCLE, a simple test system designed to check the security risks of large language models (LLMs) that can run code directly.
What's the problem?
The problem is that when LLMs run code, they can be tricked into creating programs that use too much computer power like CPU, memory, or disk space, which can crash systems or cause security issues. These risks are different and more serious than usual text-based problems.
What's the solution?
CIRCLE solves this by offering a benchmark with many test prompts specifically designed to see how well LLMs handle requests that try to overload resources. It checks if the models refuse dangerous requests, create safe code instead, or fail to stop risky actions. This helps reveal where models are vulnerable.
Why it matters?
This matters because as more AI models run code on real computers, knowing their security weaknesses is crucial to prevent attacks and failures. CIRCLE helps developers make safer AI systems by providing clear ways to test and improve security protections.
Abstract
CIRCLE evaluates interpreter-specific cybersecurity risks in large language models by testing their responses to prompts that target resource exhaustion, revealing significant vulnerabilities and the need for dedicated mitigation tools.