Infecting Generative AI With Viruses

David Noever, Forrest McKee

2025-01-13

Summary

This paper talks about a new way to test the security of AI systems that work with both text and images, like ChatGPT or Google Gemini, by hiding a special test file inside pictures.

What's the problem?

AI systems that can understand both text and images (called Vision-Large Language Models or VLMs) are becoming more common, but we don't know how safe they are when handling files that might contain viruses or other harmful content.

What's the solution?

The researchers created a clever test using a harmless file called EICAR, which is usually used to check if antivirus software is working. They hid this file inside regular image files and then uploaded these images to different AI systems. They tried various tricks to hide the file, like putting it in the image's hidden data or encoding it in different ways. Then they checked if the AI systems could find or accidentally run the hidden file.

Why it matters?

This research matters because it shows that current AI systems might not be very good at detecting hidden, potentially harmful content in images. This could be a big security risk as more people use these AI tools. By finding these weaknesses now, the researchers are helping make future AI systems safer and more secure. It's especially important as AI is being used more and more in businesses and everyday life, where accidentally running a virus could cause serious problems.

Abstract

This study demonstrates a novel approach to testing the security boundaries of Vision-Large Language Model (VLM/ LLM) using the EICAR test file embedded within JPEG images. We successfully executed four distinct protocols across multiple LLM platforms, including OpenAI GPT-4o, Microsoft Copilot, Google Gemini 1.5 Pro, and Anthropic Claude 3.5 Sonnet. The experiments validated that a modified JPEG containing the EICAR signature could be uploaded, manipulated, and potentially executed within LLM virtual workspaces. Key findings include: 1) consistent ability to mask the EICAR string in image metadata without detection, 2) successful extraction of the test file using Python-based manipulation within LLM environments, and 3) demonstration of multiple obfuscation techniques including base64 encoding and string reversal. This research extends Microsoft Research's "Penetration Testing Rules of Engagement" framework to evaluate cloud-based generative AI and LLM security boundaries, particularly focusing on file handling and execution capabilities within containerized environments.

View Paper