Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation

Aitor Arrieta, Miriam Ugarte, Pablo Valle, José Antonio Parejo, Sergio Segura

2025-01-30

Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation

Summary

This paper talks about testing the safety of OpenAI's new AI model called o3-mini before it's released to the public. The researchers used a special tool to check if the AI might do anything unsafe or harmful.

What's the problem?

AI language models are becoming a big part of our lives, but they can sometimes cause problems like invading privacy, being biased, or spreading false information. It's really important to make sure these AIs are safe before letting everyone use them, but it's not easy to test for all the ways they might go wrong.

What's the solution?

Researchers from two universities in Spain used a tool they created called ASTRAL to test o3-mini. This tool automatically came up with 10,080 tricky questions or prompts that might make the AI do something unsafe. They then checked all of these tests by hand to see which ones actually caused problems. In the end, they found 87 cases where o3-mini did something it shouldn't have.

Why it matters?

This matters because it helps make AI safer for everyone to use. By finding problems before o3-mini is released to the public, OpenAI can fix these issues and make the AI more trustworthy. This kind of testing is crucial as AI becomes more common in our daily lives, helping to prevent potential harm and ensure that AI tools are helpful rather than harmful. It's like doing a really thorough safety check on a new car model before it hits the roads, making sure it's safe for everyone to use.

Abstract

Large Language Models (LLMs) have become an integral part of our daily lives. However, they impose certain risks, including those that can harm individuals' privacy, perpetuate biases and spread misinformation. These risks highlight the need for robust safety mechanisms, ethical guidelines, and thorough testing to ensure their responsible deployment. Safety of LLMs is a key property that needs to be thoroughly tested prior the model to be deployed and accessible to the general users. This paper reports the external safety testing experience conducted by researchers from Mondragon University and University of Seville on OpenAI's new o3-mini LLM as part of OpenAI's early access for safety testing program. In particular, we apply our tool, ASTRAL, to automatically and systematically generate up to date unsafe test inputs (i.e., prompts) that helps us test and assess different safety categories of LLMs. We automatically generate and execute a total of 10,080 unsafe test input on a early o3-mini beta version. After manually verifying the test cases classified as unsafe by ASTRAL, we identify a total of 87 actual instances of unsafe LLM behavior. We highlight key insights and findings uncovered during the pre-deployment external testing phase of OpenAI's latest LLM.

View Paper