CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

Anirudh Khatry, Robert Zhang, Jia Pan, Ziteng Wang, Qiaochu Chen, Greg Durrett, Isil Dillig

2025-04-24

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

Summary

This paper talks about CRUST-Bench, a new dataset created to test how well AI models can convert code written in the C programming language into safe Rust code, which is known for preventing certain types of bugs.

What's the problem?

The problem is that translating C code to Rust isn't just about changing the words—Rust is designed to be much safer, especially when it comes to memory errors, but it also has its own style and best practices. Most AI models struggle to make this conversion in a way that keeps the code both safe and easy to read.

What's the solution?

The researchers built CRUST-Bench, a collection of real C code examples and their ideal Rust translations, and used it to test and compare different AI models. This benchmark helps show where the models do well and where they still make mistakes, especially when it comes to following Rust's safety rules and writing code that looks natural to Rust programmers.

Why it matters?

This matters because making it easier and safer to convert old C code to Rust could help prevent bugs and security problems in software that people rely on every day. CRUST-Bench also gives researchers a clear way to measure progress and improve future AI models for code translation.

Abstract

A dataset, CRUST-Bench, evaluates the ability of large language models to transpile C to safe Rust, highlighting the challenges in maintaining idiomatic patterns and memory safety.

View Paper