Struct-Bench: A Benchmark for Differentially Private Structured Text Generation

Shuaiqi Wang, Vikas Raunak, Arturs Backurs, Victor Reis, Pei Zhou, Sihao Chen, Longqi Yang, Zinan Lin, Sergey Yekhanin, Giulia Fanti

2025-09-17

Struct-Bench: A Benchmark for Differentially Private Structured Text Generation

Summary

This paper introduces a new way to test how well artificial datasets protect privacy while still being useful, specifically when dealing with data that's organized in tables and includes text like you'd find in a business setting.

What's the problem?

Creating fake data that mimics real data is useful for things like training computer models without revealing private information. However, existing methods for checking the quality of this fake data don't work well when the real data is structured – think spreadsheets or databases – and contains natural language. They struggle to verify if the relationships *within* the data are accurately copied, which is crucial for the fake data to be helpful.

What's the solution?

The researchers developed a tool called Struct-Bench. This tool asks users to describe the structure of their data using a kind of coding language called a Context-Free Grammar. They then created a set of seven datasets, some real and some created specifically for testing, and used Struct-Bench to evaluate how well different methods create private, synthetic data. They also built a public leaderboard so researchers can compare their methods.

Why it matters?

This work is important because it provides a standardized way to evaluate privacy-preserving synthetic data, especially for the kind of structured data commonly found in businesses. By identifying the weaknesses of current methods, it encourages the development of better techniques for generating useful fake data that protects sensitive information, and the public leaderboard fosters competition and progress in the field.

Abstract

Differentially private (DP) synthetic data generation is a promising technique for utilizing private datasets that otherwise cannot be exposed for model training or other analytics. While much research literature has focused on generating private unstructured text and image data, in enterprise settings, structured data (e.g., tabular) is more common, often including natural language fields or components. Existing synthetic data evaluation techniques (e.g., FID) struggle to capture the structural properties and correlations of such datasets. In this work, we propose Struct-Bench, a framework and benchmark for evaluating synthetic datasets derived from structured datasets that contain natural language data. The Struct-Bench framework requires users to provide a representation of their dataset structure as a Context-Free Grammar (CFG). Our benchmark comprises 5 real-world and 2 synthetically generated datasets, each annotated with CFGs. We show that these datasets demonstrably present a great challenge even for state-of-the-art DP synthetic data generation methods. Struct-Bench also includes reference implementations of different metrics and a leaderboard, thereby providing researchers a standardized evaluation platform to benchmark and investigate privacy-preserving synthetic data generation methods. Further, we also present a case study showing how to use Struct-Bench to improve the synthetic data quality of Private Evolution (PE) on structured data. The benchmark and the leaderboard have been publicly made available at https://struct-bench.github.io.

View Paper