GenCtrl -- A Formal Controllability Toolkit for Generative Models

Emily Cheng, Carmen Amo Alonso, Federico Danieli, Arno Blaas, Luca Zappella, Pau Rodriguez, Xavier Suau

2026-01-12

GenCtrl -- A Formal Controllability Toolkit for Generative Models

Summary

This research investigates whether we can truly control the output of powerful AI models, like those used to generate text or images, and to what extent. It doesn't just look at *how* to control them, but asks if control is even fundamentally possible.

What's the problem?

As AI models become more common, people want to be able to precisely guide what they create. There are many techniques for trying to do this, like carefully crafting instructions or retraining the model, but no one has really proven whether these methods actually give us reliable control. We don't know the limits of what we can ask these models to do, or if they'll always surprise us with unexpected results.

What's the solution?

The researchers developed a mathematical way to figure out the 'controllable set' of a model – basically, all the outputs you can reliably get it to produce. They created an algorithm that estimates this set by testing the model and providing guarantees about how accurate that estimate is, even without knowing exactly how the model works internally. They tested this on both language models (for text) and models that create images from text.

Why it matters?

This work is important because it shows that controlling AI models is often more difficult and fragile than we think. It suggests we need to spend less time just *trying* to control these models and more time understanding their fundamental limitations. This understanding is crucial for building AI systems that are safe, reliable, and behave as we intend.

Abstract

As generative models become ubiquitous, there is a critical need for fine-grained control over the generation process. Yet, while controlled generation methods from prompting to fine-tuning proliferate, a fundamental question remains unanswered: are these models truly controllable in the first place? In this work, we provide a theoretical framework to formally answer this question. Framing human-model interaction as a control process, we propose a novel algorithm to estimate the controllable sets of models in a dialogue setting. Notably, we provide formal guarantees on the estimation error as a function of sample complexity: we derive probably-approximately correct bounds for controllable set estimates that are distribution-free, employ no assumptions except for output boundedness, and work for any black-box nonlinear control system (i.e., any generative model). We empirically demonstrate the theoretical framework on different tasks in controlling dialogue processes, for both language models and text-to-image generation. Our results show that model controllability is surprisingly fragile and highly dependent on the experimental setting. This highlights the need for rigorous controllability analysis, shifting the focus from simply attempting control to first understanding its fundamental limits.

View Paper