General Multimodal Protein Design Enables DNA-Encoding of Chemistry
Jarrid Rector-Brooks, Théophile Lambert, Marta Skreta, Daniel Roth, Yueming Long, Zi-Qi Li, Xi Zhang, Miruna Cretu, Francesca-Zhoufan Li, Tanvi Ganapathy, Emily Jin, Avishek Joey Bose, Jason Yang, Kirill Neklyudov, Yoshua Bengio, Alexander Tong, Frances H. Arnold, Cheng-Hao Liu
2026-04-08
Summary
This research introduces a new computer program called DISCO that designs completely new enzymes, proteins that speed up chemical reactions, from scratch. These aren't just tweaked versions of existing enzymes, but truly novel ones capable of performing reactions never seen before in nature.
What's the problem?
Enzymes are amazing at catalyzing reactions, but the types of reactions they can do are limited by what evolution has already discovered. While we can use computers to *design* proteins, existing methods struggle to create functional enzymes without already knowing which parts of the protein will actually do the work – the catalytic residues. Essentially, designing enzymes from the ground up, without pre-defining the key components, is a huge challenge.
What's the solution?
The researchers developed DISCO, a powerful AI model that simultaneously designs both the protein sequence (the order of amino acids) and its 3D structure. It’s trained to focus on the molecules that participate *during* a reaction, called reactive intermediates, and then builds an enzyme around them. DISCO doesn't need to be told where the catalytic parts should be; it figures it out on its own. They also developed ways to refine the designs to make them even better at their job. They tested this by creating enzymes that perform reactions involving carbenes, which are highly reactive molecules, and found they worked surprisingly well.
Why it matters?
DISCO opens up a whole new world of possibilities for biotechnology. By creating enzymes that can do reactions nature hasn't explored, we can potentially synthesize new materials, develop more efficient industrial processes, and even create new medicines. The fact that these enzymes can be further improved through traditional methods like directed evolution means we have a scalable path to even more powerful and versatile biocatalysts, expanding the range of chemical transformations possible with genetically encoded proteins.
Abstract
Evolution is an extraordinary engine for enzymatic diversity, yet the chemistry it has explored remains a narrow slice of what DNA can encode. Deep generative models can design new proteins that bind ligands, but none have created enzymes without pre-specifying catalytic residues. We introduce DISCO (DIffusion for Sequence-structure CO-design), a multimodal model that co-designs protein sequence and 3D structure around arbitrary biomolecules, as well as inference-time scaling methods that optimize objectives across both modalities. Conditioned solely on reactive intermediates, DISCO designs diverse heme enzymes with novel active-site geometries. These enzymes catalyze new-to-nature carbene-transfer reactions, including alkene cyclopropanation, spirocyclopropanation, B-H, and C(sp^3)-H insertions, with high activities exceeding those of engineered enzymes. Random mutagenesis of a selected design further confirmed that enzyme activity can be improved through directed evolution. By providing a scalable route to evolvable enzymes, DISCO broadens the potential scope of genetically encodable transformations. Code is available at https://github.com/DISCO-design/DISCO.