Composer 2 Technical Report

Cursor Research, Aaron Chan, Ahmed Shalaby, Alexander Wettig, Aman Sanger, Andrew Zhai, Anurag Ajay, Ashvin Nair, Charlie Snell, Chen Lu, Chen Shen, Emily Jia, Federico Cassano, Hanpeng Liu, Haoyu Chen, Henry Wildermuth, Jacob Jackson, Janet Li, Jediah Katz, Jiajun Yao, Joey Hejna, Josh Warner

2026-03-30

Summary

This paper introduces Composer 2, a new artificial intelligence model specifically built for helping with software engineering tasks, like writing and planning code.

What's the problem?

Creating AI that can reliably write complex code and handle real-world software development challenges is difficult. Existing models often struggle with long-term planning, making mistakes during coding, and maintaining consistency when working on larger projects. Essentially, they weren't very good at acting like a helpful coding assistant over extended periods.

What's the solution?

The researchers trained Composer 2 in two steps. First, they gave it more general knowledge and improved its basic coding skills through continued pretraining. Then, they used a technique called reinforcement learning, where the model learns by trying to solve coding problems and getting rewarded for success. Importantly, they trained it using the same tools and environment a real programmer would use, and tested it on problems taken from actual software projects. They also created a new benchmark, CursorBench, to specifically test its abilities on these kinds of tasks.

Why it matters?

Composer 2 represents a significant step forward in AI-assisted software engineering. It performs much better than previous versions of Composer and competes with the best coding AI models available. This work also shows a successful method for creating specialized AI models that excel in specific fields, like coding, by focusing training on realistic tasks and environments.

Abstract

Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to improve the model's knowledge and latent coding ability, followed by large-scale reinforcement learning to improve end-to-end coding performance through stronger reasoning, accurate multi-step execution, and coherence on long-horizon realistic coding problems. We develop infrastructure to support training in the same Cursor harness that is used by the deployed model, with equivalent tools and structure, and use environments that match real problems closely. To measure the ability of the model on increasingly difficult tasks, we introduce a benchmark derived from real software engineering problems in large codebases including our own. Composer 2 is a frontier-level coding model and demonstrates a process for training strong domain-specialized models. On our CursorBench evaluations the model achieves a major improvement in accuracy compared to previous Composer models (61.3). On public benchmarks the model scores 61.7 on Terminal-Bench and 73.7 on SWE-bench Multilingual in our harness, comparable to state-of-the-art systems.

View Paper