Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Ailin Huang, Ang Li, Aobo Kong, Bin Wang, Binxing Jiao, Bo Dong, Bojun Wang, Boyu Chen, Brian Li, Buyun Ma, Chang Su, Changxin Miao, Changyi Wan, Chao Lou, Chen Hu, Chen Xu, Chenfeng Yu, Chengting Feng, Chengyuan Yao, Chunrui Han, Dan Ma, Dapeng Shi

2026-02-12

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Summary

This paper introduces Step 3.5 Flash, a new artificial intelligence model designed to be both incredibly smart and efficient in how it uses computing power. It's built to be a strong foundation for creating helpful 'agents' – AI systems that can reason and take actions.

What's the problem?

Creating AI agents that can truly think and solve complex problems requires huge models, which are typically very slow and expensive to run. Existing models either sacrifice intelligence for speed, or are too resource-intensive for practical use in many real-world situations. The challenge is to build an agent that’s both highly capable *and* doesn’t require massive amounts of computing power to operate.

What's the solution?

The researchers developed Step 3.5 Flash by using a technique called a 'Mixture-of-Experts' model, which allows the AI to focus only on the parts of the problem it's best at handling. They reduced the number of active components during operation to 11 billion, while still maintaining a total model size of 196 billion parameters. They also improved how the model processes information, making it faster and more efficient, especially when having ongoing conversations or completing multi-step tasks. Finally, they created a way to train the AI using both direct feedback and human preferences, allowing it to continuously improve its skills in areas like math, coding, and using tools.

Why it matters?

Step 3.5 Flash is important because it demonstrates that it’s possible to create AI agents that are as good as the very best models currently available, but without the same enormous computational costs. This makes it more practical to deploy these advanced AI systems in real-world applications, like helping businesses automate tasks or providing personalized assistance, opening the door for wider use of powerful AI.

Abstract

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.

View Paper