InCoder-32B: Code Foundation Model for Industrial Scenarios

Jian Yang, Wei Zhang, Jiajun Wu, Junhang Cheng, Shawn Guo, Haowen Wang, Weicheng Gu, Yaxin Du, Joseph Li, Fanglin Xu, Yizhi Li, Lin Jing, Yuanbo Wang, Yuhan Gao, Ruihao Gong, Chuan Hao, Ran Tao, Aishan Liu, Tuney Zheng, Ganqu Cui, Zhoujun Li, Mingjie Tang

2026-03-18

InCoder-32B: Code Foundation Model for Industrial Scenarios

Summary

This paper introduces InCoder-32B, a new large language model specifically designed for complex coding tasks found in real-world industries, not just general programming.

What's the problem?

Current large language models are really good at writing code in general, but they struggle when you need to consider specific details about computer hardware, specialized programming languages, or limitations on things like memory and processing power – all common issues in professional coding environments.

What's the solution?

The researchers created InCoder-32B from the ground up, using a smart design and a training process that happened in stages. First, it learned from a lot of general code. Then, it was trained on code from specific industries. During training, they gradually increased how much code the model could look at all at once, and they used simulated industrial problems to help it learn to reason. Finally, they tested the code it wrote to make sure it actually worked correctly.

Why it matters?

InCoder-32B is important because it’s the first openly available model of its size that can handle these complex, industry-specific coding challenges. This means it can help developers in areas like chip design, optimizing graphics cards, and creating software for smaller devices, and it provides a strong starting point for others to build upon.

Abstract

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14 mainstream general code benchmarks and 9 industrial benchmarks spanning 4 specialized domains. Results show InCoder-32B achieves highly competitive performance on general tasks while establishing strong open-source baselines across industrial domains.

View Paper