Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Junru Lu, Jiarui Qin, Lingfeng Qiao, Yinghui Li, Xinyi Dai, Bo Ke, Jianfeng He, Ruizhi Qiao, Di Yin, Xing Sun, Yunsheng Wu, Yinsong Liu, Shuangyin Liu, Mingkong Tang, Haodong Lin, Jiayi Kuang, Fanxu Meng, Xiaojuan Tang, Yunjia Xi, Junjie Huang, Haotong Yang, Zhenyi Shen

2026-01-01

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Summary

This paper introduces Youtu-LLM, a new language model that's relatively small in size but still very capable, especially when it comes to tasks requiring planning and reasoning like an 'agent'. It aims to show you don't necessarily need a huge model to get smart behavior.

What's the problem?

Many smaller language models are created by simplifying larger ones, which can limit their ability to truly *think* and plan. Existing small models often struggle with tasks that require remembering information over a long period or performing complex, multi-step actions. The challenge is building a small model that can reason, plan, and act effectively without needing massive computing resources.

What's the solution?

The researchers built Youtu-LLM from the ground up, rather than shrinking a bigger model. They focused on a special design called 'Multi-Latent Attention' that allows it to handle very long pieces of text (128k tokens) efficiently. They also carefully chose the data it learned from, starting with general knowledge and gradually moving to more complex science, technology, engineering, and math (STEM) problems and tasks where the model needs to act like an agent. Finally, they specifically trained it on examples of planning and reflecting on its actions in areas like math, coding, and using tools.

Why it matters?

Youtu-LLM demonstrates that it's possible to create a small language model that performs surprisingly well, even beating larger models on certain tasks that require 'agentic' abilities – meaning the ability to plan, reason, and take action. This is important because smaller models are cheaper to run and can be used in more places, like on phones or devices with limited processing power, without sacrificing intelligence.

Abstract

We introduce Youtu-LLM, a lightweight yet powerful language model that harmonizes high computational efficiency with native agentic intelligence. Unlike typical small models that rely on distillation, Youtu-LLM (1.96B) is pre-trained from scratch to systematically cultivate reasoning and planning capabilities. The key technical advancements are as follows: (1) Compact Architecture with Long-Context Support: Built on a dense Multi-Latent Attention (MLA) architecture with a novel STEM-oriented vocabulary, Youtu-LLM supports a 128k context window. This design enables robust long-context reasoning and state tracking within a minimal memory footprint, making it ideal for long-horizon agent and reasoning tasks. (2) Principled "Commonsense-STEM-Agent" Curriculum: We curated a massive corpus of approximately 11T tokens and implemented a multi-stage training strategy. By progressively shifting the pre-training data distribution from general commonsense to complex STEM and agentic tasks, we ensure the model acquires deep cognitive abilities rather than superficial alignment. (3) Scalable Agentic Mid-training: Specifically for the agentic mid-training, we employ diverse data construction schemes to synthesize rich and varied trajectories across math, coding, and tool-use domains. This high-quality data enables the model to internalize planning and reflection behaviors effectively. Extensive evaluations show that Youtu-LLM sets a new state-of-the-art for sub-2B LLMs. On general benchmarks, it achieves competitive performance against larger models, while on agent-specific tasks, it significantly surpasses existing SOTA baselines, demonstrating that lightweight models can possess strong intrinsic agentic capabilities.

View Paper