Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng
2025-02-06
Summary
This paper talks about a new method called Token Assorted that improves how AI language models think and reason. It combines regular text with special compressed tokens to make the reasoning process more efficient and effective.
What's the problem?
Large language models are good at reasoning when they use a step-by-step thought process, but this approach requires a lot of words and takes up a lot of computing power. Many of these words are just there to make the text flow smoothly and aren't actually crucial for the reasoning itself.
What's the solution?
The researchers created a hybrid system that uses both regular text and special compressed tokens to represent the reasoning process. They used a technique called VQ-VAE to create these compressed tokens, which contain important information in a more compact form. They tested this method in two ways: by training a new model from scratch on a maze-solving problem, and by fine-tuning existing models on logic and math problems. To make the learning process smoother, they randomly mixed the compressed tokens with regular text during training.
Why it matters?
This research matters because it could make AI language models better at solving complex problems while using less computing power. By compressing part of the reasoning process, the models can work more efficiently without losing their ability to think through problems step-by-step. This could lead to smarter, faster AI systems that can handle a wider range of tasks, from solving math problems to making logical decisions.
Abstract
Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data, where the step-by-step thought process is explicitly outlined by text tokens. However, this results in lengthy inputs where many words support textual coherence rather than core reasoning information, and processing these inputs consumes substantial computation resources. In this work, we propose a hybrid representation of the reasoning process, where we partially abstract away the initial reasoning steps using latent discrete tokens generated by VQ-VAE, significantly reducing the length of reasoning traces. We explore the use of latent trace abstractions in two scenarios: 1) training the model from scratch for the Keys-Finding Maze problem, 2) fine-tuning LLMs on this hybrid data with an extended vocabulary including unseen latent tokens, for both logical and mathematical reasoning problems. To facilitate effective learning, we introduce a simple training procedure that randomly mixes latent and text tokens, which enables fast adaptation to new latent tokens. Our approach consistently outperforms the baselines methods in various benchmarks.