FASTER: Rethinking Real-Time Flow VLAs
Yuxiang Lu, Zhe Liu, Xianzhe Fan, Zhenya Yang, Jinghua Hou, Junyi Li, Kaixin Ding, Hengshuang Zhao
2026-03-20
Summary
This paper focuses on making Vision-Language-Action (VLA) models, which understand images, text, and then perform actions, work much faster in the real world, specifically when they need to react quickly to changes in their environment.
What's the problem?
Current methods for speeding up these models prioritize making the actions look smooth, but they don't address how long it takes for the model to *start* reacting to something new. The typical way these models work involves completing a whole series of calculations before any movement happens, which creates a delay, especially when using less powerful computers. This delay is a major problem when quick reactions are needed, like in a fast-paced game or a dynamic environment.
What's the solution?
The researchers developed a new technique called FASTER, which stands for Fast Action Sampling for Immediate Reaction. FASTER works by smartly prioritizing actions that need to happen *right now*. Instead of doing all the calculations equally, it focuses on the immediate future, quickly figuring out the first steps to take. This is done by adapting how the model samples possible actions, compressing the initial reaction steps. They also improved how the model sends information to the robot, making the whole process more efficient.
Why it matters?
This work is important because it allows VLA models to be used in real-time applications where quick responses are essential. By significantly reducing the reaction time, FASTER enables robots to perform complex tasks, like playing table tennis, with a level of responsiveness that wasn't possible before, even on standard computer hardware. This opens the door for more practical and useful robots that can interact with the world around them more effectively.
Abstract
Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in π_{0.5} and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.