ACG: Action Coherence Guidance for Flow-based VLA models
Minho Park, Kinam Kim, Junha Hyung, Hyojin Jang, Hoiyeong Jin, Jooyeol Yun, Hojoon Lee, Jaegul Choo
2025-10-28
Summary
This paper focuses on improving how robots follow instructions based on both what they see and what they're told to do, specifically when learning from watching humans. It addresses a common issue where robots mimic human movements too closely, including imperfections, which leads to clumsy and inaccurate actions.
What's the problem?
When robots learn by watching humans perform tasks, they often copy everything, even the little mistakes like shaky hands or pauses. This 'noise' in the human demonstrations makes the robot's movements less smooth and consistent, causing it to drift off course and fail, especially in tasks that require precise movements like carefully manipulating objects. Essentially, the robot is too literal and doesn't understand the *intent* behind the action, just the exact way a human did it.
What's the solution?
The researchers developed a method called Action Coherence Guidance (ACG) that helps the robot smooth out its actions *while* it's performing a task, without needing any extra training. It's like a real-time 'coach' that encourages the robot to make its movements more fluid and purposeful, ignoring the small imperfections it learned from the human demonstrations. This guidance is applied during the robot's operation, not during the learning phase.
Why it matters?
This work is important because it makes robots much more reliable at performing complex tasks, particularly those requiring fine motor skills. By improving the smoothness and accuracy of robot movements, ACG allows robots to successfully complete tasks in various environments, including real-world scenarios, and opens the door for more practical applications of robot assistants.
Abstract
Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and project page are available at https://github.com/DAVIAN-Robotics/ACG and https://DAVIAN-Robotics.github.io/ACG , respectively.