TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Jiaru Zou, Soumya Roy, Vinay Kumar Verma, Ziyi Wang, David Wipf, Pan Lu, Sumit Negi, James Zou, Jingrui He
2025-10-08
Summary
This paper focuses on improving how large reasoning models, specifically those dealing with information in tables, can be guided to make better decisions. It introduces a new method called TaTToo to help these models learn from their reasoning steps when working with tabular data.
What's the problem?
Existing methods for guiding these models, called Process Reward Models, work well with text but struggle when the reasoning involves tables. They have trouble understanding specific table operations like finding relevant parts of a table or understanding the table's structure, which limits their performance on tasks requiring table analysis.
What's the solution?
The researchers developed TaTToo, a new system that specifically focuses on the steps involved in reasoning with tables. They created a large dataset of over 60,000 examples showing good reasoning steps, using tools to verify the accuracy of each step. TaTToo is trained in two phases: first, it learns basic table-use patterns, and then it uses reinforcement learning, guided by the tool-verified rewards, to refine its reasoning process.
Why it matters?
This work is important because it significantly improves the ability of large reasoning models to handle tasks involving tabular data. TaTToo outperforms existing methods, even those much larger in size, and shows it can adapt to different ways of improving model performance, making it a valuable tool for a wide range of data analysis and reasoning applications.
Abstract
Process Reward Models (PRMs) have recently emerged as a powerful framework for enhancing the reasoning capabilities of large reasoning models (LRMs), particularly in the context of test-time scaling (TTS). However, their potential for supervising LRMs on tabular reasoning domains remains underexplored. Through detailed empirical analyses, we identify that existing PRMs, though widely adopted for supervising text-only reasoning steps, struggle with table-specific operations such as sub-table retrieval and schema interaction, leading to critical performance bottlenecks. To address this limitation, we propose TaTToo, a novel table-grounded PRM framework that (i) reasons explicitly over tabular reasoning steps and (ii) integrates tool-based verification to provide precise reward supervision. Concretely, we first design a scalable data curation pipeline that constructs over 60k high-quality step-level annotations by integrating table verification rationales with tool-based executions. Building on the collected data, we train TaTToo with a dual-stage paradigm: cold-start supervised fine-tuning to capture tool-use reasoning patterns, followed by reinforcement learning with tool-grounded reward shaping to align our model with table-based verification. We provide a comprehensive evaluation of the policy improvement induced by our newly designed PRM. Across 5 challenging tabular reasoning benchmarks covering numerical reasoning, fact-checking, and data analysis, TaTToo improves downstream policy LRMs by 30.9% at inference, surpasses strong PRM baselines such as Qwen-2.5-Math-PRM-72B with only 8B parameters, and demonstrates strong generalizability across diverse TTS strategies.