Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models

Yuanchen Zhou, Shuo Jiang, Jie Zhu, Junhui Li, Lifan Guo, Feng Chen, Chi Zhang

2025-08-22

Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models

Summary

This paper introduces a new way to train large language models (LLMs) to be better at financial reasoning, focusing on how they arrive at answers, not just the answers themselves.

What's the problem?

Current methods for guiding LLMs, called Process Reward Models, work well for general knowledge or subjects like science and math, but they struggle with the specific demands of finance. Financial reasoning requires very precise, logical steps and needs to be factually correct and follow regulations, something these general models don't handle well.

What's the solution?

The researchers created Fin-PRM, a specialized reward model designed specifically for financial tasks. It doesn't just look at the final answer; it evaluates each step of the LLM’s reasoning process and the overall path it takes to get there, ensuring it aligns with sound financial principles. They used this model in three ways: to pick the best reasoning paths for the LLM to learn from, to give detailed feedback during training, and to help the LLM choose the best answer when it’s unsure.

Why it matters?

This work is important because it shows that tailoring reward models to specific fields, like finance, significantly improves the performance of LLMs. The models trained with Fin-PRM were much better at financial reasoning tasks than those trained with general-purpose models, leading to substantial gains in accuracy and reliability. This means LLMs can potentially be used more effectively and safely in real-world financial applications.

Abstract

Process Reward Models (PRMs) have emerged as a promising framework for supervising intermediate reasoning in large language models (LLMs), yet existing PRMs are primarily trained on general or Science, Technology, Engineering, and Mathematics (STEM) domains and fall short in domain-specific contexts such as finance, where reasoning is more structured, symbolic, and sensitive to factual and regulatory correctness. We introduce Fin-PRM, a domain-specialized, trajectory-aware PRM tailored to evaluate intermediate reasoning steps in financial tasks. Fin-PRM integrates step-level and trajectory-level reward supervision, enabling fine-grained evaluation of reasoning traces aligned with financial logic. We apply Fin-PRM in both offline and online reward learning settings, supporting three key applications: (i) selecting high-quality reasoning trajectories for distillation-based supervised fine-tuning, (ii) providing dense process-level rewards for reinforcement learning, and (iii) guiding reward-informed Best-of-N inference at test time. Experimental results on financial reasoning benchmarks, including CFLUE and FinQA, demonstrate that Fin-PRM consistently outperforms general-purpose PRMs and strong domain baselines in trajectory selection quality. Downstream models trained with Fin-PRM yield substantial improvements with baselines, with gains of 12.9\% in supervised learning, 5.2\% in reinforcement learning, and 5.1\% in test-time performance. These findings highlight the value of domain-specialized reward modeling for aligning LLMs with expert-level financial reasoning. Our project resources will be available at https://github.com/aliyun/qwen-dianjin.

View Paper