CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation
Bin Wang, Fan Wu, Linke Ouyang, Zhuangcheng Gu, Rui Zhang, Renqiu Xia, Bo Zhang, Conghui He
2024-09-06

Summary
This paper talks about CDM, a new metric designed to evaluate how well models recognize mathematical formulas, aiming for a fairer and more accurate assessment.
What's the problem?
Recognizing mathematical formulas is difficult because they can be structured in many different ways, and existing evaluation methods like BLEU and Edit Distance have limitations. These methods often don't account for the various ways the same formula can be represented, leading to unfair evaluations of how well different models perform.
What's the solution?
The authors propose a new evaluation metric called Character Detection Matching (CDM) that focuses on comparing images of formulas instead of just text representations. By converting both the predicted and actual formulas into images, CDM uses visual features to match characters more accurately, taking into account their positions in the formula. This method provides a more reliable way to assess formula recognition compared to traditional text-based methods.
Why it matters?
This research is important because it improves how we evaluate formula recognition models, making it easier to compare their performance fairly. By using CDM, researchers can better understand which models are truly effective at recognizing complex mathematical expressions, ultimately advancing the development of more accurate educational tools and software.
Abstract
Formula recognition presents significant challenges due to the complicated structure and varied notation of mathematical expressions. Despite continuous advancements in formula recognition models, the evaluation metrics employed by these models, such as BLEU and Edit Distance, still exhibit notable limitations. They overlook the fact that the same formula has diverse representations and is highly sensitive to the distribution of training data, thereby causing the unfairness in formula recognition evaluation. To this end, we propose a Character Detection Matching (CDM) metric, ensuring the evaluation objectivity by designing a image-level rather than LaTex-level metric score. Specifically, CDM renders both the model-predicted LaTeX and the ground-truth LaTeX formulas into image-formatted formulas, then employs visual feature extraction and localization techniques for precise character-level matching, incorporating spatial position information. Such a spatially-aware and character-matching method offers a more accurate and equitable evaluation compared with previous BLEU and Edit Distance metrics that rely solely on text-based character matching. Experimentally, we evaluated various formula recognition models using CDM, BLEU, and ExpRate metrics. Their results demonstrate that the CDM aligns more closely with human evaluation standards and provides a fairer comparison across different models by eliminating discrepancies caused by diverse formula representations.