GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research
Xinqi Li, Yiqun Liu, Shan Jiang, Enrong Zheng, Huaijin Zheng, Wenhao Dai, Haodong Deng, Dianhai Yu, Yanjun Ma
2025-10-30
Summary
This paper introduces GraphNet, a new collection of over 2,700 real-world examples of how deep learning models are structured as computational graphs, and provides a way to measure how well different tools can optimize these models for speed.
What's the problem?
Currently, it's hard to reliably compare the performance of different 'tensor compilers' – the tools that translate deep learning models into efficient code for specific hardware. Existing benchmarks don't always accurately reflect real-world model complexity or consider whether optimizations introduce errors. There was a need for a standardized, realistic dataset and a better metric to evaluate these compilers.
What's the solution?
The researchers created GraphNet, a dataset containing computational graphs from various deep learning tasks and frameworks. They also developed a new metric called 'Speedup Score' which measures how much faster a compiler makes a model run, while also checking for correctness. They even extended this to an 'Error-aware Speedup Score' to pinpoint where compilers struggle and introduce errors. They then used this dataset and metrics to test two popular compilers, CINN and TorchInductor.
Why it matters?
This work is important because it provides a common ground for evaluating and improving tensor compilers. By having a realistic dataset and a robust metric, developers can more effectively optimize deep learning models, leading to faster training and inference times, and ultimately making AI more efficient and accessible. It helps identify bottlenecks in compiler performance and guides future development efforts.
Abstract
We introduce GraphNet, a dataset of 2.7K real-world deep learning computational graphs with rich metadata, spanning six major task categories across multiple deep learning frameworks. To evaluate tensor compiler performance on these samples, we propose the benchmark metric Speedup Score S(t), which jointly considers runtime speedup and execution correctness under tunable tolerance levels, offering a reliable measure of general optimization capability. Furthermore, we extend S(t) to the Error-aware Speedup Score ES(t), which incorporates error information and helps compiler developers identify key performance bottlenecks. In this report, we benchmark the default tensor compilers, CINN for PaddlePaddle and TorchInductor for PyTorch, on computer vision (CV) and natural language processing (NLP) samples to demonstrate the practicality of GraphNet. The full construction pipeline with graph extraction and compiler evaluation tools is available at https://github.com/PaddlePaddle/GraphNet .