Scaling Laws for Deepfake Detection

Wenhao Wang, Longqi Cai, Taihong Xiao, Yuxiao Wang, Ming-Hsuan Yang

2025-10-28

Summary

This research investigates how well deepfake detection systems improve as they are exposed to more and more different types of real images and deepfake creation techniques, looking for predictable patterns in their performance.

What's the problem?

Detecting deepfakes is getting harder because new ways to create them are constantly emerging, and detection systems often only learn to spot the fakes they’ve already seen. Existing datasets weren't large or diverse enough to properly study how to build detection systems that can generalize to *all* deepfakes, not just the ones they were trained on. Researchers needed a way to understand how much data – both real images and examples of fakes – is actually needed to create a reliable detector.

What's the solution?

The researchers created a massive new dataset called ScaleDF, containing over 5.8 million real images from 51 different sources and over 8.8 million fake images generated by 102 different deepfake methods. They then tested deepfake detection models on this dataset and found a consistent pattern: as the number of real image types or deepfake methods the model was trained on increased, the error rate decreased in a predictable way, following what’s called a 'power law'. This is similar to what’s been observed in the development of large language models like ChatGPT.

Why it matters?

This work is important because it provides a way to predict how much more data is needed to improve deepfake detection to a certain level of accuracy. Instead of just randomly collecting more data, we can now strategically focus on adding the types of real images and deepfake methods that will have the biggest impact on performance. This helps us stay ahead of the evolving deepfake technology and build more robust detection systems, ultimately making it harder for malicious deepfakes to spread.

Abstract

This paper presents a systematic study of scaling laws for the deepfake detection task. Specifically, we analyze the model performance against the number of real image domains, deepfake generation methods, and training images. Since no existing dataset meets the scale requirements for this research, we construct ScaleDF, the largest dataset to date in this field, which contains over 5.8 million real images from 51 different datasets (domains) and more than 8.8 million fake images generated by 102 deepfake methods. Using ScaleDF, we observe power-law scaling similar to that shown in large language models (LLMs). Specifically, the average detection error follows a predictable power-law decay as either the number of real domains or the number of deepfake methods increases. This key observation not only allows us to forecast the number of additional real domains or deepfake methods required to reach a target performance, but also inspires us to counter the evolving deepfake technology in a data-centric manner. Beyond this, we examine the role of pre-training and data augmentations in deepfake detection under scaling, as well as the limitations of scaling itself.

View Paper