Federated Computation of ROC and PR Curves

Xuefeng Xu, Graham Cormode

2025-10-07

Federated Computation of ROC and PR Curves

Summary

This paper tackles the problem of evaluating how well a machine learning model is doing when that model is trained using Federated Learning, a technique where data stays on users' devices instead of being collected in one place.

What's the problem?

Normally, to see how good a model is, you'd look at things like ROC and Precision-Recall curves. These curves need the model's actual prediction scores and the correct answers. But in Federated Learning, the central server doesn't have access to this raw data because of privacy concerns – it can't just ask everyone for their individual predictions and labels. So, evaluating the model's performance becomes really difficult.

What's the solution?

The researchers came up with a way to *estimate* those ROC and PR curves without needing the raw data. They use a technique called differential privacy, which adds a little bit of noise to the data to protect individual privacy. They focus on figuring out specific points in the distribution of prediction scores (called quantiles) across all the users, and use those to build an approximate curve. They also figured out how much error to expect in their approximation, and how that error relates to privacy and how much data needs to be shared.

Why it matters?

This work is important because it allows us to confidently evaluate machine learning models trained with Federated Learning, which is crucial for applications where privacy is paramount, like healthcare or personal finance. It provides a practical way to check if a model is actually good at its job while still protecting user data, and it balances the need for accuracy with the need for privacy and efficient communication.

Abstract

Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves are fundamental tools for evaluating machine learning classifiers, offering detailed insights into the trade-offs between true positive rate vs. false positive rate (ROC) or precision vs. recall (PR). However, in Federated Learning (FL) scenarios, where data is distributed across multiple clients, computing these curves is challenging due to privacy and communication constraints. Specifically, the server cannot access raw prediction scores and class labels, which are used to compute the ROC and PR curves in a centralized setting. In this paper, we propose a novel method for approximating ROC and PR curves in a federated setting by estimating quantiles of the prediction score distribution under distributed differential privacy. We provide theoretical bounds on the Area Error (AE) between the true and estimated curves, demonstrating the trade-offs between approximation accuracy, privacy, and communication cost. Empirical results on real-world datasets demonstrate that our method achieves high approximation accuracy with minimal communication and strong privacy guarantees, making it practical for privacy-preserving model evaluation in federated systems.

View Paper