Power Transform Revisited: Numerically Stable, and Federated
Xuefeng Xu, Graham Cormode
2025-10-07
Summary
This paper investigates problems with a common data preparation technique called power transforms, which are used to make data behave more like a normal distribution, and then offers improvements to make them more reliable, even when dealing with data spread across multiple sources.
What's the problem?
Power transforms, while helpful for data analysis and machine learning, can be surprisingly unstable when implemented directly with computers. This instability can lead to inaccurate results or even cause programs to crash because of how numbers are handled during the calculations, especially when dealing with extreme values in the data.
What's the solution?
The researchers thoroughly analyzed *why* these numerical issues happen and then developed ways to fix them. They didn't stop there; they also adapted power transforms to work in 'federated learning,' a situation where data is distributed across many devices or locations, which introduces new challenges related to both the numerical stability and differences in the data itself. They created methods to address both of these issues.
Why it matters?
This work is important because it makes a widely used data preparation step much more dependable. By improving the stability of power transforms, the researchers ensure that data analysis and machine learning models are built on a more solid foundation, leading to more trustworthy and accurate outcomes, particularly in complex scenarios like federated learning where data privacy and distribution are key concerns.
Abstract
Power transforms are popular parametric techniques for making data more Gaussian-like, and are widely used as preprocessing steps in statistical analysis and machine learning. However, we find that direct implementations of power transforms suffer from severe numerical instabilities, which can lead to incorrect results or even crashes. In this paper, we provide a comprehensive analysis of the sources of these instabilities and propose effective remedies. We further extend power transforms to the federated learning setting, addressing both numerical and distributional challenges that arise in this context. Experiments on real-world datasets demonstrate that our methods are both effective and robust, substantially improving stability compared to existing approaches.