Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning
Andrei Semenov, Philip Zmushko, Alexander Pichugin, Aleksandr Beznosikov
2024-12-17
Summary
This paper talks about how simple changes can enhance data protection in Vertical Federated Learning (VFL), a method that allows different organizations to collaboratively train machine learning models while keeping their data private.
What's the problem?
In VFL, organizations share their data to improve machine learning models without revealing sensitive information. However, there are risks, such as feature reconstruction attacks, where someone could potentially recreate the original data from the shared information. This makes it crucial to find ways to protect the data effectively during the training process.
What's the solution?
The authors propose that even basic transformations of the model architecture can significantly improve data protection against these attacks. They demonstrate that if the model is designed thoughtfully, it can be resilient against attempts to reconstruct sensitive features from the data. Their experiments show that using Multi-Layer Perceptron (MLP)-based models can effectively resist these types of attacks while maintaining good performance.
Why it matters?
This research is important because it addresses a key challenge in collaborative machine learning: how to protect sensitive data while still benefiting from shared insights. By improving data security in VFL, this work helps ensure that organizations can collaborate without compromising privacy, which is essential for fields like healthcare, finance, and any area dealing with personal information.
Abstract
Vertical Federated Learning (VFL) aims to enable collaborative training of deep learning models while maintaining privacy protection. However, the VFL procedure still has components that are vulnerable to attacks by malicious parties. In our work, we consider feature reconstruction attacks, a common risk targeting input data compromise. We theoretically claim that feature reconstruction attacks cannot succeed without knowledge of the prior distribution on data. Consequently, we demonstrate that even simple model architecture transformations can significantly impact the protection of input data during VFL. Confirming these findings with experimental results, we show that MLP-based models are resistant to state-of-the-art feature reconstruction attacks.