Leveraging Multi-Party Computation for Privacy-Preserving Machine Learning
-
Name:
Leveraging Multi-Party Computation for Privacy-Preserving Machine Learning
-
Venue:
252 / BBB
-
Date:
2026-04-30
- Speaker:
-
Time:
15:00
-
Machine Learning (ML) is no longer merely a research topic. At the time of writing
this thesis, ML has become a well-studied and mature field, enabling the development
of real-world industrial applications based on trained models, such as face recognition,
fraud detection, recommendation systems, and more. In parallel, Generative AI, with
models such as ChatGPT proposed by OpenAI, has gained significant attention, rapidly
transforming the way people approach and perform everyday tasks.
To train an effective machine learning model, a well-structured and comprehensive dataset
is an essential prerequisite. In real-world applications, large companies such as Google and
Amazon often collect and own extensive training datasets from their end users, enabling
them to train models independently. However, even for these large companies, datasets
are typically collected within the scope of their own business domains and can thus be
further improved by integrating with datasets from other sources. However, due to strict
privacy regulations such as the General Data Protection Regulation (GDPR), companies
are prohibited from directly sharing their data with others, and in some cases, business
considerations may further discourage them from doing so. As a result, the following
research question arises: Can different entities collaboratively train a machine learning
model without exposing their private datasets?
Privacy-preserving machine learning (PPML) provides cryptographic mechanisms that
enable model training without revealing the training data. Within the domain of PPML,
different entities can collaboratively train a machine learning model from scratch or
aggregate their local training results in a federated learning (FL) scenario. To achieve this
goal, we apply secure multi-party computation (MPC) techniques, which enable distrustful
parties to jointly evaluate functions without revealing their private inputs.
In this thesis, we investigate how privacy-preserving machine learning can be achieved
using MPC protocols. Specifically, we propose and experiment with MPC protocols that
are more efficient compared to existing ones. Our first contribution is a four-party secret-
sharing scheme called X-sharing, along with a set of four-party protocols built upon this
scheme. We explore how four-party neuron network training can be accelerated using
this new sharing method and compare its performance against existing approaches. Our
second contribution is the development of new protocols for two-party training of gradient
boosting decision trees (GBDT). We analyze the underlying modular protocols required
for private GBDT and propose efficient two-party protocols to improve training efficiency.
Our third contribution is a maliciously secure aggregation protocol designed for federated
learning, which provides protection against poisoning attacks. The aggregation protocol
is designed for a two-server setting, where clients efficiently share their gradient updateswith the servers, supporting them in generating message authentication codes (MACs).
We prove the security of the proposed protocols within the universally composable (UC)
framework.