This paper is available on arxiv under CC BY 4.0 DEED license.
Authors:
(1) Zhipeng Wang, Department of Computing, Imperial College London;
(2) Nanqing Dong Department of Computer Science, University of Oxford;
(3) Jiahao Sun, Data Science Institute, Imperial College London;
(4) William Knottenbelt, Department of Computing, Imperial College London.
Theoretical and Empirical Analysis
Federated Learning (FL) is a machine learning paradigm, which enables multiple and decentralized clients to collaboratively train a model under the orchestration of a central aggregator. Traditional FL solutions rely on the trust assumption of the centralized aggregator, which forms cohorts of clients in a fair and honest manner. However, a malicious aggregator, in reality, could abandon and replace the client’s training models, or launch Sybil attacks to insert fake clients. Such malicious behaviors give the aggregator more power to control clients in the FL setting and determine the final training results. In this work, we introduce zkFL, which leverages zero-knowledge proofs (ZKPs) to tackle the issue of a malicious aggregator during the training model aggregation process.
To guarantee the correct aggregation results, the aggregator needs to provide a proof per round. The proof can demonstrate to the clients that the aggregator executes the intended behavior faithfully. To further reduce the verification cost of clients, we employ a blockchain to handle the proof in a zero-knowledge way, where miners (i.e., the nodes validating and maintaining the blockchain data) can verify the proof without knowing the clients’ local and aggregated models. The theoretical analysis and empirical results show that zkFL can achieve better security and privacy than traditional FL, without modifying the underlying FL network structure or heavily compromising the training speed.
Federated Learning (FL) is a privacy-preserving machine learning paradigm that allows multiple clients to collaboratively train a global model without sharing their raw data [23]. In FL, each participant (i.e., client) performs local training on its own private dataset and communicates only the model updates to the central server (i.e., aggregator). The aggregator aggregates the model updates and sends the updated global model back to the clients. This process repeats iteratively until the global model converges or a stopping criterion is fulfilled. During the cross-device FL process, participants need to place their trust in the aggregator to create cohorts of clients in a fair and unbiased manner.
However, a potential vulnerability is that an actively malicious adversary with control over the aggregator could exploit this trust [17]. For instance, adversaries could carry out a Sybil attack [7] by simulating numerous fake client devices, and the adversary could also selectively favor previously compromised clients’ model updates from the pool of available participants. These attacks have the potential to enable the adversary to manipulate the final training results in FL, compromising the integrity of the learning process. Safeguarding against such threats is imperative to maintain the effectiveness and security of cross-device FL.
In this work, we present zkFL (cf. Fig. 1), an innovative approach that integrates zero-knowledge proofs (ZKPs) into FL. Without changing the learning setup of the underlying FL method, this integration guarantees the integrity of aggregated data from the centralized aggregator. Zero-knowledge proofs (ZKPs) [12,28,6,22] are widely recognized cryptographic tools that enable secure and private computations while safeguarding the underlying data. In essence, ZKPs empower a prover to convince a verifier of a specific fact without revealing any information beyond that fact itself. Within the context of zkFL, ZKPs play a pivotal role in addressing the challenge posed by a potentially malicious aggregator during the model aggregation process.
To achieve accurate aggregation results, the aggregator must provide a proof for each round, demonstrating to the clients that it has faithfully executed the intended behavior for aggregating the model updates. By verifying these proofs, the clients can ensure the aggregator’s actions are transparent and verifiable, instilling confidence that the aggregation process is conducted with utmost honesty. Furthermore, in order to minimize the verification burden on the clients, we propose a blockchain-based zkFL solution to handle the proof in a zeroknowledge manner. As shown in Fig. 2, in this approach, the blockchain acts as a decentralized and trustless platform, allowing miners, the nodes validating and maintaining the blockchain data [25,4], to verify the authenticity of the ZKP proof without compromising the confidentiality of the clients’ models. By incorporating blockchain technology into our zkFL system, we establish a robust and scalable framework for conducting zero-knowledge proof verification in a decentralized and transparent manner. This not only enhances the overall efficiency of the zkFL system but also reinforces the confidentiality of the participants’ data, making it a promising solution for secure and privacy-conscious cross-device FL.
Our contributions can be summarized as follows:
We present zkFL, an innovative ZKP-based FL system that can be integrated with existing FL methods. zkFL empowers clients to independently verify proofs generated by the centralized aggregator, thereby ensuring the accuracy and validity of model aggregation results. zkFL effectively addresses the threats posed by the malicious aggregators during the training model aggregation process, enhancing security and trust in the collaborative FL setting.
We integrate zkFL with blockchain technology to minimize clients’ computation costs for verification. Leveraging the zero-knowledge property of ZKPs, 3 our blockchain-based zkFL significantly improves overall efficiency while preserving clients’ model privacy.
We present rigorous theoretical analysis on the security, privacy, and efficiency of zkFL. We further evaluate these properties under benchmark FL setups. The results of these experiments demonstrate the practical feasibility and effectiveness of zkFL in real-world scenarios.