This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Berrenur Saylam;
(2) Ozlem Durmaz ¨ ˙Incel.
Federated Learning: Collaborative and Privacy-Preserving Model Training
State of the Art Federated Learning Software Platforms and Testbeds
Sensors and Edge-Sensing Devices
Federated Learning Applications on Sensing Devices
Conclusions, Acknowledgment, and References
This section delves into the field of FL, a method created to produce accurate models while protecting the privacy of people and institutions. As mentioned, it proposes a collaborative learning paradigm in which numerous devices take part in training a shared model under the direction of a central server. This section investigates the nuances of this collaboration procedure and its various architectural strategies.
Collaborative Model Training: FL is a notion that respects privacy. Firstly the server distributes a model with randomly initialized parameters to the participating devices. Each device uses its local data to train the
shared model, ensuring no raw data leaves the device. Instead, the server receives only the modified model parameters. The model averaging method is often used to reach a stopping point in this iterative procedure Wang and Preininger (2019); Rieke et al. (2020) (a visual scheme of the process can be seen in Figure 2).
By aggregating the updated parameters from all devices, a global model is obtained, embodying the collective knowledge of the participants while preserving data privacy.
2.1. Architectural Categories
FL encompasses three main architectural categories: horizontal, vertical, and transfer federated learning (FTL) Yang et al. (2019). These categories are determined based on the data distribution and the collaboration’s nature. In the horizontal architecture, data features are similar across devices, with variations primarily in data volume.
On the other hand, the vertical architecture focuses on data with similar data correspondence (IDs) but differing feature spaces. Horizontal architecture, often called sample-based FL, is commonly employed in medical cases where different local devices share similar data structures.
In contrast, vertical architecture, known as featurebased FL, is well-suited for aggregating user data from different servers. This architecture enables communication and collaboration between devices, requiring aggregating diverse features Aledhari et al. (2020).
An example of horizontal architecture is the Google keyboard, where users from different regions update model parameters locally, which are then aggregated by the server. In contrast, the vertical architecture finds its relevance in scenarios where multiple institutions, such as banks, collaborate to generate personalized models by aggregating knowledge from different domains of the same person.
FTL also leverages multiple data sources to train a model on the server. The goal is to identify commonalities across features, reducing the overall error. FTL supports both homogeneous (different samples) and heterogeneous (different features) training approaches, making it particularly valuable in healthcare applications Jing et al. (2019).
Each FL architecture offers distinct advantages. The horizontal architecture ensures data independence, allowing devices to update model parameters without extensive communication. Vertical architecture enables collaboration between different servers, facilitating the aggregation of diverse features. FTL leverages different data sources to identify common patterns, ultimately improving model accuracy Aledhari et al. (2020); Yao et al. (2021).
These architectural approaches find applications across various domains where privacy and data security are paramount. For example, in the healthcare domain, by leveraging FL, healthcare institutions can collaboratively train models while preserving patient privacy, facilitating personalized healthcare services, and advancing medical research.
2.2. Challenges
Federated Learning encompasses various research areas Kairouz et al. (2021); Saeed et al. (2020), including optimization Li et al. (2020), communication efficiency Koneˇcn`y et al. (2016), personalization Wang et al. (2019), fault tolerance Bonawitz et al. (2019), privacy preservation Bonawitz et al. (2017), and computational efficiency Zhou et al. (2020).
Likewise the case in many methodologies, FL also has pros and downsides. These challenges mainly arise due to device, data, and model heterogeneity (Figure 3).
Device heterogeneity: The use of different devices in data collection, such as urban sensors, smartphones, wearable devices, and autonomous vehicle sensors Jiang et al. (2020); Nguyen et al. (2021), introduces variations in sensing capabilities, memory, data processing, battery life, and communication capacity.
Consequently, challenges arise, including fault tolerance, communication efficiency, and computational efficiency, due to the diverse nature of IoT devices.
Data heterogeneity: Device heterogeneity leads to non-IID (non-identically and independently distributed) data, manifesting in different labels, feature distributions, and concept drift Xu et al. (2021). For example, when data is collected from multiple devices, each device may have only a subset of the activity classes in Human Activity Recognition (HAR) research. Moreover, different individuals may assign different labels to the same activity.
Furthermore, variations in feature distributions can occur due to individuals’ unique characteristics and behaviours. Addressing these challenges requires distributed optimization techniques and personalized approaches.
Model heterogeneity: While the standard model used in basic FL architecture facilitates aggregation, it may only be suitable for some scenarios. Model heterogeneity arises from the diverse needs of client devices.
For example, when constructing a model for predicting user activities using data from smartphones and smartwatches, adapting the model based on each device’s capacity may be necessary.
Additionally, clients may have privacy concerns that discourage sharing certain model parts. This challenge underscores the need for further research on privacy preservation and client participation.
2.3. Federation Algorithms
FL algorithms vary based on the model aggregation methods used in the literature. The initial algorithm proposed by Google, known as FedAvg McMahan et al. (2017), employs the average function for aggregation. Subsequently, several variations of averaging methods emerged, including weighted averaging Shlezinger et al. (2020), one model selection, best model averaging Yao et al. (2021), stochastic controlled average Karimireddy et al. (2020), and periodic averaging combined with quantization Reisizadeh et al. (2020).
However, FedAvg has limitations when working with non-IID data and fails to capture fine-grained information specific to each device Wu et al. (2020).
Other widely used algorithms include FedMA Wang et al. (2020), which constructs a global model layer-wise using matching and averaging, specifically designed for neural network architectures to address data heterogeneity. FedPer Sannara et al. (2021) tackles data heterogeneity caused by non-IID distributions by providing personalized models composed of base and personalized layers.
The base layers are aggregated on the server, while clients focus on representation learning in the personalized layer using transfer learning.
Moreover, the server must receive models from all participating devices to apply any aggregation method which causes latency problems. Slower clients are called stragglers, and the problem is called the straggler problem Zaharia et al. (2008). In other terms, we can define this problem as a synchronous and asynchronous learning problem. The literature proposes asynchronous Xie et al. (2019) and semi-synchronous Wu et al. (2020) methods to overcome the straggler problem. In addition to latency, when the server faces unbalanced and non-IID data, even these proposed methods converge slowly Duan et al. (2019).
In Liu et al. (2021), the authors propose a partial aggregation strategy (FedPA) where the server updates its model using models from only a subset of clients determined by a reinforcement learning model. This parameter’s adaptive and dynamic calculation considers data distribution and device heterogeneity, preventing accuracy loss caused by selecting an inappropriate aggregation number.
Similarly, in Gao et al. (2021), the authors propose the n-soft sync aggregation algorithm to address cloud idle time. This algorithm combines the benefits of synchronous and asynchronous aggregation by uploading only the models of n clients in each round, striking a balance between transmission overhead and training time. This approach ensures faster completion of training by corresponding clients.
In Erg¨un et al. (2022), authors introduce FL with Secure Aggregation (FedSecAgg), an multi-party computation (MPC)-based aggregation protocol that ensures privacy and security in FL scenarios. It addresses the challenge of privacy leakage during the aggregation process by employing secure MPC techniques. This approach allows the aggregation to be performed while preserving the privacy of individual client updates.
In He et al. (2020), Federated Meta-Learning with Model Agnostic MetaLearning (FedMeta-MAML) is proposed. FedMeta-MAML combines the principles of FL with model-agnostic meta-learning (MAML). Meta-learning enables rapid adaptation to new tasks and clients by learning a good initialization for client models. FedMeta-MAML aggregates client updates using a meta-update rule, allowing clients to learn how to learn from their local data.
FL with Momentum SGD (FedMoSGD) extends federated learning by incorporating the momentum technique Xu et al. (2021), which is commonly used in stochastic gradient descent (SGD) optimization. The momentum technique accelerates convergence and improves the robustness of the optimization process by adding a fraction of the previous update direction to the current update direction.
Another proposed method in literature is FedProx which adresses the challenges of heterogeneity Li et al. (2020). It is a generalization and reparametrization of the popular FedAvg method, which allows for local updating and low participation to tackle communication costs.
The proposed framework introduces modifications that address both system and statistical heterogeneity, providing convergence guarantees in scenarios with non identically distributed data across devices. Empirically, FedProx demonstrates more robust convergence and improved accuracy compared to FedAvg on realistic federated datasets, especially in highly heterogeneous settings.
The field of FL has witnessed advancements in aggregation methods to overcome challenges posed by non-IID data, latency, and privacy. These state-of-the-art algorithms provide valuable contributions toward enhancing FL processes’ efficiency, privacy, and convergence.
2.4. Metrics
In order to evaluate and compare Federated Learning (FL) algorithms, various metrics are employed in the literature. These metrics provide insights into different aspects of FL performance and effectiveness. When assessing FL algorithms, it is crucial to consider metrics that capture model performance, communication cost, computational power, and fairness.
In literature Semwal et al. (2020), accuracy and loss for a model performance measure, number of global rounds and the amount of transmitted data for communication cost, number of local rounds and convergence for computational power measure, fairness (similar performance over clients) are used as metrics.
Model performance in FL is evaluated using accuracy and loss metrics, which measure the accuracy of predictions and the deviation between predicted and actual values. Higher accuracy and lower loss values indicate better model performance in terms of prediction quality and generalization. These metrics are also used to compare FL performance with traditional ML algorithms.
Communication cost is crucial in FL systems, with metrics like global rounds and transmitted data assessing efficiency. Minimizing the number of rounds and reducing the transmitted data volume can optimize the overall communication cost in FL systems.
Computational power measures evaluate algorithms’ efficiency in reaching stable solutions. The number of local rounds reflects the computational burden on individual client devices during training. Convergence, another important metric, evaluates the algorithm’s efficiency in reaching a stable and optimal solution. Faster convergence indicates a more computationally efficient approach.
Fairness is an essential consideration in FL, aiming to ensure equitable performance and benefits for all participating clients. Fairness metrics assess the consistency of model performance across different clients, avoiding biases or imbalances during the learning process.
This paper is available on Arxiv under a CC 4.0 license.