Global warming is on the rise due to the presence of the highest levels of carbon dioxide, methane, and nitrous oxide levels compared to the past. Data Scientists, data engineers, and cloud experts all have come forward to create a more sustainable environment by following the best practices in Machine Learning. Machine Learning models create a detrimental effect on the environment when using substantial computational resources and energy while getting trained for thousands of hours on specialized hardware accelerators in data centers. The average temperature rise has been increasing steadily over the last 3 decades (from 1980), as illustrated in the figure below. All popular Meteorological agencies/bodies show similar trends, which have made environmentalists, geologists, and technology experts in different domains come forward and set certain standards for controlling the temperature rise. Global average temperature anomaly from 1880 to 2012, compared to the 1951-1980 long-term average. Source: NASA Earth Observatory . Research in curbing down the energy expenditures from ML models has led to using models that differ from conventional Machine Learning approaches by following a decentralized training. Instead of a with the server responsible for handling all ML training tasks, in individual devices train their own local data and send the model to the cloud/server, which aggregates the model from different devices and pushes the updated model, back to the . “state-of-the-art” Centralized ML Federated Learning, updated devices With gradual advancements of Federated Learning (FL), the importance of FL in has been realized, particularly when can collect energy from the environment, saving in environments in both wireless and networks. Sustainability rechargeable devices ambient energy cost networked edge Federated Learning and Client Data and its importance Federated Learning (FL) settings can be applied to as either In a cross-silo scenario, clients are generally few, with high availability during all rounds, and are likely to have similar data distribution for training, e.g., hospitals. This scenario serves more as a use case to consider and I . For the 2nd use-case, we can consider a cross-device system that will likely encompass thousands of clients having very different participating in just a few rounds, e.g. training of next-word prediction models on mobile devices. cross-silo or cross-device. Independent dentically Distributed (IID) distributions data distributions (non-IID) Thus FL can be known to serve two different partition schemes: a where each client has approximately the same proportion of each class from the original dataset, and a for which each client has an and of each class. uniform partition (IID) heterogeneous partition (non-IID) unbalanced different proportion In addition to handling different data distributions, it has been possible to lay the analytical for , which can provide a first-of-its-kind quantitative estimation method. This can give a detailed study on emissions resulting from both hardware training and communication between servers and clients. This gives a solid foundation base to show the roadmap for future federated learning. Carbon Footprint Model FL CO2e emissions environmentally-friendly Moreover, the FL setup enables researchers to conduct on real under different settings, strategies, and tasks. The studies and experiments have proposed that CO2e emissions depend on a wide range of hyper-parameters, and emissions from communication between clients and servers can range from to more than , and efficient strategies can reduce CO2e emissions up to 60%. carbon sensitivity analysis FL hardware 0.4% of total emissions 95% FL will continue to cast its long-lasting impact on the total CO2e emission. This might be further facilitated by including , relevant and sustainable physical location deep learning tasks, model architecture, FL aggregation strategy, hardware efficiency. Why Sustainability in Federated Learning? One of the most important factors for consideration in FL is carbon emissions. As research has already demonstrated proper design of the FL setup leads to a decrease of these emissions, the integration of the released CO2e serves as a crucial metric to the FL deployment. quantifying FL is known to quicker with fewer FL rounds on increasing the number of local epochs. However, this does not guarantee a overall consumption. converge smaller energy The below figure illustrates how Federated Learning can have a long-lasting impact on the environment by having efficient algorithms that reduce device to on the one hand and the use of with better capabilities and greater transparency on energy consumption. server communications advanced hardware processing In comparison to centralized systems, where we see cooling in datacenters accounts for up to 40% of the total energy consumed, FL does not need or use this parameter. On the other hand, FL can use the Power Usage Effectiveness (PUE) ratio. Use of Renewable Energy availability during training in devices There are different initiations to compensate Carbon emissions by or with the in the USA or ( , in the EU). Carbon offsetting is an action initiated to compensate polluting actions via different investments in environment-friendly projects, such as r or Anderson, (2012). carbon offsetting purchases of Renewable Energy Credits (RECs, ) Tradable Green Certificates TGCs enewable energies massive tree planting Devices can also depend on energy resources for their own energy generation, which can be accomplished primarily in 2 ways and that strategizes how devices can send updates to the central server in FL setup. renewable In the first use-case, as illustrated from the below figure, we see that clients are about using their energy during the training process, which causes a in performance. Some of the main characteristics of this process are: opportunist degradation Participate in training on energy availability. The energy generation process does not uniform across devices Bias global model towards clients with more frequent energy arrivals, causing a performance loss in accuracy. In the second use case, as illustrated from the below figure, we see that clients are about using their energy and wait for the slowest client to have enough energy before starting the training process. As a result, the process may be slow, but it gives better performance. Some of the main characteristics of this process are: pessimists Instead of strictly adhering to either of these principles, there can be an optimal client scheduling process for training. in conventional federated learning algorithms is primarily based on the assumption that all to participate in training if chosen. Clients have the flexibility for dropouts, which may occur (which does not bias the training). Client selection clients are inherently available uniformly at random The most important areas of focus in FL have been on selecting the clients to or to the of training. In contrast, the optimal scheduling process can help to strategize energy consumption by allowing selected clients to participate in the training process through a based on their instead of allowing their participation at all rounds. This scheduling process ensures convergence by keeping constant the number of clients participating in one global round. maximize the convergence rate reduce communication overhead stochastic process energy profile A unique characteristic of this scheduling is to allow clients to perform local training at each global round, but the global model is updated by using only the local updates from the clients that were originally scheduled at that global round. If we try to understand from the perspective of accuracy and number of rounds (or in other words, the time needed for model convergence), we see the optimal scheduling (Algorithm1) along with performs better in terms of accuracy, while Benchmark1 has a n contrast, Benchmark2 demonstrates an increase in accuracy by increasing the global rounds. FedAvg fairly steady accuracy for the different number of global rounds. I On the other side, we should also be aware that CO2e emissions (expressed in grams, i.e., lower is better) for both centralized learning and FL when they reach the target accuracies, with different setups. Benchmark -1 Each client participates in training as soon as they have enough energy and then waits until the next energy arrival. Benchmark – 2: Global model is updated only when all clients have received energy, i.e., the server waits until all clients have energy available before initiating a global update. Metrics for Carbon Emissions We need to quantify cloud sustainability in (FL) environment. In addition to Sustainable Supply Chain promotions, we need to put a strong emphasis on where we can measure Carbon-free energy scores along with Power Usage Effectiveness. We provide some of the definitions of the key metrics in terms of the Google Cloud Platform in the reference figure below. Federated Learning Smarter efficient enterprise data centers This is the average percentage of carbon-free energy consumed by a user application in a particular location on an hourly basis while taking into account the investments we have made in renewable energy in that location. This means that in addition to the that's already supplied by the grid, we have added renewable energy generation in that location to reach our 24/7 carbon-free . Google CFE%: carbon-free energy energy objective This metric indicates the average lifecycle gross emissions per unit of energy from the grid. This metric should be used to compare the regions in terms of the of their electricity from the local grid. For regions that are similar in CFE%, this will indicate the relative emissions for when your workload is not running on carbon-free energy. Grid carbon intensity (gCO2eq/kWh): carbon intensity Google invests in enough renewable energy and carbon offsets to neutralize the global operational carbon emissions footprint of Google Cloud per the GHG protocol under the Scope 2 market-based methodology. Google Cloud net carbon emissions (Scope 2 market-based): As Google Cloud Platform is actively concentrating on increasing the for each of the Google Cloud regions, deploying solutions having a higher percentage of carbon-free energy accounts for increasing the sustainability of the solution. Some of the unique propositions to cloud AI specialists and architects are to : Unique Cloud Strategies CFE% : Building and running new applications in the region with the highest CFE% available at the application. Pick a lower-carbon region for your new applications : Planning batch workloads by picking the region with the highest CFE% in order to maximize the carbon-free energy supplying the job. Run batch jobs in a lower carbon region : Allowing usage of resources and services in certain regions while restricting its access and usage in other regions. Set an organizational policy for lower carbon regions : Increasing the efficiency of cloud apps by using less energy (consequently fewer carbon emissions) by increasing the dependence on serverless products like Cloud Run, Cloud Functions. These services automatically scale up and down based on the workload and conserve energy as much as possible. In addition to this, right VM sizing also plays an important role in conserving energy. Efficient use of services The below figure illustrates how different variants of Federated Learning - with different adaptive ML optimizers can work together with the right cloud optimizations to reduce carbon emissions. FedAVG, FedAdam, FedAdaGrad Cloud & ML Optimizations in Federated Learning Conclusion and Future Work With the current research results and potential opportunities of Federated Learning (FL), there are focussed directions in building and learning schemes for large-scale networks. In large-scale deployments, millions of devices jointly train machine learning models over large volumes of data. Some of these research directions include formalizing the fundamental under stochastic and unknown processes. Exploration and research on model quantization and compression techniques will continue that can help adapt to the resource and and characterize the relationship between the and training . sustainably federated distributed performance limits of distributed training energy arrival energy arrival patterns energy renewal processes performance The end goal of deploying a simple and scalable federated learning strategy with provable convergence guarantees can be satisfied with Sustainable Federated Leading. Here in Sustainable FL, devices can rely on intermittent energy availability. This kind of proposed framework can significantly improve the training performance compared to the energy-agnostic benchmarks. References A First Look into the Carbon Footprint of Federated Learning - https://arxiv.org/pdf/2102.07627.pdf https://cloud.google.com/blog/topics/sustainability/sharing-carbon-free-energy-percentage-for-google-cloud-regions SUSTAINABLE FEDERATED LEARNING - https://arxiv.org/pdf/2102.11274.pdf https://cloud.google.com/sustainabilityAdaptive Federated Optimization - https://arxiv.org/pdf/2003.00295.pdf