We predict growth of Federated Learning, a new framework for Artificial Intelligence (AI) model development that is distributed over millions of mobile devices. Federated Learning models are hyper personalized for an user, involve minimum latencies, low infra overheads and are privacy preserved by design. This article is a beginner level primer for Federated Learning.
Disclaimer: the author is an investor and advisor in the Federated Learning startup S20.ai. In case you are wondering, S20 stands for “Software 2.0”.
The AI market is dominated by tech giants such as Google, Amazon and Microsoft, offering cloud-based AI solutions and APIs. In the traditional AI methods sensitive user data are sent to the servers where models are trained.
Recently we are seeing the beginning of a decentralized AI model, called Federated Learning, born at the intersection of on-device AI, blockchain, and edge computing/IoT. In contrast to the traditional AI methods, Federated Learning brings the models to the data source or client device for training and inferencing. The local copies of the model on the device eliminate network latencies and costs incurred due to continuously sharing data with the server. Being local, model response is hyper personalized for a particular user. Federated Learning utilizes computing and storage resources on the user’s device reducing cloud infra overheads even at scale. Additionally, Federated Learning techniques are privacy preserved by design.
Figure 1. Federated learning models are hyper personalized for a particular user, involved minimum latencies and low infra overheads and are privacy preserved by design
Federated Learning can be majorly classified as Single Party or Multi-Party. In a Single Party system, only one entity is involved in governance of the distributed data capture and flow system. This could be in several forms such as a smartphone or IoT app, network devices, distributed data warehouses, machines used by employees etc. Models are trained in a Federated manner on data that has the same structure across all client devices and in most cases each data point is unique to the device or user. For example, a music recommendation engine, which recommends music on an app for users, can be Federated this way.
Figure 2: Federated Learning model development. Figure 2: A user’s phone personalizes the model locally, based on her usage (A). Many users’ updates are then aggregated (B) to form a consensus change © to the shared model. This process is then repeated
In a Multi-Party system, two or more organizations or franchisees form an alliance to train a shared model on their individual datasets through Federated Learning. Keeping data private is the major value addition of Federated Learning here for each of the participating entities to achieve a common goal. The data structures and parameters are usually similar but need not be the same and a lot of pre-processing is required at each client to standardize model inputs. A neutral 3rd party could be involved in providing the infrastructure to aggregate model weights and establish trust among the clients. For example, multiple banks could train a common powerful fraud detection model without sharing their sensitive customer data with each other through Federated Learning.
Some of the popular and recent Federated Learning frameworks include TensorFlow Federated, an open source framework by Google for experimenting with machine learning and other computations on decentralized data. PySyft is a open source library that is built on top of PyTorch for encrypted, privacy preserving deep learning. Federated AI Technology Enabler (FATE) is an open-source project initiated by Webank’s AI group to provide a secure computing framework to support the Federated AI ecosystem.
Several new startups such as S20.ai, Owkin, and Snips have emerged in this space creating new tools and enterprise solutions around Federated Learning and other secure computation techniques across different verticals.
Federated Learning is still in its early stages and faces numerous challenges with its design and deployment.
For instance, its feasibility is highly constrained by the capabilities of the edge devices to perform local training and inferencing. While this may not remain a major barrier to entry as most smartphones and IoT devices newly launched are equipped with GPUs or sufficient computing hardware to run powerful AI models. Today it’s still impossible to train a network without compromising device performance and user experience or compressing a model and resorting to a lower accuracy.
Local training of supervised models requires labeled data which isn’t available or difficult to produce in most cases. A good way to tackle this challenge is by defining the Federated Learning problem and designing data pipeline such that labels are captured in an implicit way, for example, user’s interactions, feedback on model responses based on certain actions taken or events triggered, etc.
In real world cases, model convergence time is higher in the Federated setup as compared to the traditional central training approach. There could be reliability issues where not all devices participate in the Federated Learning process due to connectivity issues, different app usage patterns and model training times, irregular or missed updates, etc. Federated Learning should be considered only when the size of the data and cost of aggregating from distributed sources is very high. inconsistent model versions across the clients does not affect the experience too much for a significant time window and the central model can converge with minimal client participation.
Finally, the mindset of centrally aggregating data and creating silos by large companies for competitive advantage would be a major challenge to drive the adoption of Federated Learning. Effective data protection policies and appropriate incentives and business models around decentralizing data can tackle these issues and develop the Federated AI ecosystem.
Epilogue: I am writing a series of blogs on Data Science, Machine Learning, Federated Learning, Product Management and Career Success Stories. You can follow me to get these in your Medium feed.
Next Story: Scoring an Awesome Product Manager Interview
Previous Story: Impact of Poor Addresses in India: $10–14 Billion a Year