Umbhali: Nicola Rieke U-Jonny Hancox Ngena ngemvume U-Fausto Milletarì Holger R. Roth Shadi Albarqouni Spyridon Bakas U-Mathieu N. Galtier Bennett A. Landman Klaus Maier-Hein U-Sebastien Ourselin Miki Sheller U-Ronald M. Summers Andrew Trask Ukubuyekeza Maximilian Baust M. Jorge Cardoso Umbhali: Nicole Rieke U-Jonny Hancox Ngena ngemvume U-Fausto Milletarì U-Holger R. Roth Shadi Albarqouni I-Spyridon Bakas U-Mathieu N. Galtier Bennett A. Landman U-Klaus Maier-Hein U-Sebastien Ourselin Miki Sheller U-Ronald M. Summers U-Andrea Trask Ukubuyekeza Maximilian Baust UJorge Cardoso abstract I-Data-driven machine learning (ML) yaziwa njenge-approach enhle yokwakha amamodeli asebenzayo enhle futhi enhle ezokuthuthukiswa kwedatha yezokwelapha, okuyinto asathunyelwe ngezinkimbinkimbi ezintsha zempilo. I-Data yokufakelwa kwedatha ebonakalayo ayithunyelwe kakhulu ngu-ML ikakhulukazi ngenxa yokufakelwa kwedatha futhi izinzuzo ze-privacy zihlanganisa ukufinyelela kwezi idatha. Nokho, ngaphandle kokufika kwedatha ephakeme, i-ML iyathunyelwa ukufinyelela ikhasimende yayo ephelele futhi, ekugcineni, ukwenza ukuhweba kusuka ku-research kuya ku-clinical practice. Le ncwadi kusetshenziselwa ama-factors eziphambili ezinikezele kulesi khosi (FL) Introduction Ukuhlolwa kwe-artificial intelligence (AI), futhi ikakhulukazi izinzuzo ze-machine learning (ML) kanye ne-deep learning (DL) Izinzuzo ezintsha ezivela ku-radiology, i-pathology, i-genomics, njll. Ama-models ezintsha ze-DL zihlanganisa ama-millions of parameters ezidingekayo ezaziwa kusuka ku-datasets e-curated eningi kakhulu ukuze uthole ukucaciswa kwe-clinical-grade, ngenkathi zihlanganisa kahle, enhle, ephakeme futhi zihlanganisa kahle kuya ku-data eyenziwe. , , , . 1 2 3 4 5 For example, training an AI-based tumour detector requires a large database encompassing the full spectrum of possible anatomies, pathologies, and input data types. Data like this is hard to obtain, because health data is highly sensitive and its usage is tightly regulated Ngaphandle kwe-anonymization ye-data kungabangela lezi zincazelo, manje kuhlolwa kahle ukuthi ukususwa kwe-metadata efana ne-name ye-patient noma i-datum ye-birth ayidingi akufanele ukuvikelwa kwe-privacy. Isibonelo, kungenziwa ukuguqulwa umzimba we-patient kusuka ku-computed tomography (CT) noma i-magnetic resonance imaging (MRI) idatha Okunye isizathu ukuthi idatha yokudlulisa akuyona okuqhubekayo emzimbeni yempilo kuyinto ukuthi ukuthatha, ukugcinwa kanye nokuthuthukiswa kwekhwalithi ephezulu idatha ibanga isikhathi esithakazelisayo, ukucindezeleka, kanye nezimali. Ngenxa yalokho, idatha efanayo ingaba kunomthombo emikhulu yebhizinisi, okwenza ukuthi kungcono ukuthi zithunyelwe ngokushesha. Ngaphandle kwalokho, ama-data collectors amava ukugcina ukulawula okuhlobene kwedatha etholakalayo. 6 7 8 Ukufundwa kwe-Federated Learning (FL) , , is a learning paradigm seeking to address the problem of data governance and privacy by training algorithms collaboratively without exchanging the data itself. Olandelayo eyakhelwe izindawo ezahlukene, ezifana mobile and edge device use cases , it recently gained traction for healthcare applications , , , , , , , . I-FL inikeza ukufinyelela kwezibuyekezo ngokuxhumana, isib. Ngokwe-model ye-consensus, ngaphandle kokushintshisa idatha ye-patient ngaphandle kwe-firewalls ye-institutions lapho zihlala. Ngaphandle kwalokho, inqubo ye-ML ikhona lokusebenza kwelinye isakhiwo esihlalweni futhi kuphela izici ze-model (isib. Ama-parameter, ama-gradients) zithunyelwa njengezithombe ku-Fig. Ubuchwepheshe ezidlulile zibonise ukuthi amamodeli abafundiswe nge-FL angakwazi ukufinyelela izinga lokusebenza ezivela kumadodeli abafundiswe ku-centrally-hosted datasets futhi engaphezulu kumodeli abavela kuphela idatha e-isolated single-institutional , . 9 10 11 12 13 14 15 16 17 18 19 20 1 16 17 I-FL ye-aggregation server-i-FL ye-workflow ebonakalayo lapho i-federation ye-training nodes idolobha i-global model, i-re-submit ye-models e-partly-trained e-central server ngempumelelo ukuze i-aggregation bese uqhubeke ukuqeqeshwa ku-consensus model ukuthi i-server ivumela. FL peer-to-peer-ukuguqulwa okuguqulwa kwe-FL lapho wonke node yokufunda amamodeli eyenziwe ngokugcwele nge-party or all of its peers. Ukuqeqeshwa okuzenzakalelayo—i-non-FL training workflow ngokuvamile lapho izindawo zokufaka idatha zithumela idatha yabo ku-Data Lake esebenzayo lapho abasebenzisi kanye nabanye angakwazi ukuvikela idatha yokufundisa e-local, okungagunyaziwe. a b c Ukukhiqizwa okuphumelela kwe-FL kungenzeka ngempumelelo kakhulu yokusebenza kwe-precision medicine emikhulu, okwenza amamodeli abakwazi ukuthatha imibuzo emangalisayo, ukucacisa ngokuphathelene ne-physiology ye-individual, futhi zihlanganisa imiphumela emibi emangalisayo ngokuvumelana nezimo ze-government ne-privacy. Kodwa-ke, i-FL inesibophelele ukucacisa kwezobuchwepheshe ezinzima ukuqinisekisa ukuthi i-algorithm isebenza ngokushesha ngaphandle kokuphumelela ukhuseleko noma ukunakekelwa kwama-patient. Nokho, inesibophelele ukuhlangabezana nezimfuneko ze-approximations ezidingana ne-single pool ye-datas centralized. I-Federated Future ye-Digital Health (I-Data-Driven Medicine) inikeza imiphumela ye-FL yama-Digital Health (I-Data-Driven Medicine inikeza izinzuzo ze-Federated), kanye nokukhuthaza izinzuzo eziyinhloko ze-implementation ye-FL ye-Digital Health (I-Technical Considerations). I-Data-Driven Medicine Inikeza Izinzuzo ze-Federation I-ML, futhi ikakhulukazi i-DL, ikhona indlela yokufunda ulwazi e-de facto emizimbini eziningi, kodwa ukulungiselela ngempumelelo izicelo ze-Data-driven kufuneka zihlanganisa idatha amakhulu futhi ahlukahlukene. Nokho, izihlanganisi ze-medical zihlanganisa ezinzima (i-subsection “The Dependency on Data”). I-FL isixazulule le ngxaki ngokuvumela ukufundisa kwamakhasimende ngaphandle kokucubungula idatha (i-subsection “The Promise of Federated Efforts”) futhi i-subsection “I-impact on stakeholders” (i-subsection “Current FL efforts for digital health”). Le paradigm entsha yokufunda inikeza ukubuyekeza, kodwa inikeza izinzuzo, izihlangan Ukulungiselela idatha Data-driven approaches rely on data that truly represent the underlying data distribution of the problem. While this is a well-known requirement, state-of-the-art algorithms are usually evaluated on carefully curated data sets, often originating from only a few sources. This can introduce biases where demographics (e.g., gender, age) or technical imbalances (e.g., acquisition protocol, equipment manufacturer) skew predictions and adversely affect the accuracy for certain groups or sites. However, to capture subtle relationships between disease patterns, socio-economic and genetic factors, as well as complex and rare cases, it is crucial to expose a model to diverse cases. Imininingwane ye-databases enkulu ye-AI yokulungisa lithunyelwe izindlela ezininzi ezinikezela ukuxhumana idatha evela kumazwe amaningi. Lezi zithunyelwe ngokuvamile ku-Data Lakes. Lezi ziye ziye ziye ziye ziye ziye ziye zithunyelwe noma i-value yebhizinisi yedatha, isib. I-IBM Merge Healthcare ibhizinisi , noma njenge-resource for economic growth and scientific progress, isib. NHS Scotland's National Safe Haven I-French Health Data Hub , futhi Health Data Research UK . 21 22 23 24 Izinzuzo ezikhulu, kodwa ezincinane, zihlanganisa i-Human Connectome , the UK Biobank I-Cancer Imaging Archive (i-TCIA) NIH CXR8 , NIH DeepLesion I-Cancer Genome Atlas (i-TCGA) I-Alzheimer's Disease Neuroimaging Initiative (i-ADNI) , kanye nezidingo ezinkulu zonyango Njengomthombo we-Camelyon Challenge I-International Multimodal Brain Tumor Segmentation (BraTS) , , noma I-Medical Segmentation Decathlon Idatha ye-medical ye-Public ikakhulukazi i-task- noma i-sickness-specific futhi ikakhulukazi i-release nge-grade ezahlukene ye-license restrictions, ikakhulukazi i-limiting its exploitation. 25 26 27 28 29 30 31 32 33 34 35 36 37 Centralising or releasing data, however, poses not only regulatory, ethical and legal challenges, related to privacy and data protection, but also technical ones. Anonymising, controlling access and safely transferring healthcare data is a non-trivial, and sometimes impossible task. Anonymised data from the electronic health record can appear innocuous and GDPR/PHI compliant, but just a few data elements may allow for patient reidentification . The same applies to genomic data and medical images making them as unique as a fingerprint Ngakho-ke, uma inqubo ye-anonymization akubonakali ukufudumeza ukufudumeza idatha, okungenani ukunciphisa, ukuguqulwa kwama-patient noma ukufakelwa kwebhizinisi ayikwazi ukufakelwa. Ukufinyelela kwebhizinisi kubasebenzisi abagciniwe ikakhulukazi isetshenziswe njenge isixazululo esithathweni. Nokho, ngaphandle kokunciphisa ukufinyelela kwebhizinisi, lokhu kunezinto ezisebenzayo kuphela lapho isivumelwano esigcwele esekelwe kumadokhumenti, njengoba ukuguqulwa idatha evela kumadokhumenti abakwazi ukufinyelela idatha kunokwezifiso ngokuvamile. 7 38 The promise of federated efforts Umphumela we-FL kuyinto elula—ukuphendula izinzuzo ze-privacy ne-data governance ngokuvumela i-ML kusuka ku-non-co-located data. Ngokuvamile we-FL, wonke umphathi we-Data akufinyelela kuphela imiphumo yayo yayo yokulawula kanye nezinsizakalo zokuxhumana ezinezinhlangano, kodwa ukulawula ukufinyelela kwedatha futhi inokukwazi ukuguqulwa. Lokhu kubandakanya isizinda se-training kanye nesigaba se-validation. Ngokuvamile, i-FL ingakwazi ukwakha izinzuzo ezintsha, isibonelo, ngokuvumela ukuvalwa kwamanani, ku-in-institutional, noma ngokuvumela ukuhlolwa kwama-malayisi ezijwayelekile, lapho izinga lokuphendula zihlanganisa nezinhlangano As depicted in Fig. , i-FL workflow ingatholakala nge-topologies ezahlukene kanye ne-computing plans. Amabili angama-computing e-healthcare isebenzisa i-server ye-aggregation , , and peer to peer approaches , . In all cases, FL implicitly offers a certain degree of privacy, as FL participants never directly access data from other institutions and only receive model parameters that are aggregated over several participants. In a FL workflow with aggregation server, the participating institutions can even remain unknown to each other. However, it has been shown that the models themselves can, under certain conditions, memorise information , , , Ngenxa yalokho, izindlela ezifana ne-differential privacy , noma ucwaningo kusuka ku-data encrypted kuboniswa ukwandisa ngokushesha ukubaluleka kwe-privacy ku-FL setting (c.f. isigaba "Ukuhlolwa kwe-Technical considerations"). Ngokuvamile, umthamo we-FL yokusebenza kwe-healthcare iye yakhuthaza ukujabulela kwedolobha futhi FL ubuchwepheshe kuyinto indawo yokukhula uphando , . 2 16 17 18 15 39 40 41 42 43 44 45 46 12 20 I-FL topologies — isakhiwo se-communication ye-federation. I-centralized: I-aggregation server ibambisana nezifundo ze-training kanye nokufaka, i-aggregate ne-distributes amamodeli kuya kuma-Nodes ze-Training (i-Hub & Spoke). I-Descentralized: Isikhungo se-training isixhumanisi se-one or more peers ne-aggregation esebenza kumazwe ngamunye ngokuhambisana. I-Hierarchical: I-networks e-federated ingahlukaniswa kwezinye ama-sub-federations, okuyinto ingahlukaniswa kusuka ku-peer-to-peer ne-aggregation server federations ( I-FL Computational Plans—I-trajectory ye-model phakathi kwezinhlangano eziningana. Ukuqeqeshwa okuqhubekayo / Ukuqeqeshwa okuqhubekayo. Imininingwane ye-server I-peer ye-peer ye-peer a b c d e f g Ukusebenza okuqhubekayo we-FL ku-digital health Njengoba FL kuyinto paradigm yokufundisa jikelele okuyinto ukunciphisa inkinobho yokuhlanganisa idatha ekuthuthukiseni imodeli AI, isicelo se-FL ifakwe wonke I-AI for healthcare. Ngokuvumela amathuba yokufaka ukuguqulwa kwe-data eningi futhi ukuhlola izigulane ezahlukile ezahlukile, i-FL ingathumela izinguquko ezinzima kodwa futhi isetshenziselwa manje. Ngokuhambisana ne-Electronic Health Records (EHR), isibonelo, i-FL inikeza ekubunjweni nokufumana izigulane ezivamile. , , kanye nokubuyekeza izivakashi ngenxa iziganeko zemvelo , mortality and ICU stay time . The applicability and advantages of FL have also been demonstrated in the field of medical imaging, for whole-brain segmentation in MRI , kanye ne-brain tumor segmentation , Ngaphezu kwalokho, ubuchwepheshe eyasetyenziselwa fMRI ukubhalisa ukuze uthole ama-biomarkers asebenzayo ezihlobene nezifo futhi isixazulwe njenge-approximation enhle ku-COVID-19 . 13 47 14 19 15 16 17 18 48 It is worth noting that FL efforts require agreements to define the scope, aim and technologies used which, since it is still novel, can be difficult to pin down. In this context, today’s large-scale initiatives really are the pioneers of tomorrow’s standards for safe, fair and innovative collaboration in healthcare applications. Kufaka ku-Consortium eyenziwe ku-Advanced izifundo, njenge-Trustworthy Federated Data Analytics (TFDA) Project futhi I-Joint Imaging Platform ye-German Cancer Consortium , which enable decentralised research across German medical imaging research institutions. Another example is an international research collaboration that uses FL for the development of AI models for the assessment of mammograms I-Study ibonise ukuthi amamodeli eyenziwe nge-FL wahlanganyela abaqeqeshiwe ku-data ye-institut ye-single futhi zithunyelwe kakhulu, ukuze zithunyelwe kahle ku-data ye-i-institutes. Kodwa-ke, i-FL ayinezingeni kuphela emkhakheni yama-academic. I-Academic 49 50 51 Ngokuhambisana nezinsizakalo zokwelapha, ngaphandle kokufaka kweziqu ze-research centers, i-FL ingakwazi ukufinyelela ngqo impact. The on-going HealthChain project , for example, aims to develop and deploy a FL framework across four hospitals in France. This solution generates common models that can predict treatment response for breast cancer and melanoma patients. It helps oncologists to determine the most effective treatment for each patient from their histology slides or dermoscopy images. Another large-scale effort is the Federated Tumour Segmentation (FeTS) initiative , okuyinto i-federation ye-international of 30 izakhiwo zempilo ezihlangene usebenzisa i-open-source FL framework nge-graphic user interface. Umthombo wokuphucula ukucaciswa kwama-tumor border, kuhlanganise i-brain glioma, ama-brust tumors, ama-liver tumors kanye nama-bone lesions emzimbeni amakhulu e-myeloma. clinical 52 53 Enye indawo yokusabela ngaphakathi research and translation. FL enables collaborative research for, even competing, companies. In this context, one of the largest initiatives is the Melloddy project Ukusebenza kwe-FL ye-multi-task ku-datasets ye-10 amabhizinisi ze-pharmaceutical. Ngokwenza isampula esebenzayo se-predictive, okuyinto ivimbela ukuthi izinhlanganisela zamakhemikhali zihlanganisa amaprotheni, ama-partners zihlanganisa inqubo yokukhipha kwezidakamizwa ngaphandle kokubonisa idatha zabo ezinhle kakhulu. industrial 54 Imiphumela ku-stakeholder FL comprises a paradigm shift from centralised data lakes and it is important to understand its impact on the various stakeholders in a FL ecosystem. I-Clinic I-clinicists ihamba ngokuvamile i-subgroup ye-population ngokuvumelana ne-location kanye ne-demographic environment, okwenza izimo ezingenalutho mayelana ne-probability of certain diseases or their interconnection. Ngokusebenzisa izinhlelo ezisekelwe ku-ML, isibonelo, njenge-reader yesibili, bakwazi ukwandisa izidakamizwa yabo ngokuvumelana nezidakamizwa ezingaphezu kwezinye izakhiwo, ngokuvumelana nokuvumelana nokuvumelana kokubili engatholakali. Nakuba lokhu kusetshenziselwa uhlelo esisekelwe ku-ML ngokuvumelana ngokuvamile, izinhlelo ezijolwe ngokuvumelana ngokuvumelana ngokuvumelana ngokuvumelana zithumela izinqumo ezingenalutho futhi ukucindezeleka kwama-cases eziningana Izigulane Patients are usually treated locally. Establishing FL on a global scale could ensure high quality of clinical decisions regardless of the treatment location. In particular, patients requiring medical attention in remote areas could benefit from the same high-quality ML-aided diagnoses that are available in hospitals with a large number of cases. The same holds true for rare, or geographically uncommon, diseases, that are likely to have milder consequences if faster and more accurate diagnoses can be made. FL may also lower the hurdle for becoming a data donor, since patients can be reassured that the data remains with their own institution and data access can be revoked. Hospitals and practices I-Hospital ne-Practices ingathola ngokugcwele kanye nokufaka idatha yayo yama-patient nge-traceability ephelele ye-access data, okunciphisa ingozi yokusebenzisa kwamanye amazwe. Kodwa-ke, lokhu kuncike ukuthenga ku-on-premise computing infrastructure noma ukunikeza inkonzo ye-private-cloud kanye nokuxhumana ne-standardized ne-synoptic data formats ukuze ama-ML amamodeli angasetshenziswe nokuthuthukiswa ngokushesha. Umthamo we-computing efunekayo ikakhulukazi kuxhomekeke ukuba i-site iyatholakala kuphela ekuthuthukiseni nokuthuthukiswa nokuthuthukiswa noma futhi ekuthuthukiseni ama-training. Ngaphandle kwalokho, izakhiwo ezincinane zingatholakala futhi ziye zith Researchers and AI developers Ukuhlinzeka kanye nabasebenzi be-AI zitholakala ukufinyelela kwekhompyutha esikhulu se-real-world data, okuyinto iziphumela izikhwama ezincinane ze-research labs kanye ne-start-ups. Ngakho-ke, izinsiza zingatholakala ekuphenduleni izidingo ze-clinical kanye nezinkinga zebhizinisi ezihambisana, ngaphandle kokufikelela emithonjeni amancane we-open data sets. Ngesikhathi esifanayo, kubalulekile ukufundisa izindlela ze-algorithmic for federated training, njll, indlela yokuxhuma amamodeli noma ukuhlaziywa ngokushesha, indlela yokubambisana ngokushesha ku-distribution shifts. , , I-FL-based development inikeza futhi ukuthi umfundisi noma umphakeli we-AI ayikwazi ukuhlola noma ukucubungula zonke idatha esebenzayo yama-model, isibonelo, akuyona inokukwazi ukucubungula isimo se-fault eyodwa ukuze ufunde ukuthi isibonelo esilandelayo isebenza kahle. 11 12 20 Izinzuzo zempilo Umphakeli we-healthcare kumazwe amaningi angatholakala ngokushesha i-paradigm shift kusuka ku-volume-based, i-i.e., i-fee-for-service-based, kuya ku-value-based healthcare, okuyinto kubangelwa kakhulu ekubunjweni okuphumelela kwe-precision medicine. It is not about promoting more expensive individualized therapies but instead about achieving better results earlier through more focused treatment, thus reducing the cost. FL has the potential to increase the accuracy and robustness of healthcare AI, while reducing costs and improving patient outcomes, and can therefore be vital to precision medicine. Umkhiqizi Manufacturers of healthcare software and hardware could benefit from FL as well, since combining the learning from many devices and applications, without revealing patient-specific information, can facilitate the continuous validation or improvement of their ML-based systems. However, realising such a capability may require significant upgrades to local compute, data storage, networking capabilities and associated software. Imibuzo Technical I-FL ikwazi kakhulu kusukela umsebenzi we-Konečnỳ et al. , but various other definitions have been proposed in the literature , , , . A FL workflow (Iphuzu. ) kungenziwa nge topologies ezihlukahlukene kanye nezinhlelo zokusebenza (I-Fig. Umphumela we-FL, kodwa isixazululo efanayo, i-i.e., ukuhlanganisa ulwazi olufundwe kusuka ku-non-co-located data. Kulesi isigaba, sincoma ngokuphathelene kakhulu ukuthi FL kuyinto, kanye nokukhuthaza izinzuzo eziyinhloko kanye nezinkinga zobuchwepheshe ezivela lapho isetshenziswe i-FL ku-digital health. 55 9 11 12 20 1 2 Ukufundisa Federated I-FL yi-paradigma yokufundisa lapho izinhlayiya eziningana zokusebenza ngokucophelela ngaphandle kokufuna ukuhlangabezana noma ukuhlangabezana izinhlayiya zebhizinisi. Isisombululo esikhulu se-FL ivame kulandelayo: Let denote a global loss function obufumana nge-combination we-weighted of izindleko zendawo, zihlanganiswe kusuka ku-private data , which is residing at the individual involved parties and never shared among them: K Xk Yini > 0 inikeza izinga lokuxhumana. ikhaya Ngokuvamile, wonke umbhali akufinyelela kanye nokuthuthukisa imodeli ye-global consensus ngokuvumela imizuzu eminingi ye-optimization e-locally futhi ngaphambi kokuthunyelwe ama-updates, noma ngqo noma nge-parameter server. Ngaphezulu imizuzu ye-local training ifakwe, kuncike ukuthi inqubo jikelele ivimbele (Eq. ) , . Inqubo efanayo yokuhlanganisa ama-parametres kulingana ne-network topology, njengoba ama-nodes angase zihlukaniswe ku-sub-networks ngenxa yama-constraints zezwe-geographical noma zomthetho (bheka Fig. 5). ). Aggregation strategies can rely on a single aggregating node (hub and spokes models), or on multiple nodes without any centralisation. An example is peer-to-peer FL, where connections exist between all or a subset of the participants and model updates are shared only between directly connected sites , I-Algorithm I-1 inikeza umzekelo we-fl aggregation, lapho isibonelo yokuhlanganisa kwe-fl ye-centralized iboniswe ku-Algorithm I-1: Qaphela ukuthi izinhlelo zokuhlanganisa ayidinga ulwazi malunga ne-update ephelele yama-model; amakhasimende angathola ukunikezela kuphela ingxenye yama-subset yama-parameter yama-model ngenxa yokunciphisa ukuxhumana okuqhubekayo, ukuqinisekisa ukhuseleko olungcono lokuxhumana. noma ukukhiqiza i-multi-task learning algorithms okuyinto kuphela ingxenye yama-parameter eyenziwe ngempumelelo. 1 9 12 2 15 56 10 I-framework yokuhlanganisa okuvumela izinhlelo ezithakazelisayo zokusebenza kungenzeka ukuxazulula izindleko ze-computing (i-data ne-server) kusuka ku-computer. , as depicted in Fig. . The latter defines the trajectory of a model across several partners, to be trained and evaluated on specific data sets. Ukulinganisa Plan 2 Challenges and considerations Nangona izinzuzo ze-FL, akuyona zonke izimo ezijwayelekile zokufundisa idatha yezokwelapha. Ukwakhiwa kwama-model enhle kunesibophelela amafakazi ezifana nokwalithi yedatha, i-bias kanye ne-standardization . Lezi zinkinga ziya kusungulwe kumaziko lokufundisa kanye non-federated ngokusebenzisa izindlela ezifanele, njenge-studial design enhle, i-protocol common for data acquisition, ukuhlaziywa okwakhiwa kanye ne-methodology eyenziwe yokufunda i-bias kanye ne-stratification ebonakalayo. Ngezansi, sinikezela izindawo eziyinhloko ze-FL okuyizinhloko ezizodwa lapho isetshenziswe ku-digital health futhi kufanele kusetshenziswe ekubunjweni kwe-FL. Ukuze uthole imininingwane zobuchwepheshe nokuxhumana okuhlobene, sinikezela umdlali izifundo ezintsha , , . 2 11 12 20 Heterogeneity data Medical data is particularly diverse—not only because of the variety of modalities, dimensionality and characteristics in general, but even within a specific protocol due to factors such as acquisition differences, brand of the medical device or local demographics. FL may help address certain sources of bias through potentially increased diversity of data sources, but inhomogeneous data distribution poses a challenge for FL algorithms and strategies, as many are assuming independently and identically distributed (IID) data across the participants. In general, strategies such as are prone to fail under these conditions , , , okungenani ukuhlangabezana isisekelo esisodwa se-strategies ye-collaborative learning. Nokho, imiphumela esilandelayo ibonisa ukuthi ukuqeqeshwa kwe-FL kunzima , even if medical data is not uniformly distributed across the institutions , noma kuhlanganise i-bias yendawo Ukuhlolwa okuhlobisa le nkqubo kuhlanganisa, isibonelo, I-strategy ye-part-data-sharing futhi FL nge-domain-adaptation Umthamo we-Data Heterogeneity kungenzeka ukuthi isixazululo se-Optimum ye-global angakwazi ukufinyelela ku-Optimum for an individual local participant. I-definition ye-Model Training Optimality kufanele, ngoko ke, kuxhomekeke kumazwe wonke abalandeli ngaphambi kokufunda. FedAvg 9 9 57 58 59 16 17 51 FedProx 57 58 18 I-Privacy kanye ne-Security Healthcare data is highly sensitive and must be protected accordingly, following appropriate confidentiality procedures. Therefore, some of the key considerations are the trade-offs, strategies and remaining risks regarding the privacy-preserving potential of FL. I-Privacy vs. Performance: Kubaluleke ukunceda ukuthi i-FL ayixazulule zonke izimo zokusebenza ze-privacy futhi – efana ne-ML algorithms ngokuvamile – uzodinga izimo ezithile. Izindlela zokuvumelana ne-Privacy ye-FL zibonisa izinga zokuvumelana eziningana nezimodeli ze-ML ezivamile ezivamile ezivamile ezisebenzayo. Kodwa-ke, kukhona isivumelwano ngokuvumelana nokusebenza futhi lezi zindlela zingatholakala, isibonelo, ukucaciswa kwama-model yokuqala Ngaphezu kwalokho, ubuchwepheshe elandelayo kanye / noma idatha ezihambayo angasetshenziselwa ukuhlangabezana imodeli owayokwenzeka ngaphambi kokubili ukuthi kuyinto low-risk. 12 10 Level of trust: Broadly speaking, participating parties can enter two types of FL collaboration: —Ukuhlukaniswa kwe-FL ye-consortium, lapho zonke ama-party zihlanganiswa ngokutholile futhi zihlanganiswa nge-encrypted collaboration agreement, singakwazi ukunciphisa izimo ezininzi ezingenalutho, njenge-intent ezisebenzayo ukukhuthaza ulwazi oluthile noma ukuhlangabezana kwe-model ngokutholile. Lokhu kuncike inkinga kwezimo zokuphendula ezihlangene nezimfuneko ezivamile zokufundisa. Trusted —Uma izinhlelo ze-FL ezisebenzayo emikhulu, kungenzeka ukuthi kungenziwa i-executable collaborative agreement. Ezinye amakhasimende angakwazi ukuhlangabezana ukusebenza, ukunciphisa inkqubo noma ukuthatha ulwazi evela kumazwe amaningi. Ngakho-ke, izindlela zokhuseleko ziye ziye ziye ziye ziye ziye ziye ziye ziye ziye ziye ziye ziye ziye zihlanganisa izinzuzo ezifana nokushintshwa okuqhubekayo kwama-model, ukuhambisa okuhambelana kumazwe ngamunye, ukucindezeleka kwezinto, ukuhambelana kwezinguquko, izinhlelo zokubhalisa, ukuhambisa ukuhambisa, ukuhambelana kwama-model, kanye nokuthintshwa kwama-attack adversarial. Ukukhangisa Information leakage: By definition, FL systems avoid sharing healthcare data among participating institutions. However, the shared information may still indirectly expose private data used for local training, e.g., by model inversion of the model updates, i-gradients ngokuvamile or adversarial attacks , . FL is different from traditional training insofar as the training process is exposed to multiple parties, thereby increasing the risk of leakage via reverse-engineering if adversaries can observe model changes over time, observe specific model updates (i.e., a single institution’s update), or manipulate the model (e.g., induce additional memorisation by others through gradient-ascent-style attacks). Developing counter-measures, such as limiting the granularity of the updates and adding noise , Ukubonisa ukubaluleka kwe-differential privacy , kungenziwa futhi kuyinto indawo esebenzayo yokufundisa . 60 61 62 63 16 18 44 12 Ukucaciswa kanye nokuthintela As per all safety-critical applications, the reproducibility of a system is important for FL in healthcare. In contrast to centralised training, FL requires multi-party computations in environments that exhibit considerable variety in terms of hardware, software and networks. Traceability of all system assets including data access history, training configurations, and hyperparameter tuning throughout the training processes is thus mandatory. In particular in non-trusted federations, traceability and accountability processes require execution integrity. After the training process reaches the mutually agreed model optimality criteria, it may also be helpful to measure the amount of contribution from each participant, such as computational resources consumed, quality of the data used for local training, etc. These measurements could then be used to determine relevant compensation, and establish a revenue model among the participants . One implication of FL is that researchers are not able to investigate data upon which models are being trained to make sense of unexpected results. Moreover, taking statistical measurements of their training data as part of the model development workflow will need to be approved by the collaborating parties as not violating privacy. Although each site will have access to its own raw data, federations may decide to provide some sort of secure intra-node viewing facility to cater for this need or may provide some other way to increase explainability and interpretability of the global model. 64 I-Architecture ye-System Unlike running large-scale FL amongst consumer devices such as McMahan et al. Izifundo zempilo zihlanganisa nezinsizakalo ezinzima kakhulu ze-computing kanye nezinkampani ezinzima ezinzima ezingenalutho ezinikezwayo ezinikezela ukuqeqesha amamodeli amakhulu ngezinyathelo ezinzima zokusebenza ezindaweni eziningi, futhi ukusabalalisa ulwazi olusebenzayo eziningana nezinhlangano. Lezi izici ezizodwa ze-FL emzimbeni zempilo zihlanganisa izixazululo ezifana nokupholisa ukuhlanganiswa kwedatha ngokusebenzisa izinhlangano ezinzima, ukudlulisela izindlela zokuphathelela zokuvimbela ukuhlangabezana ukuhlangabezana kwedatha, noma ukudlulisela izinhlangano ezisebenzayo zokuphathelisa izixhobo zokuphathelela nokunciphisa isikhathi eside. 9 Ukuphathwa kwe-federation efana kungenziwa ngezindlela eziningana. Kwesimo ezidingekayo zokuxhumana kwamathuluzi phakathi kwabasebenzi, ukuqeqeshwa kungenziwa ngokusebenzisa uhlobo lwezinkqubo ye-"honest broker", lapho umphakeli we-third-party enokutholwe njenge-intermediary futhi inikeza ukufinyelela kwedatha. Lokhu ukufakelwa kufuneka i-entity eyahlukile ekulawula uhlelo jikelele, okuyinto ayikwazanga ngokuqondile, njengoko kungatholakala izindleko ezengeziwe kanye ne-viscosity yokusebenza. Nokho, kunezingcele ukuthi izinhlelo zangaphambili zangaphakathi zingatholakala ngaphandle kwamakhasimende, okwenza uhlelo engaphansi futhi kulula ukuhlaziywa. Ngaphandle kwalokho, ku-peer-to-peer system Conclusion I-ML, futhi ikakhulukazi i-DL, iholele isakhiwo esikhulu se-innovation emakethe ye-digital healthcare. Njengoba zonke izindlela ze-ML zitholakala kakhulu kusuka ku-capability to access data that approximates the true global distribution, i-FL iyisisombululo yokufinyelela amamodeli amakhulu, amancane, amancane, amancane futhi amangalisayo. Ngokuvumela ama-party eziningana ukuqeqeshwa ngokuxhumana ngaphandle kokufuna noma ukuxhaswa ama-datasets, i-FL isixazululo emangalisayo ezihlangene ne-egress of sensitive medical data. Ngenxa yalokho, kungase uvumela izindlela ezintsha zokufundisa nezinkampani kanye nokupholisa izindleko zokuxhumana nezidakamizwa emhlabeni wonke. Nokho, . Despite this, we truly believe that its potential impact on precision medicine and ultimately improving medical care is very promising. 12 Ukubuyekezwa Further information on research design is available in the Ngena ngemvume kule post. Nature Research Reporting Summary References LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. , 436 (2015). Nature 521 Wang, F., Casalino, L. P. & Khullar, D. Deep learning in medicine—promise, progress, and challenges. , 293–294 (2019). JAMA Intern. Med. 179 Chartrand, G. et al. Deep learning: a primer for radiologists. , 2113–2131 (2017). Radiographics 37 De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. , 1342 (2018). Nat. Med. 24 Sun, C., Shrivastava, A., Singh, S. & Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In , 843–852 ( , 2017). Proceedings of the IEEE international conference on computer vision IEEE Van Panhuis, W. G. et al. A systematic review of barriers to data sharing in public health. , 1144 (2014). BMC Public Health 14 Rocher, L., Hendrickx, J. M. & De Montjoye, Y.-A. Estimating the success of re-identifications in incomplete datasets using generative models. , 1–9 (2019). Nat. Commun. 10 Schwarz, C. G. et al. Identification of anonymous mri research participants with face-recognition software. , 1684–1686 (2019). N. Engl. J. Med. 381 McMahan, B., Moore, E., Ramage, D., Hampson, S. & y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. In , 1273–1282. (2017). Artificial Intelligence and Statistics https://scholar.google.de/scholar?hl=de&as_sdt=0%2C5&q=Communicationefficient+learning+of+deep+networks+from+decentralized+data&btnG= Li, T., Sahu, A. K., Talwalkar, A. & Smith, V. Federated learning: Challenges, methods, and future directions. , 50–60 (IEEE, 2020). IEEE Signal Processing Magazine 37 Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. , 12 (2019). ACM Trans. Intell. Syst. Technol. (TIST) 10 Kairouz, P. et al. Advances and open problems in federated learning. (2019). arXiv preprint arXiv:1912.04977 Lee, J. et al. Privacy-preserving patient similarity learning in a federated environment: development and analysis. , e20 (2018). JMIR Med. Inform. 6 Brisimi, T. S. et al. Federated learning of predictive models from federated electronic health records. , 59–67 (2018). Int. J. Med. Inform. 112 Roy, A. G., Siddiqui, S., Pölsterl, S., Navab, N. & Wachinger, C. Braintorrent: a peer-to-peer environment for decentralized federated learning. (2019). arXiv preprint arXiv:1905.06731 Li, W. et al. Privacy-preserving federated brain tumour segmentation. In , 133–141 (Springer, 2019). International Workshop on Machine Learning in Medical Imaging Sheller, M. J., Reina, G. A., Edwards, B., Martin, J. & Bakas, S. Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. In , 92–104 (Springer, 2018). International MICCAI Brainlesion Workshop Li, X. et al. Multi-site fmri analysis using privacy-preserving federated learning and domain adaptation: abide results. (2020). arXiv preprint arXiv:2001.05647 Huang, L. et al. Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. , 103291 (2019). J. Biomed. Inform. 99 Xu, J. & Wang, F. Federated learning for healthcare informatics. (2019). arXiv preprint arXiv:1911.06270 Roy, A. & Banerjee, A. Ibm’s merge healthcare acquisitio . (2015) (Accessed 10 February 2020). n https://www.reuters.com/article/us-merge-healthcare-m-a-ibm/ibm-to-buy-merge-healthcare-in-1-billion-deal-idUSKCN0QB1ML20150806 Nhs scotland’s national safe haven. (2015) (Accessed 10 February 2020). https://www.gov.scot/publications/charter-safe-havens-scotland-handling-unconsented-data-national-health-service-patient-records-support-research-statistics/pages/4/ Cuggia, M. & Combes, S. The french health data hub and the german medical informatics initiatives: Two national projects to promote data sharing in healthcare. , 195–202 (2019). Yearbook Med. Informat. 28 Health Data Research UK. (Health Data Research UK, 2020) (Accessed 10 Feb 2020). https://www.hdruk.ac.uk/ Sporns, O., Tononi, G. & Kötter, R. The human connectome: a structural description of the human brain. . , e42, (2005). PLoS Comput. Biol 1 https://doi.org/10.1371/journal.pcbi.0010042 Sudlow, C. et al. Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. . , e1001779. (2015). PLoS Med 12 https://doi.org/10.1371/journal.pmed.1001779 Clark, K. et al. The cancer imaging archive (tcia): maintaining and operating a public information repository. , 1045–1057 (2013). J. Digit. Imaging. 26 Wang, X. et al. Chestx-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In , 2097–2106 ( , 2017). Proceedings of the IEEE conference on computer vision and pattern recognition IEEE Yan, K., Wang, X., Lu, L. & Summers, R. M. Deeplesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. , 036501 (2018). J Med. Imaging. 5 Tomczak, K., Czerwińska, P. & Wiznerowicz, M. The cancer genome atlas (tcga): an immeasurable source of knowledge. , A68 (2015). Contemp. Oncol. 19 Jack Jr., C. R. et al. The alzheimer’s disease neuroimaging initiative (adni): Mri methods. , 685–691 (2008). J. Magn. Reson. Imaging 27 . (2020) (Accessed 24 July 2020). Grand Challenge-a Platform for End-to-end Development of Machine Learning Solutions in Biomedical Imaging https://grand-challenge.org/ Litjens, G. et al. 1399 h&e-stained sentinel lymph node sections of breast cancer patients: the camelyon dataset. , giy065 (2018). GigaScience 7 Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (brats). , 1993–2024 (2014). IEEE Trans. Med. Imaging 34 Bakas, S. et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. (2018). arXiv preprint arXiv:1811.02629 Bakas, S. et al. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. , 170117 (2017). Sci. Data 4 Simpson, A. L. et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. (2019). arXiv preprint arXiv:1902.09063 Yeh, F.-C. et al. Quantifying differences and similarities in whole-brain white matter architecture using local connectome fingerprints. , e1005203 (2016). PLoS Comput. Biol. 12 Chang, K. et al. Distributed deep learning networks among institutions for medical imaging. , 945–954 (2018). J. Am. Med. Inform. Assoc. 25 Shokri, R., Stronati, M., Song, C. & Shmatikov, V. Membership inference attacks against machine learning models. In , 3-18 (IEEE, 2017). 2017 IEEE Symposium on Security and Privacy (SP) Sablayrolles, A., Douze, M., Ollivier, Y., Schmid, C. & Jégou, H. White-box vs black-box: Bayes optimal strategies for membership inference. In Chaudhuri, K. & Salakhutdinov, R. (eds) , 5558–5567. (PMLR, 2019). Proceedings of the 36th International Conference on Machine Learning, {ICML} 97 http://proceedings.mlr.press/v97/sablayrolles19a.html Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning requires rethinking generalization. In , (OpenReview.net, 2017). 5th International Conference on Learning Representations, {ICLR}. https://openreview.net/forum?id=Sy8gdB9xx Carlini, N., Liu, C., Erlingsson, Ú., Kos, J. & Song, D. The secret sharer: evaluating and testing unintended memorization in neural networks. In Heninger, N. & Traynor, P. (eds) { } ({ } , 267–284. ({USENIX} Association, Santa Clara, CA, USA, 2019). 28th USENIX Security Symposium USENIX Security 19 https://www.usenix.org/conference/usenixsecurity19/presentation/carlini Abadi, M. et al. Deep learning with differential privacy. In , 308–318 (ACM, 2016). Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security Shokri, R. & Shmatikov, V. Privacy-preserving deep learning. In , 1310–1321 (ACM, 2015). Proceedings of the 22nd ACM SIGSAC conference on computer and communications security Langlotz, C. P. et al. A roadmap for foundational research on artificial intelligence in medical imaging: from the 2018 nih/rsna/acr/the academy workshop. , 781–791 (2019). Radiology 291 Kim, Y., Sun, J., Yu, H. & Jiang, X. Federated Tensor Factorization for Computational Phenotyping. In . 887–895. (ACM, Halifax, NS, Canada, 2017). Proceedings of the 23rd {ACM} {SIGKDD} International Conference on Knowledge Discoveryand Data Mining https://doi.org/10.1145/3097983.3098118 He, C., Annavaram, M. & Avestimehr, S. Fednas: Federated deep learning via neural architecture search. (2020). https://sites.google.com/view/cvpr20-nas/ Trustworthy federated data analytics (tfda). (2020) (Accessed 28 May 2020). https://tfda.hmsp.center/ Joint Imaging Platform (Jip). (2020) (Accessed 28 May 2020). https://jip.dktk.dkfz.de/jiphomepage/ Medical institutions collaborate to improve mammogram assessment ai. (2020) (Accessed 28 May 2020). https://blogs.nvidia.com/blog/2020/04/15/federated-learning-mammogram-assessment/ Healthchain consortium. (2020) (Accessed 28 May 2020). https://www.substra.ai/en/healthchain-project The federated tumor segmentation (fets) initiative. (2020) (Accessed 28 May 2020). https://www.fets.ai Machine learning ledger orchestration for drug discovery. (2020). Accessed 28 May 2020. https://cordis.europa.eu/project/id/831472 Konečny`, J., McMahan, H. B., Ramage, D. & Richtárik, P. Federated optimization: Distributed machine learning for on-device intelligence. (2016). arXiv preprint arXiv:1610.02527 Lalitha, A., Kilinc, O. C., Javidi, T. & Koushanfar, F. Peer-to-peer federated learning on graphs. (2019). arXiv preprint arXiv:1901.11173 Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A. & Smith, V. Federated optimization in heterogeneous networks. (2018). arXiv preprint arXiv:1812.06127 Zhao, Y. et al. Federated learning with non-iid data. (2018). arxivabs/1806.00582 Li, X., Huang, K., Yang, W., Wang, S. & Zhang, Z. On the convergence of fedavg on non-IID data. (2020). https://openreview.net/forum?id=HJxNAnVtDS Wu, B. et al. P3sgd: patient privacy preserving SGD for regularizing deep CNNs in pathological image classification. In (pp. 2099–2108) (2019). Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Zhu, L., Liu, Z. & Han, S. Deep leakage from gradients. In Wallach, H. M. et al. (eds) , 14747–14756. (2019). Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems http://papers.nips.cc/paper/9617-deep-leakage-from-gradients Wang, Z. et al. Beyond inferring class representatives: user-level privacy leakage from federated learning. In 2512–2520. (IEEE, Paris, France, 2019). 2019 {IEEE} Conferenceon Computer Communications, {INFOCOM} https://doi.org/10.1109/INFOCOM.2019.8737416 Hitaj, B., Ateniese, G. & Perez-Cruz, F. Deep models under the gan: information leakage from collaborative deep learning. In , CCS’17, 603–618 (Association for Computing Machinery, New York, NY, USA, 2017). Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security Ghorbani, A. & Zou, J. Data shapley: Equitable valuation of data for machine learning. In (pp. 2242-2251) (2019). International Conference on Machine Learning Acknowledgements Ukusungulwa kwe-UK Research and Innovation London Medical Imaging & Artificial Intelligence Centre for Value-Based Healthcare, i-Wellcome/EPSRC Centre for Medical Engineering (WT203148/Z/16/Z), i-Wellcome Flagship Programme (WT213038/Z/18/Z), i-Intramural Research Programme of the National Institutes of Health (NIH) Clinical Center, i-National Cancer Institute of the NIH under award number U01CA242871, i-National Institute of Neurological Disorders and Stroke of the NIH under award number R01NS042645, kanye ne-Helmholtz Initiative and Networking Fund (i-project “Trustworthy Federated Data Analytics”) ne-PRIME program ye-Deutscher Akademischer Austauschdienst (DAAD Okuzenzakalelayo iyatholakala ngaphansi kwe-CC by 4.0 Deed (i-Attribution 4.0 International) isivumelwano. Okuzenzakalelayo Ngaphandle kwe-CC by 4.0 Deed (i-Attribution 4.0 International) isicelo. Ukubuyekezwa ku-Nature