'N Nuwe Privaatheid-Eerste AI Voorspel COVID Severity Gebruik X-Rays en Mediese Rekords

Die skrywers: Die dag van Dayan Holger R. Roth se werk Aoksiou Zhong Ahmed Harouni Amilcare vriendelik Anas Z. van Abidin Andrew Liu Anthony Beardsworth Costa Bradford J. Woud Kies vir: Sing Tsai Chih-Hung Wang Chun-Nan Hsu C. K. Lee oor Peiying Ruan Daguang Xu Dufan Wu Eddie Huang Felipe Campos Kitaro Die Griffin Lacey Gustavo César van Antônio Corradi Gustavo Nino Hoo-Hsin Shin Hirofumi Obinata Hui Ren Jason C. Crane Jesse Tetreault se Die Jiahui Guan John W. Garrett JOSHUA D. KAGGIE Die Jung Gil Park Keith Dreyer se lewe Krishna Juluru Christoffer Kersten Geskryf deur Marcio Aloisio Bezerra Cavalcanti Rockenbach Marius George Linguraru Masoom A. Haider Ek dink aan Abdelmaas Nicola Rieke Pablo F. Damasceno deur Pedro Mario Cruz en Silva Geskryf deur Wang Sjoe Xu Shuichi Kawano Sira Sriswasdi Soo Young Park Thomas M. Grys Die boek van Watson se middagete Weichung Wang Wêreld Jong Tak Sjoe Li Xihong Lin Jong Joon Kwon Abood Quraini Andrew Feng Andrew N. Priest Vryheid Turkbey Benjamin Glicksberg se Bernardo Bizzo Kyk na Kim Carlos Tor-Díez se lewe Geskryf deur Chia-Cheng Lee Chia-Jung Hsu Die Chin Lin Chiu-Ling Lai Christophe P. Hess se werk Colin kompas Deepeksha Bhatia Eric K. Oermann Evan Leibovitz Hisashi Sasaaki Hitoshi sterf Isak Yang Jae Ho Sohn Krishna Nand Keshava Murthy Li-Chen het Matheus Ribeiro Furtado van Mendonça Mike van Fralick Ek ken Kang Mahmoud Adil Natalie Gangai Vryheid van Vateekul Pierre Elnajjar deur Sarah Hickman Sharmila Majumdar Geskryf deur Shelley L. McLeod Sheridan Reed se Stefan Graaf Stephenie Harmon Tatsuya Kodama Thanyawee Puthanakit Tony Mazzulli se Vitor Lima van die Werk Yothin Rakvongthai YU RIM LEE Oom Wen Fiona J. Gilbert se gesig Mona G. Blomme Kyk na Li Die skrywers: Die dag van Dayan Holger R. Roth se werk Aoksiou Zhong Ahmed Harouni Amilcare vriendelik Anas Z. van Abidin Andrew Liu Geskryf deur Anthony Beardsworth Costa Bradford J. Woud Kies vir: Sing Tsai Chih-Hung Wang Chun-Nan Hsu C. K. Lee oor Die rooi rooi Daguang Xu Dufan Wu Eddie Huang Felipe Campos Kitaro Die Griffin Lacey Gustavo César van Antônio Corradi Gustav Nino Hoo-Hsin Shin Die Hirofumi Obinata Hui Ren Jason C. Crane Jesse Tetreault se Die Jiahui Guan deur John W. Garrett JOSHUA D. KAGGIE Die Jung Gil Park Keith Dreyer se lewe Krishna Juluru Christoffer Kersten Geskryf deur Marcio Aloisio Bezerra Cavalcanti Rockenbach Marius George Linguraru Masoom A. Haider Ek dink aan Abdelmaas Nicola Rieke Pablo F. Damasceno deur Pedro Mario Cruz en Silva Geskryf deur Wang Sjoe Xu Kyk na Shuichi Kawano Sira Sriswasdi Soong Jong Park Thomas M. Grys Die boek van Watson se middagete Oorweging Wang Wêreld Jong Tak Sjoe Li Xihong Lin Jong Joon Kwon Vrou van die Koran Andreë Feng Andrew N. Priest se werk Vryheid Turkbey Benjamin Glicksberg se Bernardo Bizzo se lewe Kyk na Kim Carlos Tor-Díez se lewe Geskryf deur Chia-Cheng Lee Chia-Jung Hsu Die Chin Lin Chiu-Ling Lai Christophe P. Hess se werk Colin kompas Deepeksha Bhatia Eric K. Oermann Evan Leibovitz Hisashi Sasaaki Hitoshi sterf Isak Yang Jae Ho Seun Krishna Nand en Keshava Murthy Li-Chen het Matheus Ribeiro Furtado van Mendonça Mike van Fralick Ek ken Kang Mahmoud Adil Natalie Gangaï Vryheid van Vateekul Pierre Elnajjar deur Sarah Hickman Sharmila Majumdar Geskryf deur Shelley L. McLeod Sheridan Reed se Stefan Graaf Stephenie Harmon Tatsuya Kodama Tydens die vergadering Tony Mazzulli se Vitor Lima van die Werk Oorweg van Yothin Rakvongthai YU RIM LEE Oom Wen Fiona J. Gilbert se gesig Mona G. Blomme Kyk na Li abstrakte Federated learning setting (FL) is 'n metode wat gebruik word om kunsmatige intelligensie modelle met data van verskeie bronne te oefen, terwyl data anonimiteit gehou word, en sodoende baie hindernisse vir data-deling verwyder. Hier het ons data van 20 instellings regoor die wêreld gebruik om 'n FL-model, genaamd EXAM (elektroniese mediese rekords (EMR) borst X-ray AI-model), wat die toekomstige suurstofbehoeftes van simptomatiese pasiënte met COVID-19 voorspel, met behulp van inputs van vitale tekens, laboratoriumdata en borst X-rays. EXAM het 'n gemiddelde area onder die kurwe (AUC) >0.92 bereik vir die voorspelling van uitkomste by 24 en 72 uur vanaf die tyd van die aanvank hoof The scientific, academic, medical and data science communities have come together in the face of the COVID-19 pandemic crisis to rapidly assess novel paradigms in artificial intelligence (AI) that are rapid and secure, and potentially incentivize data sharing and model training and testing without the usual privacy and data ownership hurdles of conventional collaborations , Gesondheidsorgverskaffers, navorsers en die bedryf het hul fokus om onvoldoende en kritieke kliniese behoeftes wat deur die krisis geskep is, met opvallende resultate aan te spreek. , , , , , , Kliniese proef werwing is versnel en vergemaklik deur nasionale regulerende organe en 'n internasionale samewerking gees , , Die data-analise en AI-dissiplines het altyd oop en samewerkende benaderings bevorder, wat konsepte soos open-source sagteware, herhaalbare navorsing, data repositories omvat en geanonimiseerde datasette openbaar beskikbaar maak. , Die pandemie het die behoefte beklemtoon om vinnig data-kolaborasies te voer wat die kliniese en wetenskaplike gemeenskappe in staat stel om te reageer op vinnig ontwikkelende en wydverspreide globale uitdagings. , , . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 'N konkrete voorbeeld van hierdie soorte samewerking is ons vorige werk op 'n AI-gebaseerde SARS-COV-2 kliniese besluitondersteuning (CDS) model. Hierdie CDS model is ontwikkel by Mass General Brigham (MGB) en is gevalideer oor verskeie gesondheidstelsels data. , , , . CXR was selected as the imaging input because it is widely available and commonly indicated by guidelines such as those provided by ACR Die Fleischner Vereniging Die WHO Nasionale thoraciese samelewings , national health ministry COVID handbooks and radiology societies across the world Die uitvoer van die CDS model was 'n score, genoem CORISK , that corresponds to oxygen support requirements and that could aid in triaging patients by frontline clinicians , , Gesondheidsorgverskaffers is bekend om modelle te verkies wat op hul eie data gevalideer is. Tot dusver is die meeste AI-modelle, insluitend die voorheen genoemde CDS-model, opgelei en gevalideer op 'smal' data wat dikwels mangel aan diversiteit het. , , wat potensieel lei tot oormontering en laer algemeneerbaarheid. Dit kan verminder word deur opleiding met verskillende data van verskeie plekke sonder dat data gecentraliseer word using methods such as transfer learning , of FL. FL is 'n metode wat gebruik word om AI-modelle op verskillende databronne op te lei, sonder dat die data buite hul oorspronklike ligging vervoer of blootgestel word. . 18 19 20 21 22 23 24 25 26 27 28 29 30 27 31 32 33 34 35 36 Federated Learning ondersteun die vinnige begin van sentrale georkestreer eksperimente met verbeterde spoorbaarheid van data en evaluering van algoritmiese veranderinge en impak Een benadering tot FL, kliënt-bediener genoem, stuur 'n 'onopgeleide' model na ander bedieners ("nodes") wat gedeeltelike opleiding take uitvoer, wat op sy beurt die resultate terugstuur om in die sentrale ("federale") bediener saamgesmelt te word. . 37 36 Die bestuur van data vir FL word plaaslik gehandhaaf, wat privaatheidskwessies verlig, met slegs model gewigte of gradiente wat tussen kliëntwebwerwe en die federale bediener kommunikeer word. , FL het reeds belofte getoon in onlangse mediese beeldvormingtoepassings , , , , including in COVID-19 analysis , , . A notable example is a mortality prediction model in patients infected with SARS-COV-2 that uses clinical features, albeit limited in terms of number of modalities and scale . 38 39 40 41 42 43 8 44 45 46 Ons doel was om 'n robuuste, algemeneerbare model te ontwikkel wat kan help in die triering van pasiënte. Ons het getoon dat die CDS-model suksesvol kan federasie, aangesien dit gebruik kan word van data-inputs wat relatief algemeen in kliniese praktyk is en wat nie grootliks afhanklik is van operator-afhanklike beoordelings van die pasiënt se toestand (soos kliniese indrukke of gerapporteerde simptome). In plaas daarvan, laboratorium resultate, vitale tekens, 'n beeldstudies en 'n algemeen gevang demografiese (dit is, ouderdom) gebruik is. Ons het dus die CDS-model met verskeie data met behulp van 'n kliënt-server FL benadering om 'n nuwe globale FL-model te ontwikkel, wat EXAM Ons hipotese was dat EXAM beter sal uitvoer as plaaslike modelle en beter oor gesondheidsorgstelsels sal algemeneer. Resultate Die eksam model argitektuur Die EXAM model is gebaseer op die CDS model wat hierbo genoem is . In total, 20 features (19 from the EMR and one CXR) were used as input to the model. The outcome (that is, ‘ground truth’) labels were assigned based on patient oxygen therapy after 24- and 72-hour periods from initial admission to the emergency department (ED). A detailed list of the requested features and outcomes can be seen in Table . 27 1 Die output etikette van pasiënte is ingestel op 0, 0,25, 0,50 en 0,75 afhangende van die mees intensiewe suurstofterapie wat die pasiënt in die voorspelling venster ontvang het. Die suurstofterapie kategorieë was, respectievelik, kamer lug (RA), lae vloei van suurstof (LFO), hoë vloei van suurstof (HFO)/noninvasiewe ventilasie (NIV) of meganiese ventilasie (MV). As die pasiënt in die voorspelling venster gesterf het, is die uitslag etikette ingestel op 1. Vir EMR-funksies is slegs die eerste waardes wat in die ED gevang is, gebruik en data voorverwerking het deidentifisering, ontbrekende waarde-imputasie en normalisering na nul-gemiddelde en eenheidsvariansie ingesluit. Die model fusioneer dus inligting van beide EMR- en CXR-kenmerke, met behulp van 'n 34-lae konvolusionele neurale netwerk (ResNet34) om kenmerke uit 'n CXR- en 'n Deep & Cross-netwerk te onttrek om die kenmerke saam met die EMR-kenmerke te verbinten (vir meer uitgebreide besonderhede, sien Die modeluitvoer is 'n risiko-scoring, die EXAM-scoring genoem, wat 'n voortdurende waarde in die reeks 0-1 is vir elkeen van die 24 en 72 uur-voorspellings wat ooreenstem met die etiket wat hierbo beskryf is. Die metodes Federasie van die model Die EXAM-model is opgelei met behulp van 'n kohorte van 16,148 gevalle, wat dit nie net een van die eerste FL-modelle vir COVID-19 maak nie, maar ook 'n baie groot en multikontinentale ontwikkelingsprojek in klinies relevante AI (Fig. Gegevens tussen plekke is nie geharmoniseer voor die uitvinding en, in die lig van real-life kliniese informatika omstandighede, 'n gedetailleerde harmonisering van die data-invoer is nie uitgevoer deur die skrywers (Fig. die 1a,b 1C en D , World map indicating the 20 different client sites contributing to the EXAM study. , Aantal gevalle wat deur elke instelling of webwerf bygedra word (kliënt 1 verteenwoordig die webwerf wat die grootste aantal gevalle bygedra het). , Chest X-ray intensiteit verspreiding by elke kliënt site. , Age of patients at each client site, showing minimum and maximum ages (asterisks), mean age (triangles) and standard deviation (horizontal bars). The number of samples of each client site is shown in Supplementary Table . a b c d 1 Ons het plaaslik opgeleide modelle met die globale FL-model op elke kliënt se toetsdata vergelyk. « 1 × 10–3, Wilcoxon onderteken-rang toets) van 16% (soos gedefinieer deur die gemiddelde AUC wanneer die model op die ooreenkomstige plaaslike toets stel: van 0,795 tot 0,920, of 12.5 persentasiepunte) (Figuur. ). It also resulted in 38% generalizability improvement (as defined by average AUC when running the model on all test sets: from 0.667 to 0.920, or 25.3 percentage points) of the best global model for prediction of 24-h oxygen treatment compared with models trained only on a site’s own data (Fig. ). For the prediction results of 72-h oxygen treatment, the best global model training resulted in an average performance improvement of 18% compared to locally trained models, while generalizability of the global model improved on average by 34% (Extended Data Fig. ). The stability of our results was validated by repeating three runs of local and FL training on different randomized data splits. P 2a 2b 1 , Performance on each client’s test set in prediction of 24-h oxygen treatment for models trained on local data only (Local) versus that of the best global model available on the server (FL (gl. best)). Av., average test performance across all sites. , Generalizability (gemiddelde prestasie op ander webwerwe se toetsdata, soos verteenwoordig deur die gemiddelde AUC) as 'n funksie van 'n kliënt se dataset grootte (geen gevalle). Die groen horizontale lyn verteenwoordig die generalizability prestasie van die beste globale model. ) en kliënt 14 het slegs gevalle met RA-behandeling gehad, sodat die evalueringsmetrieke (af AUC) in geen van hierdie gevalle toepasbaar was nie ( ). Data for client 14 were also excluded from computation of average generalizability in local models. a b 1 Die metodes Local models that were trained using unbalanced cohorts (for example, mostly mild cases of COVID-19) markedly benefited from the FL approach, with a substantial improvement in prediction average AUC performance for categories with only a few cases. This was evident at client site 16 (an unbalanced dataset), with most patients experiencing mild disease severity and with only a few severe cases. The FL model achieved a higher true-positive rate for the two positive (severe) cases and a markedly lower false-positive rate compared to the local model, both shown in the receiver operating characteristic (ROC) plots and confusion matrices (Fig. Uitgebreide data Fig. ). More important, the generalizability of the FL model was considerably increased over the locally trained model. 3a 2 , ROC by kliënt site 16, met ongebalanseerde data en meestal milde gevalle. , ROC of the local model at client site 12 (a small dataset), mean ROC of models trained on larger datasets corresponding to the five client sites in the Boston area (1, 4, 5, 6, 8) and ROC of the best global model in prediction of 72-h oxygen treatment for different thresholds of EXAM score (left, middle, right). The mean ROC is calculated based on five locally trained models while the gray area denotes the ROC standard deviation. ROCs for three different cutoff values ( ) of the EXAM risk score are shown. Pos and neg denote the number of positive and negative cases, respectively, as defined by this range of EXAM score. a b t In the case of client sites with relatively small datasets, the best FL model markedly outperformed not only the local model but also those trained on larger datasets from five client sites in the Boston area of the USA (Fig. ). 3b Die globale model het goed gegaan in die voorspelling van suurstofbehoefte by 24/72 uur by pasiënte wat beide COVID positief en negatief was (Extended Data Fig. ). 3 Validation at independent sites Following initial training, EXAM was subsequently tested at three independent validation sites: Cooley Dickinson Hospital (CDH), Martha’s Vineyard Hospital (MVH) and Nantucket Cottage Hospital (NCH), all in Massachusetts, USA. The model was not retrained at these sites and it was used only for validation purposes. The cohort size and model inference results are summarized in Table , and the ROC curves and confusion matrices for the largest dataset (from CDH) are shown in Fig. . The operating point was set to discriminate between nonmechanical ventilation and mechanical ventilation (MV) treatment (or death). The FL global trained model, EXAM, achieved an average AUC of 0.944 and 0.924 for 24- and 72-h prediction tasks, respectively (Table ), which exceeded the average performance among sites used in training EXAM. For prediction of MV treatment (or death) at 24 h, EXAM achieved a sensitivity of 0.950 and specificity of 0.882 at CDH, and a sensitivity of 1.000 specificity of 0.934 at MVH. NCH did not have any cases with MV/death at 24 h. In regard to 72-h MV prediction, EXAM achieved a sensitivity of 0.929 and specificity of 0.880 at CDH, sensitivity of 1.000 and specificity of 0.976 at MVH and sensitivity of 1.000 and specificity of 0.929 at NCH. 2 4 2 , , Performance (ROC) (top) en verwarring matrices (bottom) van die EXAM FL model op die CDH dataset vir die voorspelling van suurstofbehoefte by 24 h ( • 7 dae ( Verwys na drie verskillende verwysings ( ) of the EXAM risk score are shown. a b a b t For MV at CDH at 72 h, EXAM had a low false-negative rate of 7.1%. Representative failure cases are presented in Extended Data Fig. , wat twee vals-negatiewe gevalle van CDH toon waar een geval baie ontbrekende EMR-data eienskappe gehad het en die ander 'n CXR met 'n bewegingsartefak en 'n paar ontbrekende EMR funksies gehad het. 4 Use of differential privacy A primary motivation for healthcare institutes to use FL is to preserve the security and privacy of their data, as well as adherence to data compliance measures. For FL, there remains the potential risk of model ‘inversion’ or even the reconstruction of training images from the model gradients themselves . To counter these risks, security-enhancing measures were used to mitigate risk in the event of data ‘interception’ during site-server communication . We experimented with techniques to avoid interception of FL data, and added a security feature that we believe could encourage more institutions to use FL. We thus validated previous findings showing that partial weight sharing, and other differential privacy techniques, can successfully be applied in FL . Through investigation of a partial weight-sharing scheme , , , we showed that models can reach a comparable performance even when only 25% of weight updates are shared (Extended Data Fig. die 47 48 49 50 50 51 52 5 Discussion This study features a large, real-world healthcare FL study in terms of number of sites and number of data points used. We believe that it provides a powerful proof-of-concept of the feasibility of using FL for fast and collaborative development of needed AI models in healthcare. Our study involved multiple sites across four continents and under the oversight of different regulatory bodies, and thus holds the promise of being provided to different regulated markets in an expedited way. The global FL model, EXAM, proved to be more robust and achieved better results at individual sites than any model trained on only local data. We believe that consistent improvement was achieved owing to a larger, but also a more diverse, dataset, the use of data inputs that can be standardized and avoidance of clinical impressions/reported symptoms. These factors played an important part in increasing the benefits from this FL approach and its impact on performance, generalizability and, ultimately, the model’s usability. For a client site with a relatively small dataset, two typical approaches could be used for fitting a useful model: one is to train locally with its own data, the other is to apply a model trained on a larger dataset. For sites with small datasets, it would have been virtually impossible to build a performant deep learning model using only their local data. The finding, that these two approaches were outperformed on all three prediction tasks by the global FL model, indicates that the benefit for client sites with small datasets arising from participation in FL collaborations is substantial. This is probaby a reflection of FL’s ability to capture more diversity than local training, and to mitigate the bias present in models trained on a homogenous population. An under-represented population or age group in one hospital/region might be highly represented in another region—such as children who might be differentially affected by COVID-19, including disease manifestations in lung imaging . 46 The validation results confirmed that the global model is robust, supporting our hypothesis that FL-trained models are generalizable across healthcare systems. They provide a compelling case for the use of predictive algorithms in COVID-19 patient care, and the use of FL in model creation and testing. By participating in this study the client sites received access to EXAM, to be further validated ahead of pursuing any regulatory approval or future introduction into clinical care. Plans are under way to validate EXAM prospectively in ‘production’ settings at MGB leveraging COVID-19 targeted resources , as well as at different sites that were not a part of the EXAM training. 53 Meer as 200 voorspellingsmodelle om besluitneming by pasiënte met COVID-19 te ondersteun, is gepubliseer . Unlike the majority of publications focused on diagnosis of COVID-19 or prediction of mortality, we predicted oxygen requirements that have implications for patient management. We also used cases with unknown SARS-COV-2 status, and so the model could provide input to the physician ahead of receiving a result for PCR with reverse transcription (RT–PCR), making it useful for a real-life clinical setting. The model’s imaging input is used in common practice, in contrast with models that use chest computed tomography, a nonconsensual diagnostic modality. The model’s design was constrained to objective predictors, unlike many published studies that leveraged subjective clinical impressions. The data collected reflect varied incidence rates, and thus the ‘population momentum’ we encountered is more diverse. This implies that the algorithm can be useful in populations with different incidence rates. 19 Patient cohort identification and data harmonization are not novel issues in research and data science , but are further complicated, when using FL, given the lack of visibility on other sites’ datasets. Improvements to clinical information systems are needed to streamline data preparation, leading to better leverage of a network of sites participating in FL. This, in conjunction with hyperparameter engineering, can allow algorithms to ‘learn’ more effectively from larger data batches and adapt model parameters to a particular site for further personalization—for example, through further fine-tuning on that site . A system that would allow seamless, close-to real-time model inference and results processing would also be of benefit and would ‘close the loop’ from training to model deployment. 54 39 Because data were not centralized they are not readily accessible. Given that, any future analysis of the results, beyond what was derived and collected, is limited. Soos ander masjienlerende modelle, is EXAM beperk deur die gehalte van die opleiding data. Institusie wat belangstel in die implementering van hierdie algoritme vir kliniese sorg moet die potensiële vooroordele in die opleiding te verstaan. Byvoorbeeld, die etikette wat gebruik word as grond waarheid in die opleiding van die EXAM model is afgelei van 24 en 72 uur suurstof verbruik in die pasiënt; dit word veronderstel dat die suurstof wat aan die pasiënt gelewer word, gelyk aan die suurstof behoefte. Since our data access was limited, we did not have sufficient available information for the generation of detailed statistics regarding failure causes, post hoc, at most sites. However, we did study failure cases from the largest independent test site, CDH, and were able to generate hypotheses that we can test in the future. For high-performing sites, it seems that most failure cases fall into one of two categories: (1) low quality of input data—for example, missing data or motion artifact in CXR; or (2) out-of-distribution data—for example a very young patient. In future, we also intend to investigate the potential for a ‘population drift’ due to different phases of disease progression. We believe that, owing to the diversity across the 20 sites, this risk may have been mitigated. Een kenmerk wat hierdie soorte groot-skale samewerking sal verbeter, is die vermoë om die bydrae van elke kliëntwebwerf te voorspel tot die verbetering van die globale FL-model.Dit sal help in kliëntwebwerfkeuse, en in die priorisasie van data-aankrywing en anotasie pogings. Future approaches may incorporate automated hyperparameter searching , neural architecture search and other automated machine learning approaches to find the optimal training parameters for each client site more efficiently. 55 56 57 Known issues of batch normalization (BN) in FL het ons gemotiveer om ons basismodel vir beeldfunksie-uittreksel te herstel to reduce the divergence between unbalanced client sites. Future work might explore different types of normalization techniques to allow the training of AI models in FL more effectively when client data are nonindependent and identically distributed. 58 49 Recent works on privacy attacks within the FL setting have raised concerns on data leakage during model training Ondertussen bly beskermingsalgoritmes onbenut en beperk deur verskeie faktore. , , show good protection, they may weaken the model’s performance. Encryption algorithms, such as homomorphic encryption 'N Kwantifiseerbare manier om privaatheid te meet, sal beter keuses toelaat vir die besluit van die minimum privaatheidsparameters wat nodig is, terwyl klinies aanvaarbare prestasie gehandhaaf word. , , . 59 36 48 49 60 36 48 49 Following further validation, we envision deployment of the EXAM model in the ED setting as a way to evaluate risk at both the per-patient and population level, and to provide clinicians with an additional reference point when making the frequently difficult task of triaging patients. We also envision using the model as a more sensitive population-level metric to help balance resources between regions, hospitals and departments. Our hope is that similar FL efforts can break the data silos and allow for faster development of much-needed AI models in the near future. Die metodes Ethics approval All procedures were conducted in accordance with the principles for human experimentation as defined in the Declaration of Helsinki and International Conference on Harmonization Good Clinical Practice guidelines, and were approved by the relevant institutional review boards at the following validation sites: CDH, MVH, NCH and at the following training sites: MGB, Mass General Hospital (MGH), Brigham and Women’s Hospital, Newton-Wellesley Hospital, North Shore Medical Center and Faulkner Hospital (all eight of these hospitals were covered under MGB’s ethics board reference, no. 2020P002673, and informed consent was waived by the instititional review board (IRB). Similarly, participation of the remaining sites was approved by their respective relevant institutional review processes: Children’s National Hospital in Washington, DC (no. 00014310, IRB certified exempt); NIHR Cambridge Biomedical Research Centre (no. 20/SW/0140, informed consent waived); The Self-Defense Forces Central Hospital in Tokyo (no. 02-014, informed consent waived); National Taiwan University MeDA Lab and MAHC and Taiwan National Health Insurance Administration (no. 202108026 W, informed consent waived); Tri-Service General Hospital in Taiwan (no. B202105136, informed consent waived); Kyungpook National University Hospital in South Korea (no. KNUH 2020-05-022, informed consent waived); Faculty of Medicine, Chulalongkorn University in Thailand (nos. 490/63, 291/63, informed consent waived); Diagnosticos da America SA in Brazil (no. 26118819.3.0000.5505, informed consent waived); University of California, San Francisco (no. 20-30447, informed consent waived); VA San Diego (no. H200086, IRB certified exempt); University of Toronto (no. 20-0162-C, informed consent waived); National Institutes of Health in Bethesda, Maryland (no. 12-CC-0075, informed consent waived); University of Wisconsin-Madison School of Medicine and Public Health (no. 2016-0418, informed consent waived); Memorial Sloan Kettering Cancer Center in New York (no. 20-194, informed consent waived); and Mount Sinai Health System in New York (no. IRB-20-03271, informed consent waived). MI-CLAIM guidelines for reporting of clinical AI models were followed (Supplementary Note ) 2 Studeer die setting The study included data from 20 institutions (Fig. ): MGB, MGH, Brigham and Women’s Hospital, Newton-Wellesley Hospital, North Shore Medical Center and Faulkner Hospital; Children’s National Hospital in Washington, DC; NIHR Cambridge Biomedical Research Centre; The Self-Defense Forces Central Hospital in Tokyo; National Taiwan University MeDA Lab and MAHC and Taiwan National Health Insurance Administration; Tri-Service General Hospital in Taiwan; Kyungpook National University Hospital in South Korea; Faculty of Medicine, Chulalongkorn University in Thailand; Diagnosticos da America SA in Brazil; University of California, San Francisco; VA San Diego; University of Toronto; National Institutes of Health in Bethesda, Maryland; University of Wisconsin-Madison School of Medicine and Public Health; Memorial Sloan Kettering Cancer Center in New York; and Mount Sinai Health System in New York. Institutions were recruited between March and May 2020. Dataset curation started in June 2020 and the final data cohort was added in September 2020. Between August and October 2020, 140 independent FL runs were conducted to develop the EXAM model and, by the end of October 2020, EXAM was made public on NVIDIA NGC , , . Data from three independent sites were used for independent validation: CDH, MVH and NCH, all in Massachusetts, USA. These three hospitals had patient population characteristics different from the training sites. The data used for the algorithm validation consisted of patients admitted to the ED at these sites between March 2020 and February 2021, and that satisfied the same inclusion criteria of the data used to train the FL model. 1a 61 62 63 Data collection Die 20 kliëntwebwerwe het 'n totaal van 16,148 gevalle (beide positief en negatief) voorberei vir die doel van opleiding, validering en toetsing van die model (Fig. Mediese data is toeganklik in verband met pasiënte wat voldoen aan die kriteria vir insluiting van die studie. Kliëntwebwerwe het probeer om al die COVID-positiewe gevalle van die begin van die pandemie in Desember 2019 en totdat hulle plaaslike opleiding vir die EXAM-studie begin het. Alle plaaslike opleiding het by 30 September 2020 begin. Die webwerwe het ook ander pasiënte in dieselfde tydperk ingesluit met negatiewe RT-PCR-toetsresultate. Aangesien die meeste webwerwe meer SARS-COV-2-negatiewe as -positiewe pasiënte gehad het, het ons die aantal negatiewe pasiënte ingesluit tot ten hoogste 95% van die totale gevalle op elke kliëntwebwerf beperk. 1b A ‘case’ included a CXR and the requisite data inputs taken from the patient’s medical record. A breakdown of the cohort size of the dataset for each client site is shown in Fig. . The distribution and patterns of CXR image intensity (pixel values) varied greatly among sites owing to a multitude of patient- and site-specific factors, such as different device manufacturers and imaging protocols, as shown in Fig. Die ouderdom van die pasiënt en die verspreiding van EMR-kenmerke het grootliks tussen plekke verskil, soos verwag word as gevolg van verskillende demografieë tussen wêreldwyd verspreide hospitale (Extended Data Fig. die 1b 1C en D 6 Kriteria vir die insluiting van pasiënte Patient inclusion criteria were: (1) patient presented to the hospital’s ED or equivalent; (2) patient had a RT–PCR test performed at any time between presentation to the ED and discharge from the hospital; (3) patient had a CXR in the ED; and (4) patient’s record had at least five of the EMR values detailed in Table Die CXR, laboratoriumresultate en vitale wat gebruik is, was die eerste wat beskikbaar was vir vang tijdens die besoek aan die ED. Die model het geen CXR, laboratoriumresultate of vitale wat verkry is nadat die ED verlaat is nie. 1 Model input In totaal, 21 EMR kenmerke is gebruik as input tot die model. die uitkoms (dit is, grond waarheid) etikette is toegewys op grond van pasiënt vereistes na 24 en 72 uur tydperke van die aanvanklike toelating tot die ED. 'n gedetailleerde lys van die gevraagde EMR kenmerke en uitkomste kan gesien word in Tabel . 1 The distribution of oxygen treatment using different devices at different client sites is shown in Extended Data Fig. , which details the device usage at admission to the ED and after 24- and 72-h periods. The difference in dataset distribution between the largest and smallest client sites can be seen in Extended Data Fig. . 7 8 The number of positive COVID-19 cases, as confirmed by a single RT–PCR test obtained at any time between presentation to the ED and discharge from the hospital, is listed in Supplementary Table . Each client site was asked to randomly split its dataset into three parts: 70% for training, 10% for validation and 20% for testing. For both 24- and 72-h outcome prediction models, random splits for each of the three repeated local and FL training and evaluation experiments were independently generated. 1 Die ontwikkeling van die eksamenmodel Daar is wye variasie in die kliniese verloop van pasiënte wat in die hospitaal kom met simptome van COVID-19, met sommige wat 'n vinnige verslechtering in respiratoriese funksie ervaar wat verskillende intervensies benodig om hipoksemia te voorkom of te verlig. , . A critical decision made during the evaluation of a patient at the initial point of care, or in the ED, is whether the patient is likely to require more invasive or resource-limited countermeasures or interventions (such as MV or monoclonal antibodies), and should therefore receive a scarce but effective therapy, a therapy with a narrow risk–benefit ratio due to side effects or a higher level of care, such as admittance to the intensive care unit . In contrast, a patient who is at lower risk of requiring invasive oxygen therapy may be placed in a less intensive care setting such as a regular ward, or even released from the ED for continuing self-monitoring at home . EXAM was developed to help triage such patients. 62 63 64 65 Of note, the model is not approved by any regulatory agency at this time and it should be used only for research purposes. EXAM score EXAM was trained using FL; it outputs a risk score (termed EXAM score) similar to CORISK (Uitgewing van die data figuur. Dit stem ooreen met 'n pasiënt se suurstofondersteuningsbehoeftes binne twee vensters - 24 en 72 uur - na die aanvanklike aanbieding aan die ED. illustrates how CORISK and the EXAM score can be used for patient triage. 27 9a 9b Chest X-ray images were preprocessed to select the anterior position image and exclude lateral view images, and then scaled to a resolution of 224 × 224. As shown in Extended Data Fig. , die model fusieer inligting van beide EMR en CXR funksies (gebaseer op 'n gemodifiseerde ResNet34 met ruimtelike aandag vooraf opgelei op die CheXpert dataset) Die Deep & Cross Netwerk . To converge these different data types, a 512-dimensional feature vector was extracted from each CXR image using a pretrained ResNet34, with spatial attention, then concatenated with the EMR features as the input for the Deep & Cross network. The final output was a continuous value in the range 0–1 for both 24- and 72-h predictions, corresponding to the labels described above, as shown in Extended Data Fig. . We used cross-entropy as the loss function and ‘Adam’ as the optimizer. The model was implemented in Tensorflow using the NVIDIA Clara Train SDK . The average AUC for the classification tasks (≥LFO, ≥HFO/NIV or ≥MV) was calculated and used as the final evaluation metric, with normalization to zero-mean and unit variance. CXR images were preprocessed to select the correct series and exclude lateral view images, then scaled to a resolution of 224 × 224 (ref. ). 9a 66 67 68 9b 69 70 27 Funksies imputasie en normalisering A MissForest algorithm was used to impute EMR features, based on the local training dataset. If an EMR feature was completely missing from a client site dataset, the mean value of that feature, calculated exclusively on data from MGB client sites, was used. Then, EMR features were rescaled to zero-mean and unit variance based on statistics calculated on data from the MGB client sites. 71 besonderhede van EMR-CXR data fusie gebruik die Deep & Cross netwerk Om die interaksies van funksies van EMR- en CXR-data op die gevalvlak te model, is 'n diep-feature-skema gebruik wat gebaseer is op 'n Deep & Cross-netwerk-argitektuur. . Binary and categorical features for the EMR inputs, as well as 512-dimensional image features in the CXR, were transformed into fused dense vectors of real values by embedding and stacking layers. The transformed dense vectors served as input to the fusion framework, which specifically employed a crossing network to enforce fusion among input from different sources. The crossing network performed explicit feature crossing within its layers, by conducting inner products between the original input feature and output from the previous layer, thus increasing the degree of interaction across features. At the same time, two individual classic deep neural networks with several stacked, fully connected feed-forward layers were trained. The final output of our framework was then derived from the concatenation of both classic and crossing networks. 68 FL details Arguably the most established form of FL is implemention of the federated averaging algorithm as proposed by McMahan et al. , or variations thereof. This algorithm can be realized using a client-server setup where each participating site acts as a client. One can think of FL as a method aiming to minimize a global loss function by reducing a set of local loss functions, which are estimated at each site. By minimizing each client site’s local loss while also synchronizing the learned client site weights on a centralized aggregation server, one can minimize global loss without needing to access the entire dataset in a centralized location. Each client site learns locally, and shares model weight updates with a central server that aggregates contributions using secure sockets layer encryption and communication protocols. The server then sends an updated set of weights to each client site after aggregation, and sites resume training locally. The server and client site iterate back and forth until the model converges (Extended Data Fig. ). 72 9c A pseudoalgorithm of FL is shown in Supplementary Note . In our experiments, we set the number of federated rounds at = 200, with one local training epoch per round at each client. The number of clients, , was up to 20 depending on the network connectivity of clients or available data for a specific targeted outcome period (24 or 72 h). The number of local training iterations, , hang af van die grootte van die dataset by elke kliënt and is used to weigh each client’s contributions when aggregating the model weights in federated averaging. During the FL training task, each client site selects its best local model by tracking the model’s performance on its local validation set. At the same time, the server determines the best global model based on the average validation scores sent from each client site to the server after each FL round. After FL training finishes, the best local models and the best global model are automatically shared with all client sites and evaluated on their local test data. 1 T t K Die NK k When training on local data only (the baseline), we set the epoch number to 200. The Adam optimizer was used for both local training and FL with an initial learning rate of 5 × 10–5 and a stepwise learning rate decay with a factor 0.5 after every 40 epochs, which is important for the convergence of federated averaging . Random affine transformations, including rotation, translations, shear, scaling and random intensity noise and shifts, were applied to the images for data augmentation during training. 73 Owing to the sensitivity of BN layers when dealing with different clients in a nonindependent and identically distributed setting, we found the best model performance occurred when keeping the pretrained ResNet34 with spatial attention Die Deep & Cross-netwerk wat beeldkenmerke met EMR-kenmerke kombineer, bevat nie BN-lae nie en is dus nie beïnvloed deur BN-instabiliteitsprobleme nie. 58 47 In this study we investigated a privacy-preserving scheme that shares only partial model updates between server and client sites. The weight updates were ranked during each iteration by magnitude of contribution, and only a certain percentage of the largest weight updates was shared with the server. To be exact, weight updates (also known as gradients) were shared only if their absolute value was above a certain percentile threshold, (t) (Extended Data Fig. ), which was computed from all non-zero gradients, Δ , and could be different for each client In elke ronde . Variations of this scheme could include additional clipping of large gradients or differential privacy schemes wat willekeurige lawaai by die gradiente, of selfs aan die ruwe data, voeg voordat dit in die netwerk voer . k 5 Wêreldwêreld(t) k t 49 51 Statistiese analise We conducted a Wilcoxon signed-rank test to confirm the significance of the observed improvement in performance between the locally trained model and the FL model for the 24- and 72-h time points (Fig. Uitgebreide data Fig. ). The null hypothesis was rejected with one-sided 1 × 10–3 in beide gevalle. 2 1 P Die Pearson-korrelasie is gebruik om die generaliserbaarheid (robuustheid van die gemiddelde AUC-waarde na ander kliëntwebwerwe se toetsdata) van plaaslik opgeleide modelle in verband met hul plaaslike datasetgrootte te evalueer. = 0.43, = 0.035, degrees of freedom (df) = 17 for the 24-h model and Ek is 0,62 = 0.003, df = 16 for the 72-h model). This indicates that dataset size alone is not the only factor determining a model’s robustness to unseen data. r P r P Om ROC-kromme van die globale FL-model en plaaslike modelle wat op verskillende plekke opgelei is, te vergelyk (Extended Data Fig. ), ons bootstrapped 1000 monsters uit die data en bereken die resulterende AUCs. Ons bereken dan die verskil tussen die twee reeks en gestandaardiseer met die formule = (AUC1 – AUC2)/ , where is the standardized difference, is the standard deviation of the bootstrap differences and AUC1 and AUC2 are the corresponding bootstrapped AUC series. By comparing with normal distribution, we obtained the Werte geïllustreer in Aanvullende Tabel . The results show that the null hypothesis was rejected with very low values, indicating the statistical significance of the superiority of FL outcomes. The computation of waarde in R met die pROC-bibliotheek uitgevoer is . 3 D s D s D P 2 P P 74 Since the model predicts a discrete outcome, a continuous score from 0 to 1, a straightforward calibration evaluation such as a qqplot is not possible. Hence, for a quantified estimate of calibration we quantified discrimination (Extended Data Fig. ). We conducted one-way analysis of variation (ANOVA) tests to compare local and FL model scores among four ground truth categories (RA, LFO, HFO, MV). The -statistic, calculated as the variation between the sample means divided by variation within the samples and representing the degree of dispersion among different groups, was used to quantify the models. Our results show that the -values of five different local sites are 245.7, 253.4, 342.3, 389.8 and 634.8, while that of the FL model is 843.5. Given that larger -values mean that groups are more separable, the scores from our FL model clearly show a greater dispersion among the four ground truth categories. Furthermore, the value of the ANOVA test on the FL model is <2 × 10–16, indicating that the FL prediction scores are statistically significantly different among the different prediction classes. 10 F F F P Verslag van die samestelling Verdere inligting oor navorsing ontwerp is beskikbaar in die linked to this article. Natuur navorsing versameling Data availability The dataset from the 20 institutes that participated in this study remains under their custody. These data were used for training at each of the local sites and were not shared with any of the other participating institutions or with the federated server, and they are not publicly available. Data from the independent validation sites are maintained by CAMCA, and access can be requested by contacting Q.L. Based on determination by CAMCA, a data-sharing review and amendment of IRB for research purposes can be conducted by MGB research administration and in accordance with MGB IRB and policy. Code availability Al die kode en sagteware wat in hierdie studie gebruik word, is openbaar beskikbaar by NGC. Om toegang te verkry, in te log as 'n gas of 'n profiel te skep, voer dan een van die URL's hieronder in. Die opgeleide modelle, data voorbereiding riglyne, kode vir opleiding, validering van die model toets, readme lêer, installasie riglyne en lisensie lêers is openbaar beskikbaar by NVIDIA NGC : The federated learning software is available as part of the Clara Train SDK: . Alternatively, use this command to download the model “wget --content-disposition -O clara_train_covid19_exam_ehr_xray_1.zip”. 61 https://ngc.nvidia.com/catalog/models/nvidia:med:clara_train_covid19_exam_ehr_xray https://ngc.nvidia.com/catalog/containers/nvidia:clara-train-sdk https://api.ngc.nvidia.com/v2/models/nvidia/med/clara_train_covid19_exam_ehr_xray/versions/1/zip Verwysings Budd, J. et al. Digital technologies in the public-health response to COVID-19. , 1183–1192 (2020). Nat. Med. 26 Moorthy, V., Henao Restrepo, A. M., Preziosi, M.-P. & Swaminathan, S. Data sharing for novel coronavirus (COVID-19). , 150 (2020). Bull. World Health Organ. 98 Chen, Q., Allot, A. & Lu, Z. Keep up with the latest coronavirus research. , 193 (2020). Nature 579 Fabbri, F., Bhatia, A., Mayer, A., Schlotter, B. & Kaiser, J. BCG IT spend pulse: how COVID-19 is shifting tech priorities. (2020). https://www.bcg.com/publications/2020/how-covid-19-is-shifting-big-it-spend Candelon, F., Reichert, T., Duranton, S., di Carlo, R. C. & De Bondt, M. The rise of the AI-powered company in the postcrisis world. (2020). https://www.bcg.com/en-gb/publications/2020/business-applications-artificial-intelligence-post-covid Chao, H. et al. Integrative analysis for COVID-19 patient outcome prediction. , 101844 (2021). Med. Image Anal. 67 Zhu, X. et al. Joint prediction and time estimation of COVID-19 developing severe symptoms using chest CT scan. , 101824 (2021). Med. Image Anal. 67 Yang, D. et al. Federated semi-supervised learning for Covid region segmentation in chest ct using multi-national data from China, Italy, Japan. , 101992 (2021). Med. Image Anal. 70 Minaee, S., Kafieh, R., Sonka, M., Yazdani, S. & Jamalipour Soufi, G. Deep-COVID: predicting COVID-19 from chest X-ray images using deep transfer learning. , 101794 (2020). Med. Image Anal. 65 COVID-19 Studies from the World Health Organization Database. (2020). https://clinicaltrials.gov/ct2/who_table ACTIV. (2020). https://www.nih.gov/research-training/medical-research-initiatives/activ Coronavirus Treatment Acceleration Program (CTAP). US Food and Drug Administration (2020). https://www.fda.gov/drugs/coronavirus-covid-19-drugs/coronavirus-treatment-acceleration-program-ctap Gleeson, P., Davison, A. P., Silver, R. A. & Ascoli, G. A. A commitment to open source in neuroscience. , 964–965 (2017). Neuron 96 Piwowar, H. et al. The state of OA: a large-scale analysis of the prevalence and impact of open access articles. , e4375 (2018). PeerJ. 6 European Society of Radiology (ESR). What the radiologist should know about artificial intelligence – an ESR white paper. , 44 (2019). Insights Imaging 10 Pesapane, F., Codari, M. & Sardanelli, F. Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. , 35 (2018). Eur. Radiol. Exp. 2 Price, W. N. 2nd & Cohen, I. G. Privacy in the age of medical big data. , 37–43 (2019). Nat. Med. 25 Liang, W. et al. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. , 1081–1089 (2020). JAMA Intern. Med. 180 Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. , m1328 (2020). Brit. Med. J. 369 Zhang, L. et al. D-dimer levels on admission to predict in-hospital mortality in patients with Covid-19. , 1324–1329 (2020). J. Thromb. Haemost. 18 Sands, K. E. et al. Patient characteristics and admitting vital signs associated with coronavirus disease 2019 (COVID-19)-related mortality among patients admitted with noncritical illness. (2020). https://doi.org/10.1017/ice.2020.461 American College of Radiology. CR recommendations for the use of chest radiography and computed tomography (CT) for suspected COVID-19 infection. (2020). https://www.acr.org/Advocacy-and-Economics/ACR-Position-Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection Rubin, G. D. et al. The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the Fleischner Society. , 172–180 (2020). Radiology 296 World Health Organization. Use of chest imaging in COVID-19. (2020). https://www.who.int/publications/i/item/use-of-chest-imaging-in-covid-19 Jamil, S. et al. Diagnosis and management of COVID-19 disease. , 10 (2020). Am. J. Respir. Crit. Care Med. 201 Redmond, C. E., Nicolaou, S., Berger, F. H., Sheikh, A. M. & Patlas, M. N. Emergency radiology during the COVID-19 pandemic: The Canadian Association of Radiologists Recommendations for Practice. , 425–430 (2020). Can. Assoc. Radiologists J. 71 Buch, V. et al. Development and validation of a deep learning model for prediction of severe outcomes in suspected COVID-19 Infection. Preprint at (2021). https://arxiv.org/abs/2103.11269 Lyons, C. & Callaghan, M. The use of high-flow nasal oxygen in COVID-19. , 843–847 (2020). Anaesthesia 75 Whittle, J. S., Pavlov, I., Sacchetti, A. D., Atwood, C. & Rosenberg, M. S. Respiratory support for adult patients with COVID-19. , 95–101 (2020). J. Am. Coll. Emerg. Physicians Open 1 Ai, J., Li, Y., Zhou, X. & Zhang, W. COVID-19: treating and managing severe cases. , 370–371 (2020). Cell Res. 30 Esteva, A. et al. A guide to deep learning in healthcare. , 24–29 (2019). Nat. Med. 25 Cahan, E. M., Hernandez-Boussard, T., Thadaney-Israni, S. & Rubin, D. L. Putting the data before the algorithm in big data addressing personalized healthcare. , 78 (2019). NPJ Digit. Med. 2 Thrall, J. H. et al. Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. , 504–508 (2018). J. Am. Coll. Radiol. 15 Shilo, S., Rossman, H. & Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. , 29–38 (2020). Nat. Med. 26 Gao, Y. & Cui, Y. Deep transfer learning for reducing health care disparities arising from biomedical data inequality. , 5131 (2020). Nat. Commun. 11 Rieke, N. et al. The future of digital health with federated learning. , 119 (2020). NPJ Dig. Med. 3 Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. , 12 (2019). ACM Trans. Intell. Syst. Technol. 10 Ma, C. et al. On safeguarding privacy and security in the framework of federated learning. , 242–248 (2020). IEEE Netw. 34 Brisimi, T. S. et al. Federated learning of predictive models from federated Electronic Health Records. , 59–67 (2018). Int. J. Med. Inform. 112 Roth, H. R. et al. Federated learning for breast density classification: a real-world implementation. In , (eds. Albarqouni, S. et al.) Vol. 12,444, 181–191 (Springer International Publishing, 2020). Proc. Second MICCAI Workshop, DART 2020 and First MICCAI Workshop, DCL 2020 Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. , 12598 (2020). Sci. Rep. 10 Remedios, S. W., Butman, J. A., Landman, B. A. & Pham, D. L. in (eds Remedios, S. W. et al.) (Springer, 2020). Federated Gradient Averaging for Multi-Site Training with Momentum-Based Optimizers Xu, Y. et al. A collaborative online AI engine for CT-based COVID-19 diagnosis. Preprint at (2020). https://www.medrxiv.org/content/10.1101/2020.05.10.20096073v2 Raisaro, J. L. et al. SCOR: A secure international informatics infrastructure to investigate COVID-19. , 1721–1726 (2020). J. Am. Med. Inform. Assoc. 27 Vaid, A. et al. Federated learning of electronic health records to improve mortality prediction in hospitalized patients with COVID-19: machine learning approach. , e24207 (2021). JMIR Med. Inform. 9 Nino, G. et al. Pediatric lung imaging features of COVID-19: a systematic review and meta-analysis. , 252–263 (2021). Pediatr. Pulmonol. 56 Fredrikson, M., Jha, S. & Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures. In 1322–1333, (2015). Proc. 22nd ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/2810103.2813677 Zhu, L., Liu, Z. & Han, S. in (eds Wallach, H. et al.) 14774–14784 (Curran Associates, Inc., 2019). Advances in Neural Information Processing Systems 32 Kaissis, G. A., Makowski, M. R., Rückert, D. & Braren, R. F. Secure, privacy-preserving and federated machine learning in medical imaging. , 305–311 (2020). Nat. Mach. Intell. 2 Li, W. et al. in 133–141 (Springer, 2019). Privacy-Preserving Federated Brain Tumour Segmentation Shokri, R. & Shmatikov, V. Privacy-preserving deep learning. In (2015). Proc. 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton) https://doi.org/10.1109/allerton.2015.7447103 Li, X. et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. , 101765 (2020). Med. Image Anal. 65 Estiri, H. et al. Predicting COVID-19 mortality with electronic medical records. , 15 (2021). NPJ Dig. Med. 4 Jiang, G. et al. Harmonization of detailed clinical models with clinical study data standards. , 65–74 (2015). Methods Inf. Med. 54 Yang, D. et al. in . (2019). Searching Learning Strategy with Reinforcement Learning for 3D Medical Image Segmentation https://doi.org/10.1007/978-3-030-32245-8_1 Elsken, T., Metzen, J. H. & Hutter, F. Neural architecture search: a survey. , 1–21 (2019). J. Mach. Learning Res. 20 Yao, Q. et al. Taking human out of learning applications: a survey on automated machine learning. Preprint at (2019). https://arxiv.org/abs/1810.13306 Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In , PMLR , 448–456 (2015). Proc. 32nd International Conf. Machine Learning 37 Kaufman, S., Rosset, S. & Perlich, C. Leakage in data mining: formulation, detection, and avoidance. In , 556–563 (2011). Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Zhang, C. et al. BatchCrypt: efficient homomorphic encryption for cross-silo federated learning. In , 493–506 (2020). Proc. 2020 USENIX Annual Technical Conference, ATC 2020 . (2020). Nvidia NGC Catalog: COVID-19 Related Models https://ngc.nvidia.com/catalog/models?orderBy=scoreDESC&pageNumber=0&query=covid&quickFilter=models&filters Marini, J. J. & Gattinoni, L. Management of COVID-19 respiratory distress. , 2329–2330 (2020). JAMA 323 Cook, T. M. et al. Consensus guidelines for managing the airway in patients with COVID-19: Guidelines from the Difficult Airway Society, the Association of Anaesthetists the Intensive Care Society, the Faculty of Intensive Care Medicine and the Royal College of Anaesthetist. , 785–799 (2020). Anaesthesia 75 Galloway, J. B. et al. A clinical risk score to identify patients with COVID-19 at high risk of critical care admission or death: an observational cohort study. , 282–288 (2020). J. Infect. 81 Kilaru, A. S. et al. Return hospital admissions among 1419 COVID-19 patients discharged from five U.S. emergency departments. , 1039–1042 (2020). Acad. Emerg. Med. 27 He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In (2016). Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.90 Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. , 590–597 (2019). Proc. AAAI Conf. Artif. Intell. 33 Wang, R., Fu, B., Fu, G. & Wang, M. Deep & Cross network for Ad Click predictions. In Article no. 12 (2017). Proc. ADKDD’17 Abadi, M. et al. TensorFlow: asystem for large-scale machine learning. In , USENIX Association 265–283 (2016). 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) . (2020). NVIDIA Clara Imaging https://developer.nvidia.com/clara-medical-imaging Stekhoven, D. J. & Bühlmann, P. MissForest–non-parametric missing value imputation for mixed-type data. , 112–118 (2012). Bioinformatics 28 McMahan, H., Moore, E., Ramage, D., Hampson, S. & y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. (2017). http://proceedings.mlr.press/v54/mcmahan17a.html Hsieh, K., Phanishayee, A., Mutlu, O. & Gibbons, P. B. The non-IID data quagmire of decentralized machine learning. In PMLR 119 (2020). Proc. 37th International Conf. Machine Learning Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. , 77 (2011). BMC Bioinformatics 12 erkennings Die standpunte wat in hierdie studie uitgedruk word, is die van die skrywers en nie noodwendig die van die NHS, die NIHR, die Departement van Gesondheids- en Sosiale Sorg of enige van die organisasies wat met die skrywers verband hou. MGB bedank die volgende individue vir hul ondersteuning: J. Brink, Departement van Radiologie, Massachusetts Algemene Hospitaal, Harvard Mediese Skool, Boston, MA; M. Kalra, Departement van Radiologie, Massachusetts Algemene Hospitaal, Harvard Mediese Skool, Boston, MA; N. Neumark, Sentrum vir Kliniese Data Wetenskap, Massachusetts Algemene Brigham, Boston, MA; T. Schultz, Departement van Radiologie, Massachusetts Algemene Hospitaal, Boston, MA; N. Guo, Sentrum vir Geavanceerde deur die Fakulté van Geneeskunde, Chulalongkorn Universiteit bedank die Ratchadapisek Sompoch Endowment Fund RA (PO) (no. 001/63) vir die versameling en bestuur van COVID-19-verwante kliniese data en biologiese monsters vir die Navorsing Task Force, Fakulté van Geneeskunde, Chulalongkorn Universiteit. NIHR Cambridge Biomedical Research Centre bedank A. Priest, wat deur die NIHR (Cambridge Biomedical Research Centre by Cambridge University Hospitals NHS Foundation Trust) ondersteun word. Nasionale Taiwan Universiteit MeDA Lab en die MAHC en Taiwan Nasionale Gesondheidsversekering Administratie bedank die MOST Joint Research Center for AI technology, die All Vista Gesondheidsversekering Nasionale Administratie, Taiwan, die Ministerie https://data.ucsf.edu/covid19 This paper is under CC by 4.0 Deed (Attribution 4.0 International) license. available on nature Hierdie papier is onder CC by 4.0 Deed (Attribution 4.0 Internasionale) lisensie. beskikbaar in die natuur