Uma nova IA de privacidade prevê gravidade do COVID usando raios-X e registros médicos

Os autores: Dai Daião Por Holger R. Roth Aoxiao Zhong Ahmed Harouni Amilcare gentil Anas Z. Abidin André Liu Marcadores: Anthony Beardsworth Costa Direção Bradford J. Wood Cão-Sung Tsai O Chih-Hung Wang Chun-Nan Hsu Direção C. K. Lee Peiying Ruan Daguang Xu Dufão Wu Eddie Huang - Felipe Campos Kitamura Grêmio Lacey Gustavo César de Antônio Corradi Gustavo Nino Caminho do Shin Hirofumi Obinata Hui Ren Título: Jason C. Crane Jesse Tetreault Jiahui Guan por John W. Garrett JOSHUA D. KAGGIE Parque Jung Gil Keith Dreyer em inglês Krishna Juluru São Cristóvão Marcio Aloisio Bezerra Cavalcanti Rockenbach Marius George Linguraru Masoom A. Haider Meena AbdelMaseeh Nicolau Rico Pablo F. Damasceno Pedro Mario Cruz e Silva Cachoeira Wang Sheng Xu O Shuichi Kawano A Sra. Sriswasdi Soo Young Park Thomas M. Grist Varun Buch Almoço de Cachoeira Cachoeira Wang Jovem não ganha João Li Sênior Lin Jovem Joon Kwon Caminhada Quraini André Feng Direção: Andrew N. Priest Barreiras Turquesa Benjamim Glicksberg Bernardo Bizzo Conheça o Kim Carlos Tor-Díez Composição de Chia-Cheng Lee Chia-Jung Hsu Chin Lin Cui-Ling Lai Cristóvão P. Hess Colin Compas Profissão Deepeksha Bhatia Eric K. Oermann Evan Leibovitz Hisashi Sasaki Hitoshi morreu Isaque Yang Filho de Jae Ho Krishna Nand Keshava Morte Li-Chen Fu Matheus Ribeiro Furtado de Mendonça Mike Fralick Min Kyu Kang Adão Adão Natalie Gangai Peerapão Vateekul Pierre Elnajjar Sarah Hickman Sharmila Majumdar Shelley L. McLeod Sheridan Reed João Gráfico Estêvão Harmon Tatsuya Kodama Caminhão Putacaí Tony Mazzulli Vitor Lima do Trabalho Jovem Jovem Rainha Yu Rim Lee João Wen Fiona J. Gilbert Mona G. Flores Quântico Li Os autores: Dai Daião Por Holger R. Roth Aoxiao Zhong Ahmed Harouni Amilcare gentil Anas Z. Abidin André Liu Marcadores: Anthony Beardsworth Costa Direção Bradford J. Wood Cão-Sung Tsai O Chih-Hung Wang Chun-Nan Hsu Direção C. K. Lee Pequim Ruan Daguang Xu Dufão Wu Eddie Huang - Felipe Campos Kitamura Grêmio Lacey Gustavo César de Antônio Corradi Gustavo Nino Caminho do Shin Hirofumi Obinata Hui Ren Título: Jason C. Crane João Tetreault Jiahui Guan por John W. Garrett JOSHUA D. KAGGIE Parque Jung Gil Keith Dreyer em inglês Krishna Juluru São Cristóvão Marcio Aloisio Bezerra Cavalcanti Rockenbach Marius George Linguraru Masoom A. Haider Meena AbdelMaseeh Nicolau Rico Paulo F. Damasceno Pedro Mario Cruz e Silva Cachoeira Wang Sheng Xu O Shuichi Kawano A Sra. Sriswasdi Parque Jovem Thomas M. Grist Varun Buch Almoço de Cachoeira Cachoeira Wang Jovem não ganha João Li Sênior Lin Jovem Joon Kwon Caminhada Quraini André Feng Direção: Andrew N. Priest Barreiras Turquesa Benjamim Glicksberg Bernardo Bizzo Conheça o Kim Carlos Tor-Díez Composição de Chia-Cheng Lee Cia-Jung Hsu Chin Lin Cui-Ling Lai Cristóvão P. Hess Colin Compas Profissão Deepeksha Bhatia Com Eric K. Oermann Evan Leibovitz Hisashi Sasaki Hitoshi morreu Isaque Yang Filho de Jae Ho Krishna Nand Keshava Morte Li-Chen Fu Matheus Ribeiro Furtado de Mendonça Mike Fralick Min Kyu Kang Adão Adão Natalie Gangai Peerapão Vateekul Pierre Elnajjar por Sarah Hickman Sharmila Majumdar Shelley L. McLeod Sheridan Reed João Gráfico Estêvão Harmon Tatsuya Kodama Caminhão Putacaí Tônio Mazzuli Vitor Lima do Trabalho Jovem Jovem Rainha Yu Rim Lee João Wen Fiona J. Gilbert Mona G. Flores Quântico Li Abstração Federated Learning Setup (FL) é um método usado para treinar modelos de inteligência artificial com dados de várias fontes, mantendo o anonimato dos dados, removendo muitas barreiras à partilha de dados. Aqui usamos dados de 20 institutos em todo o mundo para treinar um modelo FL, chamado EXAM (Electronic Medical Record (EMR) chest X-ray AI model), que prevê as futuras necessidades de oxigênio dos pacientes sintomáticos com COVID-19 usando entradas de sinais vitais, dados de laboratório e raios-X torácicos. O EXAM alcançou uma área média sob a curva (AUC) > 0,92 para prever os resultados em 24 e 72 horas a partir do momento da apresentação inicial para a sala de emergência, e forneceu uma melhoria de 16% na AUC média medida em todos os locais participantes e um Principal As comunidades científicas, acadêmicas, médicas e de ciência de dados se uniram diante da crise pandêmica do COVID-19 para avaliar rapidamente novos paradigmas em inteligência artificial (IA) que são rápidos e seguros, e potencialmente incentivar o compartilhamento de dados e treinamento de modelos e testes sem as barreiras usuais de privacidade e propriedade de dados de colaborações convencionais. , Os prestadores de cuidados de saúde, pesquisadores e a indústria viraram o foco para atender às necessidades clínicas insatisfeitas e críticas criadas pela crise, com resultados notáveis. , , , , , , O recrutamento de ensaios clínicos foi acelerado e facilitado pelos organismos reguladores nacionais e por um espírito de cooperação internacional. , , As disciplinas de análise de dados e IA sempre promoveram abordagens abertas e colaborativas, abraçando conceitos como software de código aberto, pesquisa reproduzível, repositórios de dados e disponibilizando conjuntos de dados anônimos publicamente. , A pandemia enfatizou a necessidade de conduzir rapidamente colaborações de dados que capacitem as comunidades clínicas e científicas ao responder a desafios globais em rápida evolução e generalizados. compartilhamento de dados tem complexidades éticas, regulamentares e legais que são sublinhadas, e talvez um pouco complicadas, pela recente entrada de grandes empresas de tecnologia no mundo dos dados de saúde. , , . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Um exemplo concreto desses tipos de colaboração é o nosso trabalho anterior em um modelo de suporte a decisões clínicas (CDS) baseado em SARS-COV-2 baseado em IA. Este modelo de CDS foi desenvolvido em Mass General Brigham (MGB) e foi validado em vários dados de sistemas de saúde. As entradas para o modelo de CDS foram imagens de raios-X (CXR) de peito, sinais vitais, dados demográficos e valores de laboratório que foram mostrados em publicações anteriores para ser previsível dos resultados de pacientes com COVID-19 , , , . CXR was selected as the imaging input because it is widely available and commonly indicated by guidelines such as those provided by ACR A Sociedade Fleischner O que a OMS Sociedades Tóricas Nacionais , national health ministry COVID handbooks and radiology societies across the world A saída do modelo CDS foi uma pontuação, chamada CORISK , que corresponde às necessidades de suporte de oxigênio e que poderia ajudar na triagem de pacientes por clínicos da linha de frente , , Os prestadores de cuidados de saúde têm sido conhecidos por preferir modelos que foram validados com base em seus próprios dados Até à data, a maioria dos modelos de IA, incluindo o modelo CDS acima mencionado, foram treinados e validados com base em dados “estreitos” que muitas vezes carecem de diversidade. , , potencialmente resultando em overfitting e menor generalização. Isto pode ser mitigado por treinamento com dados diversos de vários sites sem centralização de dados Usando métodos como a aprendizagem de transferência , FL é um método usado para treinar modelos de IA em fontes de dados dispares, sem que os dados sejam transportados ou expostos fora de sua localização original. . 18 19 20 21 22 23 24 25 26 27 28 29 30 27 31 32 33 34 35 36 A aprendizagem federada suporta o lançamento rápido de experimentos centralmente orquestrados com melhor rastreabilidade de dados e avaliação de mudanças e impactos algorítmicos Uma abordagem ao FL, chamada cliente-servidor, envia um modelo “não treinado” para outros servidores (nódulos) que realizam tarefas de treinamento parcial, enviando os resultados de volta para serem fundidos no servidor central (federado). . 37 36 A governação de dados para a FL é mantida localmente, aliviando preocupações de privacidade, com apenas pesos de modelo ou gradientes comunicados entre os sites do cliente e o servidor federado , FL já mostrou promessa em aplicações recentes de imagem médica , , , Análise do COVID-19 , , Um exemplo notável é um modelo de previsão de mortalidade em pacientes infectados com SARS-COV-2 que usa características clínicas, embora limitadas em termos de número de modalidades e escala. . 38 39 40 41 42 43 8 44 45 46 Nosso objetivo era desenvolver um modelo robusto, generalizável que poderia ajudar em triagem de pacientes. Teorizamos que o modelo CDS pode ser federado com sucesso, dado o seu uso de entradas de dados que são relativamente comuns na prática clínica e que não dependem fortemente de avaliações dependentes do operador da condição do paciente (como impressões clínicas ou sintomas relatados). Em vez disso, foram usados resultados de laboratório, sinais vitais, um estudo de imagem e uma demografia comumente capturada (ou seja, idade). Re-treinamos, portanto, o modelo CDS com dados diversos usando uma abordagem FL cliente-servidor para desenvolver um novo modelo FL global, que foi nomeado EXAM, usando características de CXR e EMR como entradas. Ao alavancar o FL, as instituições participantes não teriam que transferir dados para um Nossa hipótese era que o EXAM funcionaria melhor do que os modelos locais e generalizaria melhor em todos os sistemas de saúde. Resultados Arquitetura Modelo Exame O modelo EXAM é baseado no modelo CDS mencionado acima . In total, 20 features (19 from the EMR and one CXR) were used as input to the model. The outcome (that is, ‘ground truth’) labels were assigned based on patient oxygen therapy after 24- and 72-hour periods from initial admission to the emergency department (ED). A detailed list of the requested features and outcomes can be seen in Table . 27 1 The outcome labels of patients were set to 0, 0.25, 0.50 and 0.75 depending on the most intensive oxygen therapy the patient received in the prediction window. The oxygen therapy categories were, respectively, room air (RA), low-flow oxygen (LFO), high-flow oxygen (HFO)/noninvasive ventilation (NIV) or mechanical ventilation (MV). If the patient died within the prediction window, the outcome label was set to 1. This resulted in each case being assigned two labels in the range 0–1, corresponding to each of the prediction windows (that is, 24 and 72 h). Para os recursos EMR, apenas os primeiros valores capturados no ED foram usados e o pré-processamento de dados incluiu desidentificação, imputação de valor perdido e normalização para zero-mean e variância de unidade. The model therefore fuses information from both EMR and CXR features, using a 34-layer convolutional neural network (ResNet34) to extract features from a CXR and a Deep & Cross network to concatenate the features together with the EMR features (for more expanded details, see A saída do modelo é uma pontuação de risco, denominada pontuação EXAM, que é um valor contínuo no intervalo de 0 a 1 para cada uma das previsões de 24 e 72 horas correspondentes às etiquetas descritas acima. Métodos Federação do Modelo O modelo EXAM foi treinado usando uma coorte de 16.148 casos, tornando-o não apenas entre os primeiros modelos FL para COVID-19 mas também um projeto de desenvolvimento muito grande e multicontinental em IA clinicamente relevante (Fig. ). Data between sites were not harmonized before extraction and, in light of real-life clinical informatics circumstances, a meticulous harmonization of the data input was not conducted by the authors (Fig. e) o 1a e B 1o C, D , World map indicating the 20 different client sites contributing to the EXAM study. , Number of cases contributed by each institution or site (client 1 represents the site contributing the largest number of cases). Distribuição de intensidade de raios-X no peito em cada local do cliente. , Idade dos pacientes em cada local do cliente, mostrando idades mínimas e máximas (asteríquios), idade média (triângulos) e desvio padrão (barras horizontais). . a b c d 1 Os dados de teste de cada cliente compararam os modelos treinados localmente com o modelo global FL. O treinamento do modelo através do FL resultou em uma melhoria significativa no desempenho ( « 1 × 10–3, teste assinado por Wilcoxon) de 16% (conforme definido pela AUC média ao executar o modelo nos respectivos conjuntos de teste locais: de 0,795 a 0,920, ou 12,5 pontos percentuais) (Fig. Também resultou em 38% de melhoria na generalização (definida pela AUC média ao executar o modelo em todos os conjuntos de teste: de 0,667 a 0,920, ou 25,3 pontos percentuais) do melhor modelo global para a previsão do tratamento de oxigênio 24 horas em comparação com modelos treinados apenas em dados próprios de um site (Fig. ). For the prediction results of 72-h oxygen treatment, the best global model training resulted in an average performance improvement of 18% compared to locally trained models, while generalizability of the global model improved on average by 34% (Extended Data Fig. A estabilidade dos nossos resultados foi validada pela repetição de três corridas de treinamento local e FL em diferentes divisões de dados aleatórios. P 2a 2b 1 , Desempenho em cada teste do cliente definido em previsão de tratamento de oxigênio de 24 horas para modelos treinados com dados locais apenas (Local) versus o melhor modelo global disponível no servidor (FL (ver melhor). , Generalizability (average performance on other sites’ test data, as represented by average AUC) as a function of a client’s dataset size (no. of cases). The green horizontal line denotes the generalizability performance of the best global model. The performance for 18 of 20 clients is shown, because client 12 had outcomes only for 72-h oxygen (Extended Data Fig. ) e o cliente 14 tinha casos apenas com tratamento de RA, de modo que a métrica de avaliação (av. AUC) não era aplicável em nenhum desses casos ( Os dados para o cliente 14 também foram excluídos do cálculo da generalização média em modelos locais. a b 1 Métodos Modelos locais que foram treinados usando coortes desequilibradas (por exemplo, principalmente casos leves de COVID-19) beneficiaram significativamente da abordagem FL, com uma melhora substancial no desempenho da AUC média de previsão para categorias com apenas alguns casos. Isto foi evidente no site do cliente 16 (um conjunto de dados desequilibrado), com a maioria dos pacientes experimentando severidade da doença leve e com apenas alguns casos graves. O modelo FL alcançou uma taxa verdadeiramente positiva maior para os dois casos positivos (severos) e uma taxa falso-positiva significativamente menor em comparação com o modelo local, ambos mostrados nas parcelas de características operacionais do receptor (ROC) e matrizes de confusão (Fig. and Extended Data Fig. Mais importante, a generalização do modelo FL foi consideravelmente aumentada sobre o modelo treinado localmente. 3a 2 , ROC at client site 16, with unbalanced data and mostly mild cases. , ROC of the local model at client site 12 (a small dataset), mean ROC of models trained on larger datasets corresponding to the five client sites in the Boston area (1, 4, 5, 6, 8) and ROC of the best global model in prediction of 72-h oxygen treatment for different thresholds of EXAM score (left, middle, right). The mean ROC is calculated based on five locally trained models while the gray area denotes the ROC standard deviation. ROCs for three different cutoff values ( ) of the EXAM risk score are shown. Pos and neg denote the number of positive and negative cases, respectively, as defined by this range of EXAM score. a b t In the case of client sites with relatively small datasets, the best FL model markedly outperformed not only the local model but also those trained on larger datasets from five client sites in the Boston area of the USA (Fig. e) o 3b The global model performed well in predicting oxygen needs at 24/72 h in patients both COVID positive and negative (Extended Data Fig. ). 3 Validation at independent sites Following initial training, EXAM was subsequently tested at three independent validation sites: Cooley Dickinson Hospital (CDH), Martha’s Vineyard Hospital (MVH) and Nantucket Cottage Hospital (NCH), all in Massachusetts, USA. The model was not retrained at these sites and it was used only for validation purposes. The cohort size and model inference results are summarized in Table , and the ROC curves and confusion matrices for the largest dataset (from CDH) are shown in Fig. O ponto de operação foi definido para diferenciar entre ventilação não mecânica e ventilação mecânica (MV) tratamento (ou morte). O modelo FL global treinado, EXAM, alcançou uma AUC média de 0,944 e 0,924 para tarefas de previsão de 24 e 72 horas, respectivamente (Tabela). ), which exceeded the average performance among sites used in training EXAM. For prediction of MV treatment (or death) at 24 h, EXAM achieved a sensitivity of 0.950 and specificity of 0.882 at CDH, and a sensitivity of 1.000 specificity of 0.934 at MVH. NCH did not have any cases with MV/death at 24 h. In regard to 72-h MV prediction, EXAM achieved a sensitivity of 0.929 and specificity of 0.880 at CDH, sensitivity of 1.000 and specificity of 0.976 at MVH and sensitivity of 1.000 and specificity of 0.929 at NCH. 2 4 2 , , Desempenho (ROC) (top) e matrizes de confusão (baixo) do modelo EXAM FL no conjunto de dados CDH para previsão da necessidade de oxigênio em 24 h ( ) and 72 h ( ). ROCs for three different cutoff values ( ) da pontuação de risco do exame são mostrados. a b a b t Para o MV no CDH às 72 h, o EXAM teve uma baixa taxa de falso-negativo de 7,1%. , showing two false-negative cases from CDH where one case had many missing EMR data features and the other had a CXR with a motion artifact and some missing EMR features. 4 Uso da privacidade diferenciada A primary motivation for healthcare institutes to use FL is to preserve the security and privacy of their data, as well as adherence to data compliance measures. For FL, there remains the potential risk of model ‘inversion’ or even the reconstruction of training images from the model gradients themselves . To counter these risks, security-enhancing measures were used to mitigate risk in the event of data ‘interception’ during site-server communication Experimentamos com técnicas para evitar a interceptação de dados FL, e adicionamos um recurso de segurança que acreditamos que poderia incentivar mais instituições a usar FL. Assim, validamos descobertas anteriores mostrando que a partilha de peso, e outras técnicas de privacidade diferencial, podem ser aplicadas com sucesso em FL. . Through investigation of a partial weight-sharing scheme , , , we showed that models can reach a comparable performance even when only 25% of weight updates are shared (Extended Data Fig. e) o 47 48 49 50 50 51 52 5 Discussão This study features a large, real-world healthcare FL study in terms of number of sites and number of data points used. We believe that it provides a powerful proof-of-concept of the feasibility of using FL for fast and collaborative development of needed AI models in healthcare. Our study involved multiple sites across four continents and under the oversight of different regulatory bodies, and thus holds the promise of being provided to different regulated markets in an expedited way. The global FL model, EXAM, proved to be more robust and achieved better results at individual sites than any model trained on only local data. We believe that consistent improvement was achieved owing to a larger, but also a more diverse, dataset, the use of data inputs that can be standardized and avoidance of clinical impressions/reported symptoms. These factors played an important part in increasing the benefits from this FL approach and its impact on performance, generalizability and, ultimately, the model’s usability. For a client site with a relatively small dataset, two typical approaches could be used for fitting a useful model: one is to train locally with its own data, the other is to apply a model trained on a larger dataset. For sites with small datasets, it would have been virtually impossible to build a performant deep learning model using only their local data. The finding, that these two approaches were outperformed on all three prediction tasks by the global FL model, indicates that the benefit for client sites with small datasets arising from participation in FL collaborations is substantial. This is probaby a reflection of FL’s ability to capture more diversity than local training, and to mitigate the bias present in models trained on a homogenous population. An under-represented population or age group in one hospital/region might be highly represented in another region—such as children who might be differentially affected by COVID-19, including disease manifestations in lung imaging . 46 Os resultados da validação confirmaram que o modelo global é robusto, apoiando a nossa hipótese de que os modelos treinados com FL são generalizáveis em todos os sistemas de saúde. Eles fornecem um caso convincente para o uso de algoritmos preditivos no atendimento ao paciente com COVID-19 e o uso de FL na criação de modelos e testes. Ao participar neste estudo, os sites dos clientes receberam acesso ao EXAM, para ser mais validado antes de prosseguir qualquer aprovação regulatória ou futura introdução em cuidados clínicos. , as well as at different sites that were not a part of the EXAM training. 53 Over 200 prediction models to support decision-making in patients with COVID-19 have been published Ao contrário da maioria das publicações focadas no diagnóstico do COVID-19 ou na previsão da mortalidade, previamos os requisitos de oxigênio que têm implicações para o gerenciamento do paciente. Também usamos casos com status desconhecido de SARS-COV-2, e assim o modelo poderia fornecer informações ao médico antes de receber um resultado para PCR com transcrição reversa (RT-PCR), tornando-o útil para um ambiente clínico real. A entrada de imagem do modelo é usada na prática comum, em contraste com os modelos que usam tomografia computacional torácica, uma modalidade de diagnóstico não consensual. O design do modelo foi restrito a preditores objetivos, ao contrário de muitos estudos publicados que aproveitaram impressões clínicas subjetivas. Os dados coletados refletem taxas de 19 Patient cohort identification and data harmonization are not novel issues in research and data science , but are further complicated, when using FL, given the lack of visibility on other sites’ datasets. Improvements to clinical information systems are needed to streamline data preparation, leading to better leverage of a network of sites participating in FL. This, in conjunction with hyperparameter engineering, can allow algorithms to ‘learn’ more effectively from larger data batches and adapt model parameters to a particular site for further personalization—for example, through further fine-tuning on that site Um sistema que permitisse a inferência de modelo sem problemas e em tempo real e o processamento de resultados também seria benéfico e “fecharia o loop” do treinamento à implantação de modelos. 54 39 Como os dados não foram centralizados, eles não são facilmente acessíveis.Dado isso, qualquer análise futura dos resultados, além do que foi derivado e coletado, é limitada. Semelhante a outros modelos de aprendizagem de máquina, o EXAM é limitado pela qualidade dos dados de treinamento. As instituições interessadas em implantar este algoritmo para cuidados clínicos precisam entender possíveis preconceitos no treinamento. Por exemplo, os rótulos usados como verdade de base no treinamento do modelo EXAM foram derivados do consumo de oxigênio de 24 e 72 horas no paciente; pressupõe-se que o oxigênio entregue ao paciente equivale à necessidade de oxigênio. No entanto, na fase inicial da pandemia COVID-19, muitos pacientes receberam oxigênio de alto fluxo de forma profilática, independentemente de sua necessidade de oxigênio. Since our data access was limited, we did not have sufficient available information for the generation of detailed statistics regarding failure causes, post hoc, at most sites. However, we did study failure cases from the largest independent test site, CDH, and were able to generate hypotheses that we can test in the future. For high-performing sites, it seems that most failure cases fall into one of two categories: (1) low quality of input data—for example, missing data or motion artifact in CXR; or (2) out-of-distribution data—for example a very young patient. No futuro, pretendemos também investigar o potencial de uma “drift populacional” devido a diferentes fases da progressão da doença.Acreditamos que, devido à diversidade entre os 20 locais, este risco pode ter sido mitigado. A feature that would enhance these kinds of large-scale collaboration is the ability to predict the contribution of each client site towards improving the global FL model. This will help in client site selection, and in prioritization of data acquisition and annotation efforts. The latter is especially important given the high costs and difficult logistics of these large-consortia endeavors, and it will enable these endeavors to capture diversity rather than the sheer quantity of data samples. Futuras abordagens podem incorporar pesquisa automática de hiperparâmetros , neural architecture search and other automated machine learning approaches to find the optimal training parameters for each client site more efficiently. 55 56 57 Known issues of batch normalization (BN) in FL motivated us to fix our base model for image feature extraction O trabalho futuro pode explorar diferentes tipos de técnicas de normalização para permitir o treinamento de modelos de IA em FL de forma mais eficaz quando os dados do cliente não são independentes e distribuídos de forma idêntica. 58 49 Recent works on privacy attacks within the FL setting have raised concerns on data leakage during model training Enquanto isso, os algoritmos de proteção permanecem inexplorados e limitados por múltiplos fatores. , , mostrar boa proteção, eles podem enfraquecer o desempenho do modelo. algoritmos de criptografia, como a criptografia homomórfica , maintain performance but may substantially increase message size and training time. A quantifiable way to measure privacy would allow better choices for deciding the minimal privacy parameters necessary while maintaining clinically acceptable performance , , . 59 36 48 49 60 36 48 49 Following further validation, we envision deployment of the EXAM model in the ED setting as a way to evaluate risk at both the per-patient and population level, and to provide clinicians with an additional reference point when making the frequently difficult task of triaging patients. We also envision using the model as a more sensitive population-level metric to help balance resources between regions, hospitals and departments. Our hope is that similar FL efforts can break the data silos and allow for faster development of much-needed AI models in the near future. Métodos Ethics approval All procedures were conducted in accordance with the principles for human experimentation as defined in the Declaration of Helsinki and International Conference on Harmonization Good Clinical Practice guidelines, and were approved by the relevant institutional review boards at the following validation sites: CDH, MVH, NCH and at the following training sites: MGB, Mass General Hospital (MGH), Brigham and Women’s Hospital, Newton-Wellesley Hospital, North Shore Medical Center and Faulkner Hospital (all eight of these hospitals were covered under MGB’s ethics board reference, no. 2020P002673, and informed consent was waived by the instititional review board (IRB). Similarly, participation of the remaining sites was approved by their respective relevant institutional review processes: Children’s National Hospital in Washington, DC (no. 00014310, IRB certified exempt); NIHR Cambridge Biomedical Research Centre (no. 20/SW/0140, informed consent waived); The Self-Defense Forces Central Hospital in Tokyo (no. 02-014, informed consent waived); National Taiwan University MeDA Lab and MAHC and Taiwan National Health Insurance Administration (no. 202108026 W, informed consent waived); Tri-Service General Hospital in Taiwan (no. B202105136, informed consent waived); Kyungpook National University Hospital in South Korea (no. KNUH 2020-05-022, informed consent waived); Faculty of Medicine, Chulalongkorn University in Thailand (nos. 490/63, 291/63, informed consent waived); Diagnosticos da America SA in Brazil (no. 26118819.3.0000.5505, informed consent waived); University of California, San Francisco (no. 20-30447, informed consent waived); VA San Diego (no. H200086, IRB certified exempt); University of Toronto (no. 20-0162-C, informed consent waived); National Institutes of Health in Bethesda, Maryland (no. 12-CC-0075, informed consent waived); University of Wisconsin-Madison School of Medicine and Public Health (no. 2016-0418, informed consent waived); Memorial Sloan Kettering Cancer Center in New York (no. 20-194, informed consent waived); and Mount Sinai Health System in New York (no. IRB-20-03271, informed consent waived). MI-CLAIM guidelines for reporting of clinical AI models were followed (Supplementary Note ) 2 Study setting The study included data from 20 institutions (Fig. ): MGB, MGH, Brigham and Women’s Hospital, Newton-Wellesley Hospital, North Shore Medical Center e Faulkner Hospital; Hospital Nacional de Crianças em Washington, DC; NIHR Cambridge Biomedical Research Center; O Hospital Central das Forças de Autodefesa em Tóquio; National Taiwan University MeDA Lab e MAHC e Taiwan National Health Insurance Administration; Tri-Service General Hospital em Taiwan; Kyungpook National University Hospital na Coreia do Sul; Faculdade de Medicina, Chulalongkorn University na Tailândia; Diagnosticos da América SA no Brasil; Universidade da Califórnia, San Francisco; VA San Diego; Universidade de Toronto; Institutos Nacionais de Saúde em Bethesda, Maryland; Universidade de Wisconsin-Madison School of Medicine and Public Health; Memorial Sloan Kettering , , Os dados de três locais independentes foram usados para validação independente: CDH, MVH e NCH, todos em Massachusetts, EUA. Esses três hospitais tinham características populacionais de pacientes diferentes dos locais de treinamento. Os dados usados para a validação do algoritmo consistiram em pacientes admitidos para a ED nesses locais entre março de 2020 e fevereiro de 2021, e que satisfazem os mesmos critérios de inclusão dos dados usados para treinar o modelo FL. 1a 61 62 63 Data collection The 20 client sites prepared a total of 16,148 cases (both positive and negative) for the purposes of training, validation and testing of the model (Fig. ). Medical data were accessed in relation to patients who satisfied the study inclusion criteria. Client sites strived to include all COVID-positive cases from the beginning of the pandemic in December 2019 and up to the time they started local training for the EXAM study. All local training had started by 30 September 2020. The sites also included other patients in the same period with negative RT–PCR test results. Since most of the sites had more SARS-COV-2-negative than -positive patients, we limited the number of negative patients included to, at most, 95% of the total cases at each client site. 1b Um “caso” incluiu um CXR e as entradas de dados necessárias retiradas do registro médico do paciente. Uma decomposição do tamanho da coorte do conjunto de dados para cada site do cliente é mostrada na Figura. A distribuição e os padrões da intensidade da imagem CXR (valores de pixels) variaram muito entre os locais devido a uma multidão de fatores específicos do paciente e do local, como diferentes fabricantes de dispositivos e protocolos de imagem, como mostrado na Figura. A idade do paciente e a distribuição da EMR variaram muito entre os locais, como esperado devido às diferenças demográficas entre os hospitais distribuídos globalmente (Fig. ). 1B 1c,d 6 Patient inclusion criteria Patient inclusion criteria were: (1) patient presented to the hospital’s ED or equivalent; (2) patient had a RT–PCR test performed at any time between presentation to the ED and discharge from the hospital; (3) patient had a CXR in the ED; and (4) patient’s record had at least five of the EMR values detailed in Table , todos obtidos na ED, e os resultados relevantes capturados durante a hospitalização. de nota, o CXR, resultados de laboratório e vitais usados foram os primeiros disponíveis para captura durante a visita à ED. O modelo não incorporou qualquer CXR, resultados de laboratório ou vitais adquiridos após deixar a ED. 1 Model input In total, 21 EMR features were used as input to the model. The outcome (that is, ground truth) labels were assigned based on patient requirements after 24- and 72-h periods from initial admission to the ED. A detailed list of the requested EMR features and outcomes can be seen in Table . 1 The distribution of oxygen treatment using different devices at different client sites is shown in Extended Data Fig. , que detalha o uso do dispositivo na admissão ao ED e após os períodos de 24 e 72 horas.A diferença na distribuição dos conjuntos de dados entre os maiores e os menores sites de clientes pode ser vista na Figura de Dados Estendidos. . 7 8 The number of positive COVID-19 cases, as confirmed by a single RT–PCR test obtained at any time between presentation to the ED and discharge from the hospital, is listed in Supplementary Table . Each client site was asked to randomly split its dataset into three parts: 70% for training, 10% for validation and 20% for testing. For both 24- and 72-h outcome prediction models, random splits for each of the three repeated local and FL training and evaluation experiments were independently generated. 1 Exame de Desenvolvimento Modelo Há uma ampla variação no curso clínico de pacientes que se apresentam ao hospital com sintomas de COVID-19, com alguns experimentando deterioração rápida na função respiratória que requerem diferentes intervenções para prevenir ou mitigar a hipoxemia. , . A critical decision made during the evaluation of a patient at the initial point of care, or in the ED, is whether the patient is likely to require more invasive or resource-limited countermeasures or interventions (such as MV or monoclonal antibodies), and should therefore receive a scarce but effective therapy, a therapy with a narrow risk–benefit ratio due to side effects or a higher level of care, such as admittance to the intensive care unit . In contrast, a patient who is at lower risk of requiring invasive oxygen therapy may be placed in a less intensive care setting such as a regular ward, or even released from the ED for continuing self-monitoring at home O EXAM foi desenvolvido para ajudar a triagem desses pacientes. 62 63 64 65 O modelo não é aprovado por nenhuma agência reguladora neste momento e deve ser usado apenas para fins de pesquisa. Exame de pontuação EXAM foi treinado usando FL; ele produz uma pontuação de risco (conhecida como pontuação EXAM) semelhante ao CORISK (Extended Data Fig. ) and can be used in the same way to triage patients. It corresponds to a patient’s oxygen support requirements within two windows—24 and 72 h—after initial presentation to the ED. Extended Data Fig. illustrates how CORISK and the EXAM score can be used for patient triage. 27 9a 9b As imagens de raios-X do peito foram pré-processadas para selecionar a imagem de posição anterior e excluir as imagens de visualização lateral, e depois escaladas para uma resolução de 224 × 224. , the model fuses information from both EMR and CXR features (based on a modified ResNet34 with spatial attention pretrained on the CheXpert dataset) and the Deep & Cross network Para convergir esses diferentes tipos de dados, um vetor de características 512-dimensional foi extraído de cada imagem CXR usando uma ResNet34 pré-treinada, com atenção espacial, então concatenado com as características EMR como a entrada para a rede Deep & Cross. A saída final foi um valor contínuo na faixa 0-1 para as previsões de 24 e 72 horas, correspondente às etiquetas descritas acima, como mostrado na Figura de Dados Estendidos. Usamos a entropia cruzada como função de perda e “Adam” como o otimizador. using the NVIDIA Clara Train SDK . The average AUC for the classification tasks (≥LFO, ≥HFO/NIV or ≥MV) was calculated and used as the final evaluation metric, with normalization to zero-mean and unit variance. CXR images were preprocessed to select the correct series and exclude lateral view images, then scaled to a resolution of 224 × 224 (ref. ). 9a 66 67 68 9b 69 70 27 Feature imputation and normalization A MissForest algorithm was used to impute EMR features, based on the local training dataset. If an EMR feature was completely missing from a client site dataset, the mean value of that feature, calculated exclusively on data from MGB client sites, was used. Then, EMR features were rescaled to zero-mean and unit variance based on statistics calculated on data from the MGB client sites. 71 Details of EMR–CXR data fusion using the Deep & Cross network Para modelar as interações de recursos de dados EMR e CXR no nível do caso, um esquema de função profunda foi usado com base em uma arquitetura de rede Deep & Cross. . Binary and categorical features for the EMR inputs, as well as 512-dimensional image features in the CXR, were transformed into fused dense vectors of real values by embedding and stacking layers. The transformed dense vectors served as input to the fusion framework, which specifically employed a crossing network to enforce fusion among input from different sources. The crossing network performed explicit feature crossing within its layers, by conducting inner products between the original input feature and output from the previous layer, thus increasing the degree of interaction across features. At the same time, two individual classic deep neural networks with several stacked, fully connected feed-forward layers were trained. The final output of our framework was then derived from the concatenation of both classic and crossing networks. 68 FL Detalhes Arguably the most established form of FL is implemention of the federated averaging algorithm as proposed by McMahan et al. , ou variações dele. Este algoritmo pode ser realizado usando uma configuração cliente-servidor onde cada site participante atua como um cliente. Pode-se pensar em FL como um método destinado a minimizar uma função de perda global reduzindo um conjunto de funções de perda local, que são estimadas em cada site. Ao minimizar a perda local de cada site cliente enquanto também sincronizando os pesos do site cliente aprendidos em um servidor de agregação centralizada, pode-se minimizar a perda global sem ter que acessar todo o conjunto de dados em um local centralizado. Cada site cliente aprende localmente e compartilha atualizações de peso do modelo com um servidor central que agrega contribuições usando protocolos de criptografia de camada de socket seguro e comunicação. O servidor, em seguida, envia um conjunto atualizado de pesos para cada site cliente após a agregação, ). 72 9C Um pseudo-algoritmo de FL é mostrado na Nota Suplementar Em nossos experimentos, definimos o número de rodadas federadas em = 200, with one local training epoch per round Para cada cliente, o número de clientes, , foi até 20 dependendo da conectividade de rede dos clientes ou dos dados disponíveis para um período de resultados específico (24 ou 72 horas). , depende do tamanho do conjunto de dados em cada cliente and is used to weigh each client’s contributions when aggregating the model weights in federated averaging. During the FL training task, each client site selects its best local model by tracking the model’s performance on its local validation set. At the same time, the server determines the best global model based on the average validation scores sent from each client site to the server after each FL round. After FL training finishes, the best local models and the best global model are automatically shared with all client sites and evaluated on their local test data. 1 T t K NÃO k O otimizador Adam foi usado para treinamento local e FL com uma taxa de aprendizagem inicial de 5 × 10-5 e uma taxa de aprendizagem gradual de decadência com um fator 0,5 após cada 40 épocas, o que é importante para a convergência da média federada. . Random affine transformations, including rotation, translations, shear, scaling and random intensity noise and shifts, were applied to the images for data augmentation during training. 73 Devido à sensibilidade das camadas BN Quando lidamos com diferentes clientes em um ambiente não independente e idênticamente distribuído, descobrimos que o melhor desempenho do modelo ocorreu quando mantemos o ResNet34 pré-treinado com atenção espacial. parameters fixed during FL training (that is, using a learning rate of zero for those layers). The Deep & Cross network that combines image features with EMR features does not contain BN layers and hence was not affected by BN instability issues. 58 47 In this study we investigated a privacy-preserving scheme that shares only partial model updates between server and client sites. The weight updates were ranked during each iteration by magnitude of contribution, and only a certain percentage of the largest weight updates was shared with the server. To be exact, weight updates (also known as gradients) were shared only if their absolute value was above a certain percentile threshold, (t) (Extended Data Fig. ), which was computed from all non-zero gradients, Δ Pode ser diferente para cada cliente Em cada rodada Variações deste esquema podem incluir cortes adicionais de grandes gradientes ou esquemas de privacidade diferenciais. que adicionam ruído aleatório aos gradientes, ou mesmo aos dados brutos, antes de se alimentar na rede . k 5 Cidade(t) k t 49 51 Statistical analysis We conducted a Wilcoxon signed-rank test to confirm the significance of the observed improvement in performance between the locally trained model and the FL model for the 24- and 72-h time points (Fig. and Extended Data Fig. ). The null hypothesis was rejected with one-sided « 1 × 10–3 in both cases. 2 1 P A correlação de Pearson foi usada para avaliar a generalização (robustez do valor médio da AUC para os dados de teste de outros sites de cliente) dos modelos treinados localmente em relação ao tamanho do conjunto de dados local respectivo. = 0.43, = 0.035, degrees of freedom (df) = 17 for the 24-h model and São 0,62, = 0.003, df = 16 for the 72-h model). This indicates that dataset size alone is not the only factor determining a model’s robustness to unseen data. r P r P To compare ROC curves from the global FL model and local models trained at different sites (Extended Data Fig. ), iniciamos 1.000 amostras dos dados e calculamos as AUCs resultantes.Calculamos a diferença entre as duas séries e padronizamos usando a fórmula = (AUC1 – AUC2)/ , where É a diferença padronizada, is the standard deviation of the bootstrap differences and AUC1 and AUC2 are the corresponding bootstrapped AUC series. By comparing with normal distribution, we obtained the Valores ilustrados na Tabela Complementar . The results show that the null hypothesis was rejected with very low values, indicating the statistical significance of the superiority of FL outcomes. The computation of values was conducted in R with the pROC library . 3 D s D s D P 2 P P 74 Since the model predicts a discrete outcome, a continuous score from 0 to 1, a straightforward calibration evaluation such as a qqplot is not possible. Hence, for a quantified estimate of calibration we quantified discrimination (Extended Data Fig. ). We conducted one-way analysis of variation (ANOVA) tests to compare local and FL model scores among four ground truth categories (RA, LFO, HFO, MV). The - estatística, calculada como a variação entre os meios da amostra dividida por variação dentro das amostras e representando o grau de dispersão entre os diferentes grupos, foi usada para quantificar os modelos. -values of five different local sites are 245.7, 253.4, 342.3, 389.8 and 634.8, while that of the FL model is 843.5. Given that larger -Os valores significam que os grupos são mais separáveis, as pontuações do nosso modelo FL mostram claramente uma maior dispersão entre as quatro categorias de verdade básica. O valor do teste ANOVA no modelo FL é <2 × 10–16, indicando que as pontuações de previsão FL são estatisticamente significativamente diferentes entre as diferentes classes de previsão. 10 F F F P Relatório Resumo Further information on research design is available in the ligado a este artigo. Nature Research Reporting Summary Disponibilidade de dados O conjunto de dados dos 20 institutos que participaram neste estudo permanece sob a sua custódia. Esses dados foram usados para treinamento em cada um dos locais locais e não foram compartilhados com nenhuma das outras instituições participantes ou com o servidor federado, e eles não estão disponíveis publicamente. Dados dos sites de validação independentes são mantidos pela CAMCA, e o acesso pode ser solicitado contactando a Q.L. Com base na determinação pela CAMCA, uma revisão de compartilhamento de dados e alteração do IRB para fins de pesquisa pode ser realizada pela administração de pesquisa da MGB e de acordo com o IRB e a política da MGB. Código Disponibilidade All code and software used in this study are publicly available at NGC. To access, log in as a guest or create a profile then enter one of the URLs below. The trained models, data preparation guidelines, code for training, validating testing of the model, readme file, installation guideline and license files are publicly available at NVIDIA NGC A: The federated learning software is available as part of the Clara Train SDK: . Alternatively, use this command to download the model “wget --content-disposition -O clara_train_covid19_exam_ehr_xray_1.zip”. 61 https://ngc.nvidia.com/catalog/models/nvidia:med:clara_train_covid19_exam_ehr_xray https://ngc.nvidia.com/catalog/containers/nvidia:clara-train-sdk https://api.ngc.nvidia.com/v2/models/nvidia/med/clara_train_covid19_exam_ehr_xray/versions/1/zip References Budd, J. et al. Digital technologies in the public-health response to COVID-19. , 1183–1192 (2020). Nat. Med. 26 Moorthy, V., Henao Restrepo, A. M., Preziosi, M.-P. & Swaminathan, S. Data sharing for novel coronavirus (COVID-19). , 150 (2020). Bull. World Health Organ. 98 Chen, Q., Allot, A. & Lu, Z. Keep up with the latest coronavirus research. , 193 (2020). Nature 579 Fabbri, F., Bhatia, A., Mayer, A., Schlotter, B. & Kaiser, J. BCG IT spend pulse: how COVID-19 is shifting tech priorities. (2020). https://www.bcg.com/publications/2020/how-covid-19-is-shifting-big-it-spend Candelon, F., Reichert, T., Duranton, S., di Carlo, R. C. & De Bondt, M. The rise of the AI-powered company in the postcrisis world. (2020). https://www.bcg.com/en-gb/publications/2020/business-applications-artificial-intelligence-post-covid Chao, H. et al. Integrative analysis for COVID-19 patient outcome prediction. , 101844 (2021). Med. Image Anal. 67 Zhu, X. et al. Joint prediction and time estimation of COVID-19 developing severe symptoms using chest CT scan. , 101824 (2021). Med. Image Anal. 67 Yang, D. et al. Federated semi-supervised learning for Covid region segmentation in chest ct using multi-national data from China, Italy, Japan. , 101992 (2021). Med. Image Anal. 70 Minaee, S., Kafieh, R., Sonka, M., Yazdani, S. & Jamalipour Soufi, G. Deep-COVID: predicting COVID-19 from chest X-ray images using deep transfer learning. , 101794 (2020). Med. Image Anal. 65 COVID-19 Studies from the World Health Organization Database. (2020). https://clinicaltrials.gov/ct2/who_table ACTIV. (2020). https://www.nih.gov/research-training/medical-research-initiatives/activ Coronavirus Treatment Acceleration Program (CTAP). US Food and Drug Administration (2020). https://www.fda.gov/drugs/coronavirus-covid-19-drugs/coronavirus-treatment-acceleration-program-ctap Gleeson, P., Davison, A. P., Silver, R. A. & Ascoli, G. A. A commitment to open source in neuroscience. , 964–965 (2017). Neuron 96 Piwowar, H. et al. The state of OA: a large-scale analysis of the prevalence and impact of open access articles. , e4375 (2018). PeerJ. 6 European Society of Radiology (ESR). What the radiologist should know about artificial intelligence – an ESR white paper. , 44 (2019). Insights Imaging 10 Pesapane, F., Codari, M. & Sardanelli, F. Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. , 35 (2018). Eur. Radiol. Exp. 2 Price, W. N. 2nd & Cohen, I. G. Privacy in the age of medical big data. , 37–43 (2019). Nat. Med. 25 Liang, W. et al. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. , 1081–1089 (2020). JAMA Intern. Med. 180 Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. , m1328 (2020). Brit. Med. J. 369 Zhang, L. et al. D-dimer levels on admission to predict in-hospital mortality in patients with Covid-19. , 1324–1329 (2020). J. Thromb. Haemost. 18 Sands, K. E. et al. Patient characteristics and admitting vital signs associated with coronavirus disease 2019 (COVID-19)-related mortality among patients admitted with noncritical illness. (2020). https://doi.org/10.1017/ice.2020.461 American College of Radiology. CR recommendations for the use of chest radiography and computed tomography (CT) for suspected COVID-19 infection. (2020). https://www.acr.org/Advocacy-and-Economics/ACR-Position-Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection Rubin, G. D. et al. The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the Fleischner Society. , 172–180 (2020). Radiology 296 World Health Organization. Use of chest imaging in COVID-19. (2020). https://www.who.int/publications/i/item/use-of-chest-imaging-in-covid-19 Jamil, S. et al. Diagnosis and management of COVID-19 disease. , 10 (2020). Am. J. Respir. Crit. Care Med. 201 Redmond, C. E., Nicolaou, S., Berger, F. H., Sheikh, A. M. & Patlas, M. N. Emergency radiology during the COVID-19 pandemic: The Canadian Association of Radiologists Recommendations for Practice. , 425–430 (2020). Can. Assoc. Radiologists J. 71 Buch, V. et al. Development and validation of a deep learning model for prediction of severe outcomes in suspected COVID-19 Infection. Preprint at (2021). https://arxiv.org/abs/2103.11269 Lyons, C. & Callaghan, M. The use of high-flow nasal oxygen in COVID-19. , 843–847 (2020). Anaesthesia 75 Whittle, J. S., Pavlov, I., Sacchetti, A. D., Atwood, C. & Rosenberg, M. S. Respiratory support for adult patients with COVID-19. , 95–101 (2020). J. Am. Coll. Emerg. Physicians Open 1 Ai, J., Li, Y., Zhou, X. & Zhang, W. COVID-19: treating and managing severe cases. , 370–371 (2020). Cell Res. 30 Esteva, A. et al. A guide to deep learning in healthcare. , 24–29 (2019). Nat. Med. 25 Cahan, E. M., Hernandez-Boussard, T., Thadaney-Israni, S. & Rubin, D. L. Putting the data before the algorithm in big data addressing personalized healthcare. , 78 (2019). NPJ Digit. Med. 2 Thrall, J. H. et al. Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. , 504–508 (2018). J. Am. Coll. Radiol. 15 Shilo, S., Rossman, H. & Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. , 29–38 (2020). Nat. Med. 26 Gao, Y. & Cui, Y. Deep transfer learning for reducing health care disparities arising from biomedical data inequality. , 5131 (2020). Nat. Commun. 11 Rieke, N. et al. The future of digital health with federated learning. , 119 (2020). NPJ Dig. Med. 3 Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. , 12 (2019). ACM Trans. Intell. Syst. Technol. 10 Ma, C. et al. On safeguarding privacy and security in the framework of federated learning. , 242–248 (2020). IEEE Netw. 34 Brisimi, T. S. et al. Federated learning of predictive models from federated Electronic Health Records. , 59–67 (2018). Int. J. Med. Inform. 112 Roth, H. R. et al. Federated learning for breast density classification: a real-world implementation. In , (eds. Albarqouni, S. et al.) Vol. 12,444, 181–191 (Springer International Publishing, 2020). Proc. Second MICCAI Workshop, DART 2020 and First MICCAI Workshop, DCL 2020 Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. , 12598 (2020). Sci. Rep. 10 Remedios, S. W., Butman, J. A., Landman, B. A. & Pham, D. L. in (eds Remedios, S. W. et al.) (Springer, 2020). Federated Gradient Averaging for Multi-Site Training with Momentum-Based Optimizers Xu, Y. et al. A collaborative online AI engine for CT-based COVID-19 diagnosis. Preprint at (2020). https://www.medrxiv.org/content/10.1101/2020.05.10.20096073v2 Raisaro, J. L. et al. SCOR: A secure international informatics infrastructure to investigate COVID-19. , 1721–1726 (2020). J. Am. Med. Inform. Assoc. 27 Vaid, A. et al. Federated learning of electronic health records to improve mortality prediction in hospitalized patients with COVID-19: machine learning approach. , e24207 (2021). JMIR Med. Inform. 9 Nino, G. et al. Pediatric lung imaging features of COVID-19: a systematic review and meta-analysis. , 252–263 (2021). Pediatr. Pulmonol. 56 Fredrikson, M., Jha, S. & Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures. In 1322–1333, (2015). Proc. 22nd ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/2810103.2813677 Zhu, L., Liu, Z. & Han, S. in (eds Wallach, H. et al.) 14774–14784 (Curran Associates, Inc., 2019). Advances in Neural Information Processing Systems 32 Kaissis, G. A., Makowski, M. R., Rückert, D. & Braren, R. F. Secure, privacy-preserving and federated machine learning in medical imaging. , 305–311 (2020). Nat. Mach. Intell. 2 Li, W. et al. in 133–141 (Springer, 2019). Privacy-Preserving Federated Brain Tumour Segmentation Shokri, R. & Shmatikov, V. Privacy-preserving deep learning. In (2015). Proc. 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton) https://doi.org/10.1109/allerton.2015.7447103 Li, X. et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. , 101765 (2020). Med. Image Anal. 65 Estiri, H. et al. Predicting COVID-19 mortality with electronic medical records. , 15 (2021). NPJ Dig. Med. 4 Jiang, G. et al. Harmonization of detailed clinical models with clinical study data standards. , 65–74 (2015). Methods Inf. Med. 54 Yang, D. et al. in . (2019). Searching Learning Strategy with Reinforcement Learning for 3D Medical Image Segmentation https://doi.org/10.1007/978-3-030-32245-8_1 Elsken, T., Metzen, J. H. & Hutter, F. Neural architecture search: a survey. , 1–21 (2019). J. Mach. Learning Res. 20 Yao, Q. et al. Taking human out of learning applications: a survey on automated machine learning. Preprint at (2019). https://arxiv.org/abs/1810.13306 Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In , PMLR , 448–456 (2015). Proc. 32nd International Conf. Machine Learning 37 Kaufman, S., Rosset, S. & Perlich, C. Leakage in data mining: formulation, detection, and avoidance. In , 556–563 (2011). Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Zhang, C. et al. BatchCrypt: efficient homomorphic encryption for cross-silo federated learning. In , 493–506 (2020). Proc. 2020 USENIX Annual Technical Conference, ATC 2020 . (2020). Nvidia NGC Catalog: COVID-19 Related Models https://ngc.nvidia.com/catalog/models?orderBy=scoreDESC&pageNumber=0&query=covid&quickFilter=models&filters Marini, J. J. & Gattinoni, L. Management of COVID-19 respiratory distress. , 2329–2330 (2020). JAMA 323 Cook, T. M. et al. Consensus guidelines for managing the airway in patients with COVID-19: Guidelines from the Difficult Airway Society, the Association of Anaesthetists the Intensive Care Society, the Faculty of Intensive Care Medicine and the Royal College of Anaesthetist. , 785–799 (2020). Anaesthesia 75 Galloway, J. B. et al. A clinical risk score to identify patients with COVID-19 at high risk of critical care admission or death: an observational cohort study. , 282–288 (2020). J. Infect. 81 Kilaru, A. S. et al. Return hospital admissions among 1419 COVID-19 patients discharged from five U.S. emergency departments. , 1039–1042 (2020). Acad. Emerg. Med. 27 He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In (2016). Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.90 Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. , 590–597 (2019). Proc. AAAI Conf. Artif. Intell. 33 Wang, R., Fu, B., Fu, G. & Wang, M. Deep & Cross network for Ad Click predictions. In Article no. 12 (2017). Proc. ADKDD’17 Abadi, M. et al. TensorFlow: asystem for large-scale machine learning. In , USENIX Association 265–283 (2016). 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) . (2020). NVIDIA Clara Imaging https://developer.nvidia.com/clara-medical-imaging Stekhoven, D. J. & Bühlmann, P. MissForest–non-parametric missing value imputation for mixed-type data. , 112–118 (2012). Bioinformatics 28 McMahan, H., Moore, E., Ramage, D., Hampson, S. & y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. (2017). http://proceedings.mlr.press/v54/mcmahan17a.html Hsieh, K., Phanishayee, A., Mutlu, O. & Gibbons, P. B. The non-IID data quagmire of decentralized machine learning. In PMLR 119 (2020). Proc. 37th International Conf. Machine Learning Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. , 77 (2011). BMC Bioinformatics 12 Reconhecimento Os pontos de vista expressos neste estudo são os dos autores e não necessariamente os do NHS, do NIHR, do Departamento de Saúde e Cuidados Sociais ou de qualquer das organizações associadas com os autores. MGB agradece aos seguintes indivíduos pelo seu apoio: J. Brink, Departamento de Radiologia, Massachusetts General Hospital, Harvard Medical School, Boston, MA; M. Kalra, Departamento de Radiologia, Massachusetts General Hospital, Harvard Medical School, Boston, MA; N. Neumark, Centro de Ciência de Dados Clínicos, Massachusetts General Brigham, Boston, MA; T. Schultz, Departamento de Radiologia, Massachusetts General Hospital, Boston, MA; N. Guo, Centro de Computação Médica Avançada e Análise, Departamento de Estudo de Radiologia, Massachusetts General Hospital, Harvard Medical School, Boston, MA; J. K A Faculdade de Medicina, através da Faculdade de Medicina, da Faculdade de Medicina, da Universidade de Chulalongkorn, agradece ao Ratchadapisek Sompoch Endowment Fund RA (PO) (n.o 001/63) pela recolha e gestão de dados clínicos e amostras biológicas relacionadas ao COVID-19 para a Pesquisa Task Force, Faculdade de Medicina, da Universidade de Chulalongkorn. O Centro de Pesquisa Biomédica do NIHR de Cambridge agradece a A. Priest, que é apoiado pelo NIHR (Centro de Pesquisa Biomédica da Companhia de Cambridge na Fundação Cambridge Hospitais da Universidade de Cambridge NHS Trust). National Taiwan University MeDA Lab e o MAHC e a Administração Nacional de Seguros de Saúde de Taiwan agradecem ao MOST Joint Research Center for AI https://data.ucsf.edu/covid19 Este documento é under CC by 4.0 Deed (Attribution 4.0 International) license. available on nature Este documento é Licença CC by 4.0 Deed (Attribution 4.0 International). Disponível na Natureza