Authors: Nicola Rieke Jonny Hancox Wenqi Li Fausto Milletarì Holger R. Roth Shadi Albarqouni Spyridon Bakas Mathieu N. Galtier Bennett A. Landman Klaus Maier-Hein Sébastien Ourselin Micah Sheller Ronald M. Summers Andrew Trask Daguang Xu Maximilian Baust M. Jorge Cardoso Authors: Nicolau Rico João Hancox Vênus Li Fausto Milletarì Por Holger R. Roth Shadi Albarqouni Espiridão Bacas Mathieu N. Galtier Direção Bennett A. Landman João Maier-Hein Sébastien Ourselin Micah Sheller Ronald M. Summers André Trás Daguang Xu Maximilian Baust Jorge Cardoso Abstract O aprendizado de máquina baseado em dados (ML) surgiu como uma abordagem promissora para construir modelos estatísticos precisos e robustos a partir de dados médicos, que são coletados em grandes volumes pelos sistemas de saúde modernos. Os dados médicos existentes não são totalmente explorados pela ML principalmente porque se encontra em silos de dados e preocupações de privacidade restringem o acesso a esses dados. No entanto, sem acesso a dados suficientes, a ML será impedida de alcançar seu pleno potencial e, em última análise, de fazer a transição da pesquisa para a prática clínica. Este artigo considera os fatores-chave que contribuem para este problema, explora como o aprendizado federado (FL) pode fornecer uma solução para o futuro da saúde digital e destaca os desafios e considerações que precisam ser abordados. Introduction Research on artificial intelligence (AI), and particularly the advances in machine learning (ML) and deep learning (DL) have led to disruptive innovations in radiology, pathology, genomics and other fields. Modern DL models feature millions of parameters that need to be learned from sufficiently large curated data sets in order to achieve clinical-grade accuracy, while being safe, fair, equitable and generalising well to unseen data , , , . 1 2 3 4 5 Por exemplo, o treinamento de um detector de tumores baseado em IA requer um grande banco de dados que engloba o espectro completo de possíveis anatomias, patologias e tipos de dados de entrada. . Even if data anonymisation could bypass these limitations, it is now well understood that removing metadata such as patient name or date of birth is often not enough to preserve privacy . It is, for example, possible to reconstruct a patient’s face from computed tomography (CT) or magnetic resonance imaging (MRI) data Outra razão pela qual o compartilhamento de dados não é sistemático na saúde é que a coleta, curadoria e manutenção de um conjunto de dados de alta qualidade leva tempo, esforço e despesa consideráveis. Consequentemente, tais conjuntos de dados podem ter valor comercial significativo, tornando menos provável que eles sejam compartilhados livremente. 6 7 8 Aprendizagem Federada (FL) , , É um paradigma de aprendizagem que procura abordar o problema da governança de dados e da privacidade treinando algoritmos de forma colaborativa sem trocar os próprios dados. , it recently gained traction for healthcare applications , , , , , , , . FL enables gaining insights collaboratively, e.g., in the form of a consensus model, without moving patient data beyond the firewalls of the institutions in which they reside. Instead, the ML process occurs locally at each participating institution and only model characteristics (e.g., parameters, gradients) are transferred as depicted in Fig. . Recent research has shown that models trained by FL can achieve performance levels comparable to ones trained on centrally hosted data sets and superior to models that only see isolated single-institutional data , . 9 10 11 12 13 14 15 16 17 18 19 20 1 16 17 O fluxo de trabalho FL típico em que uma federação de nós de treinamento recebe o modelo global, reenvia seus modelos parcialmente treinados para um servidor central intermitentemente para agregação e, em seguida, continua o treinamento no modelo de consenso que o servidor retorna. FL peer to peer—alternative formulation of FL in which each training node exchanges its partially trained models with some or all of its peers and each does its own aggregation. Centralised training—the general non-FL training workflow in which data acquiring sites donate their data to a central Data Lake from which they and others are able to extract data for local, independent training. a b c Uma implementação bem-sucedida do FL poderia, portanto, ter um potencial significativo para permitir a medicina de precisão em grande escala, levando a modelos que produzam decisões imparciais, refletem otimamente a fisiologia de um indivíduo e são sensíveis a doenças raras, respeitando as preocupações de governança e privacidade. We envision a federated future for digital health and with this perspective paper, we share our consensus view with the aim of providing context and detail for the community regarding the benefits and impact of FL for medical applications (section “Data-driven medicine requires federated efforts”), as well as highlighting key considerations and challenges of implementing FL for digital health (section “Technical considerations”). Medicina baseada em dados exige esforços federados ML and especially DL is becoming the de facto knowledge discovery approach in many industries, but successfully implementing data-driven applications requires large and diverse data sets. However, medical data sets are difficult to obtain (subsection “The reliance on data”). FL addresses this issue by enabling collaborative learning without centralising data (subsection “The promise of federated efforts”) and has already found its way to digital health applications (subsection “Current FL efforts for digital health”). This new learning paradigm requires consideration from, but also offers benefits to, various healthcare stakeholders (section “Impact on stakeholders”). The reliance on data Data-driven approaches rely on data that truly represent the underlying data distribution of the problem. While this is a well-known requirement, state-of-the-art algorithms are usually evaluated on carefully curated data sets, often originating from only a few sources. This can introduce biases where demographics (e.g., gender, age) or technical imbalances (e.g., acquisition protocol, equipment manufacturer) skew predictions and adversely affect the accuracy for certain groups or sites. However, to capture subtle relationships between disease patterns, socio-economic and genetic factors, as well as complex and rare cases, it is crucial to expose a model to diverse cases. The need for large databases for AI training has spawned many initiatives seeking to pool data from multiple institutions. This data is often amassed into so-called Data Lakes. These have been built with the aim of leveraging either the commercial value of data, e.g., IBM’s Merge Healthcare acquisition , ou como um recurso para o crescimento econômico e o progresso científico, por exemplo, NHS Escócia National Safe Haven , French Health Data Hub , and Health Data Research UK . 21 22 23 24 Substantial, albeit smaller, initiatives include the Human Connectome , the UK Biobank , the Cancer Imaging Archive (TCIA) , NIH CXR8 , NIH DeepLesion Atlas do Genoma do Câncer (TCGA) , the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Além de grandes desafios médicos O desafio do Camelion , the International multimodal Brain Tumor Segmentation (BraTS) challenge , , or the Medical Segmentation Decathlon Dados médicos públicos são geralmente específicos para tarefas ou doenças e muitas vezes são liberados com diferentes graus de restrições de licença, às vezes limitando sua exploração. 25 26 27 28 29 30 31 32 33 34 35 36 37 Centralising or releasing data, however, poses not only regulatory, ethical and legal challenges, related to privacy and data protection, but also technical ones. Anonymising, controlling access and safely transferring healthcare data is a non-trivial, and sometimes impossible task. Anonymised data from the electronic health record can appear innocuous and GDPR/PHI compliant, but just a few data elements may allow for patient reidentification . The same applies to genomic data and medical images making them as unique as a fingerprint Portanto, a menos que o processo de anonimização destrua a fidelidade dos dados, provavelmente tornando-os inúteis, a reidentificação do paciente ou o vazamento de informações não podem ser excluídos.O acesso por portas para usuários aprovados é muitas vezes proposto como uma solução presuntiva para este problema.No entanto, além de limitar a disponibilidade de dados, isso só é prático para casos em que o consentimento concedido pelos titulares de dados é incondicional, uma vez que a retirada de dados daqueles que podem ter tido acesso aos dados é praticamente inapropriável. 7 38 A promessa dos esforços federados The promise of FL is simple—to address privacy and data governance challenges by enabling ML from non-co-located data. In a FL setting, each data controller not only defines its own governance processes and associated privacy policies, but also controls data access and has the ability to revoke it. This includes both the training, as well as the validation phase. In this way, FL could create new opportunities, e.g., by allowing large-scale, in-institutional validation, or by enabling novel research on rare diseases, where the incident rates are low and data sets at each single institution are too small. Moving the model to the data and not vice versa has another major advantage: high-dimensional, storage-intense medical data does not have to be duplicated from local institutions in a centralised pool and duplicated again by every user that uses this data for local model training. As the model is transferred to the local institutions, it can scale naturally with a potentially growing global data set without disproportionately increasing data storage requirements. As depicted in Fig. , a FL workflow can be realised with different topologies and compute plans. The two most common ones for healthcare applications are via an aggregation server , , Peer to peer aproximação , Em todos os casos, a FL oferece implicitamente um certo grau de privacidade, uma vez que os participantes da FL nunca acessam diretamente dados de outras instituições e só recebem parâmetros de modelo que são agregados em vários participantes.Em um fluxo de trabalho da FL com servidor de agregação, as instituições participantes podem até permanecer desconhecidas umas das outras. , , , Portanto, mecanismos como a privacidade diferencial , or learning from encrypted data have been proposed to further enhance privacy in a FL setting (c.f. section “Technical considerations”). Overall, the potential of FL for healthcare applications has sparked interest in the community e as técnicas FL são uma área crescente de pesquisa , . 2 16 17 18 15 39 40 41 42 43 44 45 46 12 20 FL topologias – arquitetura de comunicação de uma federação. Centralizado: o servidor de agregação coordena as iterações de treinamento e recolhe, agrega e distribui os modelos para e dos Nódulos de Treinamento (Hub & Spoke). Descentralizado: cada nó de treinamento está conectado a um ou mais pares e a agregação ocorre em cada nó em paralelo. Hierarchical: federated networks can be composed from several sub-federations, which can be built from a mix of Peer to Peer and Aggregation Server federations ( )). FL compute plans—trajectory of a model across several partners. Aprendizagem Sequencial / Aprendizagem de Transferência Cíclico. Serviço de Agregação, Peer para peer. a b c d e f g Current FL efforts for digital health Since FL is a general learning paradigm that removes the data pooling requirement for AI model development, the application range of FL spans the whole of AI for healthcare. By providing an opportunity to capture larger data variability and to analyse patients across different demographics, FL may enable disruptive innovations for the future but is also being employed right now. In the context of electronic health records (EHR), for example, FL helps to represent and to find clinically similar patients , , as well as predicting hospitalisations due to cardiac events , mortality and ICU stay time . The applicability and advantages of FL have also been demonstrated in the field of medical imaging, for whole-brain segmentation in MRI , as well as brain tumour segmentation , . Recently, the technique has been employed for fMRI classification to find reliable disease-related biomarkers and suggested as a promising approach in the context of COVID-19 . 13 47 14 19 15 16 17 18 48 It is worth noting that FL efforts require agreements to define the scope, aim and technologies used which, since it is still novel, can be difficult to pin down. In this context, today’s large-scale initiatives really are the pioneers of tomorrow’s standards for safe, fair and innovative collaboration in healthcare applications. These include consortia that aim to advance research, such as the Trustworthy Federated Data Analytics (TFDA) project and the German Cancer Consortium’s Joint Imaging Platform , que permitem pesquisas descentralizadas em instituições de pesquisa de imagem médica alemãs. Outro exemplo é uma colaboração de pesquisa internacional que usa FL para o desenvolvimento de modelos de IA para a avaliação de mamografias . The study showed that the FL-generated models outperformed those trained on a single institute’s data and were more generalisable, so that they still performed well on other institutes’ data. However, FL is not limited just to academic environments. academic 49 50 51 By linking healthcare institutions, not restricted to research centres, FL can have direct impact. The on-going HealthChain project , por exemplo, visa desenvolver e implantar um quadro FL em quatro hospitais na França. Esta solução gera modelos comuns que podem prever a resposta ao tratamento para pacientes com câncer de mama e melanoma. Ajuda os oncologistas a determinar o tratamento mais eficaz para cada paciente a partir de seus slides de histologia ou imagens de dermoscopia. Outro esforço de grande escala é a iniciativa Federated Tumour Segmentation (FeTS) , que é uma federação internacional de 30 instituições de saúde comprometidas usando um framework FL de código aberto com uma interface de usuário gráfica. O objetivo é melhorar a detecção de fronteiras de tumores, incluindo glioma cerebral, tumores de mama, tumores de fígado e lesões ósseas de pacientes com mieloma múltiplo. clinical 52 53 Another area of impact is within pesquisa e tradução. FL permite a pesquisa colaborativa para empresas, mesmo concorrentes.Neste contexto, uma das maiores iniciativas é o projeto Melloddy É um projeto que visa implantar FL multi-tarefa em todos os conjuntos de dados de 10 empresas farmacêuticas.Ao treinar um modelo preditivo comum, que inferir como compostos químicos se ligam a proteínas, os parceiros pretendem otimizar o processo de descoberta de drogas sem revelar seus dados internos altamente valiosos. industrial 54 Impacto sobre as partes interessadas FL comprises a paradigm shift from centralised data lakes and it is important to understand its impact on the various stakeholders in a FL ecosystem. Clinicians Os clínicos são geralmente expostos a um subgrupo da população com base em sua localização e ambiente demográfico, o que pode causar pressupostos viciosos sobre a probabilidade de certas doenças ou sua interligação. Ao usar sistemas baseados em ML, por exemplo, como um segundo leitor, eles podem aumentar sua própria experiência com conhecimentos especializados de outras instituições, garantindo uma consistência de diagnóstico não alcançável hoje. Enquanto isso se aplica ao sistema baseado em ML em geral, os sistemas treinados de forma federada são potencialmente capazes de produzir decisões ainda menos viciosas e uma maior sensibilidade a casos raros, pois eles foram provavelmente expostos a uma distribuição de dados mais completa. No entanto, isso requer algum esforço antecipado, como o cumprimento de acordos, por exemplo, em relação à estrutura de dados, anota Patients Patients are usually treated locally. Establishing FL on a global scale could ensure high quality of clinical decisions regardless of the treatment location. In particular, patients requiring medical attention in remote areas could benefit from the same high-quality ML-aided diagnoses that are available in hospitals with a large number of cases. The same holds true for rare, or geographically uncommon, diseases, that are likely to have milder consequences if faster and more accurate diagnoses can be made. FL may also lower the hurdle for becoming a data donor, since patients can be reassured that the data remains with their own institution and data access can be revoked. Hospitals and practices Hospitals and practices can remain in full control and possession of their patient data with complete traceability of data access, limiting the risk of misuse by third parties. However, this will require investment in on-premise computing infrastructure or private-cloud service provision and adherence to standardised and synoptic data formats so that ML models can be trained and evaluated seamlessly. The amount of necessary compute capability depends of course on whether a site is only participating in evaluation and testing efforts or also in training efforts. Even relatively small institutions can participate and they will still benefit from collective models generated. Researchers and AI developers Pesquisadores e desenvolvedores de IA podem se beneficiar do acesso a uma coleção potencialmente vasta de dados do mundo real, o que certamente impactará pequenos laboratórios de pesquisa e start-ups.Assim, os recursos podem ser direcionados para resolver necessidades clínicas e problemas técnicos associados, em vez de depender do fornecimento limitado de conjuntos de dados abertos. , , O desenvolvimento baseado em FL também implica que o pesquisador ou desenvolvedor de IA não pode investigar ou visualizar todos os dados sobre os quais o modelo é treinado, por exemplo, não é possível olhar para um caso de falha individual para entender por que o modelo atual desempenha mal nele. 11 12 20 Provedores de Saúde Os prestadores de cuidados de saúde em muitos países são afetados pela mudança de paradigma em curso de baseada em volume, ou seja, baseada em taxas por serviço, para cuidados de saúde baseados em valor, que por sua vez está fortemente ligado ao estabelecimento bem-sucedido da medicina de precisão.Não se trata de promover terapias individualizadas mais caras, mas sim de alcançar melhores resultados mais cedo através de um tratamento mais focado, reduzindo assim os custos. Manufacturers Manufacturers of healthcare software and hardware could benefit from FL as well, since combining the learning from many devices and applications, without revealing patient-specific information, can facilitate the continuous validation or improvement of their ML-based systems. However, realising such a capability may require significant upgrades to local compute, data storage, networking capabilities and associated software. Considerações técnicas FL is perhaps best-known from the work of Konečnỳ et al. , but various other definitions have been proposed in the literature , , , Um fluxo de trabalho FL (Fig. ) can be realised via different topologies and compute plans (Fig. ), but the goal remains the same, i.e., to combine knowledge learned from non-co-located data. In this section, we will discuss in more detail what FL is, as well as highlighting the key challenges and technical considerations that arise when applying FL in digital health. 55 9 11 12 20 1 2 Federated learning definition FL is a learning paradigm in which multiple parties train collaboratively without the need to exchange or centralise data sets. A general formulation of FL reads as follows: Let denote a global loss function obtained via a weighted combination of local losses , computed from private data , que reside nas partes envolvidas e nunca é compartilhada entre elas: K Xk where > 0 indica os respectivos coeficientes de peso. WK In practice, each participant typically obtains and refines a global consensus model by conducting a few rounds of optimisation locally and before sharing updates, either directly or via a parameter server. The more rounds of local training are performed, the less it is guaranteed that the overall procedure is minimising (Eq. ) , O processo real de agregação de parâmetros depende da topologia da rede, já que os nós podem ser segregados em sub-redes devido a restrições geográficas ou legais (ver FIG. ). Aggregation strategies can rely on a single aggregating node (hub and spokes models), or on multiple nodes without any centralisation. An example is peer-to-peer FL, where connections exist between all or a subset of the participants and model updates are shared only between directly connected sites , , whereas an example of centralised FL aggregation is given in Algorithm 1. Note that aggregation strategies do not necessarily require information about the full model update; clients might chose to share only a subset of the model parameters for the sake of reducing communication overhead, ensure better privacy preservation or to produce multi-task learning algorithms having only part of their parameters learned in a federated manner. 1 9 12 2 15 56 10 Uma estrutura unificadora que permita vários esquemas de treinamento pode separar os recursos de computação (dados e servidores) do sistema. , as depicted in Fig. Este último define a trajetória de um modelo em vários parceiros, a serem treinados e avaliados em conjuntos de dados específicos. compute plan 2 Desafios e considerações Despite the advantages of FL, it does not solve all issues that are inherent to learning on medical data. A successful model training still depends on factors like data quality, bias and standardisation Essas questões devem ser resolvidas para esforços de aprendizagem federados e não federados através de medidas apropriadas, como design de estudo cuidadoso, protocolos comuns para a aquisição de dados, relatórios estruturados e metodologias sofisticadas para descobrir preconceitos e estratificações ocultas. , , . 2 11 12 20 Data heterogeneity Medical data is particularly diverse—not only because of the variety of modalities, dimensionality and characteristics in general, but even within a specific protocol due to factors such as acquisition differences, brand of the medical device or local demographics. FL may help address certain sources of bias through potentially increased diversity of data sources, but inhomogeneous data distribution poses a challenge for FL algorithms and strategies, as many are assuming independently and identically distributed (IID) data across the participants. In general, strategies such as are prone to fail under these conditions , , , in part defeating the very purpose of collaborative learning strategies. Recent results, however, indicate that FL training is still feasible , even if medical data is not uniformly distributed across the institutions , or includes a local bias As pesquisas que abordam este problema incluem, por exemplo, Estratégia de partilha de dados e FL com adaptação de domínio . Another challenge is that data heterogeneity may lead to a situation in which the global optimal solution may not be optimal for an individual local participant. The definition of model training optimality should, therefore, be agreed by all participants before training. Feijão 9 9 57 58 59 16 17 51 FedProx 57 58 18 Privacy and security Healthcare data is highly sensitive and must be protected accordingly, following appropriate confidentiality procedures. Therefore, some of the key considerations are the trade-offs, strategies and remaining risks regarding the privacy-preserving potential of FL. Privacy vs. performance: It is important to note that FL does not solve all potential privacy issues and—similar to ML algorithms in general—will always carry some risks. Privacy-preserving techniques for FL offer levels of protection that exceed today’s current commercially available ML models No entanto, há um compromisso em termos de desempenho e essas técnicas podem afetar, por exemplo, a precisão do modelo final. . Furthermore, future techniques and/or ancillary data could be used to compromise a model previously considered to be low-risk. 12 10 Level of trust: Broadly speaking, participating parties can enter two types of FL collaboration: —for FL consortia in which all parties are considered trustworthy and are bound by an enforceable collaboration agreement, we can eliminate many of the more nefarious motivations, such as deliberate attempts to extract sensitive information or to intentionally corrupt the model. This reduces the need for sophisticated counter-measures, falling back to the principles of standard collaborative research. Confiança —in FL systems that operate on larger scales, it might be impractical to establish an enforceable collaborative agreement. Some clients may deliberately try to degrade performance, bring the system down or extract information from other parties. Hence, security strategies will be required to mitigate these risks such as, advanced encryption of model submissions, secure authentication of all parties, traceability of actions, differential privacy, verification systems, execution integrity, model confidentiality and protections against adversarial attacks. Non-trusted Fuga de informações: Por definição, os sistemas FL evitam o compartilhamento de dados de saúde entre as instituições participantes. no entanto, a informação compartilhada ainda pode expor indiretamente dados privados usados para treinamento local, por exemplo, por inversão de modelo. of the model updates, the gradients themselves Ataques adversários , O FL difere do treinamento tradicional na medida em que o processo de treinamento é exposto a múltiplas partes, aumentando assim o risco de vazamento através de engenharia reversa se os adversários puderem observar mudanças de modelo ao longo do tempo, observar atualizações de modelo específicas (ou seja, atualizações de uma única instituição), ou manipular o modelo (por exemplo, induzir memorização adicional por outros através de ataques de estilo gradiente-ascendente). , and ensuring adequate differential privacy , pode ser necessária e ainda é uma área ativa de pesquisa . 60 61 62 63 16 18 44 12 Rastreabilidade e responsabilidade As per all safety-critical applications, the reproducibility of a system is important for FL in healthcare. In contrast to centralised training, FL requires multi-party computations in environments that exhibit considerable variety in terms of hardware, software and networks. Traceability of all system assets including data access history, training configurations, and hyperparameter tuning throughout the training processes is thus mandatory. In particular in non-trusted federations, traceability and accountability processes require execution integrity. After the training process reaches the mutually agreed model optimality criteria, it may also be helpful to measure the amount of contribution from each participant, such as computational resources consumed, quality of the data used for local training, etc. These measurements could then be used to determine relevant compensation, and establish a revenue model among the participants Uma implicação da FL é que os pesquisadores não são capazes de investigar os dados sobre os quais os modelos estão sendo treinados para fazer sentido de resultados inesperados. Além disso, tomar medições estatísticas de seus dados de treinamento como parte do fluxo de trabalho de desenvolvimento de modelo precisará ser aprovado pelas partes colaboradoras como não violando a privacidade. Embora cada site terá acesso aos seus próprios dados brutos, as federações podem decidir fornecer algum tipo de facilidade de visualização intra-node segura para atender a essa necessidade ou pode fornecer alguma outra maneira de aumentar a explicabilidade e interpretabilidade do modelo global. 64 System architecture Ao contrário de executar FL em grande escala entre dispositivos de consumo como McMahan et al. , healthcare institutional participants are equipped with relatively powerful computational resources and reliable, higher-throughput networks enabling training of larger models with many more local training steps, and sharing more model information between nodes. These unique characteristics of FL in healthcare also bring challenges such as ensuring data integrity when communicating by use of redundant nodes, designing secure encryption methods to prevent data leakage, or designing appropriate node schedulers to make best-use of the distributed computational devices and reduce idle time. 9 A administração de tal federação pode ser realizada de maneiras diferentes. Em situações que exigem a mais rigorosa privacidade de dados entre as partes, o treinamento pode operar através de algum tipo de sistema de “corretor honesto”, no qual um terceiro confiável atua como intermediário e facilita o acesso aos dados. Esta configuração requer uma entidade independente que controla o sistema geral, o que pode não ser sempre desejável, uma vez que pode envolver custos adicionais e viscosidade processual. No entanto, tem a vantagem de que os mecanismos internos precisos podem ser abstraídos dos clientes, tornando o sistema mais ágil e mais simples de atualizar. Em um sistema peer-to-peer, cada site interage diretamente com alguns ou todos os outros participantes. Em outras palavras, não há função de gateway, todos os protocolos devem ser acordados Conclusion ML, and particularly DL, has led to a wide range of innovations in the area of digital healthcare. As all ML methods benefit greatly from the ability to access data that approximates the true global distribution, FL is a promising approach to obtain powerful, accurate, safe, robust and unbiased models. By enabling multiple parties to train collaboratively without the need to exchange or centralise data sets, FL neatly addresses issues related to egress of sensitive medical data. As a consequence, it may open novel research and business avenues and has the potential to improve patient care globally. However, already today, FL has an impact on nearly all stakeholders and the entire treatment cycle, ranging from improved medical image analysis providing clinicians with better diagnostic tools, over true precision medicine by helping to find similar patients, to collaborative and accelerated drug discovery decreasing cost and time-to-market for pharma companies. Not all technical questions have been answered yet and FL will certainly be an active research area throughout the next decade Apesar disso, acreditamos verdadeiramente que seu impacto potencial na medicina de precisão e, em última análise, na melhoria dos cuidados médicos é muito promissor. 12 Reporting summary Mais informações sobre o projeto de pesquisa estão disponíveis na ligado a este artigo. Nature Research Reporting Summary Referências LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. , 436 (2015). Nature 521 Wang, F., Casalino, L. P. & Khullar, D. Deep learning in medicine—promise, progress, and challenges. , 293–294 (2019). JAMA Intern. Med. 179 Chartrand, G. et al. Deep learning: a primer for radiologists. , 2113–2131 (2017). Radiographics 37 De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. , 1342 (2018). Nat. Med. 24 Sun, C., Shrivastava, A., Singh, S. & Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In , 843–852 ( , 2017). Proceedings of the IEEE international conference on computer vision IEEE Van Panhuis, W. G. et al. A systematic review of barriers to data sharing in public health. , 1144 (2014). BMC Public Health 14 Rocher, L., Hendrickx, J. M. & De Montjoye, Y.-A. Estimating the success of re-identifications in incomplete datasets using generative models. , 1–9 (2019). Nat. Commun. 10 Schwarz, C. G. et al. Identification of anonymous mri research participants with face-recognition software. , 1684–1686 (2019). N. Engl. J. Med. 381 McMahan, B., Moore, E., Ramage, D., Hampson, S. & y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. In , 1273–1282. (2017). Artificial Intelligence and Statistics https://scholar.google.de/scholar?hl=de&as_sdt=0%2C5&q=Communicationefficient+learning+of+deep+networks+from+decentralized+data&btnG= Li, T., Sahu, A. K., Talwalkar, A. & Smith, V. Federated learning: Challenges, methods, and future directions. , 50–60 (IEEE, 2020). IEEE Signal Processing Magazine 37 Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. , 12 (2019). ACM Trans. Intell. Syst. Technol. (TIST) 10 Kairouz, P. et al. Advances and open problems in federated learning. (2019). arXiv preprint arXiv:1912.04977 Lee, J. et al. Privacy-preserving patient similarity learning in a federated environment: development and analysis. , e20 (2018). JMIR Med. Inform. 6 Brisimi, T. S. et al. Federated learning of predictive models from federated electronic health records. , 59–67 (2018). Int. J. Med. Inform. 112 Roy, A. G., Siddiqui, S., Pölsterl, S., Navab, N. & Wachinger, C. Braintorrent: a peer-to-peer environment for decentralized federated learning. (2019). arXiv preprint arXiv:1905.06731 Li, W. et al. Privacy-preserving federated brain tumour segmentation. In , 133–141 (Springer, 2019). International Workshop on Machine Learning in Medical Imaging Sheller, M. J., Reina, G. A., Edwards, B., Martin, J. & Bakas, S. Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. In , 92–104 (Springer, 2018). International MICCAI Brainlesion Workshop Li, X. et al. Multi-site fmri analysis using privacy-preserving federated learning and domain adaptation: abide results. (2020). arXiv preprint arXiv:2001.05647 Huang, L. et al. Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. , 103291 (2019). J. Biomed. Inform. 99 Xu, J. & Wang, F. Federated learning for healthcare informatics. (2019). arXiv preprint arXiv:1911.06270 Roy, A. & Banerjee, A. Ibm’s merge healthcare acquisitio . (2015) (Accessed 10 February 2020). n https://www.reuters.com/article/us-merge-healthcare-m-a-ibm/ibm-to-buy-merge-healthcare-in-1-billion-deal-idUSKCN0QB1ML20150806 Nhs scotland’s national safe haven. (2015) (Accessed 10 February 2020). https://www.gov.scot/publications/charter-safe-havens-scotland-handling-unconsented-data-national-health-service-patient-records-support-research-statistics/pages/4/ Cuggia, M. & Combes, S. The french health data hub and the german medical informatics initiatives: Two national projects to promote data sharing in healthcare. , 195–202 (2019). Yearbook Med. Informat. 28 Health Data Research UK. (Health Data Research UK, 2020) (Accessed 10 Feb 2020). https://www.hdruk.ac.uk/ Sporns, O., Tononi, G. & Kötter, R. The human connectome: a structural description of the human brain. . , e42, (2005). PLoS Comput. Biol 1 https://doi.org/10.1371/journal.pcbi.0010042 Sudlow, C. et al. Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. . , e1001779. (2015). PLoS Med 12 https://doi.org/10.1371/journal.pmed.1001779 Clark, K. et al. The cancer imaging archive (tcia): maintaining and operating a public information repository. , 1045–1057 (2013). J. Digit. Imaging. 26 Wang, X. et al. Chestx-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In , 2097–2106 ( , 2017). Proceedings of the IEEE conference on computer vision and pattern recognition IEEE Yan, K., Wang, X., Lu, L. & Summers, R. M. Deeplesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. , 036501 (2018). J Med. Imaging. 5 Tomczak, K., Czerwińska, P. & Wiznerowicz, M. The cancer genome atlas (tcga): an immeasurable source of knowledge. , A68 (2015). Contemp. Oncol. 19 Jack Jr., C. R. et al. The alzheimer’s disease neuroimaging initiative (adni): Mri methods. , 685–691 (2008). J. Magn. Reson. Imaging 27 . (2020) (Accessed 24 July 2020). Grand Challenge-a Platform for End-to-end Development of Machine Learning Solutions in Biomedical Imaging https://grand-challenge.org/ Litjens, G. et al. 1399 h&e-stained sentinel lymph node sections of breast cancer patients: the camelyon dataset. , giy065 (2018). GigaScience 7 Menze, B. H. et al. The multimodal brain tumor image segmentation benchmark (brats). , 1993–2024 (2014). IEEE Trans. Med. Imaging 34 Bakas, S. et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. (2018). arXiv preprint arXiv:1811.02629 Bakas, S. et al. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. , 170117 (2017). Sci. Data 4 Simpson, A. L. et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. (2019). arXiv preprint arXiv:1902.09063 Yeh, F.-C. et al. Quantifying differences and similarities in whole-brain white matter architecture using local connectome fingerprints. , e1005203 (2016). PLoS Comput. Biol. 12 Chang, K. et al. Distributed deep learning networks among institutions for medical imaging. , 945–954 (2018). J. Am. Med. Inform. Assoc. 25 Shokri, R., Stronati, M., Song, C. & Shmatikov, V. Membership inference attacks against machine learning models. In , 3-18 (IEEE, 2017). 2017 IEEE Symposium on Security and Privacy (SP) Sablayrolles, A., Douze, M., Ollivier, Y., Schmid, C. & Jégou, H. White-box vs black-box: Bayes optimal strategies for membership inference. In Chaudhuri, K. & Salakhutdinov, R. (eds) , 5558–5567. (PMLR, 2019). Proceedings of the 36th International Conference on Machine Learning, {ICML} 97 http://proceedings.mlr.press/v97/sablayrolles19a.html Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning requires rethinking generalization. In , (OpenReview.net, 2017). 5th International Conference on Learning Representations, {ICLR}. https://openreview.net/forum?id=Sy8gdB9xx Carlini, N., Liu, C., Erlingsson, Ú., Kos, J. & Song, D. The secret sharer: evaluating and testing unintended memorization in neural networks. In Heninger, N. & Traynor, P. (eds) { } ({ } , 267–284. ({USENIX} Association, Santa Clara, CA, USA, 2019). 28th USENIX Security Symposium USENIX Security 19 https://www.usenix.org/conference/usenixsecurity19/presentation/carlini Abadi, M. et al. Deep learning with differential privacy. In , 308–318 (ACM, 2016). Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security Shokri, R. & Shmatikov, V. Privacy-preserving deep learning. In , 1310–1321 (ACM, 2015). Proceedings of the 22nd ACM SIGSAC conference on computer and communications security Langlotz, C. P. et al. A roadmap for foundational research on artificial intelligence in medical imaging: from the 2018 nih/rsna/acr/the academy workshop. , 781–791 (2019). Radiology 291 Kim, Y., Sun, J., Yu, H. & Jiang, X. Federated Tensor Factorization for Computational Phenotyping. In . 887–895. (ACM, Halifax, NS, Canada, 2017). Proceedings of the 23rd {ACM} {SIGKDD} International Conference on Knowledge Discoveryand Data Mining https://doi.org/10.1145/3097983.3098118 He, C., Annavaram, M. & Avestimehr, S. Fednas: Federated deep learning via neural architecture search. (2020). https://sites.google.com/view/cvpr20-nas/ Trustworthy federated data analytics (tfda). (2020) (Accessed 28 May 2020). https://tfda.hmsp.center/ Joint Imaging Platform (Jip). (2020) (Accessed 28 May 2020). https://jip.dktk.dkfz.de/jiphomepage/ Medical institutions collaborate to improve mammogram assessment ai. (2020) (Accessed 28 May 2020). https://blogs.nvidia.com/blog/2020/04/15/federated-learning-mammogram-assessment/ Healthchain consortium. (2020) (Accessed 28 May 2020). https://www.substra.ai/en/healthchain-project The federated tumor segmentation (fets) initiative. (2020) (Accessed 28 May 2020). https://www.fets.ai Machine learning ledger orchestration for drug discovery. (2020). Accessed 28 May 2020. https://cordis.europa.eu/project/id/831472 Konečny`, J., McMahan, H. B., Ramage, D. & Richtárik, P. Federated optimization: Distributed machine learning for on-device intelligence. (2016). arXiv preprint arXiv:1610.02527 Lalitha, A., Kilinc, O. C., Javidi, T. & Koushanfar, F. Peer-to-peer federated learning on graphs. (2019). arXiv preprint arXiv:1901.11173 Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A. & Smith, V. Federated optimization in heterogeneous networks. (2018). arXiv preprint arXiv:1812.06127 Zhao, Y. et al. Federated learning with non-iid data. (2018). arxivabs/1806.00582 Li, X., Huang, K., Yang, W., Wang, S. & Zhang, Z. On the convergence of fedavg on non-IID data. (2020). https://openreview.net/forum?id=HJxNAnVtDS Wu, B. et al. P3sgd: patient privacy preserving SGD for regularizing deep CNNs in pathological image classification. In (pp. 2099–2108) (2019). Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Zhu, L., Liu, Z. & Han, S. Deep leakage from gradients. In Wallach, H. M. et al. (eds) , 14747–14756. (2019). Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems http://papers.nips.cc/paper/9617-deep-leakage-from-gradients Wang, Z. et al. Beyond inferring class representatives: user-level privacy leakage from federated learning. In 2512–2520. (IEEE, Paris, France, 2019). 2019 {IEEE} Conferenceon Computer Communications, {INFOCOM} https://doi.org/10.1109/INFOCOM.2019.8737416 Hitaj, B., Ateniese, G. & Perez-Cruz, F. Deep models under the gan: information leakage from collaborative deep learning. In , CCS’17, 603–618 (Association for Computing Machinery, New York, NY, USA, 2017). Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security Ghorbani, A. & Zou, J. Data shapley: Equitable valuation of data for machine learning. In (pp. 2242-2251) (2019). International Conference on Machine Learning Reconhecimento Este trabalho foi apoiado pelo UK Research and Innovation London Medical Imaging & Artificial Intelligence Centre for Value-Based Healthcare, pelo Wellcome/EPSRC Centre for Medical Engineering (WT203148/Z/16/Z), pelo Wellcome Flagship Programme (WT213038/Z/18/Z), pelo Intramural Research Programme do National Institutes of Health (NIH) Clinical Center, pelo National Cancer Institute do NIH sob o número de prêmio U01CA242871, pelo National Institute of Neurological Disorders and Stroke do NIH sob o número de prêmio R01NS042645, bem como pelo Helmholtz Initiative and Networking Fund (projecto “Trustworthy Federated Data Analytics”) e o programa PRIME do German Academic Exchange Service (DAAD) com fundos do Ministério Federal da Educação e Pes This paper is under CC by 4.0 Deed (Attribution 4.0 International) license. available on nature Este documento é Licença CC by 4.0 Deed (Attribution 4.0 International). Disponível na Natureza