一个新的隐私第一的AI使用X射线和医疗记录预测COVID严重性

作者 : Ittai Dayan 霍尔格·R·罗斯 Aoxiao Zhong 艾哈迈德·哈鲁尼 友好友好 Anas Z. Abidin Andrew Liu Anthony Beardsworth Costa Bradford J. Wood Chien-Sung Tsai 清华王 C. K. 李 江南 Xu Dufan Wu Eddie Huang Felipe Campos Kitamura Griffin Lacey Gustavo César de Antônio Corradi Gustavo Nino Hirofumi Obinata Hui Ren 杰森·C·克莱恩 杰西·特特雷奥特 江湖 江湖 John W. Garrett Joshua D. Kaggie Jung Gil Park Keith Dreyer Krishna Juluru Kristopher Kersten Marcio Aloisio Bezerra Cavalcanti Rockenbach Marius George Linguraru Masoom A. Haider Meena AbdelMaseeh Nicola Rieke 巴勃罗·F·达马斯科诺 Pedro Mario Cruz e Silva 王 张晓华 Shuichi Kawano Sira Sriswasdi Soo Young Park Thomas M. Grist 华文书 Watsamon Jantarabenjakul Weichung Wang Won Young Tak Xiang Li Xihong Lin Young Joon Kwon Abood Quraini Andrew Feng 安德鲁·N·普里斯特 Baris Turkbey Benjamin Glicksberg 伯纳多·比佐 Byung Seok Kim Carlos Tor-Díez Chia-Cheng Lee Chia-Jung Hsu Chin Lin Christopher P. Hess Colin Compas Deepeksha Bhatia Eric K. Oermann Evan Leibovitz Hisashi Sasaki Hitoshi Mori Isaac Yang Jae Ho Sohn Krishna Nand Keshava Murthy Li-Chen Fu Matheus Ribeiro Furtado de Mendonça Mike Fralick 我的Kyu Kang 穆罕默德 公平 娜塔莉·甘地 Peerapon Vateekul Pierre Elnajjar 莎拉·希克曼 Sharmila Majumdar Shelley L. McLeod Sheridan Reed Stefan Gräf Stephanie Harmon Tatsuya Kodama 托尼·马佐利 Vitor Lima de Lavor Yothin Rakvongthai Yu Rim Lee Yuhong Wen Fiona J. Gilbert Mona G. Flores Quanzheng Li 作者 : 霍尔格·R·罗斯 阿克西亚·张 艾哈迈德·哈鲁尼 友好友好 阿纳斯·Z·阿比丁 安德鲁·刘 安东尼·贝尔兹沃思·科斯塔 布拉德福德·J·伍德 清华王 C. K. 李 江南 Xu 杜邦Wu 埃迪·胡安 菲利普·坎波斯·基塔穆拉 格里芬·莱西 古斯塔沃·凯撒·安东尼奥·科拉迪 古斯塔沃·尼诺 李宁 杰森·C·克莱恩 杰西·特特雷奥特 江湖 江湖 约翰·W·加雷特 约书亚·D·卡吉 杨吉尔公园 凯斯·德雷尔 克里斯娜·朱鲁鲁 克里斯托弗·克里斯汀 马西奥·阿洛伊西奥·贝塞拉·卡瓦尔坎蒂·洛肯巴赫 马里乌斯·乔治·林格拉鲁 马索姆·A·海德尔 阿卜杜勒·阿卜杜勒 尼古拉·里克 巴勃罗·F·达马斯科诺 佩德罗·马里奥·克鲁兹和西尔瓦 王 张晓华 舒伊奇·卡瓦诺 斯里斯瓦斯迪爵士 苏青年公园 托马斯·M·格里斯特 华文书 美食 美食 美食 华盛顿王 年轻人不行 张李 金正恩林 年轻的乔恩·昆 古兰经 古兰经 安德鲁·芬 安德鲁·N·普里斯特 巴里斯·土耳其 本杰明·格利克斯伯格 伯纳多·比佐 金正恩看见金正恩 卡洛斯·托尔·迪埃斯 张智 李 江湖 江湖 中国林 克里斯托弗·P·赫斯 科林·康帕斯 深度Bhatia 埃里克·K·奥尔曼 埃文·莱博维茨 萨萨基 萨萨基 希托西·莫里 伊萨克·杨 杰·霍·儿子 克里什娜·南德·凯沙瓦·穆尔蒂 马修斯·里维罗·费尔塔多德·门登萨 迈克·弗拉里克 我的Kyu Kang 穆罕默德 公平 娜塔莉·甘地 皮埃尔·埃尔纳吉尔 莎拉·希克曼 谢利·L·麦克莱德 谢里登·里德 斯蒂芬·格雷夫 斯蒂芬尼·哈蒙 塔西亚·科达马 托尼·马佐利 维多尔·利马工作 尤·里姆·李 张 温 菲奥娜·J·吉尔伯特 蒙娜·G·弗洛雷斯 李 抽象 在这里,我们使用来自全球20所研究所的数据来训练一种名为EXAM(电子医疗记录(EMR)胸部X射线AI模型)的数据,该模型使用多种来源的数据来训练人工智能模型,同时保持数据匿名性,从而消除数据共享的许多障碍。在这里,我们使用了来自世界各地的20所研究所的数据,以训练EXAM(电子医疗记录(EMR)胸部X射线AI模型),该模型预测患有COVID-19的症状患者的未来氧气需求,使用生命迹象的输入,实验室数据和胸部X射线。EXAM实现了曲线下的平均面积(AUC) >0.92 从最初呈现到紧急病房的24小时和72小时时预测结果,并提供了16%的平均AUC提高,在所有参与 主要 面对COVID-19大流行危机,科学,学术,医学和数据科学界团结起来,迅速评估人工智能(AI)的新范式,快速和安全,并有可能激励数据共享和模型培训和测试,而无需传统协作的隐私和数据所有权障碍。 , 医疗保健提供者,研究人员和行业已经转向解决危机所造成的未满足和关键的临床需求,取得了惊人的结果。 , , , , , , 临床试验招聘得到国家监管机构和国际合作精神的加快和便利。 , , 数据分析和人工智能学科一直鼓励开放和协作的方法,涵盖开源软件,可复制性研究,数据存储库和公开提供匿名数据集等概念。 , 流行病强调了迅速开展数据协作的必要性,使临床和科学界在应对快速发展和普遍存在的全球挑战时有能力。 , , . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 这些类型的协作的具体例子是我们以人工智能为基础的SARS-COV-2临床决策支持(CDS)模型的以前的工作。该CDS模型是在Mass General Brigham(MGB)开发的,并在多个卫生系统的数据中进行了验证。 , , , CXR被选为成像输入,因为它是广泛可用的,通常由ACR提供的指导方针表示。 《 Fleischner 社会》 《WHO》 国家托拉克社会 国家卫生部COVID手册和世界各地的放射学会 CDS模型的输出是一个分数,称为CORISK ,这符合氧气支持需求,可能有助于前线临床医生对患者进行筛选 , , 众所周知,医疗保健提供者更喜欢基于自身数据验证的模型。 迄今为止,大多数AI模型,包括上述CDS模型,都经过了对“狭窄”数据的培训和验证,这些数据往往缺乏多样性。 , ,可能导致过度组合和较低的通用性,这可以通过从多个站点使用多样化的数据进行培训而不集中数据来缓解 使用转移学习等方法 , FL是一种用于在不同数据源上训练人工智能模型的方法,而不会将数据传输或暴露在原始位置之外。 . 18 19 20 21 22 23 24 25 26 27 28 29 30 27 31 32 33 34 35 36 联邦学习支持快速启动中心调试实验,提高数据的可追踪性,并评估算法变化和影响 FL的一个方法,称为客户端服务器,将一个“不受训练”的模型发送到执行部分训练任务的其他服务器(“节点”),反过来将结果发送回中央(“联邦”)服务器中合并。 . 37 36 FL 的数据治理在本地保持,缓解了隐私问题,只有客户端网站和联邦服务器之间传达的模型重量或梯度 , FL在最近的医疗成像应用中已经展现出承诺。 , , , ,包括在COVID-19分析中 , , 一个显着的例子是SARS-COV-2感染患者的死亡预测模型,该模型使用临床特征,虽然在模式数量和规模方面有限。 . 38 39 40 41 42 43 8 44 45 46 我们的目标是开发一个可扩展的模型,可以帮助患者进行筛查。我们认为CDS模型可以成功联盟,因为它使用的数据输入在临床实践中相对常见,而不依赖于操作员依赖的患者状况的评估(如临床印象或报告症状)。 我们的假设是,EXAM将比本地模型表现更好,并在医疗保健系统中更好地概括。 结果 考试模型架构 EXAM模型基于上面提到的CDS模型。 总共有20个特征(19从EMR和一个CXR)被用作模型的输入。结果(即“地面真相”)标签根据患者的氧疗法在24和72小时后从初始入院到紧急部门(ED)分配。 . 27 1 患者的结果标签被设置为0,0,25,0,50和0,75取决于患者在预测窗口中接受的最密集的氧疗法。 氧疗法类别分别为室内空气(RA),低流量氧(LFO),高流量氧(HFO)/非侵入式通风(NIV)或机械通风(MV)。 对于EMR特性,仅使用了在ED中捕获的第一个值,数据预处理包括去识别、缺失值归因和正常化为零平均值和单位差异。 因此,该模型将来自EMR和CXR功能的信息合并起来,使用34层突变神经网络(ResNet34)从CXR和深度和交叉网络中提取功能,将功能与EMR功能相连(更多详细信息,请参阅EMR功能)。 )模型输出是一个风险分数,称为EXAM分数,是对上述标签相应的24小时和72小时预测的0-1范围内的连续值。 方法 联邦模型 使用16148个案例组成的EXAM模型进行了培训,不仅是COVID-19的第一批FL模型之一,而且是临床相关的AI的一个非常大型和多大洲的开发项目(图)。 )网站之间的数据在提取之前没有协调,并在现实生活的临床计算机学情况下,数据输入的详细协调没有由作者进行(图)。 )。 1A、B 1C、D ,世界地图显示了20个不同的客户网站,为EXAM研究做出贡献。 , Number of cases contributed by each institution or site (client 1 represents the site contributing the largest number of cases). 胸部X射线强度分布在每个客户端网站。 , 每个客户端站点的患者年龄,显示最低和最高年龄(星座)、平均年龄(三角形)和标准偏差(水平条)。 . a b c d 1 我们在每个客户的测试数据中将本地训练的模型与全球FL模型进行比较。 « 1 × 10–3,Wilcoxon签名的排名测试) 16% (根据运行该模型在相应的本地测试集时的平均AUC定义:从0.795到0.920,或12.5个百分点) (图。 它还导致了38%的通用性改善(定义为在所有测试套件上运行模型时的平均AUC:从0.667到0.920或 25.3个百分点)的最佳全球模型,用于预测24小时氧气处理,而不是仅根据网站的数据训练的模型(图)。 对于72小时氧气处理的预测结果,最佳的全球模型培训结果与本地训练的模型相比,平均性能提高了18%,而全球模型的通用性平均提高了34%(扩展数据图。 我们的结果的稳定性通过在不同的随机数据分割上重复三次本地和FL训练来验证。 P 二A 2B 1 , 每个客户端测试的性能设置为仅基于本地数据(本地)训练的模型24小时氧化处理的预测,而不是服务器上可用的最佳全球模型(FL(见最佳))。 , 概括性(其他站点测试数据的平均性能,以平均AUC表示)作为客户端数据集大小(没有案例)的函数。 绿色水平线表示最佳全球模型的概括性性能。 )和客户14只有RA治疗的病例,因此评估指标(AUC)在这些情况下都不适用( 客户端14的数据也被排除在本地模型中平均通用性计算中。 a b 1 方法 使用不平衡的群体(例如,大多是轻度的COVID-19病例)进行训练的本地模型显著受益于FL方法,为只有少数病例的类别预测平均AUC性能显著改善。这在客户端网站16(不平衡的数据集)上是显而易见的,大多数患者经历了轻度的疾病严重性和只有少数严重的病例。 扩展数据FIG。 更重要的是,FL模型的通用性在本地训练的模型上显著增加。 第三A 2 , ROC at client site 16, with unbalanced data and mostly mild cases. , ROC of the local model at client site 12 (a small dataset), mean ROC of models trained on larger datasets corresponding to the five client sites in the Boston area (1, 4, 5, 6, 8) and ROC of the best global model in prediction of 72-h oxygen treatment for different thresholds of EXAM score (left, middle, right). The mean ROC is calculated based on five locally trained models while the gray area denotes the ROC standard deviation. ROCs for three different cutoff values ( ) of the EXAM risk score are shown. Pos and neg denote the number of positive and negative cases, respectively, as defined by this range of EXAM score. a b t In the case of client sites with relatively small datasets, the best FL model markedly outperformed not only the local model but also those trained on larger datasets from five client sites in the Boston area of the USA (Fig. )。 3b The global model performed well in predicting oxygen needs at 24/72 h in patients both COVID positive and negative (Extended Data Fig. ). 3 独立网站的验证 经过初步培训,EXAM随后在美国马萨诸塞州的三个独立验证场所进行测试:库利迪金森医院(CDH)、玛莎葡萄园医院(Martha’s Vineyard Hospital(MVH)和南特克特家居医院(NCH)。 , and the ROC curves and confusion matrices for the largest dataset (from CDH) are shown in Fig. . The operating point was set to discriminate between nonmechanical ventilation and mechanical ventilation (MV) treatment (or death). The FL global trained model, EXAM, achieved an average AUC of 0.944 and 0.924 for 24- and 72-h prediction tasks, respectively (Table ), which exceeded the average performance among sites used in training EXAM. For prediction of MV treatment (or death) at 24 h, EXAM achieved a sensitivity of 0.950 and specificity of 0.882 at CDH, and a sensitivity of 1.000 specificity of 0.934 at MVH. NCH did not have any cases with MV/death at 24 h. In regard to 72-h MV prediction, EXAM achieved a sensitivity of 0.929 and specificity of 0.880 at CDH, sensitivity of 1.000 and specificity of 0.976 at MVH and sensitivity of 1.000 and specificity of 0.929 at NCH. 2 4 2 , , Performance (ROC) (top) and confusion matrices (bottom) of the EXAM FL model on the CDH dataset for prediction of oxygen requirement at 24 h ( (二)72小时(二) (二)三种不同类型的分子分子( ) of the EXAM risk score are shown. a b a b t 对于CDH的MV在72小时,EXAM的低假负率为7.1%,代表性失败案例在扩展数据图中呈现。 ,显示了来自CDH的两个假负案例,其中一个案例有许多缺失的EMR数据特征,另一个案例有一个CXR具有运动文物和一些缺失的EMR特征。 4 使用差异性隐私 医疗机构使用FL的主要动机是维护其数据的安全性和隐私,以及遵守数据合规措施。 or even the reconstruction of training images from the model gradients themselves . To counter these risks, security-enhancing measures were used to mitigate risk in the event of data ‘interception’ during site-server communication . We experimented with techniques to avoid interception of FL data, and added a security feature that we believe could encourage more institutions to use FL. We thus validated previous findings showing that partial weight sharing, and other differential privacy techniques, can successfully be applied in FL . Through investigation of a partial weight-sharing scheme , , , we showed that models can reach a comparable performance even when only 25% of weight updates are shared (Extended Data Fig. )。 47 48 49 50 50 51 52 5 Discussion This study features a large, real-world healthcare FL study in terms of number of sites and number of data points used. We believe that it provides a powerful proof-of-concept of the feasibility of using FL for fast and collaborative development of needed AI models in healthcare. Our study involved multiple sites across four continents and under the oversight of different regulatory bodies, and thus holds the promise of being provided to different regulated markets in an expedited way. The global FL model, EXAM, proved to be more robust and achieved better results at individual sites than any model trained on only local data. We believe that consistent improvement was achieved owing to a larger, but also a more diverse, dataset, the use of data inputs that can be standardized and avoidance of clinical impressions/reported symptoms. These factors played an important part in increasing the benefits from this FL approach and its impact on performance, generalizability and, ultimately, the model’s usability. For a client site with a relatively small dataset, two typical approaches could be used for fitting a useful model: one is to train locally with its own data, the other is to apply a model trained on a larger dataset. For sites with small datasets, it would have been virtually impossible to build a performant deep learning model using only their local data. The finding, that these two approaches were outperformed on all three prediction tasks by the global FL model, indicates that the benefit for client sites with small datasets arising from participation in FL collaborations is substantial. This is probaby a reflection of FL’s ability to capture more diversity than local training, and to mitigate the bias present in models trained on a homogenous population. An under-represented population or age group in one hospital/region might be highly represented in another region—such as children who might be differentially affected by COVID-19, including disease manifestations in lung imaging . 46 验证结果证实了全球模型是坚实的,支持我们的假设,FL训练的模型可以在医疗保健系统中普遍化。它们为COVID-19患者护理中的预测算法的使用提供了令人信服的案例,以及在模型创建和测试中使用FL。通过参与本研究,客户站点获得了访问EXAM,在追求任何监管批准或未来的临床护理引入之前进行进一步验证。 ,以及在不同的场所,不是考试培训的一部分。 53 发表了200多种支持COVID-19患者决策的预测模型 与大多数专注于诊断COVID-19或预测死亡率的出版物不同,我们预测了对患者管理有影响的氧气需求。我们还使用了未知SARS-COV-2状态的案例,因此该模型可以在接受逆转录(RT-PCR)的PCR结果之前向医生提供输入,使其对现实生活的临床环境有用。该模型的成像输入在常见实践中使用,与使用胸部计算机成像的模型不同,这是一种非共识诊断方式。该模型的设计被限制在客观预测器上,不像许多发表的研究利用主观临床印象。收集的数据反映了不同的发病率,因此我们遇到的“人口动态”更为多样化。这意味着算法可以在不同发病率的群体中有用 19 患者群体识别和数据协调不是研究和数据科学的新问题 ,但在使用 FL 时,由于其他网站的数据集缺乏可见性,它们更为复杂,需要对临床信息系统进行改进,以简化数据准备,从而更好地利用参与 FL 的网站网络。 . A system that would allow seamless, close-to real-time model inference and results processing would also be of benefit and would ‘close the loop’ from training to model deployment. 54 39 Because data were not centralized they are not readily accessible. Given that, any future analysis of the results, beyond what was derived and collected, is limited. 与其他机器学习模型类似,EXAM受到培训数据的质量限制。有兴趣部署临床护理这个算法的机构需要了解培训中的潜在偏见。例如,在EXAM模型训练中作为实地真理使用的标签来自患者24小时和72小时的氧气消耗;据推测,向患者提供的氧气等同于氧气需求。 由于我们的数据访问是有限的,我们没有足够的可用信息来生成有关大多数站点故障原因的详细统计数据,但我们研究了最大的独立测试站点CDH的故障案例,并能够生成假设,我们可以在未来测试。 在未来,我们还打算研究由于疾病进展的不同阶段导致的“人口流动”的潜力,我们认为由于20个地点的多样性,这种风险可能已经减轻。 A feature that would enhance these kinds of large-scale collaboration is the ability to predict the contribution of each client site towards improving the global FL model. This will help in client site selection, and in prioritization of data acquisition and annotation efforts. The latter is especially important given the high costs and difficult logistics of these large-consortia endeavors, and it will enable these endeavors to capture diversity rather than the sheer quantity of data samples. 未来的方法可能包括自动超参数搜索 , neural architecture search 其他自动化机器学习 approaches to find the optimal training parameters for each client site more efficiently. 55 56 57 Known issues of batch normalization (BN) in FL motivated us to fix our base model for image feature extraction 未来的工作可能会探索不同类型的标准化技术,以便在客户端数据不独立且分布一致的情况下,在FL中更有效地培训AI模型。 58 49 Recent works on privacy attacks within the FL setting have raised concerns on data leakage during model training . Meanwhile, protection algorithms remain underexplored and constrained by multiple factors. While differential privacy algorithms , , show good protection, they may weaken the model’s performance. Encryption algorithms, such as homomorphic encryption , maintain performance but may substantially increase message size and training time. A quantifiable way to measure privacy would allow better choices for deciding the minimal privacy parameters necessary while maintaining clinically acceptable performance , , . 59 36 48 49 60 36 48 49 进一步验证后,我们预测在ED设置中部署EXAM模型,以评估每位患者和人口水平的风险,并为临床医生提供一个额外的参考点,当他们完成经常困难的患者筛查任务时。 Methods Ethics approval 根据《赫尔辛基宣言》和《国际会议关于协调良好临床实践的卫生准则》所定义的《多伦多大学医学研究所通报卫生系统》的12至30条同意,并在以下验证站(CDH、MVH、NCH)和下列培训站(MGB、MGH、布里汉姆和妇女医院、纽顿威尔斯利医院、北海道公立医疗中心和新福克纳医院)得到相关机构审查委员会的批准(这些医院的8个都属于以下验证站:MGB、MVH、NCH编号2020P002673,并在以下培训站(MGB、MGH、Newton-Wellesley医院、North Shore Medical Center和New Faulkner Hospital)被机构审查委员会拒绝同意;同样,其相关机构审 遵循MI-CLAIM关于临床AI模型报告的指南(附注) ) 2 研究设置 The study included data from 20 institutions (Fig. ):MGB,MGH,布里汉姆和妇女医院,纽顿威尔斯利医院,北海岸医疗中心和福克纳医院;华盛顿特区儿童国立医院;NIHR剑桥生物医学研究中心;东京自卫队中央医院;台湾国家台湾大学MDA实验室和MAHC和台湾国家健康保险管理局;台湾三服务总医院;韩国Kyungpook国立大学医院;泰国医学院,Chulalongkorn大学;巴西Diagnosticos da America SA;加利福尼亚大学,旧金山大学;华盛顿大学San Diego;多伦多大学;巴塞斯达国家卫生研究所,马里兰州;威斯康星州大学,马里森大学医学和公共卫生学院;纽约市纪念碑Sloan Kettering癌症中心;纽约 , , . Data from three independent sites were used for independent validation: CDH, MVH and NCH, all in Massachusetts, USA. These three hospitals had patient population characteristics different from the training sites. The data used for the algorithm validation consisted of patients admitted to the ED at these sites between March 2020 and February 2021, and that satisfied the same inclusion criteria of the data used to train the FL model. 1a 61 62 63 数据收集 The 20 client sites prepared a total of 16,148 cases (both positive and negative) for the purposes of training, validation and testing of the model (Fig. 获取的医学数据与符合研究纳入标准的患者有关。客户站点努力从2019年12月大流行开始,直到他们开始进行EXAM研究的本地培训为止,将所有COVID阳性病例纳入其中。所有本地培训都已于2020年9月30日开始。 1b 一个“案例”包括一个CXR和从患者的医疗记录中提取的所需数据输入。 CXR图像强度(像素值)的分布和模式因患者和网站特定的许多因素,如不同设备制造商和成像协议,在各个网站之间发生了很大的变化,如图所示。 . Patient age and EMR feature distribution varied greatly among sites, as expected owing to the differing demographics between globally distributed hospitals (Extended Data Fig. ). 1b 1c,d 6 患者包容标准 Patient inclusion criteria were: (1) patient presented to the hospital’s ED or equivalent; (2) patient had a RT–PCR test performed at any time between presentation to the ED and discharge from the hospital; (3) patient had a CXR in the ED; and (4) patient’s record had at least five of the EMR values detailed in Table , all obtained in the ED, and the relevant outcomes captured during hospitalization. Of note, The CXR, laboratory results and vitals used were the first available for capture during the visit to the ED. The model did not incorporate any CXR, laboratory results or vitals acquired after leaving the ED. 1 入口模型 总共,21个EMR特征被用作模型的输入,结果(即地面真理)标签根据患者的需求分配24小时和72小时后,从初始入院到ED。 . 1 在不同客户端站点使用不同的设备进行氧气处理的分布显示在扩展数据图中。 ,详细介绍了设备在进入ED时以及24小时和72小时之后的使用情况,最大和最小的客户端网站之间的数据集分布差异可见于扩展数据图。 . 7 8 阳性COVID-19病例的数目,通过单个RT-PCR检测证实,在向ED提交和退院之间获得的任何时间,列出补充表 每个客户端网站被要求随机将其数据集分为三个部分:70%用于培训,10%用于验证和20%用于测试。对于24小时和72小时的结果预测模型,为三次重复的本地和FL培训和评估实验的每个单独生成随机分。 1 模型开发考试 There is wide variation in the clinical course of patients who present to hospital with symptoms of COVID-19, with some experiencing rapid deterioration in respiratory function requiring different interventions to prevent or mitigate hypoxemia , 在评估患者在初级护理点或ED时所作的一个关键决定是,患者是否可能需要更具侵入性的或资源有限的对策或干预措施(如MV或单克隆抗体),因此应该接受稀有但有效的治疗,由于副作用而具有狭窄的风险与益处比例的治疗,或更高的护理水平,如进入密集护理单位。 相比之下,患有需要侵入性氧疗法的风险较低的患者可能被放置在较少密集的护理环境中,如常规护理室,甚至可以从ED中释放,在家中继续自我监测。 EXAM是为了帮助筛选这些患者而开发的。 62 63 64 65 Of note, the model is not approved by any regulatory agency at this time and it should be used only for research purposes. 考试成绩 EXAM was trained using FL; it outputs a risk score (termed EXAM score) similar to CORISK (数据扩展图。 ) and can be used in the same way to triage patients. It corresponds to a patient’s oxygen support requirements within two windows—24 and 72 h—after initial presentation to the ED. Extended Data Fig. 说明CORISK和考试分数如何用于患者分类。 27 9a 9B 胸部X射线图像进行了预处理,以选择前方位置图像,并排除侧面视图,然后扩展到224×224的分辨率。 ,该模型融合了来自EMR和CXR功能的信息(基于修改的ResNet34与空间关注 在 CheXpert 数据集上预训练) 深度和十字路口网络 为了汇聚这些不同数据类型,从每个 CXR 图像中使用预训练的 ResNet34 提取了 512 维特特性特性向量,然后与 EMR 特性合并为 Deep & Cross 网络的输入。 我们用十字作为损失函数和“亚当”作为优化器。 使用 NVIDIA Clara 火车 SDK 对分类任务的平均AUC(≥LFO, ≥HFO/NIV或 ≥MV)被计算并用作最终评估指标,正常化为零平均值和单位差异性。 ). 9a 66 67 68 9B 69 70 27 函数指数和标准化 错误的算法 如果在客户端网站数据集中完全缺少一个 EMR 功能,则使用了该功能的平均值,该值仅用来自 MGB 客户端网站的数据来计算。 71 使用 Deep & Cross 网络的 EMR-CXR 数据合并的细节 为了在案例层面模拟EMR和CXR数据的特征相互作用,使用了基于Deep & Cross网络架构的深度功能方案。 . Binary and categorical features for the EMR inputs, as well as 512-dimensional image features in the CXR, were transformed into fused dense vectors of real values by embedding and stacking layers. The transformed dense vectors served as input to the fusion framework, which specifically employed a crossing network to enforce fusion among input from different sources. The crossing network performed explicit feature crossing within its layers, by conducting inner products between the original input feature and output from the previous layer, thus increasing the degree of interaction across features. At the same time, two individual classic deep neural networks with several stacked, fully connected feed-forward layers were trained. The final output of our framework was then derived from the concatenation of both classic and crossing networks. 68 FL details Arguably the most established form of FL is implemention of the federated averaging algorithm as proposed by McMahan et al. 这个算法可以通过客户端服务器设置来实现,其中每个参与网站都作为客户端行事。人们可以将FL视为旨在通过减少每个网站估计的局部损失函数来减少全球损失函数的方法。通过最大限度地减少每个客户端网站的局部损失,同时在集中聚合服务器上同步学习的客户端网站重量,人们可以将全球损失最小化,而无需在集中位置访问整个数据集。每个客户端网站在本地学习,并与中央服务器共享模型重量更新,该服务器使用安全接口层加密和通信协议汇总贡献。 ). 72 9c FL的假算法在附注中显示 在我们的实验中,我们将联邦轮回的数量设置为 = 200,每轮有1个本地训练阶段 每个客户的数量,每个客户的数量。 ,高达20个,取决于客户端的网络连接或特定目标结果期限的可用数据(24小时或72小时)。 , depends on the dataset size at each client and is used to weigh each client’s contributions when aggregating the model weights in federated averaging. During the FL training task, each client site selects its best local model by tracking the model’s performance on its local validation set. At the same time, the server determines the best global model based on the average validation scores sent from each client site to the server after each FL round. After FL training finishes, the best local models and the best global model are automatically shared with all client sites and evaluated on their local test data. 1 T t K nk k Adam 优化器用于本地培训和 FL,初始学习率为 5 × 10-5 和每 40 个时代的步骤学习率衰退为 0.5,这对于联邦平均汇聚而言很重要。 随机的亲密转换,包括旋转,翻译,切割,扩展和随机强度的噪音和变化,在训练过程中应用于图像以增加数据。 73 由于BN层的敏感性 当我们在非独立和相同分布的环境中处理不同客户时,我们发现在保持预训练的ResNet34与空间关注时发生了最佳的模型性能。 parameters fixed during FL training (that is, using a learning rate of zero for those layers). The Deep & Cross network that combines image features with EMR features does not contain BN layers and hence was not affected by BN instability issues. 58 47 在这项研究中,我们研究了一种隐私保护计划,该计划仅在服务器和客户端网站之间共享部分模型更新。重量更新在每个迭代期间按贡献大小进行排序,并且只有一定百分比的最大重量更新与服务器共享。 (二)数据扩展图。 ) ,由所有非零梯度计算, Δ ,并可能不同于每个客户 在每一个FL轮 . Variations of this scheme could include additional clipping of large gradients or differential privacy schemes 将随机噪音添加到梯度,甚至是原始数据,在输入网络之前 . k 5 Wk(t) k t 49 51 统计分析 我们进行了Wilcoxon签名的排名测试,以确认本地训练的模型和FL模型在24小时和72小时的时间点(图)之间观察到的性能改善的意义。 扩展数据FIG。 )零假设被单方面拒绝。 1 × 10–3 在这两种情况下。 2 1 P Pearson’s correlation was used to assess the generalizability (robustness of the average AUC value to other client sites’ test data) of locally trained models in relation to respective local dataset size. Only a moderate correlation was observed ( = 0,43 , = 0.035,自由度(df) = 17 对于 24 小时模型和 第062章 = 0.003, df = 16 for the 72-h model). This indicates that dataset size alone is not the only factor determining a model’s robustness to unseen data. r P r P 要比较来自全球FL模型的ROC曲线和不同地点训练的本地模型(扩展数据图。 ),我们从数据中启动了1000个样本,并计算了结果的AUC,然后计算了两个系列之间的差异,并使用公式进行标准化 = (AUC1 – AUC2)/ 何处 这是标准化的差异。 是 bootstrap 差异的标准偏差,而 AUC1 和 AUC2 是相应的 bootstrapped AUC 系列。 通过正常分布,我们获得了 附加表中说明的值 . The results show that the null hypothesis was rejected with very low 值,表明FL结果优越性的统计意义。 值在 R 中与 pROC 库进行 . 3 D s D s D P 2 P P 74 Since the model predicts a discrete outcome, a continuous score from 0 to 1, a straightforward calibration evaluation such as a qqplot is not possible. Hence, for a quantified estimate of calibration we quantified discrimination (Extended Data Fig. ). We conducted one-way analysis of variation (ANOVA) tests to compare local and FL model scores among four ground truth categories (RA, LFO, HFO, MV). The -statistic, calculated as the variation between the sample means divided by variation within the samples and representing the degree of dispersion among different groups, was used to quantify the models. Our results show that the -values of five different local sites are 245.7, 253.4, 342.3, 389.8 and 634.8, while that of the FL model is 843.5. Given that larger -values mean that groups are more separable, the scores from our FL model clearly show a greater dispersion among the four ground truth categories. Furthermore, the FL 模型上的 ANOVA 測試值為 < 2 × 10-16 ,表明 FL 預測分數在不同預測類別之間有統計顯著差異。 10 F F F P 报告总结 Further information on research design is available in the linked to this article. Nature Research Reporting Summary 数据可用性 参与这项研究的20所研究所的数据集仍然由他们保管。这些数据用于每个地方的培训,并未与其他参与机构或联邦服务器共享,并且无法公开获取。来自独立验证网站的数据由CAMCA维护,并可通过联系Q.L进行访问。 代码可用性 All code and software used in this study are publicly available at NGC. To access, log in as a guest or create a profile then enter one of the URLs below. The trained models, data preparation guidelines, code for training, validating testing of the model, readme file, installation guideline and license files are publicly available at NVIDIA NGC : : 联邦学习软件作为Clara Train SDK的一部分可用: . Alternatively, use this command to download the model “wget --content-disposition 61 https://ngc.nvidia.com/catalog/models/nvidia:med:clara_train_covid19_exam_ehr_xray https://ngc.nvidia.com/catalog/containers/nvidia:clara-train-sdk https://api.ngc.nvidia.com/v2/models/nvidia/med/clara_train_covid19_exam_ehr_xray/versions/1/zip 参考 Budd, J. et al. Digital technologies in the public-health response to COVID-19. , 1183–1192 (2020). Nat. Med. 26 Moorthy, V., Henao Restrepo, A. M., Preziosi, M.-P. & Swaminathan, S. Data sharing for novel coronavirus (COVID-19). , 150 (2020). Bull. World Health Organ. 98 Chen, Q., Allot, A. & Lu, Z. Keep up with the latest coronavirus research. , 193 (2020). Nature 579 Fabbri, F., Bhatia, A., Mayer, A., Schlotter, B. & Kaiser, J. BCG IT spend pulse: how COVID-19 is shifting tech priorities. (2020). https://www.bcg.com/publications/2020/how-covid-19-is-shifting-big-it-spend Candelon, F., Reichert, T., Duranton, S., di Carlo, R. C. & De Bondt, M. The rise of the AI-powered company in the postcrisis world. (2020). https://www.bcg.com/en-gb/publications/2020/business-applications-artificial-intelligence-post-covid Chao, H. et al. Integrative analysis for COVID-19 patient outcome prediction. , 101844 (2021). Med. Image Anal. 67 Zhu, X. et al. Joint prediction and time estimation of COVID-19 developing severe symptoms using chest CT scan. , 101824 (2021). Med. Image Anal. 67 Yang, D. et al. Federated semi-supervised learning for Covid region segmentation in chest ct using multi-national data from China, Italy, Japan. , 101992 (2021). Med. Image Anal. 70 Minaee, S., Kafieh, R., Sonka, M., Yazdani, S. & Jamalipour Soufi, G. Deep-COVID: predicting COVID-19 from chest X-ray images using deep transfer learning. , 101794 (2020). Med. Image Anal. 65 COVID-19 Studies from the World Health Organization Database. (2020). https://clinicaltrials.gov/ct2/who_table ACTIV. (2020). https://www.nih.gov/research-training/medical-research-initiatives/activ Coronavirus Treatment Acceleration Program (CTAP). US Food and Drug Administration (2020). https://www.fda.gov/drugs/coronavirus-covid-19-drugs/coronavirus-treatment-acceleration-program-ctap Gleeson, P., Davison, A. P., Silver, R. A. & Ascoli, G. A. A commitment to open source in neuroscience. , 964–965 (2017). Neuron 96 Piwowar, H. et al. The state of OA: a large-scale analysis of the prevalence and impact of open access articles. , e4375 (2018). PeerJ. 6 European Society of Radiology (ESR). What the radiologist should know about artificial intelligence – an ESR white paper. , 44 (2019). Insights Imaging 10 Pesapane, F., Codari, M. & Sardanelli, F. Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. , 35 (2018). Eur. Radiol. Exp. 2 Price, W. N. 2nd & Cohen, I. G. Privacy in the age of medical big data. , 37–43 (2019). Nat. Med. 25 Liang, W. et al. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. , 1081–1089 (2020). JAMA Intern. Med. 180 Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. , m1328 (2020). Brit. Med. J. 369 Zhang, L. et al. D-dimer levels on admission to predict in-hospital mortality in patients with Covid-19. , 1324–1329 (2020). J. Thromb. Haemost. 18 Sands, K. E. et al. Patient characteristics and admitting vital signs associated with coronavirus disease 2019 (COVID-19)-related mortality among patients admitted with noncritical illness. (2020). https://doi.org/10.1017/ice.2020.461 American College of Radiology. CR recommendations for the use of chest radiography and computed tomography (CT) for suspected COVID-19 infection. (2020). https://www.acr.org/Advocacy-and-Economics/ACR-Position-Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection Rubin, G. D. et al. The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the Fleischner Society. , 172–180 (2020). Radiology 296 World Health Organization. Use of chest imaging in COVID-19. (2020). https://www.who.int/publications/i/item/use-of-chest-imaging-in-covid-19 Jamil, S. et al. Diagnosis and management of COVID-19 disease. , 10 (2020). Am. J. Respir. Crit. Care Med. 201 Redmond, C. E., Nicolaou, S., Berger, F. H., Sheikh, A. M. & Patlas, M. N. Emergency radiology during the COVID-19 pandemic: The Canadian Association of Radiologists Recommendations for Practice. , 425–430 (2020). Can. Assoc. Radiologists J. 71 Buch, V. et al. Development and validation of a deep learning model for prediction of severe outcomes in suspected COVID-19 Infection. Preprint at (2021). https://arxiv.org/abs/2103.11269 Lyons, C. & Callaghan, M. The use of high-flow nasal oxygen in COVID-19. , 843–847 (2020). Anaesthesia 75 Whittle, J. S., Pavlov, I., Sacchetti, A. D., Atwood, C. & Rosenberg, M. S. Respiratory support for adult patients with COVID-19. , 95–101 (2020). J. Am. Coll. Emerg. Physicians Open 1 Ai, J., Li, Y., Zhou, X. & Zhang, W. COVID-19: treating and managing severe cases. , 370–371 (2020). Cell Res. 30 Esteva, A. et al. A guide to deep learning in healthcare. , 24–29 (2019). Nat. Med. 25 Cahan, E. M., Hernandez-Boussard, T., Thadaney-Israni, S. & Rubin, D. L. Putting the data before the algorithm in big data addressing personalized healthcare. , 78 (2019). NPJ Digit. Med. 2 Thrall, J. H. et al. Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. , 504–508 (2018). J. Am. Coll. Radiol. 15 Shilo, S., Rossman, H. & Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. , 29–38 (2020). Nat. Med. 26 Gao, Y. & Cui, Y. Deep transfer learning for reducing health care disparities arising from biomedical data inequality. , 5131 (2020). Nat. Commun. 11 Rieke, N. et al. The future of digital health with federated learning. , 119 (2020). NPJ Dig. Med. 3 Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. , 12 (2019). ACM Trans. Intell. Syst. Technol. 10 Ma, C. et al. On safeguarding privacy and security in the framework of federated learning. , 242–248 (2020). IEEE Netw. 34 Brisimi, T. S. et al. Federated learning of predictive models from federated Electronic Health Records. , 59–67 (2018). Int. J. Med. Inform. 112 Roth, H. R. et al. Federated learning for breast density classification: a real-world implementation. In , (eds. Albarqouni, S. et al.) Vol. 12,444, 181–191 (Springer International Publishing, 2020). Proc. Second MICCAI Workshop, DART 2020 and First MICCAI Workshop, DCL 2020 Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. , 12598 (2020). Sci. Rep. 10 Remedios, S. W., Butman, J. A., Landman, B. A. & Pham, D. L. in (eds Remedios, S. W. et al.) (Springer, 2020). Federated Gradient Averaging for Multi-Site Training with Momentum-Based Optimizers Xu, Y. et al. A collaborative online AI engine for CT-based COVID-19 diagnosis. Preprint at (2020). https://www.medrxiv.org/content/10.1101/2020.05.10.20096073v2 Raisaro, J. L. et al. SCOR: A secure international informatics infrastructure to investigate COVID-19. , 1721–1726 (2020). J. Am. Med. Inform. Assoc. 27 Vaid, A. et al. Federated learning of electronic health records to improve mortality prediction in hospitalized patients with COVID-19: machine learning approach. , e24207 (2021). JMIR Med. Inform. 9 Nino, G. et al. Pediatric lung imaging features of COVID-19: a systematic review and meta-analysis. , 252–263 (2021). Pediatr. Pulmonol. 56 Fredrikson, M., Jha, S. & Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures. In 1322–1333, (2015). Proc. 22nd ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/2810103.2813677 Zhu, L., Liu, Z. & Han, S. in (eds Wallach, H. et al.) 14774–14784 (Curran Associates, Inc., 2019). Advances in Neural Information Processing Systems 32 Kaissis, G. A., Makowski, M. R., Rückert, D. & Braren, R. F. Secure, privacy-preserving and federated machine learning in medical imaging. , 305–311 (2020). Nat. Mach. Intell. 2 Li, W. et al. in 133–141 (Springer, 2019). Privacy-Preserving Federated Brain Tumour Segmentation Shokri, R. & Shmatikov, V. Privacy-preserving deep learning. In (2015). Proc. 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton) https://doi.org/10.1109/allerton.2015.7447103 Li, X. et al. Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. , 101765 (2020). Med. Image Anal. 65 Estiri, H. et al. Predicting COVID-19 mortality with electronic medical records. , 15 (2021). NPJ Dig. Med. 4 Jiang, G. et al. Harmonization of detailed clinical models with clinical study data standards. , 65–74 (2015). Methods Inf. Med. 54 Yang, D. et al. in . (2019). Searching Learning Strategy with Reinforcement Learning for 3D Medical Image Segmentation https://doi.org/10.1007/978-3-030-32245-8_1 Elsken, T., Metzen, J. H. & Hutter, F. Neural architecture search: a survey. , 1–21 (2019). J. Mach. Learning Res. 20 Yao, Q. et al. Taking human out of learning applications: a survey on automated machine learning. Preprint at (2019). https://arxiv.org/abs/1810.13306 Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In , PMLR , 448–456 (2015). Proc. 32nd International Conf. Machine Learning 37 Kaufman, S., Rosset, S. & Perlich, C. Leakage in data mining: formulation, detection, and avoidance. In , 556–563 (2011). Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Zhang, C. et al. BatchCrypt: efficient homomorphic encryption for cross-silo federated learning. In , 493–506 (2020). Proc. 2020 USENIX Annual Technical Conference, ATC 2020 . (2020). Nvidia NGC Catalog: COVID-19 Related Models https://ngc.nvidia.com/catalog/models?orderBy=scoreDESC&pageNumber=0&query=covid&quickFilter=models&filters Marini, J. J. & Gattinoni, L. Management of COVID-19 respiratory distress. , 2329–2330 (2020). JAMA 323 Cook, T. M. et al. Consensus guidelines for managing the airway in patients with COVID-19: Guidelines from the Difficult Airway Society, the Association of Anaesthetists the Intensive Care Society, the Faculty of Intensive Care Medicine and the Royal College of Anaesthetist. , 785–799 (2020). Anaesthesia 75 Galloway, J. B. et al. A clinical risk score to identify patients with COVID-19 at high risk of critical care admission or death: an observational cohort study. , 282–288 (2020). J. Infect. 81 Kilaru, A. S. et al. Return hospital admissions among 1419 COVID-19 patients discharged from five U.S. emergency departments. , 1039–1042 (2020). Acad. Emerg. Med. 27 He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In (2016). Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2016.90 Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. , 590–597 (2019). Proc. AAAI Conf. Artif. Intell. 33 Wang, R., Fu, B., Fu, G. & Wang, M. Deep & Cross network for Ad Click predictions. In Article no. 12 (2017). Proc. ADKDD’17 Abadi, M. et al. TensorFlow: asystem for large-scale machine learning. In , USENIX Association 265–283 (2016). 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) . (2020). NVIDIA Clara Imaging https://developer.nvidia.com/clara-medical-imaging Stekhoven, D. J. & Bühlmann, P. MissForest–non-parametric missing value imputation for mixed-type data. , 112–118 (2012). Bioinformatics 28 McMahan, H., Moore, E., Ramage, D., Hampson, S. & y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. (2017). http://proceedings.mlr.press/v54/mcmahan17a.html Hsieh, K., Phanishayee, A., Mutlu, O. & Gibbons, P. B. The non-IID data quagmire of decentralized machine learning. In PMLR 119 (2020). Proc. 37th International Conf. Machine Learning Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. , 77 (2011). BMC Bioinformatics 12 认可 通过该部门的客户数据获得的观点是作者,而不是必然的NHS,NIHR,卫生和社会护理部或与作者相关的任何组织的观点。MGB感谢以下个人他们的支持:J. Brink,放射科系,马萨诸塞总医院,哈佛医学院,波士顿MA;M. Kalra,放射科系,马萨诸塞总医院,哈佛总医院,哈佛医学院,波士顿MA;N. Neumark,临床数据科学中心,马萨诸塞总部布里哈姆,波士顿MA;T. Schultz,放射科系,马萨诸塞总医院,波士顿MA;N. Guo,先进的医疗计算和分析中心,研究系,马萨诸塞总医院,哈佛总医院,哈佛医学院,波士顿MA 通过医学院,查拉龙科恩大学感谢Ratchadapisek Sompoch Endowment Fund RA(PO)(No. 001/63)为研究工作组,医学院,查拉龙科恩大学收集和管理COVID-19相关的临床数据和生物样本。NIHR剑桥生物医学研究中心感谢NIHR(剑桥大学医院NHS基金会基金会的剑桥生物医学研究中心)支持的A.Priest。国家台湾大学MEDA实验室和MAHC和台湾国家健康保险管理局感谢AI技术的MOST联合研究中心,All Vista国家卫生保健管理局,台湾,科学部和技术部,台湾国家理论科学研究中心。国家卫生研究所(HNI)承认,国家卫生研究所(HNI)承认NIH医学研究学者计划是NIH的私人 https://data.ucsf.edu/covid19 本文在 CC by 4.0 Deed (Attribution 4.0 International) 许可证下可用。 本文在 CC by 4.0 Deed (Attribution 4.0 International) 许可证下可用。