Research Suggests AI Models Can Deliver More Accurate Diagnoses Without Discrimination

Table of Links Abstract and Introduction Related work Methods 3.1 Positive-sum fairness 3.2 Application Experiments 4.1 Initial results 4.2 Positive-sum fairness Conclusion and References 5 Conclusion In this paper, we presented the notion of positive-sum fairness and argued that larger disparities are not necessarily harmful, as long as it does not come at the expense of a specific subgroup performance. The general performance, standard fairness and positive-sum fairness of four models was analyzed, each leveraging sensitive attributes in a different way. Our study highlights the need for a nuanced understanding of fairness metrics and their implications in real-world applications. Good incorporation of medical knowledge is crucial when utilizing sensitive information and evaluating fairness accurately, particularly in cases where models may show a large performance disparity. When traditional methods often aim for equality, positive-sum fairness focuses on equity, pushing for each group to achieve its highest possible performance level. This can lead to better overall outcomes, as it encourages to address the specific needs and challenges of each group without diminishing the quality of care for others. But, being defined as an optimization problem, it could also have unintended side effects as it may inadvertently prioritize larger or more well-represented groups by focusing the efforts on the groups with the highest impact on the overall performance rather than those with the most critical needs. Therefore, it is to be noted that meeting the positivesum fairness criterion alone does not ensure a model to be fair from an egalitarian perspective, and the use of this notion in conjunction with other metrics can give a more holistic understanding of a model’s fairness. As positive-sum fairness is a relative measure, it requires a baseline to be used. Further work in this area would include developing a more robust baseline or adapting the approach to remove the need for a baseline. It would also be worth it to compare out-of-domain tested models, include other sensitive attributes such as sex and age and take into account confounding factors. Disclosure of Interests. The authors declare that there are no conflicts of interest regarding the publication of this paper. References Baumann, J., Hertweck, C., Loi, M., Heitz, C.: Distributive justice as the foundational premise of fair ml: Unification, extension, and interpretation of group fairness metrics (2023), https://arxiv.org/abs/2206.02897 Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: The state of the art (2017) Brown, A., Tomasev, N., Freyberg, J., Liu, Y., Karthikesalingam, A., Schrouff, J.: Detecting shortcut learning for fair medical AI using shortcut testing Burton, D.C., Flannery, B., Bennett, N.M., Farley, M.M., Gershman, K., Harrison, L.H., Lynfield, R., Petit, S., Reingold, A.L., Schaffner, W., Thomas, A., Plikaytis, B.D., Rose, Jr, C.E., Whitney, C.G., Schuchat, A., for the Active Bacterial Core Surveillance/Emerging Infections Program Network: Socioeconomic and racial/ethnic disparities in the incidence of bacteremic pneumonia among US adults. Am. J. Public Health 100(10), 1904–1911 (Oct 2010) Diana, E., Gill, W., Kearns, M., Kenthapadi, K., Roth, A.: Minimax group fairness: Algorithms and experiments (2021) EAM, S., M, W., P, M., ND., F.: Fairness-related performance and explainability effects in deep learning models for brain image analysis. J Med Imaging (Bellingham). 2022 Nov;9(6):061102. doi: 10. (2022) Efron, B.: Better bootstrap confidence intervals. Journal of the American statistical Association 82(397), 171–185 (1987) Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact (2015) Gichoya, J.W., Banerjee, I., Bhimireddy, A.R., Burns, J.L., Celi, L.A., Chen, L.C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S.C., Kuo, P.C., Lungren, M.P., Palmer, L.J., Price, B.J., Purkayastha, S., Pyrros, A.T., Oakden-Rayner, L., Okechukwu, C., Seyyed-Kalantari, L., Trivedi, H., Wang, R., Zaiman, Z., Zhang, H.: AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4(6), e406–e414 (Jun 2022) Glocker, B., Jones, C., Bernhardt, M., Winzeck, S.: Algorithmic encoding of protected characteristics in chest x-ray disease detection models. EBioMedicine 89(104467), 104467 (Mar 2023) Haeri, M.A., Zweig, K.A.: The crucial role of sensitive attributes in fair classification. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). pp. 2993–3002 (2020). https://doi.org/10.1109/SSCI47803.2020.9308585 Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning (2016) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2018) Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., Mark, R.: MIMIC-IV (2023) Johnson, A.E.W., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., Pollard, T.J., Hao, S., Moody, B., Gow, B., Lehman, L.W.H., Celi, L.A., Mark, R.G.: MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10(1), 1 (Jan 2023) Johnson, A.E.W., Pollard, T.J., Greenbaum, N.R., Lungren, M.P., ying Deng, C., Peng, Y., Lu, Z., Mark, R.G., Berkowitz, S.J., Horng, S.: Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs (2019) Joseph, N.P., Reid, N.J., Som, A., Li, M.D., Hyle, E.P., Dugdale, C.M., Lang, M., Betancourt, J.R., Deng, F., Mendoza, D.P., Little, B.P., Narayan, A.K., Flores, E.J.: Racial and ethnic disparities in disease severity on admission chest radiographs among patients admitted with confirmed coronavirus disease 2019: A retrospective cohort study. Radiology 297(3), E303– E312 (Dec 2020) Kleinberg, J., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores (2016) Lara, R., A., M., Echeveste, R., Ferrante, E.: Addressing fairness in artificial intelligence for medical imaging. Nat Commun 13 4581 (2022) Lee, J., Brooks, C., Yu, R., Kizilcec, R.: Fairness hub technical briefs: Auc gap (2023) Lee, J.K., Bu, Y., Rajan, D., Sattigeri, P., Panda, R., Das, S., Wornell, G.W.: Fair selective classification via sufficiency. In: International Conference on Machine Learning (2021), https://api.semanticscholar.org/CorpusID:235826429 Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts (2017) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019) Mittelstadt, B., Wachter, S., Russell, C.: The unfairness of fair machine learning: Levelling down and strict egalitarianism by default (2023), https://arxiv.org/abs/2302.02404 Mukherjee, D., Yurochkin, M., Banerjee, M., Sun, Y.: Two simple ways to learn individual fairness metrics from data (2020) Petersen, E., Ferrante, E., Ganz, M., Feragen, A.: Are demographically invariant models and representations in medical imaging fair? (2024), https://arxiv.org/abs/2305.01397 Petersen, E., Holm, S., Ganz, M., Feragen, A.: The path toward equal performance in medical machine learning. Patterns 4(7), 100790 (Jul 2023). https://doi.org/10.1016/j. patter.2023.100790, http://dx.doi.org/10.1016/j.patter.2023.100790 Raff, E., Sylvester, J.: Gradient reversal against discrimination (2018) Rajeev, C., Natarajan, K.: Data Augmentation in Classifying Chest Radiograph Images (CXR) Using DCGAN-CNN, pp. 91–110 (11 2023). https://doi.org/10.1007/ 978-3-031-43205-7_6 Rubinstein, W.S.: Hereditary breast cancer in jews. Fam. Cancer 3(3-4), 249–257 (2004) Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge (2015) Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I., Ghassemi, M.: Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in underserved patient populations. Nature Medicine 27 (12 2021). https://doi.org/10.1038/ s41591-021-01595-0 Shi, H., Seegobin, K., Heng, F., Zhou, K., Chen, R., Qin, H., Manochakian, R., Zhao, Y., Lou, Y.: Genomic landscape of lung adenocarcinomas in different races. Front. Oncol. 12 (Sep 2022) Ustun, B., Liu, Y., Parkes, D.: Fairness without harm: Decoupled classifiers with preference guarantees. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6373–6382. PMLR (09–15 Jun 2019), https://proceedings.mlr.press/ v97/ustun19a.html Varkey, B.: Principles of clinical ethics and their application to practice. Med. Princ. Pract. 30(1), 17–28 (2021) Verma, S., Rubin, J.S.: Fairness definitions explained. 2018 IEEE/ACM International Workshop on Software Fairness (FairWare) pp. 1–7 (2018), https://api.semanticscholar. org/CorpusID:49561627 Warner, E., Foulkes, W., Goodwin, P., Meschino, W., Blondal, J., Paterson, C., Ozcelik, H., Goss, P., Allingham-Hawkins, D., Hamel, N., Di Prospero, L., Contiga, V., Serruya, C., Klein, M., Moslehi, R., Honeyford, J., Liede, A., Glendon, G., Brunet, J.S., Narod, S.: Prevalence and penetrance of BRCA1 and BRCA2 gene mutations in unselected ashkenazi jewish women with breast cancer. J. Natl. Cancer Inst. 91(14), 1241–1247 (Jul 1999) Xu, Z., Li, J., Yao, Q., Li, H., Zhou, S.K.: Fairness in medical image analysis and healthcare: A literature survey (2023) Yang, Y., Zhang, H., Gichoya, J.W., Katabi, D., Ghassemi, M.: The limits of fair medical imaging ai in the wild (2023) Zong, Y., Yang, Y., Hospedales, T.: Medfair: Benchmarking fairness for medical imaging (2023) Žliobaite, I., Custers, B.: Using sensitive personal data may be necessary for avoiding dis- ˙ crimination in data-driven decision models (2016) Authors: (1) Samia Belhadj∗, Lunit Inc., Seoul, Republic of Korea (samia.belhadj@lunit.io); (2) Sanguk Park [0009 −0005 −0538 −5522]*, Lunit Inc., Seoul, Republic of Korea (tony.superb@lunit.io); (3) Ambika Seth, Lunit Inc., Seoul, Republic of Korea (ambika.seth@lunit.io); (4) Hesham Dar [0009 −0003 −6458 −2097], Lunit Inc., Seoul, Republic of Korea (heshamdar@lunit.io); (5) Thijs Kooi [0009 −0003 −6458 −2097], Kooi, Lunit Inc., Seoul, Republic of Korea (tkooi@lunit.io). This paper is available on arxiv under CC BY-NC-SA 4.0 license. Table of Links Abstract and Introduction Related work Methods 3.1 Positive-sum fairness 3.2 Application Experiments 4.1 Initial results 4.2 Positive-sum fairness Conclusion and References Abstract and Introduction Abstract and Introduction Abstract and Introduction Related work Related work Related work Methods 3.1 Positive-sum fairness 3.2 Application Methods 3.1 Positive-sum fairness 3.1 Positive-sum fairness 3.2 Application 3.2 Application Experiments 4.1 Initial results 4.2 Positive-sum fairness Experiments Experiments 4.1 Initial results 4.1 Initial results 4.2 Positive-sum fairness 4.2 Positive-sum fairness Conclusion and References Conclusion and References Conclusion and References 5 Conclusion In this paper, we presented the notion of positive-sum fairness and argued that larger disparities are not necessarily harmful, as long as it does not come at the expense of a specific subgroup performance. The general performance, standard fairness and positive-sum fairness of four models was analyzed, each leveraging sensitive attributes in a different way. Our study highlights the need for a nuanced understanding of fairness metrics and their implications in real-world applications. Good incorporation of medical knowledge is crucial when utilizing sensitive information and evaluating fairness accurately, particularly in cases where models may show a large performance disparity. When traditional methods often aim for equality, positive-sum fairness focuses on equity, pushing for each group to achieve its highest possible performance level. This can lead to better overall outcomes, as it encourages to address the specific needs and challenges of each group without diminishing the quality of care for others. But, being defined as an optimization problem, it could also have unintended side effects as it may inadvertently prioritize larger or more well-represented groups by focusing the efforts on the groups with the highest impact on the overall performance rather than those with the most critical needs. Therefore, it is to be noted that meeting the positivesum fairness criterion alone does not ensure a model to be fair from an egalitarian perspective, and the use of this notion in conjunction with other metrics can give a more holistic understanding of a model’s fairness. As positive-sum fairness is a relative measure, it requires a baseline to be used. Further work in this area would include developing a more robust baseline or adapting the approach to remove the need for a baseline. It would also be worth it to compare out-of-domain tested models, include other sensitive attributes such as sex and age and take into account confounding factors. Disclosure of Interests. The authors declare that there are no conflicts of interest regarding the publication of this paper. Disclosure of Interests. References Baumann, J., Hertweck, C., Loi, M., Heitz, C.: Distributive justice as the foundational premise of fair ml: Unification, extension, and interpretation of group fairness metrics (2023), https://arxiv.org/abs/2206.02897 Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: The state of the art (2017) Brown, A., Tomasev, N., Freyberg, J., Liu, Y., Karthikesalingam, A., Schrouff, J.: Detecting shortcut learning for fair medical AI using shortcut testing Burton, D.C., Flannery, B., Bennett, N.M., Farley, M.M., Gershman, K., Harrison, L.H., Lynfield, R., Petit, S., Reingold, A.L., Schaffner, W., Thomas, A., Plikaytis, B.D., Rose, Jr, C.E., Whitney, C.G., Schuchat, A., for the Active Bacterial Core Surveillance/Emerging Infections Program Network: Socioeconomic and racial/ethnic disparities in the incidence of bacteremic pneumonia among US adults. Am. J. Public Health 100(10), 1904–1911 (Oct 2010) Diana, E., Gill, W., Kearns, M., Kenthapadi, K., Roth, A.: Minimax group fairness: Algorithms and experiments (2021) EAM, S., M, W., P, M., ND., F.: Fairness-related performance and explainability effects in deep learning models for brain image analysis. J Med Imaging (Bellingham). 2022 Nov;9(6):061102. doi: 10. (2022) Efron, B.: Better bootstrap confidence intervals. Journal of the American statistical Association 82(397), 171–185 (1987) Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact (2015) Gichoya, J.W., Banerjee, I., Bhimireddy, A.R., Burns, J.L., Celi, L.A., Chen, L.C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S.C., Kuo, P.C., Lungren, M.P., Palmer, L.J., Price, B.J., Purkayastha, S., Pyrros, A.T., Oakden-Rayner, L., Okechukwu, C., Seyyed-Kalantari, L., Trivedi, H., Wang, R., Zaiman, Z., Zhang, H.: AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4(6), e406–e414 (Jun 2022) Glocker, B., Jones, C., Bernhardt, M., Winzeck, S.: Algorithmic encoding of protected characteristics in chest x-ray disease detection models. EBioMedicine 89(104467), 104467 (Mar 2023) Haeri, M.A., Zweig, K.A.: The crucial role of sensitive attributes in fair classification. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). pp. 2993–3002 (2020). https://doi.org/10.1109/SSCI47803.2020.9308585 Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning (2016) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2018) Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., Mark, R.: MIMIC-IV (2023) Johnson, A.E.W., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., Pollard, T.J., Hao, S., Moody, B., Gow, B., Lehman, L.W.H., Celi, L.A., Mark, R.G.: MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10(1), 1 (Jan 2023) Johnson, A.E.W., Pollard, T.J., Greenbaum, N.R., Lungren, M.P., ying Deng, C., Peng, Y., Lu, Z., Mark, R.G., Berkowitz, S.J., Horng, S.: Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs (2019) Joseph, N.P., Reid, N.J., Som, A., Li, M.D., Hyle, E.P., Dugdale, C.M., Lang, M., Betancourt, J.R., Deng, F., Mendoza, D.P., Little, B.P., Narayan, A.K., Flores, E.J.: Racial and ethnic disparities in disease severity on admission chest radiographs among patients admitted with confirmed coronavirus disease 2019: A retrospective cohort study. Radiology 297(3), E303– E312 (Dec 2020) Kleinberg, J., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores (2016) Lara, R., A., M., Echeveste, R., Ferrante, E.: Addressing fairness in artificial intelligence for medical imaging. Nat Commun 13 4581 (2022) Lee, J., Brooks, C., Yu, R., Kizilcec, R.: Fairness hub technical briefs: Auc gap (2023) Lee, J.K., Bu, Y., Rajan, D., Sattigeri, P., Panda, R., Das, S., Wornell, G.W.: Fair selective classification via sufficiency. In: International Conference on Machine Learning (2021), https://api.semanticscholar.org/CorpusID:235826429 Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts (2017) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019) Mittelstadt, B., Wachter, S., Russell, C.: The unfairness of fair machine learning: Levelling down and strict egalitarianism by default (2023), https://arxiv.org/abs/2302.02404 Mukherjee, D., Yurochkin, M., Banerjee, M., Sun, Y.: Two simple ways to learn individual fairness metrics from data (2020) Petersen, E., Ferrante, E., Ganz, M., Feragen, A.: Are demographically invariant models and representations in medical imaging fair? (2024), https://arxiv.org/abs/2305.01397 Petersen, E., Holm, S., Ganz, M., Feragen, A.: The path toward equal performance in medical machine learning. Patterns 4(7), 100790 (Jul 2023). https://doi.org/10.1016/j. patter.2023.100790, http://dx.doi.org/10.1016/j.patter.2023.100790 Raff, E., Sylvester, J.: Gradient reversal against discrimination (2018) Rajeev, C., Natarajan, K.: Data Augmentation in Classifying Chest Radiograph Images (CXR) Using DCGAN-CNN, pp. 91–110 (11 2023). https://doi.org/10.1007/ 978-3-031-43205-7_6 Rubinstein, W.S.: Hereditary breast cancer in jews. Fam. Cancer 3(3-4), 249–257 (2004) Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge (2015) Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I., Ghassemi, M.: Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in underserved patient populations. Nature Medicine 27 (12 2021). https://doi.org/10.1038/ s41591-021-01595-0 Shi, H., Seegobin, K., Heng, F., Zhou, K., Chen, R., Qin, H., Manochakian, R., Zhao, Y., Lou, Y.: Genomic landscape of lung adenocarcinomas in different races. Front. Oncol. 12 (Sep 2022) Ustun, B., Liu, Y., Parkes, D.: Fairness without harm: Decoupled classifiers with preference guarantees. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6373–6382. PMLR (09–15 Jun 2019), https://proceedings.mlr.press/ v97/ustun19a.html Varkey, B.: Principles of clinical ethics and their application to practice. Med. Princ. Pract. 30(1), 17–28 (2021) Verma, S., Rubin, J.S.: Fairness definitions explained. 2018 IEEE/ACM International Workshop on Software Fairness (FairWare) pp. 1–7 (2018), https://api.semanticscholar. org/CorpusID:49561627 Warner, E., Foulkes, W., Goodwin, P., Meschino, W., Blondal, J., Paterson, C., Ozcelik, H., Goss, P., Allingham-Hawkins, D., Hamel, N., Di Prospero, L., Contiga, V., Serruya, C., Klein, M., Moslehi, R., Honeyford, J., Liede, A., Glendon, G., Brunet, J.S., Narod, S.: Prevalence and penetrance of BRCA1 and BRCA2 gene mutations in unselected ashkenazi jewish women with breast cancer. J. Natl. Cancer Inst. 91(14), 1241–1247 (Jul 1999) Xu, Z., Li, J., Yao, Q., Li, H., Zhou, S.K.: Fairness in medical image analysis and healthcare: A literature survey (2023) Yang, Y., Zhang, H., Gichoya, J.W., Katabi, D., Ghassemi, M.: The limits of fair medical imaging ai in the wild (2023) Zong, Y., Yang, Y., Hospedales, T.: Medfair: Benchmarking fairness for medical imaging (2023) Žliobaite, I., Custers, B.: Using sensitive personal data may be necessary for avoiding dis- ˙ crimination in data-driven decision models (2016) Baumann, J., Hertweck, C., Loi, M., Heitz, C.: Distributive justice as the foundational premise of fair ml: Unification, extension, and interpretation of group fairness metrics (2023), https://arxiv.org/abs/2206.02897 Baumann, J., Hertweck, C., Loi, M., Heitz, C.: Distributive justice as the foundational premise of fair ml: Unification, extension, and interpretation of group fairness metrics (2023), https://arxiv.org/abs/2206.02897 https://arxiv.org/abs/2206.02897 Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: The state of the art (2017) Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: The state of the art (2017) Brown, A., Tomasev, N., Freyberg, J., Liu, Y., Karthikesalingam, A., Schrouff, J.: Detecting shortcut learning for fair medical AI using shortcut testing Brown, A., Tomasev, N., Freyberg, J., Liu, Y., Karthikesalingam, A., Schrouff, J.: Detecting shortcut learning for fair medical AI using shortcut testing Burton, D.C., Flannery, B., Bennett, N.M., Farley, M.M., Gershman, K., Harrison, L.H., Lynfield, R., Petit, S., Reingold, A.L., Schaffner, W., Thomas, A., Plikaytis, B.D., Rose, Jr, C.E., Whitney, C.G., Schuchat, A., for the Active Bacterial Core Surveillance/Emerging Infections Program Network: Socioeconomic and racial/ethnic disparities in the incidence of bacteremic pneumonia among US adults. Am. J. Public Health 100(10), 1904–1911 (Oct 2010) Burton, D.C., Flannery, B., Bennett, N.M., Farley, M.M., Gershman, K., Harrison, L.H., Lynfield, R., Petit, S., Reingold, A.L., Schaffner, W., Thomas, A., Plikaytis, B.D., Rose, Jr, C.E., Whitney, C.G., Schuchat, A., for the Active Bacterial Core Surveillance/Emerging Infections Program Network: Socioeconomic and racial/ethnic disparities in the incidence of bacteremic pneumonia among US adults. Am. J. Public Health 100(10), 1904–1911 (Oct 2010) Diana, E., Gill, W., Kearns, M., Kenthapadi, K., Roth, A.: Minimax group fairness: Algorithms and experiments (2021) Diana, E., Gill, W., Kearns, M., Kenthapadi, K., Roth, A.: Minimax group fairness: Algorithms and experiments (2021) EAM, S., M, W., P, M., ND., F.: Fairness-related performance and explainability effects in deep learning models for brain image analysis. J Med Imaging (Bellingham). 2022 Nov;9(6):061102. doi: 10. (2022) EAM, S., M, W., P, M., ND., F.: Fairness-related performance and explainability effects in deep learning models for brain image analysis. J Med Imaging (Bellingham). 2022 Nov;9(6):061102. doi: 10. (2022) Efron, B.: Better bootstrap confidence intervals. Journal of the American statistical Association 82(397), 171–185 (1987) Efron, B.: Better bootstrap confidence intervals. Journal of the American statistical Association 82(397), 171–185 (1987) Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact (2015) Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact (2015) Gichoya, J.W., Banerjee, I., Bhimireddy, A.R., Burns, J.L., Celi, L.A., Chen, L.C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S.C., Kuo, P.C., Lungren, M.P., Palmer, L.J., Price, B.J., Purkayastha, S., Pyrros, A.T., Oakden-Rayner, L., Okechukwu, C., Seyyed-Kalantari, L., Trivedi, H., Wang, R., Zaiman, Z., Zhang, H.: AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4(6), e406–e414 (Jun 2022) Gichoya, J.W., Banerjee, I., Bhimireddy, A.R., Burns, J.L., Celi, L.A., Chen, L.C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S.C., Kuo, P.C., Lungren, M.P., Palmer, L.J., Price, B.J., Purkayastha, S., Pyrros, A.T., Oakden-Rayner, L., Okechukwu, C., Seyyed-Kalantari, L., Trivedi, H., Wang, R., Zaiman, Z., Zhang, H.: AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4(6), e406–e414 (Jun 2022) Glocker, B., Jones, C., Bernhardt, M., Winzeck, S.: Algorithmic encoding of protected characteristics in chest x-ray disease detection models. EBioMedicine 89(104467), 104467 (Mar 2023) Glocker, B., Jones, C., Bernhardt, M., Winzeck, S.: Algorithmic encoding of protected characteristics in chest x-ray disease detection models. EBioMedicine 89(104467), 104467 (Mar 2023) Haeri, M.A., Zweig, K.A.: The crucial role of sensitive attributes in fair classification. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). pp. 2993–3002 (2020). https://doi.org/10.1109/SSCI47803.2020.9308585 Haeri, M.A., Zweig, K.A.: The crucial role of sensitive attributes in fair classification. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). pp. 2993–3002 (2020). https://doi.org/10.1109/SSCI47803.2020.9308585 https://doi.org/10.1109/SSCI47803.2020.9308585 Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning (2016) Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning (2016) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2018) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2018) Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., Mark, R.: MIMIC-IV (2023) Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., Mark, R.: MIMIC-IV (2023) Johnson, A.E.W., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., Pollard, T.J., Hao, S., Moody, B., Gow, B., Lehman, L.W.H., Celi, L.A., Mark, R.G.: MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10(1), 1 (Jan 2023) Johnson, A.E.W., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., Pollard, T.J., Hao, S., Moody, B., Gow, B., Lehman, L.W.H., Celi, L.A., Mark, R.G.: MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10(1), 1 (Jan 2023) Johnson, A.E.W., Pollard, T.J., Greenbaum, N.R., Lungren, M.P., ying Deng, C., Peng, Y., Lu, Z., Mark, R.G., Berkowitz, S.J., Horng, S.: Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs (2019) Johnson, A.E.W., Pollard, T.J., Greenbaum, N.R., Lungren, M.P., ying Deng, C., Peng, Y., Lu, Z., Mark, R.G., Berkowitz, S.J., Horng, S.: Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs (2019) Joseph, N.P., Reid, N.J., Som, A., Li, M.D., Hyle, E.P., Dugdale, C.M., Lang, M., Betancourt, J.R., Deng, F., Mendoza, D.P., Little, B.P., Narayan, A.K., Flores, E.J.: Racial and ethnic disparities in disease severity on admission chest radiographs among patients admitted with confirmed coronavirus disease 2019: A retrospective cohort study. Radiology 297(3), E303– E312 (Dec 2020) Joseph, N.P., Reid, N.J., Som, A., Li, M.D., Hyle, E.P., Dugdale, C.M., Lang, M., Betancourt, J.R., Deng, F., Mendoza, D.P., Little, B.P., Narayan, A.K., Flores, E.J.: Racial and ethnic disparities in disease severity on admission chest radiographs among patients admitted with confirmed coronavirus disease 2019: A retrospective cohort study. Radiology 297(3), E303– E312 (Dec 2020) Kleinberg, J., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores (2016) Kleinberg, J., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores (2016) Lara, R., A., M., Echeveste, R., Ferrante, E.: Addressing fairness in artificial intelligence for medical imaging. Nat Commun 13 4581 (2022) Lara, R., A., M., Echeveste, R., Ferrante, E.: Addressing fairness in artificial intelligence for medical imaging. Nat Commun 13 4581 (2022) Lee, J., Brooks, C., Yu, R., Kizilcec, R.: Fairness hub technical briefs: Auc gap (2023) Lee, J., Brooks, C., Yu, R., Kizilcec, R.: Fairness hub technical briefs: Auc gap (2023) Lee, J.K., Bu, Y., Rajan, D., Sattigeri, P., Panda, R., Das, S., Wornell, G.W.: Fair selective classification via sufficiency. In: International Conference on Machine Learning (2021), https://api.semanticscholar.org/CorpusID:235826429 Lee, J.K., Bu, Y., Rajan, D., Sattigeri, P., Panda, R., Das, S., Wornell, G.W.: Fair selective classification via sufficiency. In: International Conference on Machine Learning (2021), https://api.semanticscholar.org/CorpusID:235826429 https://api.semanticscholar.org/CorpusID:235826429 Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts (2017) Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts (2017) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019) Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019) Mittelstadt, B., Wachter, S., Russell, C.: The unfairness of fair machine learning: Levelling down and strict egalitarianism by default (2023), https://arxiv.org/abs/2302.02404 Mittelstadt, B., Wachter, S., Russell, C.: The unfairness of fair machine learning: Levelling down and strict egalitarianism by default (2023), https://arxiv.org/abs/2302.02404 https://arxiv.org/abs/2302.02404 Mukherjee, D., Yurochkin, M., Banerjee, M., Sun, Y.: Two simple ways to learn individual fairness metrics from data (2020) Mukherjee, D., Yurochkin, M., Banerjee, M., Sun, Y.: Two simple ways to learn individual fairness metrics from data (2020) Petersen, E., Ferrante, E., Ganz, M., Feragen, A.: Are demographically invariant models and representations in medical imaging fair? (2024), https://arxiv.org/abs/2305.01397 Petersen, E., Ferrante, E., Ganz, M., Feragen, A.: Are demographically invariant models and representations in medical imaging fair? (2024), https://arxiv.org/abs/2305.01397 https://arxiv.org/abs/2305.01397 Petersen, E., Holm, S., Ganz, M., Feragen, A.: The path toward equal performance in medical machine learning. Patterns 4(7), 100790 (Jul 2023). https://doi.org/10.1016/j. patter.2023.100790, http://dx.doi.org/10.1016/j.patter.2023.100790 Petersen, E., Holm, S., Ganz, M., Feragen, A.: The path toward equal performance in medical machine learning. Patterns 4(7), 100790 (Jul 2023). https://doi.org/10.1016/j . patter.2023.100790, http://dx.doi.org/10.1016/j.patter.2023.100790 https://doi.org/10.1016/j http://dx.doi.org/10.1016/j.patter.2023.100790 Raff, E., Sylvester, J.: Gradient reversal against discrimination (2018) Raff, E., Sylvester, J.: Gradient reversal against discrimination (2018) Rajeev, C., Natarajan, K.: Data Augmentation in Classifying Chest Radiograph Images (CXR) Using DCGAN-CNN, pp. 91–110 (11 2023). https://doi.org/10.1007/ 978-3-031-43205-7_6 Rajeev, C., Natarajan, K.: Data Augmentation in Classifying Chest Radiograph Images (CXR) Using DCGAN-CNN, pp. 91–110 (11 2023). https://doi.org/10.1007/ 978-3-031-43205-7_6 https://doi.org/10.1007/ Rubinstein, W.S.: Hereditary breast cancer in jews. Fam. Cancer 3(3-4), 249–257 (2004) Rubinstein, W.S.: Hereditary breast cancer in jews. Fam. Cancer 3(3-4), 249–257 (2004) Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge (2015) Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge (2015) Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I., Ghassemi, M.: Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in underserved patient populations. Nature Medicine 27 (12 2021). https://doi.org/10.1038/ s41591-021-01595-0 Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I., Ghassemi, M.: Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in underserved patient populations. Nature Medicine 27 (12 2021). https://doi.org/10.1038/ s41591-021-01595-0 https://doi.org/10.1038/ Shi, H., Seegobin, K., Heng, F., Zhou, K., Chen, R., Qin, H., Manochakian, R., Zhao, Y., Lou, Y.: Genomic landscape of lung adenocarcinomas in different races. Front. Oncol. 12 (Sep 2022) Shi, H., Seegobin, K., Heng, F., Zhou, K., Chen, R., Qin, H., Manochakian, R., Zhao, Y., Lou, Y.: Genomic landscape of lung adenocarcinomas in different races. Front. Oncol. 12 (Sep 2022) Ustun, B., Liu, Y., Parkes, D.: Fairness without harm: Decoupled classifiers with preference guarantees. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6373–6382. PMLR (09–15 Jun 2019), https://proceedings.mlr.press/ v97/ustun19a.html Ustun, B., Liu, Y., Parkes, D.: Fairness without harm: Decoupled classifiers with preference guarantees. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6373–6382. PMLR (09–15 Jun 2019), https://proceedings.mlr.press/ v97/ustun19a.html https://proceedings.mlr.press/ Varkey, B.: Principles of clinical ethics and their application to practice. Med. Princ. Pract. 30(1), 17–28 (2021) Varkey, B.: Principles of clinical ethics and their application to practice. Med. Princ. Pract. 30(1), 17–28 (2021) Verma, S., Rubin, J.S.: Fairness definitions explained. 2018 IEEE/ACM International Workshop on Software Fairness (FairWare) pp. 1–7 (2018), https://api.semanticscholar. org/CorpusID:49561627 Verma, S., Rubin, J.S.: Fairness definitions explained. 2018 IEEE/ACM International Workshop on Software Fairness (FairWare) pp. 1–7 (2018), https://api.semanticscholar . org/CorpusID:49561627 https://api.semanticscholar Warner, E., Foulkes, W., Goodwin, P., Meschino, W., Blondal, J., Paterson, C., Ozcelik, H., Goss, P., Allingham-Hawkins, D., Hamel, N., Di Prospero, L., Contiga, V., Serruya, C., Klein, M., Moslehi, R., Honeyford, J., Liede, A., Glendon, G., Brunet, J.S., Narod, S.: Prevalence and penetrance of BRCA1 and BRCA2 gene mutations in unselected ashkenazi jewish women with breast cancer. J. Natl. Cancer Inst. 91(14), 1241–1247 (Jul 1999) Warner, E., Foulkes, W., Goodwin, P., Meschino, W., Blondal, J., Paterson, C., Ozcelik, H., Goss, P., Allingham-Hawkins, D., Hamel, N., Di Prospero, L., Contiga, V., Serruya, C., Klein, M., Moslehi, R., Honeyford, J., Liede, A., Glendon, G., Brunet, J.S., Narod, S.: Prevalence and penetrance of BRCA1 and BRCA2 gene mutations in unselected ashkenazi jewish women with breast cancer. J. Natl. Cancer Inst. 91(14), 1241–1247 (Jul 1999) Xu, Z., Li, J., Yao, Q., Li, H., Zhou, S.K.: Fairness in medical image analysis and healthcare: A literature survey (2023) Xu, Z., Li, J., Yao, Q., Li, H., Zhou, S.K.: Fairness in medical image analysis and healthcare: A literature survey (2023) Yang, Y., Zhang, H., Gichoya, J.W., Katabi, D., Ghassemi, M.: The limits of fair medical imaging ai in the wild (2023) Yang, Y., Zhang, H., Gichoya, J.W., Katabi, D., Ghassemi, M.: The limits of fair medical imaging ai in the wild (2023) Zong, Y., Yang, Y., Hospedales, T.: Medfair: Benchmarking fairness for medical imaging (2023) Zong, Y., Yang, Y., Hospedales, T.: Medfair: Benchmarking fairness for medical imaging (2023) Žliobaite, I., Custers, B.: Using sensitive personal data may be necessary for avoiding dis- ˙ crimination in data-driven decision models (2016) Žliobaite, I., Custers, B.: Using sensitive personal data may be necessary for avoiding dis- ˙ crimination in data-driven decision models (2016) Authors: (1) Samia Belhadj∗, Lunit Inc., Seoul, Republic of Korea (samia.belhadj@lunit.io); (2) Sanguk Park [0009 −0005 −0538 −5522]*, Lunit Inc., Seoul, Republic of Korea (tony.superb@lunit.io); (3) Ambika Seth, Lunit Inc., Seoul, Republic of Korea (ambika.seth@lunit.io); (4) Hesham Dar [0009 −0003 −6458 −2097], Lunit Inc., Seoul, Republic of Korea (heshamdar@lunit.io); (5) Thijs Kooi [0009 −0003 −6458 −2097], Kooi, Lunit Inc., Seoul, Republic of Korea (tkooi@lunit.io). Authors: Authors: (1) Samia Belhadj∗, Lunit Inc., Seoul, Republic of Korea (samia.belhadj@lunit.io); (2) Sanguk Park [0009 −0005 −0538 −5522]*, Lunit Inc., Seoul, Republic of Korea (tony.superb@lunit.io); (3) Ambika Seth, Lunit Inc., Seoul, Republic of Korea (ambika.seth@lunit.io); (4) Hesham Dar [0009 −0003 −6458 −2097], Lunit Inc., Seoul, Republic of Korea (heshamdar@lunit.io); (5) Thijs Kooi [0009 −0003 −6458 −2097], Kooi, Lunit Inc., Seoul, Republic of Korea (tkooi@lunit.io). This paper is available on arxiv under CC BY-NC-SA 4.0 license. This paper is available on arxiv under CC BY-NC-SA 4.0 license. available on arxiv