Methods
In this paper, we presented the notion of positive-sum fairness and argued that larger disparities are not necessarily harmful, as long as it does not come at the expense of a specific subgroup performance. The general performance, standard fairness and positive-sum fairness of four models was analyzed, each leveraging sensitive attributes in a different way.
Our study highlights the need for a nuanced understanding of fairness metrics and their implications in real-world applications. Good incorporation of medical knowledge is crucial when utilizing sensitive information and evaluating fairness accurately, particularly in cases where models may show a large performance disparity.
When traditional methods often aim for equality, positive-sum fairness focuses on equity, pushing for each group to achieve its highest possible performance level. This can lead to better overall outcomes, as it encourages to address the specific needs and challenges of each group without diminishing the quality of care for others. But, being defined as an optimization problem, it could also have unintended side effects as it may inadvertently prioritize larger or more well-represented groups by focusing the efforts on the groups with the highest impact on the overall performance rather than those with the most critical needs. Therefore, it is to be noted that meeting the positivesum fairness criterion alone does not ensure a model to be fair from an egalitarian perspective, and the use of this notion in conjunction with other metrics can give a more holistic understanding of a model’s fairness.
As positive-sum fairness is a relative measure, it requires a baseline to be used. Further work in this area would include developing a more robust baseline or adapting the approach to remove the need for a baseline. It would also be worth it to compare out-of-domain tested models, include other sensitive attributes such as sex and age and take into account confounding factors.
Disclosure of Interests. The authors declare that there are no conflicts of interest regarding the publication of this paper.
Baumann, J., Hertweck, C., Loi, M., Heitz, C.: Distributive justice as the foundational premise of fair ml: Unification, extension, and interpretation of group fairness metrics (2023), https://arxiv.org/abs/2206.02897
Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: The state of the art (2017)
Brown, A., Tomasev, N., Freyberg, J., Liu, Y., Karthikesalingam, A., Schrouff, J.: Detecting shortcut learning for fair medical AI using shortcut testing
Burton, D.C., Flannery, B., Bennett, N.M., Farley, M.M., Gershman, K., Harrison, L.H., Lynfield, R., Petit, S., Reingold, A.L., Schaffner, W., Thomas, A., Plikaytis, B.D., Rose, Jr, C.E., Whitney, C.G., Schuchat, A., for the Active Bacterial Core Surveillance/Emerging Infections Program Network: Socioeconomic and racial/ethnic disparities in the incidence of bacteremic pneumonia among US adults. Am. J. Public Health 100(10), 1904–1911 (Oct 2010)
Diana, E., Gill, W., Kearns, M., Kenthapadi, K., Roth, A.: Minimax group fairness: Algorithms and experiments (2021)
EAM, S., M, W., P, M., ND., F.: Fairness-related performance and explainability effects in deep learning models for brain image analysis. J Med Imaging (Bellingham). 2022 Nov;9(6):061102. doi: 10. (2022)
Efron, B.: Better bootstrap confidence intervals. Journal of the American statistical Association 82(397), 171–185 (1987)
Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact (2015)
Gichoya, J.W., Banerjee, I., Bhimireddy, A.R., Burns, J.L., Celi, L.A., Chen, L.C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S.C., Kuo, P.C., Lungren, M.P., Palmer, L.J., Price, B.J., Purkayastha, S., Pyrros, A.T., Oakden-Rayner, L., Okechukwu, C., Seyyed-Kalantari, L., Trivedi, H., Wang, R., Zaiman, Z., Zhang, H.: AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4(6), e406–e414 (Jun 2022)
Glocker, B., Jones, C., Bernhardt, M., Winzeck, S.: Algorithmic encoding of protected characteristics in chest x-ray disease detection models. EBioMedicine 89(104467), 104467 (Mar 2023)
Haeri, M.A., Zweig, K.A.: The crucial role of sensitive attributes in fair classification. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). pp. 2993–3002 (2020). https://doi.org/10.1109/SSCI47803.2020.9308585
Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning (2016)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2018)
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., Mark, R.: MIMIC-IV (2023)
Johnson, A.E.W., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., Pollard, T.J., Hao, S., Moody, B., Gow, B., Lehman, L.W.H., Celi, L.A., Mark, R.G.: MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10(1), 1 (Jan 2023)
Johnson, A.E.W., Pollard, T.J., Greenbaum, N.R., Lungren, M.P., ying Deng, C., Peng, Y., Lu, Z., Mark, R.G., Berkowitz, S.J., Horng, S.: Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs (2019)
Joseph, N.P., Reid, N.J., Som, A., Li, M.D., Hyle, E.P., Dugdale, C.M., Lang, M., Betancourt, J.R., Deng, F., Mendoza, D.P., Little, B.P., Narayan, A.K., Flores, E.J.: Racial and ethnic disparities in disease severity on admission chest radiographs among patients admitted with confirmed coronavirus disease 2019: A retrospective cohort study. Radiology 297(3), E303– E312 (Dec 2020)
Kleinberg, J., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores (2016)
Lara, R., A., M., Echeveste, R., Ferrante, E.: Addressing fairness in artificial intelligence for medical imaging. Nat Commun 13 4581 (2022)
Lee, J., Brooks, C., Yu, R., Kizilcec, R.: Fairness hub technical briefs: Auc gap (2023)
Lee, J.K., Bu, Y., Rajan, D., Sattigeri, P., Panda, R., Das, S., Wornell, G.W.: Fair selective classification via sufficiency. In: International Conference on Machine Learning (2021), https://api.semanticscholar.org/CorpusID:235826429
Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts (2017)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019)
Mittelstadt, B., Wachter, S., Russell, C.: The unfairness of fair machine learning: Levelling down and strict egalitarianism by default (2023), https://arxiv.org/abs/2302.02404
Mukherjee, D., Yurochkin, M., Banerjee, M., Sun, Y.: Two simple ways to learn individual fairness metrics from data (2020)
Petersen, E., Ferrante, E., Ganz, M., Feragen, A.: Are demographically invariant models and representations in medical imaging fair? (2024), https://arxiv.org/abs/2305.01397
Petersen, E., Holm, S., Ganz, M., Feragen, A.: The path toward equal performance in medical machine learning. Patterns 4(7), 100790 (Jul 2023). https://doi.org/10.1016/j. patter.2023.100790, http://dx.doi.org/10.1016/j.patter.2023.100790
Raff, E., Sylvester, J.: Gradient reversal against discrimination (2018)
Rajeev, C., Natarajan, K.: Data Augmentation in Classifying Chest Radiograph Images (CXR) Using DCGAN-CNN, pp. 91–110 (11 2023). https://doi.org/10.1007/ 978-3-031-43205-7_6
Rubinstein, W.S.: Hereditary breast cancer in jews. Fam. Cancer 3(3-4), 249–257 (2004)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge (2015)
Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I., Ghassemi, M.: Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in underserved patient populations. Nature Medicine 27 (12 2021). https://doi.org/10.1038/ s41591-021-01595-0
Shi, H., Seegobin, K., Heng, F., Zhou, K., Chen, R., Qin, H., Manochakian, R., Zhao, Y., Lou, Y.: Genomic landscape of lung adenocarcinomas in different races. Front. Oncol. 12 (Sep 2022)
Ustun, B., Liu, Y., Parkes, D.: Fairness without harm: Decoupled classifiers with preference guarantees. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6373–6382. PMLR (09–15 Jun 2019), https://proceedings.mlr.press/ v97/ustun19a.html
Varkey, B.: Principles of clinical ethics and their application to practice. Med. Princ. Pract. 30(1), 17–28 (2021)
Verma, S., Rubin, J.S.: Fairness definitions explained. 2018 IEEE/ACM International Workshop on Software Fairness (FairWare) pp. 1–7 (2018), https://api.semanticscholar. org/CorpusID:49561627
Warner, E., Foulkes, W., Goodwin, P., Meschino, W., Blondal, J., Paterson, C., Ozcelik, H., Goss, P., Allingham-Hawkins, D., Hamel, N., Di Prospero, L., Contiga, V., Serruya, C., Klein, M., Moslehi, R., Honeyford, J., Liede, A., Glendon, G., Brunet, J.S., Narod, S.: Prevalence and penetrance of BRCA1 and BRCA2 gene mutations in unselected ashkenazi jewish women with breast cancer. J. Natl. Cancer Inst. 91(14), 1241–1247 (Jul 1999)
Xu, Z., Li, J., Yao, Q., Li, H., Zhou, S.K.: Fairness in medical image analysis and healthcare: A literature survey (2023)
Yang, Y., Zhang, H., Gichoya, J.W., Katabi, D., Ghassemi, M.: The limits of fair medical imaging ai in the wild (2023)
Zong, Y., Yang, Y., Hospedales, T.: Medfair: Benchmarking fairness for medical imaging (2023)
Žliobaite, I., Custers, B.: Using sensitive personal data may be necessary for avoiding dis- ˙ crimination in data-driven decision models (2016)
Authors:
(1) Samia Belhadj∗, Lunit Inc., Seoul, Republic of Korea ([email protected]);
(2) Sanguk Park [0009 −0005 −0538 −5522]*, Lunit Inc., Seoul, Republic of Korea ([email protected]);
(3) Ambika Seth, Lunit Inc., Seoul, Republic of Korea ([email protected]);
(4) Hesham Dar [0009 −0003 −6458 −2097], Lunit Inc., Seoul, Republic of Korea ([email protected]);
(5) Thijs Kooi [0009 −0003 −6458 −2097], Kooi, Lunit Inc., Seoul, Republic of Korea ([email protected]).
This paper is available on arxiv under CC BY-NC-SA 4.0 license.