paint-brush
Research Suggests AI Models Can Deliver More Accurate Diagnoses Without Discriminationby@demographic
125 reads

Research Suggests AI Models Can Deliver More Accurate Diagnoses Without Discrimination

by DemographicDecember 31st, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Researchers propose positive-sum fairness as a way to improve AI in healthcare, showing that larger disparities in performance aren't harmful as long as they benefit all groups. This approach emphasizes equity and better outcomes for each group, ensuring no subgroup is disadvantaged.
featured image - Research Suggests AI Models Can Deliver More Accurate Diagnoses Without Discrimination
Demographic HackerNoon profile picture
  1. Abstract and Introduction

  2. Related work

  3. Methods

    3.1 Positive-sum fairness

    3.2 Application

  4. Experiments

    4.1 Initial results

    4.2 Positive-sum fairness

  5. Conclusion and References

5 Conclusion

In this paper, we presented the notion of positive-sum fairness and argued that larger disparities are not necessarily harmful, as long as it does not come at the expense of a specific subgroup performance. The general performance, standard fairness and positive-sum fairness of four models was analyzed, each leveraging sensitive attributes in a different way.


Our study highlights the need for a nuanced understanding of fairness metrics and their implications in real-world applications. Good incorporation of medical knowledge is crucial when utilizing sensitive information and evaluating fairness accurately, particularly in cases where models may show a large performance disparity.


When traditional methods often aim for equality, positive-sum fairness focuses on equity, pushing for each group to achieve its highest possible performance level. This can lead to better overall outcomes, as it encourages to address the specific needs and challenges of each group without diminishing the quality of care for others. But, being defined as an optimization problem, it could also have unintended side effects as it may inadvertently prioritize larger or more well-represented groups by focusing the efforts on the groups with the highest impact on the overall performance rather than those with the most critical needs. Therefore, it is to be noted that meeting the positivesum fairness criterion alone does not ensure a model to be fair from an egalitarian perspective, and the use of this notion in conjunction with other metrics can give a more holistic understanding of a model’s fairness.


As positive-sum fairness is a relative measure, it requires a baseline to be used. Further work in this area would include developing a more robust baseline or adapting the approach to remove the need for a baseline. It would also be worth it to compare out-of-domain tested models, include other sensitive attributes such as sex and age and take into account confounding factors.


Disclosure of Interests. The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

  1. Baumann, J., Hertweck, C., Loi, M., Heitz, C.: Distributive justice as the foundational premise of fair ml: Unification, extension, and interpretation of group fairness metrics (2023), https://arxiv.org/abs/2206.02897


  2. Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: The state of the art (2017)


  3. Brown, A., Tomasev, N., Freyberg, J., Liu, Y., Karthikesalingam, A., Schrouff, J.: Detecting shortcut learning for fair medical AI using shortcut testing


  4. Burton, D.C., Flannery, B., Bennett, N.M., Farley, M.M., Gershman, K., Harrison, L.H., Lynfield, R., Petit, S., Reingold, A.L., Schaffner, W., Thomas, A., Plikaytis, B.D., Rose, Jr, C.E., Whitney, C.G., Schuchat, A., for the Active Bacterial Core Surveillance/Emerging Infections Program Network: Socioeconomic and racial/ethnic disparities in the incidence of bacteremic pneumonia among US adults. Am. J. Public Health 100(10), 1904–1911 (Oct 2010)


  5. Diana, E., Gill, W., Kearns, M., Kenthapadi, K., Roth, A.: Minimax group fairness: Algorithms and experiments (2021)


  6. EAM, S., M, W., P, M., ND., F.: Fairness-related performance and explainability effects in deep learning models for brain image analysis. J Med Imaging (Bellingham). 2022 Nov;9(6):061102. doi: 10. (2022)


  7. Efron, B.: Better bootstrap confidence intervals. Journal of the American statistical Association 82(397), 171–185 (1987)


  8. Feldman, M., Friedler, S., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact (2015)


  9. Gichoya, J.W., Banerjee, I., Bhimireddy, A.R., Burns, J.L., Celi, L.A., Chen, L.C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S.C., Kuo, P.C., Lungren, M.P., Palmer, L.J., Price, B.J., Purkayastha, S., Pyrros, A.T., Oakden-Rayner, L., Okechukwu, C., Seyyed-Kalantari, L., Trivedi, H., Wang, R., Zaiman, Z., Zhang, H.: AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4(6), e406–e414 (Jun 2022)


  10. Glocker, B., Jones, C., Bernhardt, M., Winzeck, S.: Algorithmic encoding of protected characteristics in chest x-ray disease detection models. EBioMedicine 89(104467), 104467 (Mar 2023)


  11. Haeri, M.A., Zweig, K.A.: The crucial role of sensitive attributes in fair classification. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). pp. 2993–3002 (2020). https://doi.org/10.1109/SSCI47803.2020.9308585


  12. Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning (2016)


  13. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2018)


  14. Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., Mark, R.: MIMIC-IV (2023)


  15. Johnson, A.E.W., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., Pollard, T.J., Hao, S., Moody, B., Gow, B., Lehman, L.W.H., Celi, L.A., Mark, R.G.: MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10(1), 1 (Jan 2023)


  16. Johnson, A.E.W., Pollard, T.J., Greenbaum, N.R., Lungren, M.P., ying Deng, C., Peng, Y., Lu, Z., Mark, R.G., Berkowitz, S.J., Horng, S.: Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs (2019)


  17. Joseph, N.P., Reid, N.J., Som, A., Li, M.D., Hyle, E.P., Dugdale, C.M., Lang, M., Betancourt, J.R., Deng, F., Mendoza, D.P., Little, B.P., Narayan, A.K., Flores, E.J.: Racial and ethnic disparities in disease severity on admission chest radiographs among patients admitted with confirmed coronavirus disease 2019: A retrospective cohort study. Radiology 297(3), E303– E312 (Dec 2020)


  18. Kleinberg, J., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores (2016)


  19. Lara, R., A., M., Echeveste, R., Ferrante, E.: Addressing fairness in artificial intelligence for medical imaging. Nat Commun 13 4581 (2022)

  20. Lee, J., Brooks, C., Yu, R., Kizilcec, R.: Fairness hub technical briefs: Auc gap (2023)


  21. Lee, J.K., Bu, Y., Rajan, D., Sattigeri, P., Panda, R., Das, S., Wornell, G.W.: Fair selective classification via sufficiency. In: International Conference on Machine Learning (2021), https://api.semanticscholar.org/CorpusID:235826429


  22. Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts (2017)


  23. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019)


  24. Mittelstadt, B., Wachter, S., Russell, C.: The unfairness of fair machine learning: Levelling down and strict egalitarianism by default (2023), https://arxiv.org/abs/2302.02404


  25. Mukherjee, D., Yurochkin, M., Banerjee, M., Sun, Y.: Two simple ways to learn individual fairness metrics from data (2020)


  26. Petersen, E., Ferrante, E., Ganz, M., Feragen, A.: Are demographically invariant models and representations in medical imaging fair? (2024), https://arxiv.org/abs/2305.01397


  27. Petersen, E., Holm, S., Ganz, M., Feragen, A.: The path toward equal performance in medical machine learning. Patterns 4(7), 100790 (Jul 2023). https://doi.org/10.1016/j. patter.2023.100790, http://dx.doi.org/10.1016/j.patter.2023.100790


  28. Raff, E., Sylvester, J.: Gradient reversal against discrimination (2018)


  29. Rajeev, C., Natarajan, K.: Data Augmentation in Classifying Chest Radiograph Images (CXR) Using DCGAN-CNN, pp. 91–110 (11 2023). https://doi.org/10.1007/ 978-3-031-43205-7_6


  30. Rubinstein, W.S.: Hereditary breast cancer in jews. Fam. Cancer 3(3-4), 249–257 (2004)


  31. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge (2015)


  32. Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen, I., Ghassemi, M.: Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in underserved patient populations. Nature Medicine 27 (12 2021). https://doi.org/10.1038/ s41591-021-01595-0


  33. Shi, H., Seegobin, K., Heng, F., Zhou, K., Chen, R., Qin, H., Manochakian, R., Zhao, Y., Lou, Y.: Genomic landscape of lung adenocarcinomas in different races. Front. Oncol. 12 (Sep 2022)


  34. Ustun, B., Liu, Y., Parkes, D.: Fairness without harm: Decoupled classifiers with preference guarantees. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6373–6382. PMLR (09–15 Jun 2019), https://proceedings.mlr.press/ v97/ustun19a.html


  35. Varkey, B.: Principles of clinical ethics and their application to practice. Med. Princ. Pract. 30(1), 17–28 (2021)


  36. Verma, S., Rubin, J.S.: Fairness definitions explained. 2018 IEEE/ACM International Workshop on Software Fairness (FairWare) pp. 1–7 (2018), https://api.semanticscholar. org/CorpusID:49561627


  37. Warner, E., Foulkes, W., Goodwin, P., Meschino, W., Blondal, J., Paterson, C., Ozcelik, H., Goss, P., Allingham-Hawkins, D., Hamel, N., Di Prospero, L., Contiga, V., Serruya, C., Klein, M., Moslehi, R., Honeyford, J., Liede, A., Glendon, G., Brunet, J.S., Narod, S.: Prevalence and penetrance of BRCA1 and BRCA2 gene mutations in unselected ashkenazi jewish women with breast cancer. J. Natl. Cancer Inst. 91(14), 1241–1247 (Jul 1999)


  38. Xu, Z., Li, J., Yao, Q., Li, H., Zhou, S.K.: Fairness in medical image analysis and healthcare: A literature survey (2023)


  39. Yang, Y., Zhang, H., Gichoya, J.W., Katabi, D., Ghassemi, M.: The limits of fair medical imaging ai in the wild (2023)


  40. Zong, Y., Yang, Y., Hospedales, T.: Medfair: Benchmarking fairness for medical imaging (2023)


  41. Žliobaite, I., Custers, B.: Using sensitive personal data may be necessary for avoiding dis- ˙ crimination in data-driven decision models (2016)


Authors:

(1) Samia Belhadj∗, Lunit Inc., Seoul, Republic of Korea ([email protected]);

(2) Sanguk Park [0009 −0005 −0538 −5522]*, Lunit Inc., Seoul, Republic of Korea ([email protected]);

(3) Ambika Seth, Lunit Inc., Seoul, Republic of Korea ([email protected]);

(4) Hesham Dar [0009 −0003 −6458 −2097], Lunit Inc., Seoul, Republic of Korea ([email protected]);

(5) Thijs Kooi [0009 −0003 −6458 −2097], Kooi, Lunit Inc., Seoul, Republic of Korea ([email protected]).


This paper is available on arxiv under CC BY-NC-SA 4.0 license.