paint-brush
How AI Models Could Detect Lung Conditions Fairlyby@demographic

How AI Models Could Detect Lung Conditions Fairly

by DemographicDecember 31st, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Researchers find that M2 improves overall performance without harming any subgroup, while M4 enhances fairness at the cost of some subgroup performance.
featured image - How AI Models Could Detect Lung Conditions Fairly
Demographic HackerNoon profile picture
  1. Abstract and Introduction

  2. Related work

  3. Methods

    3.1 Positive-sum fairness

    3.2 Application

  4. Experiments

    4.1 Initial results

    4.2 Positive-sum fairness

  5. Conclusion and References

4.2 Positive-sum fairness


Any classifier which has the exact same overall performance and exact same performance per protected subgroup (race) as the baseline, would be at coordinate (0,0). Any classifier that has a negative x-coordinate, would have a lower general performance than M1 and any classifier that has a negative y-coordinate would have at least one protected subgroup with a lower AUROC than M1 (at least one subgroup negatively impacted by the changes brought to the baseline model).


For lung lesions, figure 3b shows that M2 appears in the positive side of the x and y axes, meaning that the performance was improved without harming any subgroup’s performance. And this even though the figure 3a shows a decrease in fairness (larger disparity between the most advantaged and least advantaged subgroups) for M2 compared with M1. This matches the previous conclusion that the larger performance gap


Fig. 3: We put in parallel 2 different fairness vs performance frameworks: in figure (a), we compute both the performance (AUROC) and fairness (as 1 - the difference in AUROC between the most and least advantaged groups) of the 4 models per lesion. And in figure (b), we show, the difference in overall performance and in performance per protected subgroup between the 3 improved classifiers and the baseline M1. The x axis compares the performance of each improved classifier with the baseline and the y axis shows whether at least one protected subgroup has been harmed by the modifications brought to the baseline classifier.


between protected subgroups for M2 compared with M1 cannot be considered harmful as every protected subgroup’s performance was individually increased.


On the other hand, for lung lesions, model M4 improved fairness (smaller disparity between the most advantaged and least advantaged subgroups) as shown in the figure 3a. However, the figure 3b, shows that M4 has negative y coordinates, meaning that at least one subgroup was harmed while trying to achieve a smaller disparity between protected subgroups.


Authors:

(1) Samia Belhadj∗, Lunit Inc., Seoul, Republic of Korea ([email protected]);

(2) Sanguk Park [0009 −0005 −0538 −5522]*, Lunit Inc., Seoul, Republic of Korea ([email protected]);

(3) Ambika Seth, Lunit Inc., Seoul, Republic of Korea ([email protected]);

(4) Hesham Dar [0009 −0003 −6458 −2097], Lunit Inc., Seoul, Republic of Korea ([email protected]);

(5) Thijs Kooi [0009 −0003 −6458 −2097], Kooi, Lunit Inc., Seoul, Republic of Korea ([email protected]).


This paper is available on arxiv under CC BY-NC-SA 4.0 license.