M2 improves overall performance as it outperforms M1 in AUROC, leveraging an additional relevant medical feature. Although it shows more disparity in fairness, this does not equate to harm under positive-sum fairness.
M4 enhances fairness specifically for lung lesions and consolidations by using a gradient reversal layer that eliminates race information, thereby preventing demographic shortcuts from influencing model performance.
M3 displays inconsistent results, often performing below the baseline except for pneumonia. It appears that its strategy of maximizing race prediction may intensify demographic shortcuts previously exploited.
Collection
[
|
...
]