Medical AI gets 66% worse when you use automated labels for training, and the benchmark hides it! [R][P]
A recent work on fairness in medical segmentation for breast cancer tumors found that segmentation models work way worse for younger patients.
Common explanation: higher breast density = harder cases. But this is not it. The bias is qualitative — younger patients have tumors that are larger, more variable, and fundamentally harder to learn from, not just more of the same hard cases.
Also, an interesting finding that training for automated labels may amplify bias in your model by 40%. But the benchmark does not show it due to the ‘biased ruler’ effect, in which using biased labels to measure performance may mask true performance. This also highlights the need for ‘clean’ and unbiased labels in medical imaging for evaluation.
Paper – https://arxiv.org/abs/2511.00477 – International Symposium on Biomedical Imaging (ISBI) 2026 (oral)
submitted by /u/ade17_in
[link] [comments]