Software Unfairness Detection in Machine Learning-based Systems: A Systematic Mapping Study
Machine learning-based systems are increasingly deployed in high-stakes domains such as healthcare, finance, law, and e-commerce, where their predictions directly influence critical decisions. Although these systems offer powerful data-driven support, they also introduce serious concerns related to fairness, bias, and discrimination. As a result, detecting and addressing unfairness in machine learning software has become a central research challenge. This study presents a systematic mapping of research on software unfairness detection in machine learning systems, with the aim of consolidating existing fairness definitions, identifying major problem types, examining testing approaches, reviewing commonly used datasets, and highlighting open research gaps. A structured search was conducted across five major digital libraries and additional sources, covering publications from 2010 to 2025. From 1,805 initially identified records, 67 primary studies met the inclusion and quality assessment criteria. The findings show that research activity has grown significantly since 2019, reaching a peak in 2022. Most studies were published at conferences, followed by journals and workshops. The literature addresses various themes, including analysis of existing fairness methods, bias mitigation strategies, testing techniques, and evaluation frameworks. Fairness testing was performed at unit, integration, and system levels, with integration testing being the most common. Frequently used datasets include COMPAS, Adult Census Income, and German Credit. Widely adopted tools such as IBM AI Fairness 360, Themis, and Aequitas were also identified. Overall, the mapping highlights progress made in fairness research while emphasizing the need for stronger integration of fairness into practical machine learning development.