A Hybrid Fuzzy–Ensemble Machine Learning Framework for Non-Invasive Prediction of HER2 Status in Breast Cancer
HER2 status determination is a crucial task in breast cancer prognosis and treatment,1
yet traditional diagnostic methods such as immunohistochemistry (IHC) and fluorescence in situ2
hybridization (FISH) are invasive, time-consuming, and costly. Motivated by the need for scalable3
and data-driven predictive approaches, we propose a hybrid machine learning framework that4
integrates ensemble learning with fuzzy modeling for HER2 prediction using routinely available5
clinical and immunohistochemical data. A dataset comprising 624 breast cancer patients from6
Mahdieh Clinic (Kermanshah, Iran) was analyzed, with extensive feature engineering, scaling, and7
class balancing applied. We developed an ensemble framework based on tree-based learners (Random8
Forest, XGBoost, and LightGBM), combined through ensemble strategies and enhanced using fuzzy9
feature representations and decision threshold optimization. The proposed hybrid model achieved10
an accuracy of 0.816, an F1-score of 0.814, and an area under the ROC curve (AUC) of 0.862 on11
the held-out test set, demonstrating strong discriminative capability and balanced classification12
performance. This work highlights the potential of hybrid fuzzy–ensemble learning for uncertainty-13
aware predictive analytics in biomedical decision support, aligning with the journal’s focus on14
information processes, intelligent systems, and data mining.