Unbalanced Data Mining Algorithms from IoT Sensors for Early Cockroach Infestation Prediction in Sewer Systems
Predictive pest management in urban sewer networks represents a sustainable alternative to reactive, biocide‑based methods. Using data collected through an IoT architecture and validated with manual inspections across eight manholes over 113 days, we implemented a rigorous comparative framework evaluating ten data mining algorithms, including classical methods (KNN, SVM, decision trees) and advanced ensemble techniques (XGBoost, LightGBM, CatBoost) optimized for unbalanced datasets. Gradient boosting models with explicit handling of class imbalance—where the absence of pests exceeds 77% of observations—showed exceptional performance, achieving a Macro‑F1 score above 0.92 and high precision in identifying the minority high‑risk class. Explainability analysis using SHAP consistently revealed that elevated CO₂ concentrations are the primary predictor of infestation, enabling early identification of critical zones. This study demonstrates that carbon dioxide (CO₂) acts as the most robust bioindicator for predicting severe infestations of Periplaneta americana, significantly outperforming conventional environmental variables such as temperature and humidity. The implementation of the model in a real‑time monitoring platform generates interpretable heat maps that support proactive and localized interventions, optimizing resource use and reducing dependence on biocides. This study presents a scalable, operationally viable predictive system designed for direct integration into municipal asset management workflows, offering a concrete, industry-ready solution to transform pest control from a reactive, labor-intensive process into a data-driven, proactive operational paradigm. This approach not only transforms pest management from reactive to predictive but also aligns with the Sustainable Development Goals, offering a scalable, interpretable, and operationally viable system for smart cities.