Class-Adaptive Ensemble-Vote Consistency for Semi-Supervised Text Classification with Imbalanced Data
Semi-supervised text classification (SSL-TC) faces significant hurdles in real-world applications due to the scarcity of labeled data and, more critically, the prevalent issue of highly imbalanced class distributions. Existing SSL methods often struggle to effectively recognize minority classes, leading to suboptimal overall performance. To address these limitations, we propose Class-Adaptive Ensemble-Vote Consistency (AEVC), a novel semi-supervised learning framework built upon a pre-trained language model backbone. AEVC introduces two key innovations: a Dynamically Weighted Ensemble Prediction (DWEP) module, which generates robust pseudo-labels by adaptively weighting multiple classification heads based on their class-specific confidence and consistency, and a Class-Aware Pseudo-Label Adjustment (CAPLA) mechanism, designed to mitigate class imbalance by implementing category-specific pseudo-label filtering (with relaxed thresholds for minority classes) and dynamic weighting in the unsupervised loss. Our extensive experiments on the USB benchmark, including constructed long-tail imbalanced datasets, demonstrate AEVC’s superior performance. In balanced settings, AEVC consistently outperforms state-of-the-art baselines, achieving a notable error rate reduction compared to MultiMatch. More significantly, in highly imbalanced conditions, AEVC yields a substantial error rate reduction over MultiMatch. Ablation studies confirm the indispensable contributions of both DWEP and CAPLA, while human evaluation further validates AEVC’s enhanced accuracy and reliability for minority class predictions. AEVC thus offers a robust and effective solution for semi-supervised text classification, particularly in challenging environments characterized by severe class imbalance.