Integrating Attention Attribution and Pretrained Language Models for Transparent Discriminative Learning
This paper addresses the problem of insufficient interpretability in discriminative learning and proposes an interpretable discriminative learning method based on attention attribution. The study builds on the representation power of pretrained language models and introduces a multi-head attention mechanism to capture both global and local semantic dependencies, thereby obtaining richer feature representations. On this basis, attention attribution is used to calculate the importance scores of input features during prediction, and the attribution distribution reveals the core semantic cues relied upon by the model in discrimination, which enhances the transparency of the decision process. The framework consists of input embedding, contextual modeling, attribution feature extraction, and a classification layer, enabling the model to provide clear explanatory paths while maintaining high discriminative accuracy. The method is systematically validated under multiple single-factor sensitivity settings, including the effects of learning rate, sentence order disruption, dropout rate, and class imbalance on model performance and robustness. Experimental results show that the method achieves stability and superiority in metrics such as accuracy, precision, recall, and F1-score, maintaining strong discriminative ability and consistent interpretability under different conditions. Overall, by integrating attention mechanisms with attribution methods, this paper achieves a balance between performance and interpretability and provides an effective solution for discriminative learning in complex contexts.