Mutual Refinement Distillation for Multimodal Emotion Recognition: Interactive Learning and Reverse Curriculum for Complex Sample Classification

With the rapid advancement of speech emotion recognition, the transition from unimodal to multimodal approaches has become inevitable. However, multimodal methods introduce new challenges, particularly classification ambiguity in complex samples when compared to unimodal approaches. To address this, we propose a Mutual Refinement Distillation (MRD) method, which incorporates three key components: (1) Modal Interaction Calibration, enhancing classification accuracy for complex samples; (2) Interactive Learning Constraints, mitigating overfitting; and (3) Reverse Curriculum Learning, further improving model robustness. Experiments with the MELD and IEMOCAP datasets demonstrate that our approach outperforms state-of-the-art methods in emotion recognition, achieving a notable 6.07% improvement over the baseline on IEMOCAP.

Liked Liked