Symmetry-Aware Structured Representation Learning for Unified Multi-Modal Physiological Modeling in Affective State and Preference Inference

Decoding affective states and personal preferences from physiological responses remains a fundamental challenge in affective computing due to strong heterogeneity across neural, autonomic, and attentional signals, as well as the coupling between transient emotions and long-term preferences. Most existing methods address these factors independently and lack explicit mechanisms to preserve the intrinsic structural regularities and invariances of physiological affective responses, limiting their applicability in real-world scenarios such as music therapy. In this paper, we propose a symmetry-aware and structured multi-modal physiological modeling framework for joint affective state and preference inference. The framework integrates electroencephalography (EEG), peripheral physiological signals (GSR, BVP, EMG, respiration, and temperature), and eye-movement data (EOG) within a unified temporal modeling paradigm. At its core, a Dynamic Token Feature Extractor (DTFE) converts raw physiological time series into compact token representations without handcrafted features, and explicitly decomposes representation learning into cross-series symmetry and intra-series symmetry. These two complementary symmetry dimensions are realized through Cross-Series Intersection (CSI) and Intra-Series Intersection (ISI) mechanisms, enabling structured and interpretable physiological representations. A hierarchical cross-modal fusion strategy further integrates modality-level tokens in a symmetry-consistent manner, capturing dependencies among neural, autonomic, and attentional modalities. Extensive experiments on the DEAP dataset demonstrate consistent improvements over state-of-the-art methods under both single-task and multi-task settings. The proposed model achieves 98.32% and 98.45% accuracy for valence and arousal prediction, respectively, and 97.96% accuracy for quadrant-based emotion classification in single-task evaluation, while attaining 92.8%, 91.8%, and 93.6% accuracy for valence, arousal, and liking prediction in joint multi-task settings. Additional robustness analyses under reduced training data confirm that symmetry-aware structured decomposition improves data efficiency and generalization. Overall, this work establishes a principled symmetry-preserving representation learning framework for robust affective decoding and intelligent, feedback-driven music therapy systems.

Liked Liked