DMMAF-HAR: Dynamic Multi-Modal Adaptive Fusion for Human Activity Recognition in Complex Environments

Human Activity Recognition (HAR) faces significant challenges in dynamic real-world environments. This paper introduces DMMAF-HAR, a novel deep learning framework for robust HAR, integrating dynamic visual analysis, comprehensive modality-specific enhancement, and context-aware adaptive fusion. It incorporates a Dynamic Visual Chronometer Module (DVCM) for video-based dynamics and physical time scales; a Modality-Specific Enhancement and Feature Extractor (MSEFE) for tailored processing of IMU, body conduction, and acoustic data; and a Context-Adaptive Fusion and Classifier (CAFC) for intelligent, context-aware modal fusion. Evaluated on the challenging MobiAct++ dataset, DMMAF-HAR achieves state-of-the-art performance, significantly outperforming various single-modal and multi-modal baselines. Ablation studies confirm each module’s contribution, with analyses highlighting robustness, cross-modality benefits, and computational efficiency. A complementary user study validates its practical utility and perceived reliability. Our contributions include physical time scale integration, comprehensive modality-specific processing, and a novel context-aware adaptive fusion, leading to superior robustness and accuracy for real-world HAR.

Liked Liked