From Motion Artifacts to Clinical Insight: Multi-Modal Deep Learning for Robust Arrhythmia Screening in Ambulatory ECG Monitoring
Motion artifacts corruptwearable ECG signals and generate false alarms of arrhythmias, limiting the clinical adoption of continuous cardiacmonitoring. We present a dual-streamdeep learning framework formotionrobust binary arrhythmia classification throughmulti-modal sensor fusion andmulti-SNR training. ResNet-18 processes ECG spectrograms,while CNN-BiLSTMencodes accelerometermotion patterns; attention-gated fusion with gate diversity regularization adaptively weightsmodalities based on signal reliability. Training in MIT-BIHdata augmented at three noise levels (24, 12, 6 dB) enables noise-invariant learningwith successful generalization to unseen conditions. The framework achieves 99.5%accuracy under clean signals, gracefully degrading to 88.2%at extreme noise (-6 dB SNR)—a 46%improvement over trainingwith single-SNR. The high gate diversity (σ > 0.37) confirms adaptive context-dependent fusion. With 0.09% false positive rate and real-time processing (238 beats/second), the systemprovides practical continuous arrhythmia screening, establishing the foundation for hierarchical monitoring systems where binary screening activates detailed multi-class diagnosis.