Multi-Method Explainability for Multi-Sensor Wearable HAR: Counterfactual, Gradient, and Shapley Analyses

Deep neural networks have become the dominant approach to wearable human activity recognition (HAR), yet their opacity creates practical barriers: clinicians cannot verify that predictions align with physiological knowledge, and engineers lack principled guidance on which sensors to retain when cost and power constraints demand simplification. Existing explainable AI (XAI) methods for HAR typically rely on a single attribution technique, risking method-specific biases, and report importance at granularities individual channels or entire body locations that do not map cleanly onto hardware design choices. This paper introduces a multi-method XAI framework that addresses these gaps by systematically combining counterfactual sensor-group ablation, Integrated Gradients (IG), and Shapley Value Sampling around a common deep learning backbone. We evaluate the framework on a Time-Distributed LSTM trained on the MHEALTH dataset, which records twelve activities via eight sensor groups (accelerometers, gyroscopes, magnetometers, and ECG) at the chest, ankle, and wrist; the model achieves 98.2% accuracy and macro-F1 of 0.98. Four coordinated experiments probe model behaviour at global, sensor-group, and per-class levels. Counterfactual ablation reveals that removing the ankle magnetometer collapses accuracy for dynamic locomotion by 47.1 percentage points, while wrist accelerometer removal erodes confidence for static postures by more than 50%. Class-specific IG heatmaps and temporal attribution curves expose activity-dependent sensor signatures consistent with biomechanical expectations: ankle-centric patterns for gait, wrist- and chest-centric patterns for upper-body movements, and diffuse low-energy profiles for resting states. Global IG and Shapley rankings converge, with accelerometers accounting for 89% of total attribution mass. The agreement across causal, gradient-based, and game-theoretic perspectives strengthens confidence that identified importance patterns reflect genuine model behaviour. Together, these sensor-group-level explanations provide actionable guidance for sensor selection, power-aware deployment, and clinically meaningful interpretation without sacrificing recognition performance.

Liked Liked