SparseTrack: A Physics-Informed Transformer for Real-Time Human Motion Reconstruction from Sparse IMUs

Wearable inertial measurement units are widely used for human motion analysis because of their portability and independence from external infrastructure. However, many motion capture systems based on IMUs require dense configurations of sensors, which increase system complexity, cost, and require users to correctly place sensors, resulting in many usability issues in long-term, real-world applications. Reducing the number of wearable sensors is therefore essential for improving accessibility, but sparse sensing introduces challenges related to limited observability and biomechanical consistency. This paper introduces a sparse inertial human motion reconstruction framework using a minimal set of wearable sensors, with a focus on real-time operation and biomechanical plausibility. The framework integrates Movella Xsens DOT IMUs with a learning-based inverse kinematics pipeline and a real-time biomechanical digital twin for motion reconstruction and visualization. The evaluation process consists of two separate phases. The first part of the evaluation process requires the establishment of a real-time motion streaming system which will validate sensor alignment and the consistency of coordinate frames and end-to-end system latency. The second part of the evaluation process involves testing the learning-based sparse inference framework with motion data obtained from the Virginia Tech Natural Motion Dataset which tracks proximal joint orientations using distal inertial measurements collected under limited sensing capabilities. The results show that the established system can accurately reproduce human movements by using only five sensors, achieving a local Mean Per-Joint Position Error (MPJPE) of 5.96 cm. The results of the temporal backbone comparative ablation tests show that Transformer-based temporal modelling achieves better geometric accuracy and temporal smoothness than the recurrent and convolutional baseline models during training with extended temporal data. Further ablation studies show that using physics-informed regularization and hard negative mining is essential for maintaining biomechanical accuracy while also reducing high-frequency motion jitter. The real-time experiments demonstrate that the system maintains its operational performance during interactive latency limits which shows that sparse inertial motion capture technology works well for creating digital twin applications in the field of biomechanics.

Liked Liked