Eyes on Many: Evaluating Gaze, Hand, and Voice for Multi-Object Selection in Extended Reality
arXiv:2602.12406v1 Announce Type: new
Abstract: Interacting with multiple objects simultaneously makes us fast. A pre-step to this interaction is to select the objects, i.e., multi-object selection, which is enabled through two steps: (1) toggling multi-selection mode — mode-switching — and then (2) selecting all the intended objects — subselection. In extended reality (XR), each step can be performed with the eyes, hands, and voice. To examine how design choices affect user performance, we evaluated four mode-switching (SemiPinch, FullPinch, DoublePinch, and Voice) and three subselection techniques (Gaze+Dwell, Gaze+Pinch, and Gaze+Voice) in a user study. Results revealed that while DoublePinch paired with Gaze+Pinch yielded the highest overall performance, SemiPinch achieved the lowest performance. Although Voice-based mode-switching showed benefits, Gaze+Voice subselection was less favored, as the required repetitive vocal commands were perceived as tedious. Overall, these findings provide empirical insights and inform design recommendations for multi-selection techniques in XR.