Edge-Deployed Context-Aware LLM Framework for Low-Latency Bi-Directional Gaze-Speech HRI

The integration of Large Language Models (LLMs) has advanced Human-Robot Interaction (HRI), enabling more natural and context-aware multimodal exchanges via gaze and speech. However, current cloud-reliant LLM-HRI frameworks introduce critical edge challenges: latency, network dependency, privacy risks, and high operational costs. To address these, we propose the Efficient Edge-Deployed Multi-modal Human-Robot Interaction (EM-HRI) framework. EM-HRI features a novel hybrid LLM architecture, combining lightweight local models for rapid intent recognition with a compact, edge-deployable LLM for complex contextual reasoning. Optimized multimodal perception and a dynamic context graph further ensure real-time efficiency, robust situational awareness, and interaction quality. Experimental results on an ambiguous robot manipulation task demonstrate EM-HRI substantially reduces response time and energy consumption, while maintaining or improving task completion, user confidence, and ambiguity resolution over cloud LLM solutions. These findings validate EM-HRI’s potential for real-time, efficient, and intelligent HRI on edge devices, fostering more autonomous and reliable robotic applications.

Liked Liked