Toward Resource-Efficient Collaboration of Large AI Models in Mobile Edge Networks
arXiv:2602.13206v1 Announce Type: new
Abstract: The collaboration of large artificial intelligence (AI) models in mobile edge networks has emerged as a promising paradigm to meet the growing demand for intelligent services at the network edge. By enabling multiple devices to cooperatively execute submodels or subtasks, collaborative AI enhances inference efficiency and service quality with constrained resources. However, deploying large AI models in such environments remains challenging due to the intrinsic mismatch between model complexity and the limited computation, memory, and communication resources in edge networks. This article provides a comprehensive overview of the system architecture for collaborative AI in mobile edge networks, along with representative application scenarios in transportation and healthcare. We further present recent advances in resource-efficient collaboration techniques, categorized into spatial and temporal approaches. The major spatial approaches include federated tuning, mixture of experts, patch-based diffusion, and hierarchical diffusion. Meanwhile, the important temporal approaches encompass split learning, cascading inference, speculative decoding, and routing inference. Building upon these foundations, we propose a multi-stage diffusion framework that enables elastic distribution of large generative models across heterogeneous edge resources. Experimental results demonstrate that our framework achieves performance improvement in both efficiency and adaptability for data generation.