Joint Trajectory, RIS, and Computation Offloading Optimization via Decentralized Model-Based PPO in Urban Multi-UAV Mobile Edge Computing
arXiv:2603.20238v1 Announce Type: new
Abstract: Efficient computation offloading in multi-UAV edge networks becomes particularly challenging in dense urban areas, where line-of-sight (LoS) links are frequently blocked and user demand varies rapidly. Reconfigurable intelligent surfaces (RISs) can mitigate blockage by creating controllable reflected links, but realizing their potential requires tightly coupled decisions on UAV trajectories, offloading schedules, and RIS phase configurations. This joint optimization is hard to solve in practice because multiple UAVs must coordinate under limited information exchange, and purely model-free multi-agent reinforcement learning (MARL) often learns too slowly in highly dynamic environments. To address these challenges, we propose a decentralized model-based MARL framework. Each UAV optimizes mobility and offloading using observations from several hop neighbors, and submits an RIS phase proposal that is aggregated by a lightweight RIS controller. To boost sample efficiency and stability, agents learn local dynamics models and perform short horizon branched rollouts for proximal policy optimization (PPO) updates. Simulations show near centralized performance with improved throughput and energy efficiency at scale.