Sequential Cooperative Multi-Agent Online Learning and Adaptive Coordination Control in Dynamic and Uncertain Environments

Dynamic multi-agent systems must coordinate underpartial information, time-varying disturbances, and abrupt non-stationarity while satisfying hard safety constraints. This paper proposesa sequential cooperative multi-agent online learning and adaptivecoordination control framework for ordered missions. A task graphencodes precedence relations and activates stage-specific objectives,linking a global goal to a sequence of subtasks. On this structure, eachagent runs a distributed online actor–critic update using localobservations and event-triggered neighbor messages. The learned nominalinputs are then wrapped by a minimally invasive quadratic-program (QP)safety filter that enforces collision avoidance, formation/trackingconstraints, and input saturation in real time, while an adaptive/robustterm compensates bounded disturbances. Lyapunov-based analysisestablishes uniform ultimate boundedness of the closed-loop signals andconvergence of the online policies to a neighborhood of a cooperativeoptimum under mild conditions. In simulations on multi-robot formationtracking, dynamic target encirclement, and cooperative payloadtransportation (200 runs), the proposed method achieves 94.7% ± 2.6%task success, outperforming centralized MPC/DMPC (88.9% ± 3.7%) andsingle-stage safe MARL (86.3% ± 4.3%). It reduces average convergencetime to 23.4 ± 4.1 s (vs. 28.8 ± 4.9 s for centralized MPC/DMPC) whilemaintaining zero safety violations. Event-triggered communication lowersthe message rate to 3.2 msgs/(agent·s), compared with 10.0 msgs/(agent·s)under periodic-communication baselines, without degrading completion performance.

Liked Liked