Progressive Multi-Turn Reinforcement Learning for Dynamic User-Interactive Tool Agents
Recent advances in reinforcement learning for large language models have produced powerful agent frameworks that achieve strong performance on multi-turn tool use, interactive search, and complex reasoning. However, existing reinforcement learning frameworks for large language model agents face three critical limitations: difficulty in handling dynamic user interactions owing to reliance on pre-scripted queries, limited scalability across varying interaction horizons with fixed scaling schedules, and substantial reward engineering overhead requiring domain-specific manual tuning. We introduce Progressive Multi-Turn Reinforcement Learning for Dynamic User-Interactive Tool Agents, a novel framework that integrates progressive user-interactive training to overcome sparse reward signals, adaptive horizon management that monitors performance metrics and adjusts training complexity accordingly, and domain-adaptive tool orchestration that learns optimal tool selection patterns across domains. Extensive experiments on WebArena, TAU-Bench, Berkeley Function-Calling Leaderboard Version 3, BabyAI, and SciWorld demonstrate that our method achieves 28.4% success rate on WebArena and 76.3% on TAU-Bench, substantially outperforming the baselines, such as ReAct (16.2%) and MUA-RL (24.6%), while maintaining 94.7% performance on embodied reasoning tasks and 78.9% cross-domain performance retention. Our work establishes a unified framework for realistic user interaction training, performance-adaptive complexity scaling, and domain-flexible tool orchestration.