SWEET-RL: Reinforcement Learning for Multi-Turn Collaborative Reasoning with LLM Agents
This study proposes SWEET-RL, a reinforcement learning framework for training LLM agents in multi-turn collaborative reasoning tasks involving human or agent partners. A step-wise critic is trained using intermediate evaluation signals derived from task progression rather than final answers. The method is evaluated on ColBench, consisting of 3,800 multi-turn collaboration sessions across software development and design tasks. SWEET-RL improves long-horizon task success rates by 24.3% and reduces dialogue-level error accumulation by 35.1%, demonstrating stronger robustness in extended collaborative interactions.
Like
0
Liked
Liked