Online Finetuning Decision Transformers with Pure RL Gradients
arXiv:2601.00167v1 Announce Type: new Abstract: Decision Transformers (DTs) have emerged as a powerful framework for sequential decision making by formulating offline reinforcement learning (RL) as a sequence modeling problem. However, extending DTs to online settings with pure RL gradients remains largely unexplored, as existing approaches continue to rely heavily on supervised sequence-modeling objectives during online finetuning. We identify hindsight return relabeling — a standard component in online DTs — as a critical obstacle to RL-based finetuning: while beneficial […]