Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs
arXiv:2603.00578v1 Announce Type: new
Abstract: Long chain-of-thought~(CoT) has become a dominant paradigm for enhancing the reasoning capability of large reasoning models~(LRMs); however, the performance gains often come with a substantial increase in reasoning budget. Recent studies show that existing CoT paradigms tend to induce systematic overthinking, unnecessarily coupling reasoning capability with reasoning cost. Most prior approaches reduce token usage through post hoc techniques such as token compression, truncation, or length penalties, without explicitly addressing the core mechanisms of reasoning. We propose textbf{Draft-Thinking}, which guides models to first learn a concise textit{draft-style} reasoning structure that retains only the critical reasoning steps. Through a textit{progressive curriculum learning}, the model stably internalizes this efficient reasoning pattern as its capability scales. Moreover, Draft-Thinking introduces adaptive prompting, which elevates reasoning depth to a flexible, model-selectable behavior. Extensive experiments demonstrate that Draft-Thinking substantially reduces reasoning budget while largely preserving reasoning performance; for example, on MATH500, it achieves an 82.6% reduction in reasoning budget at the cost of only a 2.6% performance drop.