MetaThink: Empowering Large Reasoning Models with Adaptive Self-Correction at Inference Time

Large Reasoning Models (LRMs) face a fundamental challenge in balancing efficient “fast thinking” with accurate “slow thinking,” often struggling to adaptively trigger deeper reasoning without incurring significant computational overhead. This paper introduces ( textit{MetaThink (MT)} ), a novel inference-time adaptive refinement framework designed to imbue LRMs with conditional self-correction capabilities, without requiring any additional training. ( textit{MetaThink} ) operates by an initial “fast thinking” phase, followed by a lightweight self-monitoring mechanism that assesses confidence through uncertainty markers. When low confidence or potential errors are detected, a refinement token triggers a targeted “slow thinking” phase, guided by domain-specific prompts. This allows the model to introspectively review and correct its reasoning, culminating in a more accurate final answer. Our comprehensive evaluation across diverse and challenging benchmarks—spanning mathematical reasoning, code generation, and scientific problem-solving tasks—demonstrates that ( textit{MetaThink} ) consistently achieves substantial and robust improvements in Pass@1 accuracy. Crucially, these gains are realized while maintaining competitive or even improved inference efficiency, outperforming existing inference-time baselines. Our findings underscore that ( textit{MetaThink} ) offers an effective, training-free approach to enhance the reliability and accuracy of LRMs in complex reasoning tasks by striking a superior balance between performance and efficiency.

Liked Liked