Reading or Reasoning? Format Decoupled Reinforcement Learning for Document OCR
arXiv:2601.08834v1 Announce Type: new Abstract: Reading text from images or scanned documents via OCR models has been a longstanding focus of researchers. Intuitively, text reading is perceived as a straightforward perceptual task, and existing work primarily focuses on constructing enriched data engineering to enhance SFT capabilities. In this work, we observe that even advanced OCR models exhibit significantly higher entropy in formatted text (emph{e.g.}, formula, table, etc.) compared to plain text, often by an order of magnitude. These […]