Find the best open-source OCR models in one place at Papers with Code [P]

Hi, I’ve created an overview of the most important OCR benchmarks, along with the top open models, and links to their paper and code: https://paperswithcode.co/tasks/ocr.

This week, new OCR models were released by Baidu and Mistral.

Baidu released Unlimited OCR, a 3B-parameter model that introduces a key innovation called Reference Sliding Window Attention (R-SWA) and builds on top of DeepSeek OCR. Mistral released OCR 4, which is available via an API.

OCR, or Optical-Character Recognition, is the task of digitizing PDFs or scanned documents. There’s, of course, a huge interest in this task, as it enables ingestion of all company data for agentic use cases. AI agents love Markdown; it can be valuable to turn all those messy PDF documents into a standardized, machine-readable format. This enables use cases like agentic RAG (retrieval-augmented generation), which powers chatbots, both internally and for external customer support.

With a large number of OCR releases on Hugging Face over the last few months, it may be hard to know which one to use.

Hence, I’ve built this page, which lists the major OCR benchmarks, along with the top-performing models and links to their code. This is obviously made available on Papers with Code, the website I’m maintaining (it’s a revival of the old website, which was taken down).

The top recommended benchmarks are OlmOCRBench, created by Ai2, and OmniDocBench, created by Shanghai AI Laboratory.

Current top recommendations are Chandra OCR 2 by Datalab and Mistral OCR v4. The former is openly available, hence you can either self-host it or use their serverless API.

Let me know which other tasks you want to see major benchmarks for now!

Cheers,

Niels

open-source @ HF

submitted by /u/NielsRogge
[link] [comments]

Liked Liked