SGOCR: A Spatially-Grounded OCR-focused Pipeline & V1 Dataset [P]
Hello everyone! I’ve been independently researching & developing small-but-powerful vision-language models (VLMs) and noticed a gap in visual datasets – none were teaching my model to simply ground text in imagery, but trying to get it to reason about the text or about the scene itself. This lead me down a 2 week side-side-project to create SGOCR, an open source dataset pipeline for generating spatially-grounded, OCR-focused VQA tuples with tons of rich metadata to support diverse VLM training […]