Architecture-Agnostic Curriculum Learning for Document Understanding: Empirical Evidence from Text-Only and Multimodal
arXiv:2602.21225v1 Announce Type: new Abstract: We investigate whether progressive data scheduling — a curriculum learning strategy that incrementally increases training data exposure (33%$rightarrow$67%$rightarrow$100%) — yields consistent efficiency gains across architecturally distinct document understanding models. By evaluating BERT (text-only, 110M parameters) and LayoutLMv3 (multimodal, 126M parameters) on the FUNSD and CORD benchmarks, we establish that this schedule reduces wall-clock training time by approximately 33%, commensurate with the reduction from 6.67 to 10.0 effective epoch-equivalents of data. To isolate curriculum […]