HoloByte: Continuous Hyperspherical Distillation for Tokenizer-Free Modeling
arXiv:2603.16917v1 Announce Type: new Abstract: Sequence modeling universally relies on discrete subword tokenization to circumvent the $mathcal{O}(N^2)$ computational intractability of native byte-level attention. However, this heuristic quantization imposes artificial morphological boundaries, enforces vocabulary dependence, and fractures the continuity of the optimization landscape. To resolve this dichotomy, we introduce textbf{HoloByte}: a strictly tokenizer-free framework utilizing Continuous Hyperspherical Distillation. HoloByte partitions discrete byte sequences into fixed-capacity chunks and projects them into a continuous, strictly bounded hyperspherical manifold via an invertible, […]