[P] DWARF: O(1) KV cache attention derived from heterodyne receiver physics

DWARF uses a fixed circular buffer (about 1.5GB, always, regardless of context length). The tradeoff is you don’t get full attention over the whole context, but the physics-derived offset set recovers most of what matters.

Core result: a fixed ~1.5GB KV cache at any context length (versus ~52GB for a standard 7B at 100K tokens), achieved by computing attention at 44 physics-derived dyadic offsets rather than all past positions.

Code has been public for two weeks with 500+ clones.

Paper is written and LaTeX-compiled, and available.

GitHub: https://github.com/Lanerra/DWARF

submitted by /u/MariusNocturnum
[link] [comments]

Liked Liked