Open-source single-GPU reproductions of Cartridges and STILL for neural KV-cache compaction [P]
I implemented two recent ideas for long-context inference / KV-cache compaction and open-sourced both reproductions: Cartridges: https://github.com/shreyansh26/cartridges STILL: https://github.com/shreyansh26/STILL-Towards-Infinite-Context-Windows The goal was to make the ideas easy to inspect and run, with benchmark code and readable implementations instead of just paper/blog summaries. Broadly: cartridges reproduces corpus-specific compressed KV caches STILL reproduces reusable neural KV-cache compaction the STILL repo also compares against full-context inference, truncation, and cartridges Here are the original papers / blogs – cartridges – https://arxiv.org/abs/2506.06266 STILL […]