Open-source single-GPU reproductions of Cartridges and STILL for neural KV-cache compaction [P]

I implemented two recent ideas for long-context inference / KV-cache compaction and open-sourced both reproductions:

The goal was to make the ideas easy to inspect and run, with benchmark code and readable implementations instead of just paper/blog summaries.

Broadly:

  • cartridges reproduces corpus-specific compressed KV caches
  • STILL reproduces reusable neural KV-cache compaction
  • the STILL repo also compares against full-context inference, truncation, and cartridges

Here are the original papers / blogs –

Would be useful if you’re interested in long-context inference, memory compression, or practical systems tradeoffs around KV-cache reuse.

submitted by /u/shreyansh26
[link] [comments]

Liked Liked