MiniMax Cut Attention Compute by 28x at 1M Tokens

And Open-Sourced the Kernel

Liked Liked