FlashSketch: Sketch-Kernel Co-Design for Fast Sparse Sketching on GPUs
arXiv:2602.06071v1 Announce Type: new Abstract: Sparse sketches such as the sparse Johnson-Lindenstrauss transform are a core primitive in randomized numerical linear algebra because they leverage random sparsity to reduce the arithmetic cost of sketching, while still offering strong approximation guarantees. Their random sparsity, however, is at odds with efficient implementations on modern GPUs, since it leads to irregular memory access patterns that degrade memory bandwidth utilization. Motivated by this tension, we pursue a sketch-kernel co-design approach: we design […]