Flash Attention Mechanics: How Tiled Attention Fits in SRAM

Self-attention is the operation that lets every token in a sequence influence every other token. The cost is an N×N matrix of pairwise…

Liked Liked