[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on Blackwell
Google DeepMind dropped Gemma 4 today:
Gemma 4 31B: dense, 256K context, redesigned architecture targeting efficiency and long-context quality
Gemma 4 26B A4B: MoE, 26B total / 4B active per forward pass, 256K context
Both are natively multimodal (text, image, video, dynamic resolution).
We got both running on MAX on launch day across NVIDIA B200 and AMD MI355X from the same stack. On B200 we’re seeing 15% higher output throughput vs. vLLM (happy to share more on methodology if useful).
Free playground if you want to test without spinning anything up: https://www.modular.com/#playground
submitted by /u/carolinedfrasca
[link] [comments]
Like
0
Liked
Liked