[R] Qwen3.5’s MoE architecture: A breakthrough or just incremental?
Reading through the release notes for the 397B-A17B model. The active parameter count is incredibly low for its overall size. Do you guys think this specific MoE routing is a major breakthrough for open source, or is it just a natural, incremental step up from what we already had?
submitted by /u/astrophile_ashish
[link] [comments]
Like
0
Liked
Liked