[R] Qwen3.5’s MoE architecture: A breakthrough or just incremental?
Reading through the release notes for the 397B-A17B model. The active parameter count is incredibly low for its overall size. Do you guys think this specific MoE routing is a major breakthrough for open source, or is it just a natural, incremental step up from what we already had? submitted by /u/astrophile_ashish [link] [comments]