[D] 1T performance from a 397B model. How?
Is this pure architecture (Qwen3- Next), or are we seeing the results of massively improved synthetic data distillation?
submitted by /u/Altruistic-Rock-6797
[link] [comments]
Like
0
Liked
Liked