How I topped the Open LLM Leaderboard using 2x 4090 GPUs – Research notes in Blog form
A few years ago, I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1 place. As of 2026, the top 4 models on that leaderboard are still descendants.
The weird finding: single-layer duplication does nothing. Too few layers, nothing. Too many, it gets worse. Only circuit-sized blocks of ~7 layers work. This suggests pre-training carves out discrete functional circuits in the layer stack that only work when preserved whole.
The whole thing was developed on 2x RTX 4090s in my basement; you don’t need massive compute to make real progress!
I’m now running current models (GLM-4.7, Qwen3.5, MiniMax M2.5) on this dual GH200 rig (see my other posts). Code and new models coming soon, including special RYS versions of Qwen3.5 27B and 35A3B
Happy to answer questions.
I don’t write papers any more, so here is a full technical write-up in Blog format for your enjoyment.
I’m the same guy who built GLaDOS, and scored a crazy Nvidia GH200 system here on Reddit.
submitted by /u/Reddactor
[link] [comments]