How I topped the Open LLM Leaderboard using 2x 4090 GPUs – Research notes in Blog form

digitado ⋅ 10 de March de 2026

A few years ago, I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1 place. As of 2026, the top 4 models on that leaderboard are still descendants.

The weird finding: single-layer duplication does nothing. Too few layers, nothing. Too many, it gets worse. Only circuit-sized blocks of ~7 layers work. This suggests pre-training carves out discrete functional circuits in the layer stack that only work when preserved whole.

The whole thing was developed on 2x RTX 4090s in my basement; you don’t need massive compute to make real progress!

I’m now running current models (GLM-4.7, Qwen3.5, MiniMax M2.5) on this dual GH200 rig (see my other posts). Code and new models coming soon, including special RYS versions of Qwen3.5 27B and 35A3B

Happy to answer questions.

I don’t write papers any more, so here is a full technical write-up in Blog format for your enjoyment.

I’m the same guy who built GLaDOS, and scored a crazy Nvidia GH200 system here on Reddit.

submitted by /u/Reddactor
[link] [comments]

Like 0

Liked Liked