WM Arena: Compare world model predictions across 26 Atari games with blind battles and a perception quiz
I built WM Arena (arena.worldflux.ai), an interactive benchmark for visual world models on the Atari 100k suite.
Three modes:
– Visual Explorer: side-by-side real vs predicted frames across 26 games
– Blind Battle: ELO-ranked voting on anonymous model outputs
– Real or Predicted? Quiz: a perception test
Currently evaluating DIAMOND (NeurIPS ’24 Spotlight), TWISTER (ICLR ’25), IRIS (ICLR ’23), and STORM (NeurIPS ’23).
Every model runs its official code at a pinned commit. No re-implementations.
Try it: arena.worldflux.ai
Would love feedback from this community, especially on which models to add next. DreamerV3, Delta-IRIS, and EDELINE are on the roadmap.
submitted by /u/Confident_Gas_5266
[link] [comments]
Like
0
Liked
Liked