try Symphony (1env) in responce to Samas69420 (Proximal Policy Optimization with 512 envs)

try Symphony (1env) in responce to Samas69420 (Proximal Policy Optimization with 512 envs)

I was scrolling different topics and found you were trying to train OpenAI’s Humanoid.

Symphony is trained without paralell simulations, model-free, no behavioral cloning.

It is 5 years of work understanding humans. It does not go for speed, but it runs well before 8k episodes.

code: https://github.com/timurgepard/Symphony-S2/tree/main

paper: https://arxiv.org/abs/2512.10477 (it might feel more like book than short paper)

submitted by /u/Timur_1988
[link] [comments]

Liked Liked