try Symphony (1env) in responce to Samas69420 (Proximal Policy Optimization with 512 envs)
|
I was scrolling different topics and found you were trying to train OpenAI’s Humanoid. Symphony is trained without paralell simulations, model-free, no behavioral cloning. It is 5 years of work understanding humans. It does not go for speed, but it runs well before 8k episodes. code: https://github.com/timurgepard/Symphony-S2/tree/main paper: https://arxiv.org/abs/2512.10477 (it might feel more like book than short paper) submitted by /u/Timur_1988 |
Like
0
Liked
Liked