Toy environment question
|
So I built this toy environment and I think no existing methods can really solve it— I tested only rainbow DQN and a simple actor-critic algorithm (forked bsuite), but it’s a pretty difficult problem because there’s a powerful local optimum and uniform exploration cannot break free of it (unless tuned to an unreasonable degree). I have a couple questions:
So far I’m thinking maybe Humanoid v4, which I could imagine having the necessary structure, at least in theory— it has dense, structured rewards and the powerful local optimum is standing still and just not falling over. Meanwhile, true locomotion is essentially controlled falling, and falling over does potentially reveal the necessary information to learn locomotion. So “following the breadcrumbs” of different ways to fall over could theoretically reveal the necessary information to learn locomotion. What do y’all think? submitted by /u/w41t3rpwnZ0RZ |