use motion priors with tqc?
|
is that possible? amp implementations on the internet assume you’re using on-policy ppo. in training, ppo’s collection time is huge (22s for 2000 steps) v/s tqc’s (~1 sec for 32k steps) i am tight on time and not sure how to progress further. training a hybrid robot to climb stairs and it has been a pain (atleast above 3cm). when do you know that physics is the problem and not the reward structure anymore. i have spent now 2 weeks playing with weights and curriculum. the robots refuse to climb! submitted by /u/ishaan2479 |
Like
0
Liked
Liked