use motion priors with tqc?

use motion priors with tqc?

is that possible? amp implementations on the internet assume you’re using on-policy ppo. in training, ppo’s collection time is huge (22s for 2000 steps) v/s tqc’s (~1 sec for 32k steps)

i am tight on time and not sure how to progress further. training a hybrid robot to climb stairs and it has been a pain (atleast above 3cm). when do you know that physics is the problem and not the reward structure anymore. i have spent now 2 weeks playing with weights and curriculum. the robots refuse to climb!

submitted by /u/ishaan2479
[link] [comments]

Liked Liked