How do I improve this (quadruped RL learning)

How do I improve this (quadruped RL learning)

I’m new to RL and new to mujoco, so I have no idea what variables i should tune. Here are the variables ive rewarded/penalized:

I’ve rewarded the following:

+ r_upright + r_height + r_vx + r_vy + r_yaw + r_still + r_energy + r_posture + r_slip 

and I’ve placed penalties on:

p_vy = w_vy * vy^2 p_yaw = w_yaw * yaw_rate^2 p_still = w_still * ( (vx^2 + vy^2 + vz^2) + 0.05*(wx^2 + wy^2 + wz^2) ) p_energy = w_energy * ||q_des - q_ref||^2 p_posture = w_posture * Σ_over_12_joints (q - q_stance)^2 p_slip = w_foot_slip * Σ_over_sole-floor_contacts (v_x^2 + v_y^2) 

submitted by /u/aeauo
[link] [comments]

Liked Liked