PPO vs SAC on real robot
|
I’m working on an RL project using two algorithms: SAC and PPO. The project consists of a robot made up of 3 arms (controlled by 3 servos) attached to a plate on which a ball will roll. Through infrared sensors placed under the plate, I can detect the ball’s position (observations) and move the plate (pitch and roll actions) to bring the ball to the center and stabilize it. The reward is defined based on the distance from the center of the plate, the ball’s speed which must be kept low, and a penalty for overly jerky robot movements. When training with PPO, I manage to reach a fairly good policy that allows balancing the ball on the plate. With SAC, however, I struggle a lot and I believe it depends on the parameters. Training is done by placing a ring around the plate to prevent the ball from falling off; episodes never end before 128 steps (I’ve set it up this way), which make up a single episode. At the moment this is the definition:
The agent tends to take very small actions, it almost learns to stay still. Could anyone explain why? Find training metrics here: submitted by /u/Constant_Tiger7490 |