PPO and Normalization

Hi all,
I’ve been working on building a Multi-Agent PPO for Mad Pod Racing on CodinGame, using a simple multi-layer perceptron for both the agents and the critic.

For the input data, I have distance [0, 16000] and speed [0, 700]. I first scaled the real values by their maximums to bring them into a smaller range. With this simple scaling and short training, my agent stabilized at a mediocre performance.

Then, I tried normalizing the data using Z-score, but the performance dropped significantly. (I also encountered a similar issue in a CNN image recognition project.)

Do you know if input data normalization is supposed to improve performance, or could there be a bug in my code?

submitted by /u/kalyklos
[link] [comments]

Liked Liked