PPO and Normalization
Hi all,
I’ve been working on building a Multi-Agent PPO for Mad Pod Racing on CodinGame, using a simple multi-layer perceptron for both the agents and the critic.
For the input data, I have distance [0, 16000] and speed [0, 700]. I first scaled the real values by their maximums to bring them into a smaller range. With this simple scaling and short training, my agent stabilized at a mediocre performance.
Then, I tried normalizing the data using Z-score, but the performance dropped significantly. (I also encountered a similar issue in a CNN image recognition project.)
Do you know if input data normalization is supposed to improve performance, or could there be a bug in my code?
submitted by /u/kalyklos
[link] [comments]