Is this reward curve useless?

Is this reward curve useless?

Reward function

I’m using SAC for MARL. How do I reduce variance? The lower the value the better. I see over time the frequency of hitting 9 or lower increases but since there is so much volatility I cannot have my agents perform reliably.

My alpha term is close to 0 (came down all the way from 0.99), Q-loss and V-loss are close to 0 but my entropy term keeps increasing. What can I do?

submitted by /u/Markovvy
[link] [comments]

Liked Liked