Is this reward curve useless?
|
I’m using SAC for MARL. How do I reduce variance? The lower the value the better. I see over time the frequency of hitting 9 or lower increases but since there is so much volatility I cannot have my agents perform reliably. My alpha term is close to 0 (came down all the way from 0.99), Q-loss and V-loss are close to 0 but my entropy term keeps increasing. What can I do? submitted by /u/Markovvy |
Like
0
Liked
Liked