Is this reward curve useless?

digitado ⋅ 22 de June de 2026

I’m using SAC for MARL. How do I reduce variance? The lower the value the better. I see over time the frequency of hitting 9 or lower increases but since there is so much volatility I cannot have my agents perform reliably.

My alpha term is close to 0 (came down all the way from 0.99), Q-loss and V-loss are close to 0 but my entropy term keeps increasing. What can I do?

submitted by /u/Markovvy
[link] [comments]

Like 0

Liked Liked