MORL: How to deal with global rewards and reward shaping to incentivize the desired result?

I’m trying to create a cooperative multi-agent game, where agents have to work together to complete a game. The goal is to finish the game as fast as possible (minimizing time) and to maximize the game score. The game has intermediate subgoals.

Currently, I am running episodes to complete a game. My reward structure is a scalar with weights: R = w1*time + w2*score + shaped rewards, where w1+w2 = 1.

My struggle is how to deal with reward shaping as those are not really part of the global objective. I have read into potential-based rewards but I am not sure if I understand the consequence of that. Doesn’t that affect the value of my global objective too?

Hoping to hear people that have found a workaround for these types of problems.

submitted by /u/Markovvy
[link] [comments]

Liked Liked