Should reward functions always show a sigmoid function-like outcome?

Should reward functions always show a sigmoid function-like outcome?

My reward function looks more like this

Curious what you would use for inference as well. Of course going for the peak might be best in terms of reward but the model does not seem robust, whereas where it plateaus, the model may be more reliable.

submitted by /u/Markovvy
[link] [comments]

Liked Liked