Should reward functions always show a sigmoid function-like outcome?
|
My reward function looks more like this Curious what you would use for inference as well. Of course going for the peak might be best in terms of reward but the model does not seem robust, whereas where it plateaus, the model may be more reliable. submitted by /u/Markovvy |
Like
0
Liked
Liked