Reinforcement learning kinda made me realize something uncomfortable
the model isn’t trying to “do the right thing”
it’s trying to win whatever game you accidentally designed??
and if your reward is even a little off, it won’t fail, it’ll optimize the wrong thing perfectly
feels less like training intelligence and more like designing a system that can’t outsmart youis this why so many RL demos look good in theory but fall apart in real use?
submitted by /u/TaleAccurate793
[link] [comments]
Like
0
Liked
Liked