Reinforcement learning kinda made me realize something uncomfortable

digitado ⋅ 23 de April de 2026

the model isn’t trying to “do the right thing”
it’s trying to win whatever game you accidentally designed??

and if your reward is even a little off, it won’t fail, it’ll optimize the wrong thing perfectly

feels less like training intelligence and more like designing a system that can’t outsmart youis this why so many RL demos look good in theory but fall apart in real use?

submitted by /u/TaleAccurate793
[link] [comments]

Like 0

Liked Liked