What do you think of Yann Lecun option of RL being the cherry on top of all the ML cake?
|
Title says it all. I’m not expert in pure RL research, I worked mainly in foundation models so far. Im curious on earing form expert what are their opinion of the role of modern RL, in particular: – will it be just the very last fine tuning layer of bigger foundation models? If so what kind of RL approach you think are most prominent? – will there be (or there are alredy) model that use RL more as a core layer in the whole model? My gut feeling is that RL is very cool, but the hype has gone down in the last years due to diffusion/foundation model performing and scaling much better, and a lot of RL is perceived in practice as mainly “reward engineering”. Please correct me as I might be very wrong 🙂 submitted by /u/Amazing-Coat5160 |