What do you think of Yann Lecun option of RL being the cherry on top of all the ML cake?

What do you think of Yann Lecun option of RL being the cherry on top of all the ML cake?

Title says it all. I’m not expert in pure RL research, I worked mainly in foundation models so far.

Im curious on earing form expert what are their opinion of the role of modern RL, in particular:

– will it be just the very last fine tuning layer of bigger foundation models? If so what kind of RL approach you think are most prominent?

– will there be (or there are alredy) model that use RL more as a core layer in the whole model?

My gut feeling is that RL is very cool, but the hype has gone down in the last years due to diffusion/foundation model performing and scaling much better, and a lot of RL is perceived in practice as mainly “reward engineering”.

Please correct me as I might be very wrong 🙂

submitted by /u/Amazing-Coat5160
[link] [comments]

Liked Liked