Training from scratch with RL: Mad science or the next frontier?
Is it “crazy” to train generative models from scratch using only a reward signal? Not necessarily, but you’d be trading the efficiency of maximum likelihood estimation (MLE) for a massive uphill battle against the “cold start” problem. Since RL agents learn by exploring, a model starting with random weights will likely produce pure noise, failing to receive even a hint of a positive reward signal to begin the learning process.
submitted by /u/Delicious-Mall-5552
[link] [comments]
Like
0
Liked
Liked