Training from scratch with RL: Mad science or the next frontier?

digitado ⋅ 28 de January de 2026

Is it “crazy” to train generative models from scratch using only a reward signal? Not necessarily, but you’d be trading the efficiency of maximum likelihood estimation (MLE) for a massive uphill battle against the “cold start” problem. Since RL agents learn by exploring, a model starting with random weights will likely produce pure noise, failing to receive even a hint of a positive reward signal to begin the learning process.

submitted by /u/Delicious-Mall-5552
[link] [comments]

Like 0

Liked Liked