RL + Generative Models

digitado ⋅ 28 de January de 2026

A question for people working in RL and image generative models (diffusion, flow based etc). There seems to be more emerging work in RL fine tuning techniques for these models. I’m interested to know – is it crazy to try to train these models from scratch with a reward signal only (i.e without any supervision data)?

What techniques could be used to overcome issues with reward sparsity / cold start / training instability?

submitted by /u/amds201
[link] [comments]

Like 0

Liked Liked