I spent 3 days trying to “outsmart” an RL agent, and it taught me I’m the one who needs training.
I’ve been diving into the deep end of Reinforcement Learning and Generative Models lately, specifically trying to see if I could train a simple diffusion model from scratch using nothing but a reward signal. On paper, it sounded like a fun weekend experiment, but in reality, it was a 72-hour masterclass in frustration. By Sunday night, I was staring at a screen of pure static; every time I adjusted the hyperparameters, the model would either collapse into a […]