I just updated my RL notes!

digitado ⋅ 25 de June de 2026

It included both the foundational knowledge such as policy gradient theorem as well as the latest such as GRPO.

Like 0

Liked Liked

Search

Posts recentes

11 months of building a robotics simulator taught me one thing: talk to users more than your code
NASA may send a backup, nuclear-powered Mars rover to the Moon
ML agents difficulty modulation for game
Google kills Tenor GIF API, forcing changes at X, Discord, and more
use motion priors with tqc?

No comments to show.