Deeplearning.AI’s course on reinforcement learning is confusing me here.

digitado ⋅ 4 de June de 2026

Before they define the r term as a sequence level reward, then claim that you can get the individual contribution of each token by subtracting a token level baseline. How on earth does that even work? They never elaborate on this and most of the time never clarify that r is sequence or token level in these explanations. This has really frustrated me especially since this “explanation” is coming from a course that’s supposed to make these ideas more accessible.

submitted by /u/No_Lynx5887
[link] [comments]

Like 0

Liked Liked