Deeplearning.AI’s course on reinforcement learning is confusing me here.
|
Before they define the r term as a sequence level reward, then claim that you can get the individual contribution of each token by subtracting a token level baseline. How on earth does that even work? They never elaborate on this and most of the time never clarify that r is sequence or token level in these explanations. This has really frustrated me especially since this “explanation” is coming from a course that’s supposed to make these ideas more accessible. submitted by /u/No_Lynx5887 |
Like
0
Liked
Liked