Bellman Expectation Equation as Dot Products!
I reformulated the Bellman Expectation Equation using vector dot products instead of the usual summation sigma summation notation.
g = γ⃗ · r⃗
o⃗ = r⃗ + γv⃗’
q = p⃗ · o⃗
v = π⃗ · q⃗
Together they express the full Bellman Expectation Equation: discounted return (g), one-step Bellman backup (o for outcome), Q-value as expected outcome (q) given dynamics (p), and state value (v) as expected value under policy π. This makes the computational structure of the MDP immediately visible.
Useful for:
RL students, dynamic programming, temporal difference learning, Q-learning, policy evaluation, value iteration.
RL Professor, who empathize with students, who struggle with SigmaSigmaSigmaSigma !!
The Curious!
Comments are appreciated!
submitted by /u/Positive_Engine_5935
[link] [comments]