Bellman Expectation Equation as Dot Products!

I reformulated the Bellman Expectation Equation using vector dot products instead of the usual summation sigma summation notation.

g = γ⃗ · r⃗

o⃗ = r⃗ + γv⃗’

q = p⃗ · o⃗

v = π⃗ · q⃗

Together they express the full Bellman Expectation Equation: discounted return (g), one-step Bellman backup (o for outcome), Q-value as expected outcome (q) given dynamics (p), and state value (v) as expected value under policy π. This makes the computational structure of the MDP immediately visible.

Useful for:

RL students, dynamic programming, temporal difference learning, Q-learning, policy evaluation, value iteration.

RL Professor, who empathize with students, who struggle with SigmaSigmaSigmaSigma !!

The Curious!

PDF: github.com/khosro06001/bellman-equation-cheatsheet/blob/main/Bellman_Equation__Khosro_Pourkavoos__cheatsheet.pdf

Comments are appreciated!

submitted by /u/Positive_Engine_5935
[link] [comments]

Liked Liked