Trying to clarify something about the Bellman equation

Trying to clarify something about the Bellman equation

I’m checking if my understanding is correct.

In an MDP, is it accurate to say that:

State does NOT directly produce reward or next state.

Instead, the structure is always:

State → Action → (Reward, Next State)

So:

  • Immediate expected reward at state s is the average over actions of p(r | s,a)
  • Future value is the average over actions of p(s’ | s,a) times v(s’)

Meaning both reward and transition depend on (s,a), not on s alone.

Is this the correct way to think about it?

https://preview.redd.it/hj7ry9m1qtkg1.png?width=1577&format=png&auto=webp&s=c6f16285370679631d2904b5b85669ddb73d30a4

submitted by /u/New-Yogurtcloset1818
[link] [comments]

Liked Liked