Necessity of computation in decision making

I’ve been thinking about a question that seems surprisingly under-discussed in RL: Why do we assume an agent can always compute the correct action immediately? That assumption is built into the standard MDP formulation, but every policy is ultimately a computer program, and computation is a finite resource.

I wrote a blog exploring this idea through computability theory and thought MDPs. The main observation is that if policies are resource bounded, then “thinking” (or computation) isn’t just a convenience—it can fundamentally expand what policies can represent.
I will illustrate this through a simple XOR construction and a toy experiment.

Feedback and discussion are very welcome!

submitted by /u/chanbpy
[link] [comments]

Liked Liked