Necessity of computation in decision making
I’ve been thinking about a question that seems surprisingly under-discussed in RL: Why do we assume an agent can always compute the correct action immediately? That assumption is built into the standard MDP formulation, but every policy is ultimately a computer program, and computation is a finite resource.
I wrote a blog exploring this idea through computability theory and thought MDPs. The main observation is that if policies are resource bounded, then “thinking” (or computation) isn’t just a convenience—it can fundamentally expand what policies can represent.
I will illustrate this through a simple XOR construction and a toy experiment.
Feedback and discussion are very welcome!
submitted by /u/chanbpy
[link] [comments]