Undiscounted Semi-Markov Decision Processes With Countably Infinite Action Spaces

In this article we study semi-Markov decision processes (SMDPs) where the pay-off criterion is limiting ratio average, generally known as undiscounted pay-off. Here we consider the action space of the decision maker to be possibly countably infinite. However, we do not put any restriction on the reward function. We prove the existence of a near-optimal or ϵ-optimal strategy of the decision maker which turns out to be a deterministic semi-stationary. An efficient algorithm is discussed to compute a near-optimal pure semi-stationary strategy for such SMDP model. Also under some standard ergodicity conditions, we propose an optimality equation of these SMDP models.

Liked Liked