A Concentration Bound for TD(0) with Function Approximation
arXiv:2312.10424v4 Announce Type: replace-cross Abstract: We derive uniform all-time concentration bound of the type ‘for all $n geq n_0$ for some $n_0$’ for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive […]