Reinforcement Learning with Multi-Step Lookahead Information Via Adaptive Batching
We study tabular reinforcement learning problems with multiple steps of lookahead information. Before acting, the learner observes $ell$ steps of future transition and reward realizations: the exact state the agent would reach and the rewards it would collect under any possible course of action. While it has been shown that such information can drastically boost the value, finding the optimal policy is NP-hard, and it is common to apply one of two tractable heuristics: processing the lookahead in […]