A JAX Implementation of Sutton’s 1992 IDBD (Alberta Plan Step 1)

I just started a D.Eng and am interested the the Alberta Plan for AI research and the focus on continual online learning. I’m starting with the foundational papers Sutton recommends in his top 10 papers list on his personal page. To that end my first dive into this is a JAX implementation of the experiments in Sutton’s 1992 paper on IDBD. Good results and I have this subreddit to thank for turning me onto JAX.

I was able to reproduce the plots from the paper. Write up on my results here:
https://blog.9600baud.net/sutton92.html

I haven’t had an opportunity to publish a Python package or the source yet but it’s on my todo list. Would love any feedback on this approach to learning the foundations of RL. Autostep is next.

submitted by /u/debian_grey_beard
[link] [comments]

Liked Liked