Looking for collaborator / mentor to implement reduced version of MuZero (e.g., for Ms. Pacman)
Hi,
I’m looking for somebody who would be interested in jointly implementing a reduced version of MuZero over the next few weeks. I’m not sure yet if it’s computationally feasible within a reasonable budget, but the original paper shows some analyses for Ms. Pacman. Breaking down the algorithm in individual pieces, and step-by-step adding more sophistication so that eventually it leads to reproducing some of original analyses for that one environment could be an aspirational goal. Ideally, I would try it without looking at the published pseudo code.
I would also be happy if someone experienced would agree to occasionally give me advice.
In terms of my own RL experience: I have implemented PPO for Mujoco based on the paper (as far as I got), and then adding the remaining details from the “37 implementation details”. I haven’t done anything on Atari or tree search yet, and have not yet worked with distributed GPUs.
Thanks for your potential interest!
(contact via DM here, or via contact details in the linked repo)
submitted by /u/adrische
[link] [comments]