Partially observable Matsuzawa. Can any RL algorithm generalize in this way?
Fully observable Matsuzawa puzzles are grid worlds where an agent must pick up coins in a particular order, travel down a long hallway, then pick up coins in order again. The secondary chamber has the coins in exactly the locations in which they occurred in the primary. https://i.imgur.com/5nvi0oe.png coins must be picked up in the order of their face number. coins in the secondary chamber are pickable only when there are no coins remaining in the primary. reward […]