Is RL post-training in ‘imagined environments’ a path to continual RL? Trying to understand this deeper
I’ve been reading more about training in imagined environments, especially the work of the Dreamer series and RialTo, and I’m curious about how this could apply to CL. Take an example of a robot deployed in a home that notices it has a high failure rate when picking up a specific object (let’s say cans in a kitchen). It then builds a world model of the kitchen from it’s deployment data, generates can-grasping rollouts within it and RL […]