Can PPO learn through “Imagination” similar to Dreamer?

Hi everyone,

I’ve been diving into the Dreamer paper recently, and I found the concept of learning a policy through “imagination”(within a latent world model) absolutely fascinating.

This got me wondering: Can the PPO (Proximal Policy Optimization) algorithm also be trained through imagination?

Specifically, instead of interacting with a real environment, could we plug PPO into a learned world model to update its policy? I’d love to hear your thoughts on the technical feasibility or if there are any existing papers that have explored this.

Thanks!

submitted by /u/audi_etron
[link] [comments]

Liked Liked