Using RL with a Transformer that outputs structured actions (index + complex object) — architecture advice?
Hi everyone, I’m working on a research project where my advisor suggested combining reinforcement learning with a transformer model, and I’m trying to figure out what the best architecture might look like. I unfortunately can’t share too many details about the actual project (sorry!), but I’ll try to explain the technical structure as clearly as possible using simplified examples. Problem setup (simplified example) Imagine we have a sequence where each element is represented by a super-token containing many […]