Vocabulary Restriction of VLAs (Vision Language Action)
Hello, I wanted to ask how do you restrict the output vocabulary/ possible actions of VLAs. Specifically I am reading currently the research papers of RT-2 and OpenVLA. OpenVLA references RT-2 and RT-2 says nothing specifically, it just says in the fine-tuning phase: “Thus, to ensure that RT-2 outputs valid action tokens during decoding, we constrain its output vocabulary via only sampling valid action tokens when the model is prompted with a robot-action task …” So do you […]