[R] Good modern alternatives to Perceiver/PercieverIO for datasets with many modalities?
I’ve been working on developing foundation models for massively multimodal datasets (around 30-40 different modalities on 1 dataset, you can kind of think of it like robot with a lot of different sensors). I think most scientific papers I see from the last couple years use Perceiver, which I feel is a really intuitive and elegant solution (like you literally just slap on name of modality + the data and let it handle the rest).
However, it is half a decade old at this point. I wanted to see if there’s any better fundamental architecture changes people have moved onto recently for this kind of task before completely committing all training resources to a model based on this.
submitted by /u/Affectionate_Use9936
[link] [comments]