[R] Vision+Time Series data Encoder

Hi there,

Does anyone have experience working with a vision+time series data encoder? I am looking for a recent paper on this but only found this NeurIPS paper https://github.com/liruiw/HPT. Searched the papers that cited this but no luck yet.

I wanted to use a pre-trained encoder that takes both vision(video clips) and time series data (robotic proprioception) and generates a single embedding vector. I will use this vector for some downstream tasks. There are many strong vision encoders like VJEPA, PE and some time series encoder like Moment but I was looking for a unified one, better trained on robotics manipulation data.

Thanks

submitted by /u/zillur-av
[link] [comments]

Liked Liked