Exploring Physical Intelligence Emergence via Omni-Modal Architecture and Physical Data Engine
arXiv:2602.07064v1 Announce Type: new Abstract: Physical understanding remains brittle in omni-modal models because key physical attributes are visually ambiguous and sparsely represented in web-scale data. We present OmniFysics, a compact omni-modal model that unifies understanding across images, audio, video, and text, with integrated speech and image generation. To inject explicit physical knowledge, we build a physical data engine with two components. FysicsAny produces physics-grounded instruction–image supervision by mapping salient objects to verified physical attributes through hierarchical retrieval over […]