Sparse Self-Prompt Guided Stereo Matching for Real-World Generalization

Stereo matching has witnessed rapid advances on curated benchmarks, yet deploying models in unconstrained real-world environments remains a fundamental challenge. This paper presents a sparse self-prompt guided network (SSPGNet) for stereo matching with strong generalization across diverse environments. Our core innovation lies in a sparse self-prompt guidance mechanism: 1) a sparse disparity map, used as a prompt, is self-estimated from visual foundation model features via cost aggregation; and 2) the sparse disparity is progressively refined into dense disparity maps through cross-attention-based stereo feature interaction, enabling sparse-to-dense disparity prediction. Additionally, we collected a diverse set of indoor and outdoor stereo pairs using a ZED 2 camera to assess the real-world performance of our model. Extensive experiments demonstrate that the proposed sparse-to-dense prompt mechanism not only preserves the semantic awareness of visual foundation models but also enhances stereo correspondence reasoning, achieving strong performance on public benchmarks and our in-the-wild dataset. These results highlight the potential of SSPGNet for direct deployment in real-world stereo perception systems. The code and data will be made publicly available upon publication.

Liked Liked