MV-S2CD: A Modality-Bridged Vision Foundation Model-Based Framework for Unsupervised Optical–SAR Change Detection
Unsupervised change detection (UCD) from heterogeneous bitemporal optical–SAR imagery is challenging due to modality discrepancy, speckle/illumination variations, and the absence of change annotations. We propose MV-S2CD, a vision foundation model (VFM)-based framework that learns a modality-bridged latent space and produces dense change maps in a fully unsupervised manner. To robustly adapt pretrained VFM priors to heterogeneous inputs with minimal task-specific parameters, MV-S2CD incorporates lightweight modality-specific adapters and parameter-efficient low-rank adaptation (LoRA) in high-level layers. A shared projector embeds the two observations into a common geometry, enabling consistent cross-modal comparison and reducing sensor-induced domain shift. Building on the bridged representation, we design a dual-branch change reasoning module that decouples structure-sensitive cues from semantic-consistency cues: a structure pathway preserves fine boundaries and local variations, while a semantic-consistency pathway employs reliability gating and multi-scale context aggregation to suppress pseudo-changes caused by modality-specific nuisances and residual misregistration. For label-free optimization, we develop a difference-centric self-supervision scheme with two perturbation views and reliability-guided pseudo partitioning, jointly enforcing pseudo-unchanged invariance, pseudo-changed/unchanged separability, and sparsity and edge-preserving regularization. Experiments on three heterogeneous optical–SAR benchmarks demonstrate that MV-S2CD consistently improves the precision–recall trade-off and achieves state-of-the-art performance among unsupervised baselines, while remaining backbone-flexible and efficient.