CTCF: A Three-Level Coarse-to-Fine Cascade for Unsupervised Deformable Medical Image Registration
Deformable medical image registration aims to spatially align anatomical structures across volumetric scans. Recent transformer-based methods achieve high overlap accuracy but often produce deformation fields with topological violations. We propose CTCF, a Cascade Transformer for Coarse-to-Fine registration that wraps a lightweight coarse-and-refine envelope around a core registration module. Level 1 provides a coarse displacement estimate at quarter resolution, Level 2 performs the main registration via a Swin Transformer encoder with deformable cross-attention and a learned super-resolution decoder, and Level 3 applies error-driven flow refinement at half resolution. The two outer levels add only 3.0% parameter overhead yet improve registration accuracy while maintaining competitive deformation regularity relative to external baselines. The model is trained end-to-end with a composite unsupervised loss combining local normalized cross-correlation, diffusion regularization, inverse-consistency, and Jacobian-based topology preservation. On the OASIS brain MRI benchmark, CTCF achieves the highest Dice score of 0.8208 among the compared unsupervised methods while producing the lowest SDlogJ among the compared methods, with all Dice improvements statistically significant at p < 0.001 by the Wilcoxon signed-rank test. On IXI, CTCF also achieves the best Dice, HD95, SDlogJ, and fold percentage among the compared methods. A five-round ablation study validates each component: cascade decomposition isolates each level’s contribution, and resolution scaling experiments confirm the framework’s scalability, yielding further accuracy gains with zero parameter overhead.