Low-Light Video Enhancement via Fast–Slow Dual Branches and Flow-Guided Attention
Low-light video enhancement aims to restore clear, color-faithful, and temporally consistent visual content from video sequences captured under extremely low signal-to-noise ratios and high dynamic range constraints. Existing multi-frame enhancement methods typically adopt uniform spatio-temporal sampling and feature extraction strategies for all frames, making it challenging to simultaneously achieve long-range temporal denoising and accurate fast-motion modeling. To address this trade-off, we propose a low-light video enhancement framework based on a Fast–Slow dual-branch architecture. The video signal is decomposed into two complementary feature streams: a Slow branch with sparse temporal sampling and high spatial resolution, built on a Vision Transformer backbone, which focuses on long-range temporal denoising and high-frequency texture restoration for static and slow-moving regions; and a Fast branch with dense temporal sampling and low spatial resolution, built on a ViT-Tiny backbone, which efficiently captures large-scale motion and rapid illumination changes. To mitigate the discrepancy in sampling rates and spatial resolutions between the two branches, we further introduce a flow branch based on a pre-trained StreamFlow model and design a Flow-Guided Cross-Attention (FGCA) module. FGCA first uses optical flow to geometrically modulate and progressively align Fast-branch features, and then injects the flow-enhanced Fast features into the Slow branch at each space-time location via lightweight pixel-wise cross-attention. This mechanism achieves a cascade of coarse geometric alignment and fine semantic fusion. Experiments on two real-world low-light video datasets, SDSD-indoor and SDSD-outdoor, demonstrate that our method consistently outperforms several representative approaches in terms of PSNR, SSIM, AB(Var), and MABD, while effectively suppressing motion blur and ghosting artifacts in dynamic night scenes, yielding temporally stable and perceptually pleasing results.