Few-Shot Remote Sensing Scene Classification Based on Diffusion Augmentation and Multimodal Feature Fusion
Few-shot remote sensing scene classification (FSRSSC) entails identifying images scene classes from limited labeled samples, facing the challenges of labeled data scarcity, as well as the intricacy and variety of remote sensing images with high intraclass variance and interclass similarity. To address these challenges, we propose a novel framework named as MMFF-Net in this article, which consists of four key components: diffusion augmentation (DA), multiscale feature fusion (MSFF), dual attention fusion module (DAFM), and information interaction mutual attention (IIMA). The DA is utilized to augment support set samples with high-quality. In addition, the MSFF focuses on obtaining the local spatial details, and the DAFM is utilized to fuse the local feature and the global feature. What is more, the IIMA module is employed to interact between the query set and support set information. What is more, we use word2vec to obtain the semantic features for reducing the disparity between them and the visual features with LSE Loss. The comparative experimental results with multiple models on three benchmark remote sensing scene (RSS) datasets validate the effectiveness of the proposed MMFF-Net, showcasing the superiority and feasibility of our approach in most FSRSSC cases.