A Multi-Modal Multi-Task Framework for Nationwide 50 cm Building-Use Mapping in China
Accurate, large-scale building-use information at very high spatial resolution is critical for urban, economic, and risk-related applications. We present a nationwide framework for building-use mapping at 50~cm resolution in China by fusing very-high-resolution RGB imagery with points-of-interest (POI) data. A multi-task U-Net with a ResNet-34 backbone jointly predicts building footprints and three building-use classes (residential, commercial, industrial), using POI-based probability maps as additional input channels. We construct a labeled dataset of approximately 100{,}000 buildings from 90 cities and a 351-million-tile inference dataset covering about 54% of China’s land area. Experiments show that POI fusion and multi-task learning significantly improve performance over imagery-only baselines. In the final nationwide product, the residential, commercial, and industrial classes achieve per-class accuracies of 0.9711, 0.9664, and 0.9854, with F1-scores of 0.8416, 0.5828, and 0.8143, respectively. The resulting building-use database can support a wide range of downstream applications, including catastrophe risk assessment, exposure mapping, and urban analytics.