BEDiff: denoising diffusion probabilistic models for building extraction

Yanjing Lei , Yuan Wang , Sixian Chan , Jie Hu , Xiaolong Zhou , Hongkai Zhang

Optoelectronics Letters ›› 2025, Vol. 21 ›› Issue (5) : 298 -305.

PDF
Optoelectronics Letters ›› 2025, Vol. 21 ›› Issue (5) : 298 -305. DOI: 10.1007/s11801-025-4072-2
Article

BEDiff: denoising diffusion probabilistic models for building extraction

Author information +
History +
PDF

Abstract

Accurately identifying building distribution from remote sensing images with complex background information is challenging. The emergence of diffusion models has prompted the innovative idea of employing the reverse denoising process to distill building distribution from these complex backgrounds. Building on this concept, we propose a novel framework, building extraction diffusion model (BEDiff), which meticulously refines the extraction of building footprints from remote sensing images in a stepwise fashion. Our approach begins with the design of booster guidance, a mechanism that extracts structural and semantic features from remote sensing images to serve as priors, thereby providing targeted guidance for the diffusion process. Additionally, we introduce a cross-feature fusion module (CFM) that bridges the semantic gap between different types of features, facilitating the integration of the attributes extracted by booster guidance into the diffusion process more effectively. Our proposed BEDiff marks the first application of diffusion models to the task of building extraction. Empirical evidence from extensive experiments on the Beijing building dataset demonstrates the superior performance of BEDiff, affirming its effectiveness and potential for enhancing the accuracy of building extraction in complex urban landscapes.

Cite this article

Download citation ▾
Yanjing Lei,Yuan Wang,Sixian Chan,Jie Hu,Xiaolong Zhou,Hongkai Zhang. BEDiff: denoising diffusion probabilistic models for building extraction. Optoelectronics Letters, 2025, 21(5): 298-305 DOI:10.1007/s11801-025-4072-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

QiuW Y, GuL J, GaoF, et al.. Building extraction from very high-resolution remote sensing images using refine-UNet[J]. IEEE geoscience and remote sensing letters, 2023, 20: 1-5

[2]

LiX, XuF, LiuF, et al.. Semantic segmentation of remote sensing images by interactive representation refinement and geometric prior-guided inference[J]. IEEE transactions on geoscience and remote sensing, 2024, 62: 1-18

[3]

LiG C, XiB B, HeY F, et al.. Diamond-UNet: a novel semantic segmentation network based on UNet network and transformer for deep space rock images[J]. IEEE geoscience and remote sensing letters, 2024, 21: 1-5

[4]

XiaL G, MiS L, ZhangJ X, et al.. Dual-stream feature extraction network based on CNN and transformer for building extraction[J]. Remote sensing, 2023, 15(10): 2689

[5]

DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22) [2024-01-23]. https://arxiv.org/abs/2010.11929.

[6]

LiuZ, LinY, CaoY, et al.. Swin transformer: hierarchical vision transformer using shifted windows[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, October 11–17, 2021, Montreal, Canada, 2021, New York, IEEE: 10012-10022

[7]

YuanW, RanW H, ShiX D, et al.. Multi-constraint transformer based automatic building extraction from high resolution remote sensing images[J]. IEEE journal of selected topics in applied earth observations and remote sensing, 2023, 16: 9164-9174

[8]

HoJ, JainA, AbbeelP. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851

[9]

WU J, FU R, FANG H H, et al. MedSegDiff: medical image segmentation with diffusion probabilistic model[EB/OL]. (2022-11-01) [2024-01-23]. https://arxiv.org/abs/2211.00611.

[10]

YuM M, ChanS X, ZhouX L, et al.. Small object detection on highways via balance feature fusion and task-specific encoding network[J]. Optoelectronics letters, 2024, 20(7): 424-429

[11]

BICAKCI Y S, SARICA B. ATTransUNet: semantic segmentation model for building segmentation from aerial image and laser data[J]. Nordic machine intelligence, 2022, 2(3).

[12]

LiM L, RuiJ, YangS K, et al.. Method of building detection in optical remote sensing images based on segformer[J]. Sensors, 2023, 23(3): 1258

[13]

ChanS X, WangY, LeiY J, et al.. Asymmetric cascade fusion network for building extraction[J]. IEEE transactions on geoscience and remote sensing, 2023, 61: 1-18

[14]

SahariaC, HoJ, ChanW, et al.. Image super-resolution via iterative refinement[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(4): 4713-4726

[15]

WhangJ, DelbracioM, TalebiH, et al.. Deblurring via stochastic refinement[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 21–24, 2022, New Orleans, Louisiana, USA, 2022, New York, IEEE: 16293-16303

[16]

GuoX T, YangY W, YeC F, et al.. Accelerating diffusion models via pre-segmentation diffusion sampling for medical image segmentation[C]. 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), April 18–21, 2023, Cartagena de Indias, Colombia, 2023, New York, IEEE: 1-5

[17]

WuJ, JiW, FuH Z, et al.. Medsegdiff-v2: diffusion-based medical image segmentation with transformer[C]//Proceedings of the AAAI Conference on Artificial Intelligence, February 20–27, 2024, Vancouver, Canada. Washington: AAAI, 2024, 38(6): 6030-6038

[18]

XiaoT, LiuY C, ZhouB L, et al.. Unified perceptual parsing for scene understanding[C]. Proceedings of the European Conference on Computer Vision (ECCV), September 8–14, 2018, Munich, Germany, 2018, Berlin, Heidelberg, Springer: 418-434

[19]

XiaL G, ZhangX B, ZhangJ X, et al.. Building extraction from very-high-resolution remote sensing images using semi-supervised semantic edge detection[J]. Remote sensing, 2021, 13(11): 2187

[20]

ChenL C, ZhuY K, PapandreouG, et al.. Encoderdecoder with atrous separable convolution for semantic image segmentation[C]. Proceedings of the European Conference on Computer Vision (ECCV), September 8–14, 2018, Munich, Germany, 2018, Berlin, Heidelberg, Springer: 801-818

[21]

ChuX, TianZ, WangY, et al.. Twins: revisiting the design of spatial attention in vision transformers[J]. Advances in neural information processing systems, 2021, 34: 9355-9366

[22]

LiuZ, MaoH Z, WuC Y, et al.. A convnet for the 2020s[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 21–24, 2022, New Orleans, Louisiana, USA, 2022, New York, IEEE: 11976-11986

[23]

GuoM H, LuC Z, HouQ B, et al.. Segnext: rethinking convolutional attention design for semantic segmentation[J]. Advances in neural information processing systems, 2022, 35: 1140-1156

RIGHTS & PERMISSIONS

Tianjin University of Technology

AI Summary AI Mindmap
PDF

136

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/