Greedy Pruning Algorithm for DETR Architecture Networks Based on Global Optimization

Qiubo HUANG , Jingsai XU , Yakui ZHANG , Mei WANG , Dehua CHEN

Journal of Donghua University(English Edition) ›› 2025, Vol. 42 ›› Issue (1) : 96 -105.

PDF (1867KB)
Journal of Donghua University(English Edition) ›› 2025, Vol. 42 ›› Issue (1) :96 -105. DOI: 10.19884/j.1672-5220.202403010
Information Technology and Artificial Intelligence
research-article

Greedy Pruning Algorithm for DETR Architecture Networks Based on Global Optimization

Author information +
History +
PDF (1867KB)

Abstract

End-to-end object detection Transformer(DETR) successfully established the paradigm of the Transformer architecture in the field of object detection. Its end-to-end detection process and the idea of set prediction have become one of the hottest network architectures in recent years. There has been an abundance of work improving upon DETR. However, DETR and its variants require a substantial amount of memory resources and computational costs, and the vast number of parameters in these networks is unfavorable for model deployment. To address this issue, a greedy pruning(GP) algorithm is proposed, applied to a variant denoising-DETR(DN-DETR), which can eliminate redundant parameters in the Transformer architecture of DN-DETR. Considering the different roles of the multi-head attention(MHA) module and the feed-forward network(FFN) module in the Transformer architecture, a modular greedy pruning(MGP) algorithm is proposed. This algorithm separates the two modules and applies their respective optimal strategies and parameters. The effectiveness of the proposed algorithm is validated on the COCO 2017 dataset. The model obtained through the MGP algorithm reduces the parameters by 49% and the number of floating point operations(FLOPs) by 44% compared to the Transformer architecture of DN-DETR. At the same time, the mean average precision(m AP) of the model increases from 44. 1% to 45. 3%.

Keywords

model pruning / object detection Transformer(DETR) / Transformer architecture / object detection

Cite this article

Download citation ▾
Qiubo HUANG, Jingsai XU, Yakui ZHANG, Mei WANG, Dehua CHEN. Greedy Pruning Algorithm for DETR Architecture Networks Based on Global Optimization. Journal of Donghua University(English Edition), 2025, 42(1): 96-105 DOI:10.19884/j.1672-5220.202403010

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//European Conference on Computer Vision. Cham: Springer, 2020: 213-229.

[2]

ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. (2021-03-18)[2024-01-21]. https://arxiv.org/abs/2010.04159.

[3]

MENG D P, CHEN X K, FAN Z J, et al. Conditional DETR for fast training convergence[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE, 2021: 3631-3640.

[4]

LIU S L, LI F, ZHANG H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[EB/OL]. (2022-05-30)[2023-12-21]. https://arxiv.org/abs/2201.12329v1.

[5]

LI F, ZHANG H, LIU S L, et al. DN-DETR: accelerate DETR training by introducing query denoising[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2022: 13609-13617.

[6]

YE M, WU L M, LIU Q. Greedy optimization provably wins the lottery: logarithmic number of winning tickets is enough[J]. Advances in Neural Information Processing Systems, 2020, 33: 16409-16420.

[7]

ROH B, SHIN J, SHIN W, et al. Sparse DETR: efficient end-to-end object detection with learnable sparsity[EB/OL]. (2022-05-04)[2023-12-21]. https://arxiv.org/abs/2111.14330v1.

[8]

HAN S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural network[C] 29th Annual Conference on Neural Information Processing Systems (NIPS). La Jolla: NIPS, 2015, 28.

[9]

LIU Z, LI J G, SHEN Z Q, et al. Learning efficient convolutional networks through network slimming[C]//2017 IEEE International Conference on Computer Vision (ICCV). New York: IEEE, 2017: 2755-2763.

[10]

DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16 × 16 words: transformers for image recognition at scale[C]//35th Annual Conference on Neural Information Processing Systems (NIPS). La Jolla: NIPS, 2021.

[11]

ZHU M J, TANG Y H, HAN K. Vision transformer pruning[EB/OL]. (2021-08-14)[2023-12-21]. https://arxiv.org/abs/2104.08500.

[12]

SUN H Y, ZHANG S L, TIAN X, et al. Pruning DETR: efficient end-to-end object detection with sparse structured pruning[J]. Signal, Image and Video Processing, 2024, 18(1): 129-135.

Funding

Shanghai Municipal Commission of Economy and Information Technology, China(202301054)

PDF (1867KB)

54

Accesses

0

Citation

Detail

Sections
Recommended

/