End-to-end object detection using a query-selection encoder with hierarchical feature-aware attention

Zuyi WANG , Zhimeng ZHENG , Jun MENG , Li XU

Front. Inform. Technol. Electron. Eng ›› 2025, Vol. 26 ›› Issue (8) : 1324 -1340.

PDF (92938KB)
Front. Inform. Technol. Electron. Eng ›› 2025, Vol. 26 ›› Issue (8) : 1324 -1340. DOI: 10.1631/FITEE.2400960
Research Article

End-to-end object detection using a query-selection encoder with hierarchical feature-aware attention

Author information +
History +
PDF (92938KB)

Abstract

End-to-end object detection methods have attracted extensive interest recently since they alleviate the need for complicated human-designed components and simplify the detection pipeline. However, these methods suffer from slower training convergence and inferior detection performance compared to conventional detectors, as their feature fusion and selection processes are constrained by insufficient positive supervision. To address this issue, we introduce a novel query-selection encoder (QSE) designed for end-to-end object detectors to improve the training convergence speed and detection accuracy. QSE is composed of multiple encoder layers stacked on top of the backbone. A lightweight head network is added after each encoder layer to continuously optimize features in a cascading manner, providing more positive supervision for efficient training. Additionally, a hierarchical feature-aware attention (HFA) mechanism is incorporated in each encoder layer, including in- and cross-level feature attention, to enhance the interaction between features from different levels. HFA can effectively suppress similar feature representations and highlight discriminative ones, thereby accelerating the feature selection process. Our method is highly versatile in accommodating both CNN- and Transformer-based detectors. Extensive experiments were conducted on the popular benchmark datasets MS COCO, CrowdHuman, and PASCAL VOC to demonstrate the effectiveness of our method. The results showed that CNN- and Transformer-based detectors using QSE can achieve better end-to-end performance within fewer training epochs.

Keywords

End-to-end object detection / Query-selection encoder / Hierarchical feature-aware attention

Cite this article

Download citation ▾
Zuyi WANG, Zhimeng ZHENG, Jun MENG, Li XU. End-to-end object detection using a query-selection encoder with hierarchical feature-aware attention. Front. Inform. Technol. Electron. Eng, 2025, 26(8): 1324-1340 DOI:10.1631/FITEE.2400960

登录浏览全文

4963

注册一个新账户 忘记密码

References

RIGHTS & PERMISSIONS

Zhejiang University Press

AI Summary AI Mindmap
PDF (92938KB)

Supplementary files

FITEE-1324-25005-ZYW_suppl_2

240

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/