YOLOv8-RGBT-MT: A Fast Multi-Modal Fusion Network for Worker-and-Equipment Detection on Construction Sites

Journal of Beijing Institute of Technology ›› 2026, Vol. 35 ›› Issue (2) : 244 -252.

PDF (3025KB)
Journal of Beijing Institute of Technology ›› 2026, Vol. 35 ›› Issue (2) :244 -252. DOI: 10.15918/j.jbit1004-0579.2025.079
YOLOv8-RGBT-MT: A Fast Multi-Modal Fusion Network for Worker-and-Equipment Detection on Construction Sites
Author information +
History +
PDF (3025KB)

Abstract

To address the challenges of dusty, foggy and other complex construction site environments leading to the failure of visible light imaging and difficulties in small target detection, as well as the high resource consumption hindering model deployment, an enhanced and lightweight algorithm is proposed. This algorithm employs a hybrid architecture, integrating red green blue (RGB) (visible light) and thermal infrared (RGBT) multi-modal images through a fusion framework based on you only look once (YOLO) version 8 and Mamba-Transformer (MT). We refer to this integrated model as YOLOv8-RGBT-MT. In terms of network improvements, a frequency enhancement module is first employed to enhance visible light and infrared images. And then, a module integrating Mamba and Transformer components is designed to replace base convolutional blocks in the backbone network, thereby expanding the receptive field of the model and improving feature extraction in complex backgrounds. Finally, a multi-modal feature fusion mechanism is introduced, through which complementary information from visible and infrared images is effectively integrated via an adaptive weighting strategy, so that both the detection accuracy and robustness for small targets are enhanced. Experimental results demonstrate that, compared to YOLOv8-RGBT, the enhanced algorithm achieves an improvement of 18.7 % in mAP50, while reducing the number of inference time by 79.7 %

Keywords

RGBT images / multi-modal feature fusion / YOLOv8 / Mamba-Transformer / real-time object detection

Cite this article

Download citation ▾
Yan Li, Cunxin Sun, Baihai Zhang. YOLOv8-RGBT-MT: A Fast Multi-Modal Fusion Network for Worker-and-Equipment Detection on Construction Sites. Journal of Beijing Institute of Technology, 2026, 35(2): 244-252 DOI:10.15918/j.jbit1004-0579.2025.079

登录浏览全文

4963

注册一个新账户 忘记密码

References

PDF (3025KB)

0

Accesses

0

Citation

Detail

Sections
Recommended

/