An Object Detection Algorithm Based on Deep Learning and Salient Feature Fusion for Roadside Surveillance Camera

Yang He; Lisheng Jin; Huanhuan Wang; Xinyu Sun; Zhen Huo; Guangqi Wang

doi:10.1049/cit2.12406

CAAI Transactions on Intelligence Technology ›› 2026, Vol. 11 ›› Issue (1) :279 -295. DOI: 10.1049/cit2.12406

ORIGINAL RESEARCH

research-article

An Object Detection Algorithm Based on Deep Learning and Salient Feature Fusion for Roadside Surveillance Camera

Author information +

History +

PDF (6637KB)

Abstract

In intelligent transportation systems, object detection for a surveillance video is one of the important functions. The performance of existing surveillance video object detection algorithms is affected by the confiict between the features of the objects, which leads to a decline in precision. Therefore, an object detection algorithm based on deep learning and salient feature fusion is proposed. The proposed method introduces a non-weight-sharing network to process the salient features of the image and fuse them with the features extracted from the red blue green branch. Different from the previous solutions, the salient feature extraction branch uses the boundary features and statistical features of the image and fuses the features of the two branches in the efficient layer aggregation networks structure. At the same time, the attention module is used in efficient layer aggregation networks with convolutional block attention module to improve the efficiency of feature utilisation. The training and evaluation are carried out in the constructed surveillance video feature confiict dataset, and eight scenes are constructed in the way of orthogonal exper-iments. The experimental results show that the performance of object detection can be significantly improved by using the proposed method in the object detection task of the intelligent transportation system surveillance video feature confiict scene.

Keywords

autonomous vehicle / image processing / intelligent transportation systems / transportation

Cite this article

Download citation ▾

Yang He, Lisheng Jin, Huanhuan Wang, Xinyu Sun, Zhen Huo, Guangqi Wang. An Object Detection Algorithm Based on Deep Learning and Salient Feature Fusion for Roadside Surveillance Camera. CAAI Transactions on Intelligence Technology, 2026, 11(1): 279-295 DOI:10.1049/cit2.12406

登录浏览全文

4963

注册一个新账户忘记密码

Acknowledgements

This work is supported by the National Key Research and Development Programme of China (2021YFB3202200), the National Natural Science Foundation of China (52072333), and Hebei Provincial Department of Education in the postgraduate innovation ability training funding project (CXZZBS2023061).

Conflict of interest statement

The authors declare no confiicts of interest.

Data availability statement

Not applicable.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Lin H., Tang C.: Intelligent bus operation optimization by inte-grating cases and data driven based on business chain and enhanced quantum genetic algorithm. IEEE Trans. Intell. Transport. Syst. 23(7), 9869-9882 (2021). https://doi.org/10.1109/tits.2021.3121289.

[2]	Li G., Yang Y., Qu X.: Deep learning approaches on pedestrian detection in hazy weather. IEEE Trans. Ind. Electron. 67(10), 8889-8899 (2019). https://doi.org/10.1109/tie.2019.2945295.

[3]	Chen C., et al.: An edge traffic fiow detection scheme based on deep learning in an intelligent transportation system. IEEE Trans. Intell. Transport. Syst. 22(3), 1840-1852 (2020). https://doi.org/10.1109/tits.2020.3025687.

[4]	Gao M., et al.: Manifold Siamese network: a novel visual tracking convnet for autonomous vehicles. IEEE Trans. Intell. Transport. Syst. 21(4), 1612-1623 (2019). https://doi.org/10.1109/tits.2019.2930337.

[5]	He Y., et al.: Automatic ROI setting method based on LSC for a traffic congestion area. Sustainability 14(23), 16126 (2022). https://doi.org/10.3390/su142316126.

[6]	Wang H., et al.: Detector-tracker integration framework for auton-omous vehicles pedestrian tracking. Rem. Sens. 15(8), 2088 (2023). https://doi.org/10.3390/rs15082088.

[7]	Jin L., Ji B., Guo B.: Human-like attention-driven saliency object estimation in dynamic driving scenes. Machines 10(12), 1172 (2022). https://doi.org/10.3390/machines10121172.

[8]	Guo B., et al.: Establishment of the characteristic evaluation index system of secondary task driving and analyzing its importance. Transport. Res. F Traffic Psychol. Behav. 64, 308-317 (2019). https://doi.org/10.1016/j.trf.2019.05.013.

[9]	Khayrat A., et al.: An intelligent surveillance system for detecting abnormal behaviors on campus using YOLO and CNN-LSTM networks. In: 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), pp. 104-109. IEEE (2022).

[10]	Di Febbraro A., Sacco N., Saeednia M.: An agent-based framework for cooperative planning of intermodal freight transport chains. Trans-port. Res. C Emerg. Technol. 64, 72-85 (2016). https://doi.org/10.1016/j.trc.2015.12.014.

[11]	Ehlers U.C., et al.: Assessing the safety effects of cooperative intelligent transport systems: a bowtie analysis approach. Accid. Anal. Prev. 99, 125-141 (2017). https://doi.org/10.1016/j.aap.2016.11.014.

[12]	Mfenjou M.L., et al.: Control points deployment in an Intelligent Transportation System for monitoring inter-urban network roadway. J. King Saud Univ.-Comput. Inf. Sci. 34(2), 16-26 (2022). https://doi.org/10.1016/j.jksuci.2019.10.005.

[13]	Du Y., et al.: A novel spatio-temporal synchronization method of roadside asynchronous MMW radar-camera for sensor fusion. IEEE Trans. Intell. Transport. Syst. 23(11), 22278-22289 (2021). https://doi.org/10.1109/tits.2021.3119079.

[14]	Kannengießer N., et al.: Challenges and common solutions in smart contract development. IEEE Trans. Software Eng. 48(11), 4291-4318 (2021).

[15]	Zhang L., et al.: Gc-net: gridding and clustering for traffic object detection with roadside lidar. IEEE Intell. Syst. 36(4), 104-113 (2020). https://doi.org/10.1109/mis.2020.2993557.

[16]	Dyrmann M., et al.: Camera assisted roadside monitoring for invasive alien plant species using deep learning. Sensors 21(18), 6126 (2021). https://doi.org/10.3390/s21186126.

[17]	Sroczyński A., et al.: Examining impact of speed recommendation algorithm operating in autonomous road signs on minimum distance between vehicles. Rem. Sens. 14(12), 2803 (2022). https://doi.org/10.3390/rs14122803.

[18]	Zhang J., et al.: DSP-based traffic target detection for intelligent transportation. IEEE Trans. Intell. Transport. Syst. 24(11), 13180-13191 (2022). https://doi.org/10.1109/tits.2022.3225709.

[19]	Feng D., et al.: Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and chal-lenges. IEEE Trans. Intell. Transport. Syst. 22(3), 1341-1360 (2020). https://doi.org/10.1109/tits.2020.2972974.

[20]	Feng X., et al.: TCANet: triple context-aware network for weakly supervised object detection in remote sensing images. IEEE Trans. Geosci. Rem. Sens. 59(8), 6946-6955 (2020). https://doi.org/10.1109/tgrs.2020.3030990.

[21]	Li Y., et al.: Deepfusion:lidar-camera deep fusion for multi-modal 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17182-17191 (2022).

[22]	Pal S.K., et al.: Deep learning in multi-object detection and tracking: state of the art. Appl. Intell. 51(9), 6400-6429 (2021). https://doi.org/10.1007/s10489-021-02293-7.

[23]	Kendall A., Gal Y., Cipolla R.: Multi-task learning using uncer-tainty to weigh losses for scene geometry and semantics. IEEE, 7482-7491 (2018). https://doi.org/10.17863/CAM.25486.

[24]	Cao M., et al.: Vehicle detection in remote sensing images using deep neural networks and multi-task learning. ISPRS Annals 2, 797-804 (2020). https://doi.org/10.5194/isprs-annals-v-2-2020-797-2020.

[25]	Niu Y., et al.: SMNet: symmetric multi-task network for semantic change detection in remote sensing images based on CNN and trans-former. Rem. Sens. 15(4), 949 (2023). https://doi.org/10.3390/rs15040949.

[26]	Guo W., et al.: Geospatial object detection in high resolution sat-ellite images based on multi-scale convolutional neural network. Rem. Sens. 10(1), 131 (2018). https://doi.org/10.3390/rs10010131.

[27]	Hong D., et al.: Multi-task learning for building extraction and change detection from remote sensing images. Appl. Sci. 13(2), 1037 (2023). https://doi.org/10.3390/app13021037.

[28]	Song T.J., Jeong J., Kim J.H.: End-to-end real-time obstacle detection network for safe self-driving via multi-task learning. IEEE Trans. Intell. Transport. Syst. 23(9), 16318-16329 (2022). https://doi.org/10.1109/tits.2022.3149789.

[29]	Carion N., et al.: End-to-end Object Detection with transformers. Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I 16, pp. 213-229. Springer In-ternational Publishing (2020).

[30]	Wang S., et al.: Object tracking based on the fusion of roadside LiDAR and camera data. IEEE Trans. Instrum. Meas. 71, 1-14 (2022). https://doi.org/10.1109/tim.2022.3201938.

[31]	Jaderberg M., et al.: Spatial Transformer Networks. MIT Press (2015). https://doi.org/10.48550/arXiv.1506.02025.

[32]	Li X., Zhao Z., Wang Q.: ABSSNet: attention-based spatial seg-mentation network for traffic scene understanding. IEEE Trans. Cybern. 52(9), 9352-9362 (2021). https://doi.org/10.1109/tcyb.2021.3050558.

[33]	Woo S., et al.: Cbam:convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3-19 (2018).

[34]	Liu B., Lane I.: Attention-based recurrent neural network models for joint intent detection and slot filling. Interspeech 2016 (2016). https://doi.org/10.21437/interspeech.2016-1352.

[35]	Yang Y., et al.: A feature temporal attention based interleaved network for fast video object detection. J. Ambient Intell. Hum. Comput. 14(1), 497-509 (2023). https://doi.org/10.1007/s12652-021-03309-3.

[36]	Rong Y., et al.: Where and what: driver attention-based object detection. Proceedings of the ACM on Human-Computer Interaction 6(ETRA), 1-22 (2022). https://doi.org/10.1145/3530887.

[37]	Ounoughi C., Yahia S.B.: Data fusion for ITS: a systematic litera-ture review. Inf. Fusion, 267-291 (2023).

[38]	Xu Z., Zhao S., Zhang R.: An efficient multi-sensor fusion and tracking protocol in a vehicle-road collaborative system. IET Commun. 15(18), 2330-2341 (2021). https://doi.org/10.1049/cmu2.12273.

[39]	Bocu R., Bocu D., Iavich M.: Objects detection using sensors data fusion in autonomous driving scenarios. Electronics 10(23), 2903 (2021). https://doi.org/10.3390/electronics10232903.

[40]	Liu P., et al.: Object classification based on enhanced evidence theory: radar-vision fusion approach for roadside application. IEEE Trans. Instrum. Meas. 71, 1-12 (2022). https://doi.org/10.1109/tim.2022.3154001.

[41]	Wang J., et al.: Data fusion in infrastructure-augmented autono-mous driving system: why? where? and how? IEEE Internet Things J. 10(18), 15857-15871 (2023). https://doi.org/10.1109/jiot.2023.3266247.

[42]	Lei M., Yang D., Weng X.: Integrated sensor fusion based on 4D MIMO radar and camera: a solution for connected vehicle applications. IEEE Veh. Technol. Mag. 17(4), 38-46 (2022). https://doi.org/10.1109/mvt.2022.3207453.

[43]	Zou Z., et al.: Real-time full-stack traffic scene perception for autonomous driving with roadside cameras. International Conference on Robotics and Automation (ICRA). IEEE, 890-896 (2022).

[44]	Canny J.: A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence(6), 679-698 (1986). https://doi.org/10.1109/tpami.1986.4767851.

[45]	Haralick R.M.: Textural features for image classification. IEEE Transaction on Systems, Man, and Cybernetics. SMC, 3 (1973).

[46]	Li J., Tang J., Liu, H.: Reconstruction-based unsupervised feature selection:an embedded approach. In: Twenty-Sixth International Joint Conference on Artificial Intelligence (2017). https://doi.org/10.24963/ijcai.2017/300.

[47]	Wang C.Y., Bochkovskiy A., Liao, H.Y.M.: YOLOv7:trainable bag- of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464-7475 (2023).

[48]	Wang C.Y., Liao H.Y.M., Yeh I.H.: Designing network design strategies through gradient path analysis. arXiv preprint arXiv: 2211. 04800 (2022).

[49]	Tong Z., et al.: Wise-IoU: bounding box regression loss with dy-namic focusing mechanism. arXiv preprint arXiv:2301. 10051 (2023).

[50]	Yu C., et al.: Bisenet:bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 325-341 (2018).

[51]	Duan J., Zhang X., Shi T.: A hybrid attention-based paralleled deep learning model for tool wear prediction. Expert Syst. Appl. 211, 118548 (2023). https://doi.org/10.1016/j.eswa.2022.118548.

[52]	Zhao Y., et al.: Attention receptive pyramid network for ship detection in SAR images. IEEE J. Sel. Top. Appl. Earth Obs. Rem. Sens. 13, 2738-2756 (2020). https://doi.org/10.1109/jstars.2020.2997081.

[53]	Ma N., et al.: Shuffienet v2:practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision, pp. 116-131. ECCV (2018).

[54]	Lee Y., et al.: An energy and GPU-computation efficient backbone network for real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019).

[55]	Wang C.Y., Bochkovskiy A., Liao, H.Y.M.: Scaled-yolov4:scaling cross stage partial network. In: Proceedings of the IEEE/cvf Conference on Computer Vision and Pattern Recognition, pp. 13029-13038 (2021).

[56]	Yu J., et al.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516-520 (2016).

[57]	Rezatofighi H., et al.: Generalized intersection over union:a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658-666 (2019).

[58]	Zheng Z., et al.: Distance-IoU loss: faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intell. 34(07), 12993-13000 (2020). https://doi.org/10.1609/aaai.v34i07.6999.

[59]	Zhang Y.F., et al.: Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 506, 146-157 (2022). https://doi.org/10.1016/j.neucom.2022.07.042.

[60]	Geiger A., et al.: Vision meets robotics: the kitti dataset. Int. J. Robot Res. 32(11), 1231-1237 (2013). https://doi.org/10.1177/0278364913491297.

[61]	Cordts M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213-3223 (2016).

[62]	Wen L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. Comput. Vis. Image Understand. 193, 102907 (2020). https://doi.org/10.1016/j.cviu.2020.102907.

[63]	Ren S., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst., 28 (2015).

[64]	Zhan W., et al.: An improved Yolov5 real-time detection method for small objects captured by UAV. Soft Comput. 26(1), 361-373 (2022). https://doi.org/10.1007/s00500-021-06407-8.

[65]	Byun S.Y., Lee W.: Recipro-CAM: gradient-free reciprocal class activation map. arXiv preprint arXiv:2209. 14074 (2022).

[66]	Zhou B., et al.: Learning deep features for discriminative localiza-tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921-2929 (2016).

[67]	Selvaraju R.R., et al.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618-626 (2017).

[68]	Ge Z., et al.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107. 08430 (2021).

[69]	Wang C.Y., Yeh I.H., Liao H.Y.M.: You only learn one represen-tation: unified network for multiple tasks. arXiv preprint arXiv: 2105. 04206 (2021).