Multi-level temporal feature fusion with feature exchange strategy for multiple object tracking

Yisu Ge, Wenjie Ye, Guodao Zhang, Mengying Lin

Optoelectronics Letters ›› 2024, Vol. 20 ›› Issue (8) : 505-512.

Optoelectronics Letters ›› 2024, Vol. 20 ›› Issue (8) : 505-512. DOI: 10.1007/s11801-024-4139-5
Article

Multi-level temporal feature fusion with feature exchange strategy for multiple object tracking

Author information +
History +

Abstract

With the deepening of neural network research, object detection has been developed rapidly in recent years, and video object detection methods have gradually attracted the attention of scholars, especially frameworks including multiple object tracking and detection. Most current works prefer to build the paradigm for multiple object tracking and detection by multi-task learning. Different with others, a multi-level temporal feature fusion structure is proposed in this paper to improve the performance of framework by utilizing the constraint of video temporal consistency. For training the temporal network end-to-end, a feature exchange training strategy is put forward for training the temporal feature fusion structure efficiently. The proposed method is tested on several acknowledged benchmarks, and encouraging results are obtained compared with the famous joint detection and tracking framework. The ablation experiment answers the problem of a good position for temporal feature fusion.

Cite this article

Download citation ▾
Yisu Ge, Wenjie Ye, Guodao Zhang, Mengying Lin. Multi-level temporal feature fusion with feature exchange strategy for multiple object tracking. Optoelectronics Letters, 2024, 20(8): 505‒512 https://doi.org/10.1007/s11801-024-4139-5

References

[1]
FeichtenhoferC, PinzA, ZissermanA. Detect to track and track to detect. Proceedings of the IEEE International Conference on Computer Vision, October 22–29, 2017, Venice, Italy, 2017, New York, IEEE: 3038-3046[C]
[2]
ZhangY, WangC, WangX, et al.. FairMOT: on the fairness of detection and re-identification in multiple object tracking. International journal of computer vision, 2021, 129: 3069-3087 J]
CrossRef Google scholar
[3]
PengJ L, WangQ, WangX. Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. 16th European Conference on Computer Vision, August 23–28, 2020, Glasgow, UK, 2020, Heidelberg, Springer: 145-161[C]
[4]
ZhouX, KoltunV, KrähenbühlP. Tracking objects as points. 16th European Conference on Computer Vision, August 23–28, 2020, Glasgow, UK, 2020, Heidelberg, Springer: 474-490[C]
[5]
ZhangY, WangC, WangX, et al.. Bytetrack: multi-object tracking by associating every detection box. 17th European Conference on Computer Vision, October 24–28, 2022, Tel Aviv, Israel, 2022, Heidelberg, Springer: 1-21[C]
[6]
ChenX, PengH, WangD, et al.. SeqTrack: sequence to sequence learning for visual object tracking. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18–22, 2023, Vancouver, Canada, 2023, New York, IEEE: 14572-14581[C]
[7]
LiuM, ZhuM. Mobile video object detection with temporally-aware feature maps. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18–22, 2018, Salt Lake City, UT, USA, 2018, New York, IEEE: 5686-5695[C]
[8]
BertasiusG, TorresaniL, ShiJ. Object detection in video with spatiotemporal sampling networks. 15th European Conference on Computer Vision, September 8–14, 2018, Munich, Germany, 2018, Heidelberg, Springer: 331-346[C]
[9]
GuoC, ZhengN, TanY, et al.. Progressive sparse local attention for video object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, October 27–November 2, 2019, Seoul, Korea, 2019, New York, IEEE: 3909-3918[C]
[10]
TangP, WangC, WangX, et al.. Object detection in videos by high quality object linking. IEEE transactions on pattern analysis and machine intelligence, 2019, 42(5):1272-1278 J]
CrossRef Google scholar
[11]
XuY, BanY, DelormeG, et al.. TransCenter: transformers with dense representations for multiple-object tracking. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(6):7820-7835 J]
CrossRef Google scholar
[12]
YuF, WangD, ShelhamerE, et al.. Deep layer aggregation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18–22, 2018, Salt Lake City, UT, USA, 2018, New York, IEEE: 2403-2412[C]
[13]
LEAL-TAIXÉ L, MILAN A, REID I, et al. Motchallenge 2015: towards a benchmark for multi-target tracking[EB/OL]. (2015-04-01) [2023-12-23]. https://arxiv.org/abs/1504.01942.
[14]
MILAN A, LEAL-TAIXÉ L, REID I, et al. MOT16: a benchmark for multi-object tracking[EB/OL]. (2016-03-01) [2023-12-23]. https://arxiv.org/abs/1603.00831.
[15]
SHAO S, ZHANG Y, ZENG W, et al. Crowdhuman: a benchmark for detecting human in a crowd[EB/OL]. (2018-05-01) [2023-12-23]. https://arxiv.org/abs/1805.00123.
[16]
GeigerA, LenzP, UrtasunR. Are we ready for autonomous driving? The KITTI vision benchmark suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 16–21, 2012, Providence, RI, USA, 2012, New York, IEEE: 3354-3361 C]
CrossRef Google scholar
[17]
CaesarH, BankitiV, LangA, et al.. Nuscenes: a multimodal dataset for autonomous driving. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 14–19, 2020, Seattle, WA, USA, 2020, New York, IEEE: 11621-11631[C]
[18]
DollárP, WojekC, SchieleB, et al.. Pedestrian detection: a benchmark. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 20–25, 2009, Miami, FL, USA, 2009, New York, IEEE: 304-311 C]
CrossRef Google scholar
[19]
ZhangS, BenensonR, SchieleB. Citypersons: a diverse dataset for pedestrian detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21–26, 2017, Honolulu, HI, USA, 2017, New York, IEEE: 3213-3221[C]
[20]
XiaoT, LiS, WangB, et al.. Joint detection and identification feature learning for person search. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21–26, 2017, Honolulu, HI, USA, 2017, New York, IEEE: 3415-3424[C]
[21]
ZhengL, ZhangH, SunS, et al.. Person re-identification in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21–26, 2017, Honolulu, HI, USA, 2017, New York, IEEE: 1367-1376[C]
[22]
EssA, LeibeB, SchindlerK, et al.. A mobile vision system for robust multi-person tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 23–28, 2008, Anchorage, AK, USA, 2008, New York, IEEE: 1-8[C]

Accesses

Citations

Detail

Sections
Recommended

/