AHAT: adaptive hybrid association strategy for multi-object tracking in complex motion scenes
Keyu Wang , Yajie Yang , Xiuxian Li
Autonomous Intelligent Systems ›› 2026, Vol. 6 ›› Issue (1) : 7
Multi-object tracking (MOT) has long been a challenging task in computer vision, particularly in complex scenes with intricate motion patterns and frequent occlusions. Existing approaches often face significant hurdles in maintaining consistent and accurate trajectories under such demanding conditions. The integration of motion and appearance cues has proven beneficial, yet most methods rely on static fusion strategies that fail to adapt to dynamic scene variations. In this paper, we propose Adaptive Hybrid Association Tracking (AHAT), a novel framework designed to address the limitations of traditional MOT methods. AHAT employs a two-stage dynamic feature selection mechanism. The first stage combines motion and appearance features to achieve high-precision matching for high-scoring detection boxes, while the second stage utilizes a dynamic threshold for simple matching against low-scoring detection boxes. This approach effectively reduces trajectory fragmentation and ID switches, improving tracking robustness in crowded and dynamic environments. Notably, AHAT achieves a 5% improvement in HOTA in scenarios with low detection confidence or high motion complexity and reduces identity switches by over 10%. These results highlight AHAT’s effectiveness in practical applications, especially in video surveillance and robotics where high accuracy and real-time performance are crucial. The modular design of AHAT allows for seamless integration into existing tracking frameworks, offering a simple yet effective solution.
Adaptive hybrid association / Data association / Multi-object tracking / Visual tracking
| [1] |
|
| [2] |
|
| [3] |
W. Luo, et al., Multiple object tracking: a literature review. Artif. Intell., 103448 (2021) |
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
P. Sun, et al., TransTrack: multiple object tracking with transformer (2020). arXiv preprint. arXiv:2012.15460 |
| [23] |
|
| [24] |
|
| [25] |
A. Milan, L. Leal-Taixé, I. Reid, S. Roth, K. Schindler, MOT16: a benchmark for multi-object tracking (2016). arXiv preprint. arXiv:1603.00831 |
| [26] |
P. Dendorfer, et al., MOT20: a benchmark for multi object tracking in crowded scenes (2020). arXiv preprint. arXiv:2003.09003 |
| [27] |
|
| [28] |
|
| [29] |
Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, YOLOX: exceeding yolo series in 2021 (2021). arXiv preprint. arXiv:2107.08430 |
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
Z. Wang, H. Zhao, Y.L. Li, S. Wang, P. Torr, L. Bertinetto, Do different tracking tasks require different appearance models? Adv. Neural Inf. Process. Syst., 726–738 (2021) |
| [36] |
X. Zhou, V. Koltun, P. Krähenbühl, Tracking objects as points in European Conference on Computer Vision (2020), pp. 474–490 |
| [37] |
|
| [38] |
|
| [39] |
J. Cao, H. Wu, K. Kitani, Track targets by dense spatio-temporal position encoding (2022). arXiv preprint. arXiv:2210.09455 |
| [40] |
|
The Author(s)
/
| 〈 |
|
〉 |