Scale-aware Gaussian mixture loss for crowd localization transformers

Alabi Mehzabin Anisha , Sriram Chellappan

High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (3) : 100296

PDF (2331KB)
High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (3) : 100296 DOI: 10.1016/j.hcc.2024.100296
Research Articles
research-article

Scale-aware Gaussian mixture loss for crowd localization transformers

Author information +
History +
PDF (2331KB)

Abstract

A fundamental problem in crowd localization using computer vision techniques stems from intrinsic scale shifts. Scale shifts occur when the crowd density within an image is uneven and chaotic, a feature common in dense crowds. At locations nearer to the camera, crowd density is lower than those farther away. Consequently, there is a significant change in the number of pixels representing a person across locations in an image depending on the camera’s position. Existing crowd localization methods do not effectively handle scale shifts, resulting in relatively poor performance in dense crowd images. In this paper, we explicitly address this challenge. Our method, called Gaussian Loss Transformers (GLT), directly incorporates scale variants in crowds by adapting loss functions to handle them in the end-to-end training pipeline. To inform the model about the scale variants within the crowd, we utilize a Gaussian mixture model (GMM) for pre-processing the ground truths into non-overlapping clusters. This cluster information is utilized as a weighting factor while computing the localization loss for that cluster. Extensive experiments on state-of-the-art datasets and computer vision models reveal that our method improves localization performance in dense crowd images. We also analyze the effect of multiple parameters in our technique and report findings on their impact on crowd localization performance.

Keywords

Gaussian mixture model / Crowd localization / Vision transformers

Cite this article

Download citation ▾
Alabi Mehzabin Anisha, Sriram Chellappan. Scale-aware Gaussian mixture loss for crowd localization transformers. High-Confidence Computing, 2025, 5(3): 100296 DOI:10.1016/j.hcc.2024.100296

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Alabi Mehzabin Anisha: Writing - original draft, Visualization, Validation, Methodology, Formal analysis, Data curation, Conceptualization. Sriram Chellappan: Writing - review & editing, Supervision, Resources, Investigation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appearedto influence the work reported in this paper.

References

[1]

J. Wan, Z. Liu, A.B. Chan, A generalized loss function for crowd counting and localization, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1974-1983.

[2]

S. Abousamra, M. Hoai, D. Samaras, C. Chen, Localization in the crowd with topological constraints, in:Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 2, 2021, pp. 872-881.

[3]

Q. Song, C. Wang, Z. Jiang, Y. Wang, Y. Tai, C. Wang, J. Li, F. Huang, Y. Wu, Rethinking counting and localization in crowds: A purely point-based framework,in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3365-3374.

[4]

D. Liang, W. Xu, Y. Zhu, Y. Zhou, Focal inverse distance transform maps for crowd localization, IEEE Trans. Multimed. (2022).

[5]

D. Liang, W. Xu, X. Bai, An end-to-end transformer model for crowd localization, in: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part I, Springer, 2022, pp. 38-54.

[6]

J. Gao, M. Gong, X. Li, Congested crowd instance localization with dilated convolutional swin transformer, Neurocomputing 513 (2022) 94-103.

[7]

J. Cheng, H. Xiong, Z. Cao, H. Lu, Decoupled two-stage crowd counting and beyond, IEEE Trans. Image Process. 30 (2021) 2862-2875.

[8]

C. Xu, D. Liang, Y. Xu, S. Bai, W. Zhan, X. Bai, M. Tomizuka, Autoscale: learning to scale for crowd counting, Int. J. Comput. Vis. 130 (2) (2022) 405-434.

[9]

Y. Liu, M. Shi, Q. Zhao, X. Wang, Point in, box out: Beyond counting persons in crowds,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 6469-6478.

[10]

D.B. Sam, S.V. Peri, M.N. Sundararaman, A. Kamath, R.V. Babu, Locate, size, and count: accurately resolving people in dense crowds via detection, IEEE Trans. Pattern Anal. Machine Intell. 43 (8) (2020) 2739-2751.

[11]

Y. Wang, J. Hou, X. Hou, L.-P. Chau, A self-training approach for point-supervised object detection and counting in crowds, IEEE Trans. Image Process. 30 (2021) 2876-2887.

[12]

D.B. Sam, S.V. Peri, N. Mukuntha, R.V. Babu, Going beyond the regression paradigm with accurate dot prediction for dense crowds, in: 2020 IEEE Winter Conference on Applications of Computer Vision, WACV, IEEE, 2020, pp. 2853-2861.

[13]

D. Meng, X. Chen, Z. Fan, G. Zeng, H. Li, Y. Yuan, L. Sun, J. Wang, Conditional detr for fast training convergence, in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3651-3660.

[14]

X. Cao, Z. Wang, Y. Zhao, F. Su, Scale aggregation network for accurate and efficient crowd counting, in:Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 734-750.

[15]

Y. Li, X. Zhang, D. Chen, Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1091-1100.

[16]

Y. Tian, Y. Lei, J. Zhang, J.Z. Wang, Padnet: Pan-density crowd counting, IEEE Trans. Image Process. 29 (2019) 2714-2727.

[17]

L. Liu, Z. Qiu, G. Li, S. Liu, W. Ouyang, L. Lin, Crowd counting with deep structured scale integration network, in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1774-1783.

[18]

C. Xu, K. Qiu, J. Fu, S. Bai, Y. Xu, X. Bai, Learn to scale: Generating multipolar normalized density maps for crowd counting,in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8382-8390.

[19]

Z. Yan, Y. Yuan, W. Zuo, X. Tan, Y. Wang, S. Wen, E. Ding, Perspective-guided convolution networks for crowd counting, in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 952-961.

[20]

M. Rodriguez, I. Laptev, J. Sivic, J.-Y. Audibert, Density-aware person detection and tracking in crowds, in: 2011 International Conference on Computer Vision, IEEE, 2011, pp. 2423-2430.

[21]

Z. Ma, L. Yu, A.B. Chan, Small instance detection by integer programming on object density maps, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3689-3697.

[22]

H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, M. Shah, Composition loss for counting, density map estimation and localization in dense crowds, in:Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 532-546.

[23]

C. Liu, X. Weng, Y. Mu, Recurrent attentive zooming for joint crowd counting and precise localization, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1217-1226.

[24]

J. Gao, T. Han, Q. Wang, Y. Yuan, X. Li, Learning independent instance maps for crowd localization, 2020, arXiv preprint arXiv:2012.04164.

[25]

R. Girshick, Fast r-cnn, in:Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440-1448.

[26]

J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection,in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779-788.

[27]

P. Hu, D. Ramanan, Finding tiny faces, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 951-959.

[28]

I.H. Laradji, N. Rostamzadeh, P.O. Pinheiro, D. Vazquez, M. Schmidt, Where are the blobs: Counting by localization with point supervision,in:Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 547-562.

[29]

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I 16, Springer, 2020, pp. 213-229.

[30]

X. Liu, G. Li, Y. Qi, Z. Han, A. van den Hengel, N. Sebe, M.-H. Yang, Q. Huang, Consistency-aware anchor pyramid network for crowd localization, IEEE Trans. Pattern Anal. Machine Intell. (2024).

[31]

Z. Ma, X. Wei, X. Hong, Y. Gong, Learning scales from points: A scale-aware probabilistic model for crowd counting,in:Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 220-228.

[32]

Y. Liu, Q. Wen, H. Chen, W. Liu, J. Qin, G. Han, S. He, Crowd counting via cross-stage refinement networks, IEEE Trans. Image Process. 29 (2020) 6800-6812.

[33]

M. Shi, Z. Yang, C. Xu, Q. Chen, Revisiting perspective information for efficient crowd counting, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7279-7288.

[34]

Y. Yang, G. Li, D. Du, Q. Huang, N. Sebe, Embedding perspective analysis into multi-column convolutional neural network for crowd counting, IEEE Trans. Image Process. 30 (2020) 1395-1407.

[35]

Y. Yang, G. Li, Z. Wu, L. Su, Q. Huang, N. Sebe, Reverse perspective network for perspective-aware object counting, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4374-4383.

[36]

U. Sajid, G. Wang, Plug-and-play rescaling based crowd counting in static images, in:Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 2287-2296.

[37]

J. Wang, J. Gao, Y. Yuan, Q. Wang, Crowd localization from gaussian mixture scoped knowledge and scoped teacher, IEEE Trans. Image Process. 32 (2023) 1802-1814.

[38]

Y. Zhang, D. Zhou, S. Chen, S. Gao, Y. Ma, Single-image crowd counting via multi-column convolutional neural network, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 589-597.

[39]

C.M. Bishop, N.M. Nasrabadi, Pattern recognition and machine learning, vol. 4, (4) Springer, 2006.

[40]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.

[41]

V.A. Sindagi, R. Yasarla, V.M. Patel, Jhu-crowd++: Large-scale crowd counting dataset and a benchmark method, IEEE Trans. Pattern Anal. Mach. Intell. 44 (5) (2020) 2594-2609.

[42]

Q. Wang, J. Gao, W. Lin, X. Li, NWPU-crowd: A large-scale benchmark for crowd counting and localization, IEEE Trans. Pattern Anal. Mach. Intell. 43 (6) (2020) 2141-2149.

[43]

W. Zhai, M. Gao, X. Guo, Q. Li, G. Jeon, Scale-context perceptive network for crowd counting and localization in smart city system, IEEE Internet Things J. 10 (21) (2023) 18930-18940.

[44]

B. Chen, Z. Yan, K. Li, P. Li, B. Wang, W. Zuo, L. Zhang, Variational attention: Propagating domain-specific knowledge for multi-domain learning in crowd counting,in:Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16065-16075.

[45]

W. Shu, J. Wan, K.C. Tan, S. Kwong, A.B. Chan, Crowd counting in the frequency domain, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 19618-19627.

[46]

Z.-Q. Cheng, Q. Dai, H. Li, J. Song, X. Wu, A.G. Hauptmann, Rethinking spatial invariance of convolutional networks for object counting, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 19638-19648.

[47]

H. Xiong, A. Yao, Discrete-constrained regression for local counting models, in: European Conference on Computer Vision, Springer, 2022, pp. 621-636.

[48]

Z. Du, M. Shi, J. Deng, S. Zafeiriou, Redesigning multi-scale neural network for crowd counting, IEEE Trans. Image Process. (2023).

[49]

Y. Li, G. Yuan, Y. Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y. Wang, J. Ren, Efficientformer: Vision transformers at mobilenet speed, Adv. Neural Inf. Process. Syst. 35 (2022) 12934-12949.

[50]

L. Yu, W. Xiang, X-pruner: explainable pruning for vision transformers,in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 24355-24363.

AI Summary AI Mindmap
PDF (2331KB)

383

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/