Contactless interaction recognition and interactor detection in multi-person scenes
Jiacheng LI , Ruize HAN , Wei FENG , Haomin YAN , Song WANG
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (5) : 185325
Contactless interaction recognition and interactor detection in multi-person scenes
Human interaction recognition is an essential task in video surveillance. The current works on human interaction recognition mainly focus on the scenarios only containing the close-contact interactive subjects without other people. In this paper, we handle more practical but more challenging scenarios where interactive subjects are contactless and other subjects not involved in the interactions of interest are also present in the scene. To address this problem, we propose an Interactive Relation Embedding Network (IRE-Net) to simultaneously identify the subjects involved in the interaction and recognize their interaction category. As a new problem, we also build a new dataset with annotations and metrics for performance evaluation. Experimental results on this dataset show significant improvements of the proposed method when compared with current methods developed for human interaction recognition and group activity recognition.
human-human interaction recognition / multiperson scene / contactless interaction / human relation modeling
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
Han R, Feng W, Zhao J, Niu Z, Zhang Y, Wan L, Wang S. Complementary-view multiple human tracking. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020 |
| [12] |
|
| [13] |
|
| [14] |
Kong Y, Jia Y, Fu Y. Learning human interaction by interactive phrases. In: Proceedings of the 12th European Conference on Computer Vision. 2012 |
| [15] |
Van Gemeren C, Poppe R, Veltkamp R C. Spatio-temporal detection of fine-grained dyadic human interactions. In: Proceedings of the 7th International Workshop on Human Behavior Understanding. 2016 |
| [16] |
Taylor G W, Fergus R, LeCun Y, Bregler C. Convolutional learning of spatio-temporal features. In: Proceedings of the 11th European Conference on Computer Vision. 2010 |
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
Song S, Lan C, Xing J, Zeng W, Liu J. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017 |
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
Li J, Han R, Yan H, Qian Z, Feng W, Wang S. Self-supervised social relation representation for human group detection. In: Proceedings of the 17th European Conference on Computer Vision. 2022 |
| [39] |
Han R, Yan H, Li J, Wang S, Feng W, Wang S. Panoramic human activity recognition. In: Proceedings of the 17th European Conference on Computer Vision. 2022 |
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
Yuan H, Ni D. Learning visual context for group activity recognition. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021 |
| [44] |
|
| [45] |
|
| [46] |
Choi W, Shahid K, Savarese S. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: Proceedings of the 12th IEEE International Conference on Computer Vision Workshops, ICCV Workshops. 2009 |
| [47] |
|
| [48] |
Li W, Duan Y, Lu J, Feng J, Zhou J. Graph-based social relation reasoning. In: Proceedings of the 16th European Conference on Computer Vision. 2020 |
| [49] |
|
| [50] |
Qi S, Wang W, Jia B, Shen J, Zhu S C. Learning human-object interactions by graph parsing neural networks. In: Proceedings of the 15th European Conference on Computer Vision. 2018 |
| [51] |
|
| [52] |
Qiao T, Men Q, Li F W, Kubotani Y, Morishima S, Shum H P H. Geometric features informed multi-person human-object interaction recognition in videos. In: Proceedings of the 17th European Conference on Computer Vision. 2022 |
| [53] |
|
| [54] |
Li F, Wang S, Wang S, Zhang L. Human-object interaction detection: a survey of deep learning-based methods. In: Proceedings of the 2nd CAAI International Conference on Artificial Intelligence. 2022 |
| [55] |
|
| [56] |
|
| [57] |
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. 2010 |
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L. Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of the 14th European Conference on Computer Vision. 2016 |
| [67] |
Han R, Gan Y, Li J, Wang F, Feng W, Wang S. Connecting the complementary-view videos: joint camera identification and subject association. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022 |
| [68] |
|
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |