Learning group interaction for sports video understanding from a perspective of athlete
Rui HE , Zehua FU , Qingjie LIU , Yunhong WANG , Xunxun CHEN
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (4) : 184705
Learning group interaction for sports video understanding from a perspective of athlete
Learning activities interactions between small groups is a key step in understanding team sports videos. Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rather than the athlete. For team sports videos such as volleyball and basketball videos, there are plenty of intra-team and inter-team relations. In this paper, a new task named Group Scene Graph Generation is introduced to better understand intra-team relations and inter-team relations in sports videos. To tackle this problem, a novel Hierarchical Relation Network is proposed. After all players in a video are finely divided into two teams, the feature of the two teams’ activities and interactions will be enhanced by Graph Convolutional Networks, which are finally recognized to generate Group Scene Graph. For evaluation, built on Volleyball dataset with additional 9660 team activity labels, a Volleyball+ dataset is proposed. A baseline is set for better comparison and our experimental results demonstrate the effectiveness of our method. Moreover, the idea of our method can be directly utilized in another video-based task, Group Activity Recognition. Experiments show the priority of our method and display the link between the two tasks. Finally, from the athlete’s view, we elaborately present an interpretation that shows how to utilize Group Scene Graph to analyze teams’ activities and provide professional gaming suggestions.
group scene graph / group activity recognition / scene graph generation / graph convolutional network / sports video understanding
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
Choi W, Shahid K, Savarese S. What are they doing?: collective activity classification using spatio-temporal relationship among people. In: Proceedings of the 12th IEEE International Conference on Computer Vision Workshops, ICCV Workshops. 2009, 1282−1289 |
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
Cao Y, Chen D, Xu Z, Li H, Luo P. Nested relation extraction with iterative neural network. Frontiers of Computer Science, 2021, 15(3): 153323 |
| [33] |
Lv X, Xiao W, Zhang Y, Liao X, Jin H, Hua Q. An effective framework for asynchronous incremental graph processing. Frontiers of Computer Science, 2019, 13(3): 539–551 |
| [34] |
Ju W, Li J, Yu W, Zhang R. iGraph: an incremental data processing system for dynamic graph. Frontiers of Computer Science, 2016, 10(3): 462–476 |
| [35] |
Wang H, Wang S B, Li Y F. Instance selection method for improving graph-based semi-supervised learning. Frontiers of Computer Science, 2018, 12(4): 725–735 |
| [36] |
Wang C, Zhou G, He X, Zhou A. NERank+: a graph-based approach for entity ranking in document collections. Frontiers of Computer Science, 2018, 12(3): 504–517 |
| [37] |
Por L Y, Ku C S, Islam A, Ang T F. Graphical password: prevent shoulder-surfing attack using digraph substitution rules. Frontiers of Computer Science, 2017, 11(6): 1098–1108 |
| [38] |
Wang Y, Wang H, Li J, Gao H. Efficient graph similarity join for information integration on graphs. Frontiers of Computer Science, 2016, 10(2): 317–329 |
| [39] |
Ma S, Li J, Hu C, Lin X, Huai J. Big graph search: challenges and techniques. Frontiers of Computer Science, 2016, 10(3): 387–398 |
| [40] |
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L J, Shamma D A, Bernstein M S, Li F F, Visual genome: connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 2017, 123(1): 32–73 |
| [41] |
Xu D, Zhu Y, Choy C B, Li F F. Scene graph generation by iterative message passing. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3097–3106 |
| [42] |
Tang K, Niu Y, Huang J, Shi J, Zhang H. Unbiased scene graph generation from biased training. 2020, arXiv preprint arXiv: 2002.11949 |
| [43] |
Zellers R, Yatskar M, Thomson S, Choi Y. Neural motifs: scene graph parsing with global context. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5831−5840 |
| [44] |
Tang K, Zhang H, Wu B, Luo W, Liu W. Learning to compose dynamic tree structures for visual contexts. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 6619−6628 |
| [45] |
Cormen T H, Leiserson C E, Rivest R L, Stein C. Introduction to Algorithms. 2nd ed. Cambridge: MIT Press, 2001 |
| [46] |
Tai K S, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015, 1556−1566 |
| [47] |
Qi M, Li W, Yang Z, Wang Y, Luo J. Attentive relational networks for mapping images to scene graphs. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 3957−3966 |
| [48] |
Liu R, Han Y. Instance-sequence reasoning for video question answering. Frontiers of Computer Science, 2022, 16(6): 166708 |
| [49] |
He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 2980−2988 |
| [50] |
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 721 |
| [51] |
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2818−2826 |
| [52] |
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. 2015 |
| [53] |
Yang J, Lu J, Lee S, Batra D, Parikh D. Graph R-CNN for scene graph generation. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 690−706 |
| [54] |
Deng Z, Vahdat A, Hu H, Mori G. Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 4772−4781 |
| [55] |
Hajimirsadeghi H, Yan W, Vahdat A, Mori G. Visual recognition by counting instances: a multi-instance cardinality potential kernel. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2596−2605 |
| [56] |
Li X, Chuah M C. SBGAR: semantics based group activity recognition. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 2895−2904 |
| [57] |
Shu T, Todorovic S, Zhu S C. CERN: confidence-energy recurrent network for group activity recognition. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 4255−4263 |
| [58] |
Bagautdinov T, Alahi A, Fleuret F, Fua P, Savarese S. Social scene understanding: end-to-end multi-person action localization and collective activity recognition. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3425−3434 |
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |