Accelerating temporal action proposal generation via high performance computing
Tian WANG , Shiye LEI , Youyou JIANG , Choi CHANG , Hichem SNOUSSI , Guangcun SHAN , Yao FU
Front. Comput. Sci. ›› 2022, Vol. 16 ›› Issue (4) : 164317
Accelerating temporal action proposal generation via high performance computing
Temporal action proposal generation aims to output the starting and ending times of each potential action for long videos and often suffers from high computation cost. To address the issue, we propose a new temporal convolution network called Multipath Temporal ConvNet (MTCN). In our work, one novel high performance ring parallel architecture based is further introduced into temporal action proposal generation in order to respond to the requirements of large memory occupation and a large number of videos. Remarkably, the total data transmission is reduced by adding a connection between multiplecomputing load in the newly developed architecture. Compared to the traditional Parameter Server architecture, our parallel architecture has higher efficiency on temporal action detection tasks with multiple GPUs. We conduct experiments on ActivityNet-1.3 and THUMOS14, where our method outperformsother state-of-art temporal action detection methods with high recall and high temporal precision. In addition, a time metric is further proposed here to evaluate the speed performancein the distributed training process.
temporal convolution / temporal action proposal generation / deep learning
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
Caba Heilbron F, Escorcia V, Ghanem B, Carlos Niebles J. Activitynet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 961−970 |
| [7] |
Jiang Y G, Liu J, Zamir A. R, Toderici G, Laptev I, Shah M, Sukthankar R. Thumos challenge: action recognition with a large number of classes. 2014 |
| [8] |
Lin T, Zhao X, Su H, Wang C, Yang M. BSN: boundary sensitive network for temporal action proposal generation. In: Proceedings of the European Conference on Computer Vision. 2018, 3−19 |
| [9] |
Buch S, Escorcia V, Shen C, Ghanem B, Carlos Niebles J. SST: singlestream temporal action proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2911−2920 |
| [10] |
Caba Heilbron F, Carlos Niebles J, Ghanem B. Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 1914−1923 |
| [11] |
|
| [12] |
Shou Z, Wang D, Chang SF. Temporal action localization in untrimmed videos via multi-stage cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 1049−1058 |
| [13] |
Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le Q V, et al. Large scale distributed deep networks. In: Proceedings of the Advances in Neural Information Processing Systems. 2012, 1223–1231 |
| [14] |
Karaman S, Seidenari L, Del Bimbo A. Fast saliency based pooling of fisher encoded dense trajectories. In: Proceedings of the European Conference on Computer Vision THUMOS Workshop. 2014 |
| [15] |
|
| [16] |
Wang T, Chen Y, Lin Z, Zhu A, Li Y, Snoussi H, Wang H. Recapnet: action proposal generation mimicking human cognitive process. IEEE Transactions on Cybernetics, 2020, |
| [17] |
Gao J, Yang Z, Chen K, Sun C, Nevatia R. Turn tap: temporal unit regression network for temporal action proposals. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 3628−3636 |
| [18] |
Zhao Y, Xiong Y, Wang L, Wu Z, Tang X, Lin D. Temporal action detection with structured segment networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 2914−2923 |
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
Wang H, Kläser A, Schmid C, Liu C L. Action recognition by dense trajectories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011, 3169−3176 |
| [24] |
Feichtenhofer C, Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 1933−1941 |
| [25] |
|
| [26] |
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 4489−4497 |
| [27] |
|
| [28] |
|
| [29] |
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 7132−7141 |
| [30] |
Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 568−576 |
| [31] |
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado G S, Davis A, Dean J, Devin M, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. 2016, arXiv preprint arXiv: 1603.04467 |
| [32] |
Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. 2015, arXiv preprint arXiv: 1502.03167 |
| [33] |
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778 |
| [34] |
|
| [35] |
Ghanem B, Niebles J C, Snoek C, Heilbron F C, Alwassel H, Khrisna R, Escorcia V, Hata K, Buch S. Activitynet challenge 2017 summary. 2017, arXiv preprint arXiv: 1710.08011 |
Higher Education Press
/
| 〈 |
|
〉 |