Short video preloading via domain knowledge assisted deep reinforcement learning

Yuhong Xie , Yuan Zhang , Tao Lin , Zipeng Pan , Si-Ze Qian , Bo Jiang , Jinyao Yan

›› 2024, Vol. 10 ›› Issue (6) : 1826 -1836.

PDF
›› 2024, Vol. 10 ›› Issue (6) :1826 -1836. DOI: 10.1016/j.dcan.2024.01.006
Research article
research-article

Short video preloading via domain knowledge assisted deep reinforcement learning

Author information +
History +
PDF

Abstract

Short video applications like TikTok have seen significant growth in recent years. One common behavior of users on these platforms is watching and swiping through videos, which can lead to a significant waste of bandwidth. As such, an important challenge in short video streaming is to design a preloading algorithm that can effectively decide which videos to download, at what bitrate, and when to pause the download in order to reduce bandwidth waste while improving the Quality of Experience (QoE). However, designing such an algorithm is non-trivial, especially when considering the conflicting objectives of minimizing bandwidth waste and maximizing QoE. In this paper, we propose an end-to-end Deep reinforcement learning framework with Action Masking called DAM that leverages domain knowledge to learn an optimal policy for short video preloading. To achieve this, we introduce a reward shaping technique to minimize bandwidth waste and use action masking to make actions more reasonable, reduce playback rebuffering, and accelerate the training process. We have conducted extensive experiments using real-world video datasets and network traces including 4G/WiFi/5G. Our results show that DAM improves the QoE score by 3.73%-11.28% compared to state-of-the-art algorithms, and achieves an average bandwidth waste of only 10.27%-12.07%, outperforming all baseline methods.

Keywords

Short video preloading / Deep reinforcement learning / Reward shaping / Action masking / Domain knowledge

Cite this article

Download citation ▾
Yuhong Xie, Yuan Zhang, Tao Lin, Zipeng Pan, Si-Ze Qian, Bo Jiang, Jinyao Yan. Short video preloading via domain knowledge assisted deep reinforcement learning. , 2024, 10(6): 1826-1836 DOI:10.1016/j.dcan.2024.01.006

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Kuaishou,Kuaishou English website, https://www.kuaishou.com/en, 2023. (Ac-cessed 8 December 2023).

[2]

SensorTower, Q1 2022: store intelligence data digest, https://go.sensortower.com/rs/351-RWH-315/images/Sensor-Tower-Q1-2022-Data-Digest.pdf, 2023. (Ac-cessed 8 December 2023).

[3]

S. Zhu, T. Karagioules, E. Halepovic, A. Mohammed, A.D. Striegel, Swipe along: a measurement study of short video services,in: Proceedings of the 13th ACM Multi-media Systems Conference, 2022, pp. 123-135.

[4]

G. Zhang, J. Zhang, K. Liu, J. Guo, J.Y. Lee, H. Hu, V. Aggarwal, Duasvs: a mobile data saving strategy in short-form video streaming, IEEE Trans. Serv. Comput. 16 (2)(2022) 1066-1078.

[5]

X. Zuo, Y. Li, M. Xu, W.T. Ooi, J. Liu, J. Jiang, X. Zhang, K. Zheng, Y. Cui,Bandwidth-efficient multi-video prefetching for short video streaming, in:Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 7084-7088.

[6]

J. He, M. Hu, Y. Zhou, D. Wu,Liveclip: towards intelligent mobile short-form video streaming with deep reinforcement learning, in: Proceedings of the 30th ACM Work-shop on Network and Operating Systems Support for Digital Audio and Video, 2020, pp. 54-59.

[7]

H. Zhang, Y. Ban, X. Zhang, Z. Guo, Z. Xu, S. Meng, J. Li, Y. Wang,Apl: adap-tive preloading of short video with Lyapunov optimization, in: 2020 IEEE Interna-tional Conference on Visual Communications and Image Processing (VCIP), 2020, pp. 13-16.

[8]

J. Li, H. Xu, M. Ma, H. Yan, C.J. Xue, Alfie: neural-reinforced adaptive prefetching for short videos, in: ICME, 2022 IEEE International Conference on Multimedia and Expo (ICME), 2022, pp. 1-6.

[9]

X. Yin, A. Jindal, V. Sekar, B. Sinopoli, A control-theoretic approach for dynamic adaptive video streaming over http, in:Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, 2015, pp. 325-338.

[10]

Z. Akhtar, Y.S. Nam, R. Govindan, S. Rao, J. Chen, E. Katz-Bassett, B. Ribeiro, J. Zhan, H. Zhang, Oboe: auto-tuning video abr algorithms to network conditions,in: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, 2018, pp. 44-58.

[11]

K. Spiteri, R. Urgaonkar, R.K. Sitaraman, Bola: near-optimal bitrate adaptation for online videos, IEEE/ACM Trans. Netw. 28 (4) (2020) 1698-1711.

[12]

T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell, M. Watson,A buffer-based ap-proach to rate adaptation: evidence from a large video streaming service, in: Pro-ceedings of the 2014 ACM Conference on SIGCOMM, 2014, pp. 187-198.

[13]

Y. Qin, C. Shende, C. Park, S. Sen, B. Wang, Dataplanner: data-budget driven ap-proach to resource-efficient abr streaming,in: Proceedings of the 12th ACM Multi-media Systems Conference, 2021, pp. 94-107.

[14]

H. Yuan, S. Zhao, J. Hou, X. Wei, S. Kwong, Spatial and temporal consistency-aware dynamic adaptive streaming for 360-degree videos, IEEE J. Sel. Top. Signal Process. 14 (1) (2019) 177-193.

[15]

H. Yuan, H. Fu, J. Liu, J. Hou, S. Kwong, Non-cooperative game theory based rate adaptation for dynamic video streaming over http, IEEE Trans. Mob. Comput. 17 (10) (2018) 2334-2348.

[16]

H. Yuan, X. Wei, F. Yang, J. Xiao, S. Kwong, Cooperative bargaining game-based multiuser bandwidth allocation for dynamic adaptive streaming over http, IEEE Trans. Multimed. 20 (1) (2017) 183-197.

[17]

H. Mao, R. Netravali, M. Alizadeh,Neural adaptive video streaming with Pensieve, in:Proceedings of the Conference of the ACM Special Interest Group on Data Com-munication, 2017, pp. 197-210.

[18]

T. Huang, C. Zhou, R.-X. Zhang, C. Wu, X. Yao, L. Sun, Comyco: quality-aware adaptive video streaming via imitation learning,in: Proceedings of the 27th ACM International Conference on Multimedia, 2019, pp. 429-437.

[19]

H. Zhang, A. Zhou, J. Lu, R. Ma, Y. Hu, C. Li, X. Zhang, H. Ma, X. Chen, Onrl: im-proving mobile video telephony via online reinforcement learning,in: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, 2020, pp. 1-14.

[20]

X. Ma, Q. Li, L. Zou, J. Peng, J. Zhou, J. Chai, Y. Jiang, G.-M. Muntean, Qava: Qoe-aware adaptive video bitrate aggregation for http live streaming based on smart edge computing, IEEE Trans. Broadcast. 68 (3) (2022) 661-676.

[21]

G. Zhou, Z. Luo, M. Hu, D. Wu, Presr: neural-enhanced adaptive streaming of vbr-encoded videos with selective prefetching, IEEE Trans. Broadcast. 69 (1) (2022) 49-61.

[22]

J. Chen, Z. Wei, S. Li, B. Cao, Artificial intelligence aided joint bit rate selection and radio resource allocation for adaptive video streaming over f-rans, IEEE Wirel. Commun. 27 (2) (2020) 36-43.

[23]

S.-Z. Qian, Y. Xie, Z. Pan, Y. Zhang, T. Lin, Dam: deep reinforcement learning based preload algorithm with action masking for short video streaming,in: Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 7030-7034.

[24]

A.Y. Ng, D. Harada, S. Russell, Policy invariance under reward transformations: theory and application to reward shaping,in: Proceedings of the 16th International Conference on Machine Learning, vol. 99, 1999, pp. 278-287.

[25]

V. Gullapalli, A.G. Barto, Shaping as a method for accelerating reinforcement learn-ing, in: Proceedings of the 1992 IEEE International Symposium on Intelligent Con-trol, IEEE, 1992, pp. 554-559.

[26]

Y. Zhang, Y. Liu, L. Guo, J.Y. Lee, Measurement of a large-scale short-video ser-vice over mobile and wireless networks, IEEE Trans. Mob. Comput. 22 (6) (2022) 3472-3488.

[27]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford,O. Klimov, Proximal policy opti-mization algorithms, arXiv preprint, arXiv :1707.06347.

[28]

M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, U. Topcu,Safe reinforce-ment learning via shielding, in:Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018, pp. 2669-2678.

[29]

D. Ye, Z. Liu, M. Sun, B. Shi, P. Zhao, H. Wu, H. Yu, S. Yang, X. Wu, Q. Guo, et al., Mastering complex control in moba games with deep reinforcement learning,in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 6672-6679.

[30]

A. Müller, M. Sabatelli, Safe and psychologically pleasant traffic signal control with reinforcement learning using action masking, in: 2022 IEEE 25th International Con-ference on Intelligent Transportation Systems (ITSC), IEEE, 2022, pp. 951-958.

[31]

C.-Y. Tang, C.-H. Liu, W.-K. Chen, S.D. You, Implementing action mask in proximal policy optimization (ppo) algorithm, ICT Express 6(3) (2020) 200-203.

[32]

S. Huang, S. Ontañón,A closer look at invalid action masking in policy gradient algorithms, arXiv preprint, arXiv : 2006.14171.

[33]

AItransCom ion, Short-video-streaming-challenge, https://github.com/AItransCompetition/Short-Video-Streaming-Challenge, 2023. (Accessed 8 De-cember 2023).

[34]

D. Raca, J.J. Quinlan, A.H. Zahran, C.J. Sreenan, Beyond throughput: a 4g lte dataset with channel and context metrics,in: Proceedings of the 9th ACM Multimedia Sys-tems Conference, 2018, pp. 460-465.

[35]

S. Deng, R. Netravali, A. Sivaraman, H. Balakrishnan, Wifi, lte, or both? Measuring multi-homed wireless Internet performance, in:Proceedings of the 2014 Conference on Internet Measurement Conference, 2014, pp. 181-194.

[36]

D. Raca, D. Leahy, C.J. Sreenan, J.J. Quinlan, Beyond throughput, the next gener-ation: a 5g dataset with channel and context metrics,in: Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 303-308.

[37]

Bilibili, Bilibili, https://www.bilibili.com/, 2023. (Accessed 8 December 2023).

[38]

M.J. Neely, Stochastic network optimization with application to communication and queueing systems, Synth. Lect. Commun. Netw. 3(1) (2010) 1-211.

[39]

H. Zhang, Y. Ban, Z. Guo, Z. Xu, Q. Ma, Y. Wang, X. Zhang, Quty: towards better understanding and optimization of short video quality,in: Proceedings of the 14th Conference on ACM Multimedia Systems, 2023, pp. 173-182.

[40]

G. Zhang, K. Liu, H. Hu, J. Guo, Short video streaming with data wastage aware-ness, in: 2021 IEEE International Conference on Multimedia and Expo (ICME), 2021, pp. 1-6.

[41]

Z. Li, Y. Xie, R. Netravali, K. Jamieson, Dashlet: taming swipe uncertainty for robust short video streaming,in:20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), 2023, pp. 1583-1599.

[42]

Z. Chen, Q. He, Z. Mao, H.-M. Chung, S. Maharjan, A study on the characteristics of Douyin short videos and implications for edge caching, in: Proceedings of the ACM Turing Celebration Conference-China, 2019, pp. 1-6.

[43]

Douyin, Douyin, https://www.douyin.com, 2023. (Accessed 8 December 2023).

AI Summary AI Mindmap
PDF

142

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/