High-speed encrypted traffic classification by using payload features

Yan Xinge , He Liukun , Xu Yifan , Cao Jiuxin , Wang Liangmin , Xie Guyang

›› 2025, Vol. 11 ›› Issue (2) : 412 -423.

PDF
›› 2025, Vol. 11 ›› Issue (2) : 412 -423. DOI: 10.1016/j.dcan.2024.02.003
Original article

High-speed encrypted traffic classification by using payload features

Author information +
History +
PDF

Abstract

Traffic encryption techniques facilitate cyberattackers to hide their presence and activities. Traffic classification is an important method to prevent network threats. However, due to the tremendous traffic volume and limitations of computing, most existing traffic classification techniques are inapplicable to the high-speed network environment. In this paper, we propose a High-speed Encrypted Traffic Classification (HETC) method containing two stages. First, to efficiently detect whether traffic is encrypted, HETC focuses on randomly sampled short flows and extracts aggregation entropies with chi-square test features to measure the different patterns of the byte composition and distribution between encrypted and unencrypted flows. Second, HETC introduces binary features upon the previous features and performs fine-grained traffic classification by combining these payload features with a Random Forest model. The experimental results show that HETC can achieve a 94% F-measure in detecting encrypted flows and a 85%-93% F-measure in classifying fine-grained flows for a 1-KB flow-length dataset, outperforming the state-of-the-art comparison methods. Meanwhile, HETC does not need to wait for the end of the flow and can extract mass computing features. The average time for HETC to process each flow is only 2 or 16 ms, which is lower than the flow duration in most cases, making it a good candidate for high-speed traffic classification.

Keywords

Traffic classification / Flow analysis / Information entropy / Machine learning / Randomness test

Cite this article

Download citation ▾
Yan Xinge, He Liukun, Xu Yifan, Cao Jiuxin, Wang Liangmin, Xie Guyang. High-speed encrypted traffic classification by using payload features. , 2025, 11(2): 412-423 DOI:10.1016/j.dcan.2024.02.003

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Xinge Yan: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Writing - original draft, Writing - review & editing. Liukun He: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review & editing. Yifan Xu: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review & editing. Jiuxin Cao: Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Project administration, Resources, Software, Writing - original draft. Liangmin Wang: Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Resources, Visualization, Writing - original draft. Guyang Xie: Data curation, Formal analysis, Resources, Software, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No. U1736216.

References

[1]

J. Rauthan, K. Vaisla, Vrs-db: preserve confidentiality of users’ data using encryption approach, Digit. Commun. Netw. 7 (1) (2021) 62-71.

[2]

M. Shen, Y. Liu, L. Zhu, K. Xu, X. Du, N. Guizani, Optimizing feature selection for efficient encrypted traffic classification: a systematic approach, IEEE Netw. 34 (4) (2020) 20-27.

[3]

J. Gallego-Madrid, R. Sanchez-Iborra, P.M. Ruiz, A.F. Skarmeta, Machine learning-based zero-touch network and service management: a survey, Digit. Commun. Netw. 8 (2) (2021) 105-123.

[4]

J. Chen, D. Wu, Y. Zhao, N. Sharma, M. Blumenstein, S. Yu, Fooling intrusion detec-tion systems using adversarially autoencoder, Digit. Commun. Netw. 7 (3) (2021) 453-460.

[5]

D. Aureli, A. Cianfrani, M. Listanti, M. Polverini, S. Secci, Augmenting diffserv op-erations with dynamically learned classes of services, Comput. Netw. 202 (2022) 108624.

[6]

G.J. Klir, Uncertainty and information: foundations of generalized information the-ory, Kybernetes 35 (7) (2006) 1297-1299.

[7]

A.R. Khakpour, A.X. Liu, An information-theoretical approach to high-speed flow nature identification, IEEE/ACM Trans. Netw. 21 (4) (2012) 1076-1089.

[8]

Y. Wang, Z. Zhang, L. Guo, S. Li, Using entropy to classify traffic more deeply, in: 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage, IEEE, 2011, pp. 45-52.

[9]

F. Casino, K.-K.R. Choo, C. Patsakis, Hedge: efficient traffic classification of en-crypted and compressed packets, IEEE Trans. Inf. Forensics Secur. 14 (11) (2019) 2916-2926.

[10]

T. Luo, L. Wang, S. Yin, H. Shentu, H. Zhao, Rbp: a website fingerprinting obfus-cation method against intelligent fingerprinting attacks, J. Cloud Comput. 10 (1) (2021) 1-14.

[11]

G. Aceto, A. Dainotti, W. De Donato, A. Pescapé, Portload: taking the best of two worlds in traffic classification, in: 2010 INFOCOM IEEE Conference on Computer Communications Workshops, IEEE, 2010, pp. 1-5.

[12]

M. Finsterbusch, C. Richter, E. Rocha, J.-A. Muller, K. Hanssgen, A survey of payload-based traffic classification approaches, IEEE Commun. Surv. Tutor. 16 (2) (2013) 1135-1156.

[13]

J. Zhao, X. Jing, Z. Yan, W. Pedrycz, Network traffic classification for data fusion: a survey, Inf. Fusion 72 ( 2021) 22-47.

[14]

F. Pacheco, E. Exposito, M. Gineste, C. Baudoin, J. Aguilar, Towards the deployment of machine learning solutions in network traffic classification: a systematic survey, IEEE Commun. Surv. Tutor. 21 (2) (2018) 1988-2014.

[15]

L. Wang, H. Mei, V.S. Sheng, Multilevel identification and classification analy-sis of tor on mobile and pc platforms, IEEE Trans. Ind. Inform. 17 (2) (2020) 1079-1088.

[16]

B. Anderson, D. McGrew, Identifying encrypted malware traffic with contextual flow data, in: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Se-curity, ACM, 2016, pp. 35-46.

[17]

S. Dong, Multi class svm algorithm with active learning for network traffic classifi-cation, Expert Syst. Appl. 176 (2021) 114885.

[18]

S. Rezaei, X. Liu, Deep learning for encrypted traffic classification: an overview, IEEE Commun. Mag. 57 (5) (2019) 76-81.

[19]

X. Liu, J. You, Y. Wu, T. Li, L. Li, Z. Zhang, J. Ge, Attention-based bidirec-tional gru networks for efficient https traffic classification, Inf. Sci. 541 (2020) 297-315.

[20]

M. Lotfollahi, M. Jafari Siavoshani, R. Shirali Hossein Zade, M. Saberian, Deep packet: a novel approach for encrypted traffic classification using deep learning, Soft Comput. 24 (3) (2020) 1999-2012.

[21]

G. Xie, Q. Li, Y. Jiang, Self-attentive deep learning method for online traffic classifi-cation and its interpretability, Comput. Netw. 196 (2021) 108267.

[22]

L. Nian-Sheng, Pseudo-randomness and complexity of binary sequences generated by the chaotic system, Commun. Nonlinear Sci. Numer. Simul. 16 (2) (2011) 761-768.

[23]

M.S.I. Mamun, A.A. Ghorbani, N. Stakhanova, An entropy based encrypted traffic classifier, in: International Conference on Information and Communications Secu-rity, Springer, 2015, pp. 282-294.

[24]

R. Lyda, J. Hamrock, Using entropy analysis to find encrypted and packed malware, IEEE Secur. Priv. 5 (2) (2007) 40-45.

[25]

H. Doroud, A. Alaswad, F. Dressler, Encrypted traffic detection: beyond the port number era, in: 2022 IEEE 47th Conference on Local Computer Networks (LCN), IEEE, 2022, pp. 198-204.

[26]

M.M. Saleh, M. AlSlaiman, M.I. Salman, B. Wang, Combining raw data and engi-neered features for optimizing encrypted and compressed Internet of things traffic classification, Comput. Secur. 130 (2023) 103287.

[27]

S. Oswal, A. Singh, K. Kumari, Deflate compression algorithm, Int. J. Eng. Res. Gen. Sci. 4 (1) (2016) 430-436.

[28]

P. Choudhury, K.P. Kumar, S. Nandi, G. Athithan, An empirical approach towards characterization of encrypted and unencrypted voip traffic, Multimed. Tools Appl. 79 (1-2) (2020) 603-631.

[29]

R. Wang, Y. Shoshitaishvili, C. Kruegel, G. Vigna, Steal this movie: automatically bypassing drm protection in streaming media services, in: USENIX Security Sympo-sium, USENIX Association, 2013, pp. 687-702.

[30]

S. Vega-Pons, J. Correa-Morris, J. Ruiz-Shulcloper, Weighted cluster ensemble using a kernel consensus function, in: Iberoamerican Congress on Pattern Recognition, Springer, 2008, pp. 195-202.

[31]

J. MacQueen, et al., Some methods for classification and analysis of mul-tivariate observations,in:Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1967, pp. 281-297.

[32]

D.T. Nguyen, L. Chen, C.K. Chan, Clustering with multiviewpoint-based similarity measure, IEEE Trans. Knowl. Data Eng. 24 (6) (2011) 988-1001.

[33]

F. Pareschi, R. Rovatti, G. Setti, On statistical tests for randomness included in the nist sp800-22 test suite and based on the binomial distribution, IEEE Trans. Inf. Forensics Secur. 7 (2) (2012) 491-505.

[34]

G.J. Croll, BiEntropy - the measurement and algebras of order and disorder in finite binary strings, in: Scientific Essays in Honor of H Pierre Noyes on the Occasion of His 90th Birthday, World Scientific, 2013, pp. 48-64.

[35]

G. Biau, E. Scornet, A random forest guided tour, Test 25 (2) (2016) 197-227.

[36]

N. Das, S. Chaba, R. Wu, S. Gandhi, D.H. Chau, X. Chu, Goggles: automatic image la-beling with affinity coding,in:Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, ACM, 2020, pp. 1717-1732.

[37]

J. Zhang, F. Li, F. Ye, H. Wu, Autonomous unknown-application filtering and label-ing for dl-based traffic classifier update, in: IEEE INFOCOM 2020-IEEE Conference on Computer Communications, IEEE, 2020, pp. 397-405.

[38]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hier-archical image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 248-255.

[39]

H. Zen, V. Dang, R. Clark, Y. Zhang, R.J. Weiss, Y. Jia, Z. Chen, Y. Wu,LibriTTS: a corpus derived from LibriSpeech for text-to-speech, in: Interspeech 2019, ISCA, 2019, pp. 1526-1530.

[40]

U.C. for Research in Computer Vision, UCF101: a dataset of 101 human ac-tions classes from videos in the wild, https://www.crcv.ucf.edu/research/data-sets/ucf101/ (Accessed 15 December 2023).

[41]

L. Feng, Y. Yao, L. Wang, G. Min, Multi-timescale and multi-centrality layered node selection for efficient traffic monitoring in sdns, Comput. Netw. 198 (2021) 108381.

[42]

M. MontazeriShatoori, L. Davidson, G. Kaur, A.H. Lashkari, Detection of doh tun-nels using time-series classification of encrypted traffic, in: 2020 IEEE Intl. Conf. on Dependable, Autonomic and Secure Computing, Intl. Conf. on Pervasive Intel-ligence and Computing, Intl. Conf. on Cloud and Big Data Computing, Intl. Conf. on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), IEEE, 2020, pp. 63-70.

[43]

G. Draper-Gil, A.H. Lashkari, M.S.I. Mamun, A.A. Ghorbani, Characterization of en-crypted and vpn traffic using time-related, in: Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP), SCITEPRESS, 2016, pp. 407-414.

AI Summary AI Mindmap
PDF

825

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/