Network traffic classification: Techniques, datasets, and challenges
Ahmad Azab , Mahmoud Khasawneh , Saed Alrabaee , Kim-Kwang Raymond Choo , Maysa Sarsour
›› 2024, Vol. 10 ›› Issue (3) : 676 -692.
Network traffic classification: Techniques, datasets, and challenges
In network traffic classification, it is important to understand the correlation between network traffic and its causal application, protocol, or service group, for example, in facilitating lawful interception, ensuring the quality of service, preventing application choke points, and facilitating malicious behavior identification. In this paper, we review existing network classification techniques, such as port-based identification and those based on deep packet inspection, statistical features in conjunction with machine learning, and deep learning algorithms. We also explain the implementations, advantages, and limitations associated with these techniques. Our review also extends to publicly available datasets used in the literature. Finally, we discuss existing and emerging challenges, as well as future research directions.
Network classification / Machine learning / Deep learning / Deep packet inspection / Traffic monitoring
| [1] |
|
| [2] |
International Telecommunication Union, Measuring digital development facts and figures. https://www.itu.int/en/ITU-D/Statistics/Documents/facts/FactsFigures2019.pdf (accessed 25 June 2003). |
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
Internet Assigned Numbers Authority (IANA), Service name and transport protocol port number registry. https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml (accessed 1 July 2020). |
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
D.D. Lewis, Naive (bayes) at forty: the independence assumption in information retrieval,in: C. Nédellec, |
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
Argus: the network audit record generation and utilization system. https://qosient.com/argus/downloads.shtml (accessed 25 June 2020). |
| [54] |
Silk (system for internet-level knowledge). https://tools.netsa.cert.org/silk/ (accessed 25 June 2020). |
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
|
| [69] |
|
| [70] |
|
| [71] |
|
| [72] |
|
| [73] |
|
| [74] |
|
| [75] |
|
| [76] |
|
| [77] |
|
| [78] |
|
| [79] |
|
| [80] |
|
| [81] |
|
| [82] |
|
| [83] |
|
| [84] |
|
| [85] |
|
| [86] |
|
| [87] |
|
| [88] |
|
| [89] |
|
| [90] |
|
| [91] |
|
| [92] |
|
| [93] |
|
| [94] |
|
| [95] |
|
| [96] |
|
| [97] |
|
| [98] |
|
| [99] |
|
| [100] |
|
| [101] |
|
| [102] |
|
| [103] |
|
| [104] |
|
| [105] |
|
| [106] |
|
| [107] |
|
| [108] |
|
| [109] |
|
| [110] |
|
| [111] |
|
| [112] |
|
| [113] |
Wangwei, Ustc-tfc. https://github.com/echowei/DeepTraffic/tree/master/1.malware_traffic_classification/1.DataSet(USTC-TFC2016, 2016 (accessed 1 July 2020). |
| [114] |
The Canadian Institute for Cybersecurity, |
| [115] |
|
| [116] |
MAWI, Mawi working group traffic archive. http://mawi.wide.ad.jp/mawi/ (accessed 1 July 2020). |
| [117] |
W.R. Group, Auckland ii. https://wand.net.nz/wits/auck/2/auckland_ii.php (accessed 1 July 2020). |
| [118] |
Unibs, Unibs, Data sharing. http://netweb.ing.unibs.it/ntw/tools/traces/ (accessed 1 July 2020). |
| [119] |
|
| [120] |
Anon17, Network traffic dataset of anonymity services. https://web.cs.dal.ca/shahbar/data.html (accessed 1 July 2020). |
| [121] |
|
| [122] |
|
| [123] |
|
| [124] |
|
| [125] |
|
| [126] |
|
| [127] |
|
| [128] |
|
| [129] |
|
/
| 〈 |
|
〉 |