Fast counting the cardinality of flows for big traffic over sliding windows
Jingsong SHAN, Yinjin FU, Guiqiang NI, Jianxin LUO, Zhaofeng WU
Fast counting the cardinality of flows for big traffic over sliding windows
Counting the cardinality of flows for massive high-speed traffic over sliding windows is still a challenging work under time and space constrains, but plays a key role in many network applications, such as traffic management and routing optimization in software defined network. In this paper, we propose a novel data structure (called LRU-Sketch) to address the problem. The significant contributions are as follows. 1) The proposed data structure adapts a well-known probabilistic sketch to sliding window model; 2) By using the least-recently used (LRU) replacement policy, we design a highly time-efficient algorithm for timely forgetting stale information, which takes constant (O(1)) time per time slot; 3) Moreover, a further memory-reducing schema is given at a cost of very little loss of accuracy; 4) Comprehensive experiments, performed on two real IP trace files, confirm that the proposed schema attains high accuracy and high time efficiency.
probabilistic data structure / sketch / streaming data / cardinality / flow
[1] |
Callegari C, Giordano S, Pagano M, Procissi G. Opencounter: counting unknown flows in software defined networks. In: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems. 2015, 1–7
CrossRef
Google scholar
|
[2] |
Callegari C, Pietro A D, Giordano S, Pepe T, Procissi G. The loglog counting reversible sketch: a distributed architecture for detecting anomalies in backbone networks. In: Proceedings of IEEE International Conference on Communications. 2012, 1287–1291
CrossRef
Google scholar
|
[3] |
Estan C, Varghese G. New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Transactions on Computer Systems, 2003, 21(3): 270–313
CrossRef
Google scholar
|
[4] |
Estan C, Varghese G, Fisk M E. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Transactions on Networking, 2006, 14(5): 925–937
CrossRef
Google scholar
|
[5] |
Chen W J, Liu Y, Guan Y. Cardinality change-based early detection of large-scale cyber-attacks. In: Proceedings of IEEE International Conference on Communications. 2013, 1788–1796
CrossRef
Google scholar
|
[6] |
Cao J, Jin Y, Chen A, Bu T, Zhang Z L. Identifying high cardinality internet hosts. In: Proceedings of IEEE International Conference on Communications. 2009, 810–818
CrossRef
Google scholar
|
[7] |
Zheng Y Q, Li M. ZOE: fast cardinality estimation for large-scale RFID systems. In: Proceedings of IEEE International Conference on Communications. 2013, 908–916
CrossRef
Google scholar
|
[8] |
Chen A, Li L E, Cao J. Tracking cardinality distributions in network traffic. In: Proceedings of IEEE International Conference on Communications. 2009, 819–827
CrossRef
Google scholar
|
[9] |
Huang Q, Lee P P C. Ld-sketch: a distributed sketching design for accurate and scalable anomaly detection in network data streams. In: Proceedings of IEEE International Conference on Communications. 2014, 1420–1428
CrossRef
Google scholar
|
[10] |
Huang Q, Lee P P C. A hybrid local and distributed sketching design for accurate and scalable heavy key detection in network data streams. Computer Networks, 2015, 91: 298–315
CrossRef
Google scholar
|
[11] |
Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In: Proceedings of the 21st Symposium on Principles of Database Systems. 2002, 1–16
CrossRef
Google scholar
|
[12] |
Gibbons P B, Matias Y. Synopsis data structures for massive data sets. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. 1999, 909–910
|
[13] |
Hua Y, Xiao B, Veeravalli B, Feng D. Locality-sensitive bloom filter for approximate membership query. IEEE Transactions on Computers, 2012, 61(6): 817–830
CrossRef
Google scholar
|
[14] |
Yu Y, Qian C, Li X. Distributed and collaborative traffic monitoring in software defined networks. In: Proceedings of the 3rd Workshop on Hot Topics in Software Defined Networking. 2014, 85–90
CrossRef
Google scholar
|
[15] |
Whang K, Zanden B T V, Taylor H M. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 1990, 15(02): 208–229
CrossRef
Google scholar
|
[16] |
Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31(2): 182–209
CrossRef
Google scholar
|
[17] |
Giroire F. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, 2009, 157(2): 406–427
CrossRef
Google scholar
|
[18] |
Durand M, Flajolet P. Loglog counting of large cardinalities (extended abstract). In: Proceedings of European Symposium on Algorithms. 2003, 605–617
|
[19] |
Ońeil E, Ońeil P, Weikum G. The LRU-K page replacement algorithm for database disk buffering. In: Proceedings of ACM Special Interest Group on Management Of Data. 1993, 297–306
CrossRef
Google scholar
|
[20] |
Heule S, Nunkesser M, Hall A. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In: Proceedings of Joint International Conference on Extending Database Technology. 2013, 683–692
CrossRef
Google scholar
|
[21] |
Chen A, Cao J. Distinct counting with a self-learning bitmap. In: Proceedings of the 25th International Council for Open and Distance Education. 2009, 1171–1174
CrossRef
Google scholar
|
[22] |
Metwally A, Agrawal D, El Abbadi A. Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic. In: Proceedings of Joint International Conference on Extending Database Technology. 2008, 618–629
CrossRef
Google scholar
|
[23] |
Aouiche K, Lemire D. A comparison of five probabilistic view-size estimation techniques in OLAP. In: Proceedings of the 10th ACM International Workshop on Data Warehousing and OLAP. 2007, 17–24
CrossRef
Google scholar
|
[24] |
Chabchoub Y, Chiky R, Dogan B. How can sliding hyperloglog and EWMA detect port scan attacks in IP traffic? EURASIP Journal on Information Security, 2014, 2014: 5
|
[25] |
Ben-Basat R, Einziger G, Friedman R, Kassner Y. Heavy hitters in streams and sliding windows. In: Proceedings of IEEE International Conference on Communications. 2016
CrossRef
Google scholar
|
[26] |
Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 2002, 31(6): 1794–1813
CrossRef
Google scholar
|
[27] |
Kim H, O’Hallaron D R. Counting network flows in real time. In: Proceedings of IEEE Global Communications Conference. 2003, 3888–3893
|
[28] |
Sanjuàs-Cuxart J, Barlet-Ros P, Solé-Pareta J. Counting flows over sliding windows in high speed networks. In: Proceedings of International Conference on Research in Networking. 2009, 79–91
CrossRef
Google scholar
|
[29] |
Yi K, Wang L, Wei Z W. Indexing for summary queries: theory and practice. ACM Transactions on Database Systems, 2014, 39(1): 2
CrossRef
Google scholar
|
[30] |
Zhang Z, Wang B Q, Lan J L. Identifying elephant flows in Internet backbone traffic with bloom filters and LRU. Computer Communications, 2015, 61: 70–78
CrossRef
Google scholar
|
[31] |
Mitzenmacher M, Upfal E. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. New York: Cambridge University Press, 2005
CrossRef
Google scholar
|
[32] |
Jain R, Rountier S A. Packet trains-measurements and a new model for computer network traffic. IEEE Journal on Selected Areas in Communications, 1986, 4(6): 986–995
CrossRef
Google scholar
|
/
〈 | 〉 |