Fast counting the cardinality of flows for big traffic over sliding windows

Jingsong SHAN , Yinjin FU , Guiqiang NI , Jianxin LUO , Zhaofeng WU

Front. Comput. Sci. ›› 2017, Vol. 11 ›› Issue (1) : 119 -129.

PDF (1167KB)
Front. Comput. Sci. ›› 2017, Vol. 11 ›› Issue (1) : 119 -129. DOI: 10.1007/s11704-016-6053-x
RESEARCH ARTICLE

Fast counting the cardinality of flows for big traffic over sliding windows

Author information +
History +
PDF (1167KB)

Abstract

Counting the cardinality of flows for massive high-speed traffic over sliding windows is still a challenging work under time and space constrains, but plays a key role in many network applications, such as traffic management and routing optimization in software defined network. In this paper, we propose a novel data structure (called LRU-Sketch) to address the problem. The significant contributions are as follows. 1) The proposed data structure adapts a well-known probabilistic sketch to sliding window model; 2) By using the least-recently used (LRU) replacement policy, we design a highly time-efficient algorithm for timely forgetting stale information, which takes constant (O(1)) time per time slot; 3) Moreover, a further memory-reducing schema is given at a cost of very little loss of accuracy; 4) Comprehensive experiments, performed on two real IP trace files, confirm that the proposed schema attains high accuracy and high time efficiency.

Keywords

probabilistic data structure / sketch / streaming data / cardinality / flow

Cite this article

Download citation ▾
Jingsong SHAN, Yinjin FU, Guiqiang NI, Jianxin LUO, Zhaofeng WU. Fast counting the cardinality of flows for big traffic over sliding windows. Front. Comput. Sci., 2017, 11(1): 119-129 DOI:10.1007/s11704-016-6053-x

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Callegari C, Giordano S, Pagano M, Procissi G. Opencounter: counting unknown flows in software defined networks. In: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems. 2015, 1–7

[2]

Callegari C, Pietro A D, Giordano S, Pepe T, Procissi G. The loglog counting reversible sketch: a distributed architecture for detecting anomalies in backbone networks. In: Proceedings of IEEE International Conference on Communications. 2012, 1287–1291

[3]

Estan C, Varghese G. New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Transactions on Computer Systems, 2003, 21(3): 270–313

[4]

Estan C, Varghese G, Fisk M E. Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Transactions on Networking, 2006, 14(5): 925–937

[5]

Chen W J, Liu Y, Guan Y. Cardinality change-based early detection of large-scale cyber-attacks. In: Proceedings of IEEE International Conference on Communications. 2013, 1788–1796

[6]

Cao J, Jin Y, Chen A, Bu T, Zhang Z L. Identifying high cardinality internet hosts. In: Proceedings of IEEE International Conference on Communications. 2009, 810–818

[7]

Zheng Y Q, Li M. ZOE: fast cardinality estimation for large-scale RFID systems. In: Proceedings of IEEE International Conference on Communications. 2013, 908–916

[8]

Chen A, Li L E, Cao J. Tracking cardinality distributions in network traffic. In: Proceedings of IEEE International Conference on Communications. 2009, 819–827

[9]

Huang Q, Lee P P C. Ld-sketch: a distributed sketching design for accurate and scalable anomaly detection in network data streams. In: Proceedings of IEEE International Conference on Communications. 2014, 1420–1428

[10]

Huang Q, Lee P P C. A hybrid local and distributed sketching design for accurate and scalable heavy key detection in network data streams. Computer Networks, 2015, 91: 298–315

[11]

Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data stream systems. In: Proceedings of the 21st Symposium on Principles of Database Systems. 2002, 1–16

[12]

Gibbons P B, Matias Y. Synopsis data structures for massive data sets. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms. 1999, 909–910

[13]

Hua Y, Xiao B, Veeravalli B, Feng D. Locality-sensitive bloom filter for approximate membership query. IEEE Transactions on Computers, 2012, 61(6): 817–830

[14]

Yu Y, Qian C, Li X. Distributed and collaborative traffic monitoring in software defined networks. In: Proceedings of the 3rd Workshop on Hot Topics in Software Defined Networking. 2014, 85–90

[15]

Whang K, Zanden B T V, Taylor H M. A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems, 1990, 15(02): 208–229

[16]

Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 1985, 31(2): 182–209

[17]

Giroire F. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, 2009, 157(2): 406–427

[18]

Durand M, Flajolet P. Loglog counting of large cardinalities (extended abstract). In: Proceedings of European Symposium on Algorithms. 2003, 605–617

[19]

Ońeil E, Ońeil P, Weikum G. The LRU-K page replacement algorithm for database disk buffering. In: Proceedings of ACM Special Interest Group on Management Of Data. 1993, 297–306

[20]

Heule S, Nunkesser M, Hall A. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In: Proceedings of Joint International Conference on Extending Database Technology. 2013, 683–692

[21]

Chen A, Cao J. Distinct counting with a self-learning bitmap. In: Proceedings of the 25th International Council for Open and Distance Education. 2009, 1171–1174

[22]

Metwally A, Agrawal D, El Abbadi A. Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic. In: Proceedings of Joint International Conference on Extending Database Technology. 2008, 618–629

[23]

Aouiche K, Lemire D. A comparison of five probabilistic view-size estimation techniques in OLAP. In: Proceedings of the 10th ACM International Workshop on Data Warehousing and OLAP. 2007, 17–24

[24]

Chabchoub Y, Chiky R, Dogan B. How can sliding hyperloglog and EWMA detect port scan attacks in IP traffic? EURASIP Journal on Information Security, 2014, 2014: 5

[25]

Ben-Basat R, Einziger G, Friedman R, Kassner Y. Heavy hitters in streams and sliding windows. In: Proceedings of IEEE International Conference on Communications. 2016

[26]

Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 2002, 31(6): 1794–1813

[27]

Kim H, O’Hallaron D R. Counting network flows in real time. In: Proceedings of IEEE Global Communications Conference. 2003, 3888–3893

[28]

Sanjuàs-Cuxart J, Barlet-Ros P, Solé-Pareta J. Counting flows over sliding windows in high speed networks. In: Proceedings of International Conference on Research in Networking. 2009, 79–91

[29]

Yi K, Wang L, Wei Z W. Indexing for summary queries: theory and practice. ACM Transactions on Database Systems, 2014, 39(1): 2

[30]

Zhang Z, Wang B Q, Lan J L. Identifying elephant flows in Internet backbone traffic with bloom filters and LRU. Computer Communications, 2015, 61: 70–78

[31]

Mitzenmacher M, Upfal E. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. New York: Cambridge University Press, 2005

[32]

Jain R, Rountier S A. Packet trains-measurements and a new model for computer network traffic. IEEE Journal on Selected Areas in Communications, 1986, 4(6): 986–995

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap
PDF (1167KB)

Supplementary files

Supplementary Material

1462

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/