MR-CLOPE: A MapReduce based transactional clustering algorithm for DNS query log analysis

Ye-feng Li , Jia-jin Le , Mei Wang , Bin Zhang , Liang-xu Liu

Journal of Central South University ›› 2015, Vol. 22 ›› Issue (9) : 3485 -3494.

PDF
Journal of Central South University ›› 2015, Vol. 22 ›› Issue (9) : 3485 -3494. DOI: 10.1007/s11771-015-2888-9
Article

MR-CLOPE: A MapReduce based transactional clustering algorithm for DNS query log analysis

Author information +
History +
PDF

Abstract

DNS (domain name system) query log analysis has been a popular research topic in recent years. CLOPE, the represented transactional clustering algorithm, could be readily used for DNS query log mining. However, the algorithm is inefficient when processing large scale data. The MR-CLOPE algorithm is proposed, which is an extension and improvement on CLOPE based on MapReduce. Different from the previous parallel clustering method, a two-stage MapReduce implementation framework is proposed. Each of the stage is implemented by one kind MapReduce task. In the first stage, the DNS query logs are divided into multiple splits and the CLOPE algorithm is executed on each split. The second stage usually tends to iterate many times to merge the small clusters into bigger satisfactory ones. In these two stages, a novel partition process is designed to randomly spread out original sub clusters, which will be moved and merged in the map phrase of the second phase according to the defined merge criteria. In such way, the advantage of the original CLOPE algorithm is kept and its disadvantages are dealt with in the proposed framework to achieve more excellent clustering performance. The experiment results show that MR-CLOPE is not only faster but also has better clustering quality on DNS query logs compared with CLOPE.

Keywords

DNS data mining / MR-CLOPE algorithm / transactional clustering algorithm / MapReduce framework

Cite this article

Download citation ▾
Ye-feng Li, Jia-jin Le, Mei Wang, Bin Zhang, Liang-xu Liu. MR-CLOPE: A MapReduce based transactional clustering algorithm for DNS query log analysis. Journal of Central South University, 2015, 22(9): 3485-3494 DOI:10.1007/s11771-015-2888-9

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Stone-GrossB, CovaM, CavallaroL, GilbertB, SzydlowskiM, KemmererR, KruegelC, VignaG. Your botnet is my botnet: Analysis of a botnet takeover [C]. Proceedings of the 16th ACM Conference on Computer and Communications Security, 2009New YorkACM635-647

[2]

PerdisciR, CoronaI, DagonD, LeeW. Detecting malicious flux service networks through passive analysis of recursive DNS traces [C]. Computer Security Applications Conference, ACSAC’09 Annual, 2009Piscataway, NJ, USAIEEE311-320

[3]

BAYER U, COMPARETTI P M, HLAUSCHEK C, KRUEGEL C, KIRDA E. Scalable, behavior-based malware clustering [C]// Proceedings of the Network and Distributed System Security Symposium, NDSS 2009. Washington DC, USA: The Internet Society, 2009(9): 8–11.

[4]

RieckK, TriniusP, WillemsC, HoliT. Automatic analysis of malware behavior using machine learning [J]. Journal of Computer Security, 2011, 19(4): 639-668

[5]

GubaS, RastogiR, ShimK. ROCK: A robust clustering algorithm for categorical attributes [C]. Proceedings of ICDE’99, 1999Piscataway, NJ, USAIEEE512-521

[6]

WangK, XuC, LiuB. Clustering transactions using large items [C]. Proceedings of the Eighth International Conference on Information and Knowledge Management, 1999New York, USAACM483-490

[7]

GibsonD, KleinbergJ, RaghavanP. Clustering categorical data: An approach based on dynamical systems [J]. The VLDB Journal, 2000, 8: 222-236

[8]

YangY-l, GuanX-d, YouJ-y. CLOPE: A fast and effective clustering algorithm for transactional data [C]. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002New York, USAACM682-687

[9]

DeanJ, GhemawatS. MapReduce: Simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1): 107-113

[10]

ZhaoW-z, MaH-f, HeQ. Parallel k-means clustering based on mapreduce [J]. Cloud Computing, 2009, 5931: 674-679

[11]

EstevesR M, PaisR, RongC M. K-means clustering in the cloud—A Mahout test [C]. Advanced Information Networking and Applications (WAINA), 2011Piscataway, NJ, USAIEEE514-519

[12]

XuQ-h, MigaultD, SenecalD, FrancfortS. k-means and adaptive k-means algorithms for clustering DNS traffic [C]. Proceedings of the 5th International ICST Conference on Performance Evaluation Methodologies and Tools, 2011Brussels, BelgiumICST281-290

[13]

DietrichC J, RossowC, FreilingF C, BosH v, SteenM, PohlmannN. On botnets that use DNS for command and control [C]. Proceedings of European Conference on Computer Network Defense, 2011Piscataway, NJ, USAIEEE9-16

[14]

AntonakakisM, PerdisciR, DagonD, LeeW, FeamsterN. Building a dynamic reputation system for DNS [C]. USENIX security symposium 2010, 2010Berkley, CA, USAUSENIX Association273-290

[15]

BilgeL, KirdaE, KruegelC. BALSUZZIM. EXPOSURE: Finding malicious domains using passive DNS analysis [C]. Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, 2011Washington DC, USAThe Internet Society

[16]

WuJ, LiX-d, WangX, YanB-p. DNS usage mining and its two applications [C]. Sixth International Conference on Digital Information Management, ICDI 2011, 2011Piscataway, NJ, USAIEEE54-60

[17]

OngK L, LiW Y, NgW K, LimF P. SCLOPE: An algorithm for clustering data streams of categorical attributes [C]. LCNS 3181: Knowledge-based Intelligent Information and Engineering Systems, 2004Berlin, HeidelbergSpringer209-218

[18]

AggarwalC C, HanJ-w, WangJ-r, YuP S. A framework for clustering evolving data streams [C]. Proceedings of the 29th VLDB Conference, 2003New York, USAVLDB Endowment81-92

[19]

HanJ-w, PeiJ, YinY-w. Mining frequent patterns without candidate generation [J]. ACM Sigmod Record, 2000, 29(2): 1-12

[20]

YapP H, OngK L. s-SCLOPE: Clustering categorical streams using attribute selection [C]. LCNS 3682: Knowledge-based Intelligent Information and Engineering Systems, 2005Berlin, HeidelbergSpringer929-935

[21]

LiJ, GaoX-b, JiaoL-c. A fuzzy CLOPE algorithm and its optimal parameter choice [J]. Journal of Electronics (China), 2006, 23(3): 384-388

[22]

YadavS, ReddyA K K, ReddyA L, RanjanS. Detecting algorithmically generated malicious domain names [C]. Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, 2010New York, USAACM48-61

AI Summary AI Mindmap
PDF

100

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/