Data-driven human and bot recognition from web activity logs based on hybrid learning techniques

Marek Gajewski , Olgierd Hryniewicz , Agnieszka Jastrzębska , Mariusz Kozakiewicz , Karol Opara , Jan Wojciech Owsiński , Sławomir Zadrożny , Tomasz Zwierzchowski

›› 2024, Vol. 10 ›› Issue (4) : 1178 -1188.

PDF
›› 2024, Vol. 10 ›› Issue (4) :1178 -1188. DOI: 10.1016/j.dcan.2023.01.020
Research article
research-article

Data-driven human and bot recognition from web activity logs based on hybrid learning techniques

Author information +
History +
PDF

Abstract

Distinguishing between web traffic generated by bots and humans is an important task in the evaluation of online marketing campaigns. One of the main challenges is related to only partial availability of the performance metrics: although some users can be unambiguously classified as bots, the correct label is uncertain in many cases. This calls for the use of classifiers capable of explaining their decisions. This paper demonstrates two such mechanisms based on features carefully engineered from web logs. The first is a man-made rule-based system. The second is a hierarchical model that first performs clustering and next classification using human-centred, interpretable methods. The stability of the proposed methods is analyzed and a minimal set of features that convey the class-discriminating information is selected. The proposed data processing and analysis methodology are successfully applied to real-world data sets from online publishers.

Keywords

Web logs / Classification / Clustering / Web traffic / Bots / Interpretability

Cite this article

Download citation ▾
Marek Gajewski, Olgierd Hryniewicz, Agnieszka Jastrzębska, Mariusz Kozakiewicz, Karol Opara, Jan Wojciech Owsiński, Sławomir Zadrożny, Tomasz Zwierzchowski. Data-driven human and bot recognition from web activity logs based on hybrid learning techniques. , 2024, 10(4): 1178-1188 DOI:10.1016/j.dcan.2023.01.020

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

A. Vyas, U. Batra, Bot detection by monitoring and grouping domain name server record response queries in DNS traffic, J. Inf. Optim. Sci. 40 (5) (2019) 1143-1153.

[2]

G. Suchacka, J. Iwanski, Identifying legitimate web users and bots with different traffic profiles-an information bottleneck approach, Knowl. Base Syst. 197 (2020) 105875.

[3]

M. Alauthman, N. Aslam, M. Al-kasassbeh, S. Khan, A. Al-Qerem, K. Raymond Choo, An efficient reinforcement learning-based botnet detection approach, J. Netw. Comput. Appl. 150 (2020) 102479.

[4]

W.N.H. Ibrahim, S. Anuar, A. Selamat, O. Krejcar, R. González Crespo, E. Herrera-Viedma, H. Fujita, Multilayer framework for botnet detection using machine learning algorithms, IEEE Access 9 (2021) 48753-48768.

[5]

S. Almahmoud, B. Hammo, B. Al-Shboul, N. Obeid, A hybrid approach for identifying non-human traffic in online digital advertising, Multimed. Tool. Appl. 81 (2022) 1685-1718.

[6]

D.A. Belokurov, E.S. Shamakova, V. Kolomoitcev, Using machine learning techniques to identify bot accounts on a social network, in: 2021 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF), 2021, pp. 1-5.

[7]

T. Velayutham, P.K. Tiwari, Bot identification: helping analysts for right data in Twitter,in: 2017 3rd International Conference on Advances in Computing, Communication Automation (ICACCA) (Fall), 2017, pp. 1-5.

[8]

S. Dadkhah, F. Shoeleh, M.M. Yadollahi, X. Zhang, A.A. Ghorbani, A real-time hostile activities analyses and detection system, Appl. Soft Comput. 104 (2021) 107175.

[9]

A.A. Daya, M.A. Salahuddin, N. Limam, R. Boutaba, BotChase: graph-based bot detection using machine learning, IEEE Trans. Netw. Serv. Manag. 17 (1) (2020) 15-29.

[10]

M. Shafiq, Z. Tian, A.K. Bashir, X. Du, M. Guizani, CorrAUC: a malicious Bot-IoT traffic detection method in IoT network using machine-learning techniques, IEEE Internet Things J. 8 (5) (2021) 3242-3254.

[11]

A. Yin, J. Kleinman, J. Elliott, T. Yan, Talkingdata Adtracking Fraud Detection Challenge, 2018. https://kaggle.com/competitions/talkingdata-adtracking-fraud-detection. (Accessed 12 January 2021).

[12]

S. Wang, W. Cukierski, Click-through Rate Prediction, 2014. https://www.kaggle.com/c/avazu-ctr-prediction. (Accessed 12 January 2021).

[13]

G.S. Thejas, S. Dheeshjith, S. Iyengar, N. Sunitha, P. Badrinath, A hybrid and effective learning approach for click fraud detection, Mach. Learn. Appl. 3 (2021) 100016.

[14]

B.M. Rahal, A. Santos, M. Nogueira, A distributed architecture for DDoS prediction and bot detection, IEEE Access 8 (2020) 159756-159772.

[15]

B. Kitts, J.Y. Zhang, G. Wu, W. Brandi, J. Beasley, K. Morrill, J. Ettedgui, S. Siddhartha, H. Yuan, F. Gao, P. Azo, R. Mahato, Click Fraud Detection: Adversarial Pattern Recognition over 5 Years at Microsoft, Springer International Publishing, Cham, 2015, pp. 181-201.

[16]

N. Cassee, C. Kitsanelis, E. Constantinou, A. Serebrenik, Human bot or both? a study on the capabilities of classification models on mixed accounts, in: 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2021, pp. 654-658.

[17]

A. Acien, A. Morales, J. Fierrez, R. Vera-Rodriguez, O. Delgado-Mohatar, BeCAPTCHA: behavioral bot detection using touchscreen and mobile sensors benchmarked on HuMIdb, Eng. Appl. Artif. Intell. 98 (2021) 104058.

[18]

G. Suchacka, A. Cabri, S. Rovetta, F. Masulli, Efficient on-the-fly web bot detection, Knowl. Base Syst. 223 (2021) 107074.

[19]

S.-H. Li, Y.-C. Kao, Z.-C. Zhang, Y.-P. Chuang, D.C. Yen, A network behavior-based botnet detection mechanism using PSO and k-means, ACM Trans. Manag. Inf. Syst. 6 (1) (2015) 1-13.

[20]

M. Singh, M. Singh, S. Kaur, Detecting bot-infected machines using DNS fingerprinting, Digit. Invest. 28 (2019) 14-33.

[21]

L. Song, X. Gong, X. He, R. Zhang, A. Zhou, Multi-stage malicious click detection on large scale web advertising data, in: Proc. Of 39th Very Large Data Bases Conference, 2013.

[22]

R.U. Rahman, D.S. Tomar, New biostatistics features for detecting web bot activity on web applications, Comput. Secur. 97 (2020) 102001.

[23]

X. Zhu, D. Huang, R. Pan, H. Wang, An EM algorithm for click fraud detection, Stat. Interface 9 (3) (2016) 389-394.

[24]

R. Mouawi, I.H. Elhajj, A. Chehab, A. Kayssi, Crowdsourcing for click fraud detection, 2019, EURASIP J. Inf. Secur. (1) (2019) 11.

[25]

R. De Nicola, M. Petrocchi, M. Pratelli, On the efficacy of old features for the detection of new bots, Inf. Process. Manag. 58 (6) (2021) 102685.

[26]

D. Sisodia, D.S. Sisodia, Gradient boosting learning for fraudulent publisher detection in online advertising, Data Technol. Appl. 55 (2) (2020) 216-232.

[27]

I. Mayer, E. Sverdrup, T. Gauss, J.-D. Moyer, S. Wager, J. Josse, Doubly robust treatment effect estimation with missing attributes, Ann. Appl. Stat. 14 (3) (2020) 1409-1431.

[28]

M. Petkovic, D. Kocev, S. Dzeroski, Feature ranking for multi-target regression, Mach. Learn. 109 (2020) 1179-1204.

[29]

Z.-Z. Long, G. Xu, J. Du, H. Zhu, T. Yan, Y.-F. Yu, Flexible subspace clustering: a joint feature selection and k-means clustering framework, Big Data Res. 23 (2021) 100170.

[30]

J. Haemaelaeinen, S. Jauhiainen, T. Kaerkkaeinen, Comparison of internal clustering validation indices for prototype-based clustering, Algorithms 10 (3) (2017) 2-14.

[31]

Q. Long, Multimodal information gain in bayesian design of experiments, Comput. Stat. 37 (2022) 865-885.

[32]

Y. Yuan, L. Wu, X. Zhang, Gini-impurity index analysis, IEEE Trans. Inf. Forensics Secur. 16 (2021) 3154-3169.

[33]

M.C. Pardo, Y. Lu, A.M. Franco-Pereira, Extensions of empirical likelihood and chi-squared-based tests for ordered alternatives, J. Appl. Stat. 49 (1) (2022) 24-43.

[34]

Benjamin Goehry, Random forests for time-dependent processes, ESAIM P. S. 24 (2020) 801-826.

[35]

S. Georganos, T. Grippa, S. Vanhuysse, M. Lennert, M. Shimoni, S. Kalogirou, E. Wolff, Less is more: optimizing classification performance through feature selection in a very-high-resolution remote sensing object-based urban application, GIScience Remote Sens. 55 (2) (2018) 221-242.

AI Summary AI Mindmap
PDF

86

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/