Exploring Machine Learning Techniques for the Detection and Multi-Label Classification of Smart Contract Vulnerabilities

Aedah Alrehaili , Maher Wasl Alharby

Journal of Systems Science and Systems Engineering ›› : 1 -24.

PDF
Journal of Systems Science and Systems Engineering ›› :1 -24. DOI: 10.1007/s11518-025-5720-6
Article
research-article
Exploring Machine Learning Techniques for the Detection and Multi-Label Classification of Smart Contract Vulnerabilities
Author information +
History +
PDF

Abstract

This study enhances blockchain security by developing a comprehensive machine learning framework that automates the detection and classification of smart contract vulnerabilities. Our main innovation is the systematic transformation of the unlabeled BCCC-VulSCs-2023 dataset into a multi-label classification resource. We achieve this transformation through automated vulnerability labeling using the Oyente tool, which enables the simultaneous detection of seven different types of vulnerabilities. We employ a robust methodology that utilizes traditional machine learning algorithms, including Random Forest, Gradient Boosting, Decision Tree, Logistic Regression, and Gaussian Naive Bayes. This approach is further enhanced by advanced feature selection techniques, such as Recursive Feature Elimination (RFE) and Principal Component Analysis (PCA), along with thorough hyperparameter optimization. Our method demonstrates significant improvements over existing approaches. In binary classification, the optimized Random Forest classifier achieves an accuracy of 92.8% and an F1-score of 92.9% when combined with RFE. Furthermore, it exhibits satisfactory multi-label performance, with a precision of 72.6% across various vulnerability categories. These results highlight the potential of interpretable machine learning models for effective smart contract security auditing. Such models could reduce financial risks and promote trust in the blockchain ecosystem through automated vulnerability assessments.

Keywords

Blockchain security / smart contracts / secure contracts / vulnerability detection / machine learning / multi-label classification

Cite this article

Download citation ▾
Aedah Alrehaili, Maher Wasl Alharby. Exploring Machine Learning Techniques for the Detection and Multi-Label Classification of Smart Contract Vulnerabilities. Journal of Systems Science and Systems Engineering 1-24 DOI:10.1007/s11518-025-5720-6

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Adi Pratama F R, Oktora S I. Synthetic minority over-sampling technique (SMOTE) for handling imbalanced data in poverty classification. Statistical Journal of the IAOS. 2023, 39(1): 233-239.

[2]

Alibrahim H, Ludwig S A. Hyperparameter optimization: Comparing genetic algorithm against grid search and bayesian optimization. 2021 IEEE Congress on Evolutionary Computation (CEC). 2021June 28-July 01, 2021

[3]

Alharby M. Blockchain-based system for secure storage and sharing of diabetics healthcare records. 2023 1st International Conference on Advanced Innovations in Smart Cities (ICAISC). 2023Jeddah, Saudi Arabia, January 23–25, 2023

[4]

Ashizawa N, Yanai N, Cruz J P, Okamura S. Eth2vec: Learning contract-wide code representations for vulnerability detection on ethereum smart contracts. Proceedings of the 3rd ACM International Symposium on Blockchain and Secure Critical Infrastructure (BSCI’ 21). 2021Virtual Event, Hong Kong, China, June 7, 2021

[5]

Bari M, Ambaw A, Doroslovacki M. Comparison of machine learning algorithms for raw handwritten digits recognition. 2018 52nd Asilomar Conference on Signals, Systems, and Computers. 2018Pacific Grove, CA, USA, October 28–31, 2018

[6]

Bomprezzi C. Implications of blockchain-based smart contracts on contract law. 2021.

[7]

Chen Y, Sun Z, Gong Z, Hao D. Improving smart contract security with contrastive learning-based vulnerability detection. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE’ 24). 2024April 14–20, 2024

[8]

Chinen Y, Yanai N, Cruz J P, Okamura S. RA: Hunting for re-entrancy attacks in ethereum smart contracts via static analysis. 2020 IEEE International Conference on Blockchain (Blockchain). 2020November 2–6, 2020

[9]

Colin L S H, Mohan P M, Pan J, Keong P L K. An integrated smart contract vulnerability detection tool using multi-layer perceptron on real-time solidity smart contracts. IEEE Access. 2024, 12: 23549-23567.

[10]

De Baets C, Suleiman B, Chitizadeh A, Razzak I (2024). Vulnerability detection in smart contracts: A comprehensive survey. arXiv Preprint arXiv: 2407.07922.https://arxiv.org/abs/2407.07922.

[11]

Demir S, Ahin E K. Evaluation of oversampling methods (OVER, SMOTE, and ROSE) in classifying soil liquefaction dataset based on SVM, RF, and Naïve Bayes. Avrupa Bilim ve Teknoloji Dergisi. 2022, 34: 142-147

[12]

Deng W, Huang T, Wang H. A review of the key technology in a blockchain building decentralized trust platform. Mathematics. 2022, 11(1): 101.

[13]

Ducrée J (2022). Satoshi nakamoto and the origins of bitcoin–The profile of a 1-in-a-billion genius. arXiv Preprint arXiv: 2206.10257.https://arxiv.org/abs/2206.10257.

[14]

Duy P T, Khoa N H, Quyen N H, Trinh L C, Kien V T, Hoang T M, Pham V (2023). VulnSense: Efficient vulnerability detection in ethereum smart contracts by multimodal learning with graph neural network and language model. arXiv Preprint arXiv: 2309.08474.https://arxiv.org/abs/2309.08474.

[15]

Eshghie M, Artho C, Gurov D. Dynamic vulnerability detection on smart contracts using machine learning. Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering (EASE’ 21). 2021June 21–23, 2021

[16]

Feist J, Grieco G, Groce A. Slither: A static analysis framework for smart contracts. 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB’ 19). 2019May 27, 2019

[17]

Fernández A, Garcia S, Herrera F, Chawla N V. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. Journal of Artificial Intelligence Research. 2018, 61: 863-905.

[18]

Ferreira J F, Cruz P, Durieux T, Abreu R. Smartbugs: A framework to analyze solidity smart contracts. Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE’ 20). 2020Virtual Event, Australia, December 21–25, 2020

[19]

Fratello M, Tagliaferri R. Decision trees and random forests. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics. 2018, 1S3374

[20]

Gogineni A K, Swayamjyoti S, Sahoo D, Sahu K K, Kishore R. Multi-class classification of vulnerabilities in smart contracts using AWD-LSTM, with pre-trained encoder inspired from natural language processing. IOP SciNotes. 2020, 13035002.

[21]

Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning. 2002, 46: 389-422.

[22]

HajiHosseinKhani S, Lashkari A H, Oskui A M. Unveiling vulnerable smart contracts: Toward profiling vulnerable smart contracts using genetic algorithm and generating benchmark dataset. Blockchain: Research and Applications. 2024, 51100171

[23]

He D, Wu R, Li X, Chan S, Guizani M. Detection of vulnerabilities of blockchain smart contracts. IEEE Internet of Things Journal. 2023, 101412178-12185.

[24]

Hu S, Huang T, lhan F, Tekin S F, Liu L (2023). Large language model-powered smart contract vulnerability detection: New perspectives. arXiv Preprint arXiv: 2310.01152.https://arxiv.org/abs/2310.01152.

[25]

Huang J, Zhou K, Xiong A, Li D. Smart contract vulnerability detection model based on multi-task learning. Sensors. 2022, 22(5): 1829.

[26]

Jiang Z, Pan T, Zhang C, Yang J. A new oversampling method based on the classification contribution degree. Symmetry. 2021, 132194.

[27]

Jolliffe I T, Cadima J. Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2016, 374206520150202.

[28]

Kemmoe V Y, Stone W, Kim J, Kim D, Son J. Recent advances in smart contracts: A technical overview and state of the art. IEEE Access. 2020, 8: 117782-117801.

[29]

Kovács G. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Applied Soft Computing. 2019, 83: 105662.

[30]

Li J, Lu G, Gao Y, Gao F. A smart contract vulnerability detection method based on multimodal feature fusion and deep learning. Mathematics. 2023, 11234823.

[31]

Liu Z, Qian P, Wang X, Zhuang Y, Qiu L, Wang X. Combining graph neural networks with expert knowledge for smart contract vulnerability detection. IEEE Transactions on Knowledge and Data Engineering. 2021, 3521296-1310

[32]

Liu Z, Jiang M, Zhang S, Zhang J, Liu Y. A smart contract vulnerability detection mechanism based on deep learning and expert rules. IEEE Access. 2023, 11: 77990-77999.

[33]

Mezina A, Ometov A. Detecting smart contract vulnerabilities with combined binary and multiclass classification. Cryptography. 2023, 7334.

[34]

Mohammed M, Alzahrani M, Hejjou A, Alharby M. TrustChain: Trusted blockchain-based system for supply chain traceability. Arabian Journal for Science and Engineering. 20241-19

[35]

Nakamoto S (2008). Bitcoin: A peer-to-peer electronic cash system. https://bitcoin.org/bitcoin.pdf.

[36]

Parthasarathi R, Kaushal P. Tackle the smart contract vulnerabilities. Encyclopedia of Criminal Activities and the Deep Web. 2020919931

[37]

Perez D, Livshits B (2019). Smart contract vulnerabilities: Does anyone care. arXiv Preprint arXiv: 1902.06710.https://arxiv.org/abs/1902.06710.

[38]

Pothuganti S. Review on over-fitting and under-fitting problems in machine learning and solutions. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. 2018, 7(9): 3692-3695

[39]

Qian S, Ning H, He Y, Chen M. Multi-label vulnerability detection of smart contracts based on Bi-LSTM and attention mechanism. Electronics. 2022, 11(19): 3260.

[40]

Rane N L, Paramesha M, Choudhary S P, Rane J. Machine learning and deep learning for big data analytics: A review of methods and applications. Partners Universal International Innovation Journal. 2024, 23172-197

[41]

Riyanto S, Imas, Djatna T, Atikah T D (2023). Comparative analysis using various performance metrics in imbalanced data for multi-class text classification. International Journal of Advanced Computer Science and Applications 14(6).

[42]

Sabha S U, Assad A, Din N M U, Bhat M R. Comparative analysis of oversampling techniques on small and imbalanced datasets using deep learning. 2023 3rd International Conference on Artificial Intelligence and Signal Processing (AISP). 2023March 18–20, 2023

[43]

Sayeed S, Marco-Gisbert H, Caira T. Smart contract: Attacks and protections. IEEE Access. 2020, 8: 24416-24427.

[44]

Sharma T, Zhou Z, Miller A, Wang Y (2022). Exploring security practices of smart contract developers. arXiv Preprint arXiv: 2204.11193.https://arxiv.org/abs/2204.11193.

[45]

Singh A, Parizi R M, Zhang Q, Choo K R, Dehghantanha A. Blockchain smart contracts formalization: Approaches and challenges to address vulnerabilities. Computers & Security. 2020, 88: 101654.

[46]

Sosu R N A, Chen J, Boahen E K, Zhang Z. VdaBSC: A novel vulnerability detection approach for blockchain smart contract by dynamic analysis. IET Software. 2023, 202316631967.

[47]

Sun Y, Gu L. Attention-based machine learning model for smart contract vulnerability detection. 2021 International Conference on Mechanical Engineering, Intelligent Manufacturing and Automation Technology (MEMAT). 2021January 15–17, 2021

[48]

Sun J, Huang S, Zheng C, Wang T, Zong C, Hui Z. Mutation testing for integer overflow in ethereum smart contracts. Tsinghua Science and Technology. 2021, 27127-40.

[49]

Sun J, Yu P, Zhang B. Mutation fuzzy detection of TOD vulnerability in smart contract. 2020 1 6th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD 2020). 2020December 19–21, 2020

[50]

Szabo N (1997). Formalizing and securing relationships on public networks. https://nakamotoinstitute.org/library/formalizing-securing-relationships/.

[51]

Tahir U, Siyal F, Ianni M, Guzzo A, Fortino G. Exploiting bytecode analysis for reentrancy vulnerability detection in ethereum smart contracts. 2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). 2023November 14–17, 2023

[52]

Tang X, Zhou K, Cheng J, Li H, Yuan Y. The vulnerabilities in smart contracts: A survey. Advances in Artificial Intelligence and Security: 7th International Conference, ICAIS 2021. 2021July 19–23, 2021

[53]

Tikhomirov S, Voskresenskaya E, Ivanitskiy I, Takhaviev R, Marchenko E, Alexandrov Y. Smartcheck: Static analysis of ethereum smart contracts. Proceedings of the 1st International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB’ 18). 2018May 27, 2018

[54]

Tsankov P, Dan A, Drachsler-Cohen D, Gervais A, Buenzli F, Vechev M. Securify: Practical security analysis of smart contracts. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS’ 18). 2018October 15–19, 2018

[55]

Vincent N E, Skjellum A, Medury S. Blockchain architecture: A design that helps CPA firms leverage the technology. International Journal of Accounting Information Systems. 2020, 38: 100466.

[56]

Xu Y, Hu G, You L, Cao C. A novel machine learning-based analysis model for smart contract vulnerability. Security and Communication Networks. 2021, 202115798033

[57]

Yacouby R, Axman D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems (Eval4NLP 2020). 2020Virtual Event, Punta Cana, Dominican Republic, November 20, 2020

[58]

Yang Z, Keung J, Zhang M, Xiao Y, Huang Y, Hui T. Smart contracts vulnerability auditing with multi-semantics. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). 2020July 13–17, 2020

[59]

Zhang L, Chen W, Wang W, Jin Z, Zhao C, Cai Z, Chen H. Cbgru: A detection method of smart contract vulnerability based on a hybrid model. Sensors. 2022, 2293577.

[60]

Zheng Z, Xie S, Dai H, Chen W, Chen X, Weng J, Imran M. An overview on smart contracts: Challenges, advances and platforms. Future Generation Computer Systems. 2020, 105475-491.

[61]

Zhu T, Lin Y, Liu Y. Improving interpolation-based oversampling for imbalanced data learning. Knowledge-Based Systems. 2020, 187104826.

[62]

Zhuang Y, Liu Z, Qian P, Liu Q, Wang X, He Q. Smart contract vulnerability detection using graph neural networks. Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence (IJCAI’ 20). 2021January 7–15, 2021

RIGHTS & PERMISSIONS

Systems Engineering Society of China and Springer-Verlag GmbH Germany

PDF

6

Accesses

0

Citation

Detail

Sections
Recommended

/