iSKIN: Integrated application of machine learning and Mondrian conformal prediction to detect skin sensitizers in cosmetic raw materials

Weikaixin Kong , Jie Zhu , Peipei Shan , Huiyan Ying , Tongyu Chen , Bowen Zhang , Chao Peng , Zihan Wang , Yifan Wang , Liting Huang , Suzhen Bi , Weining Ma , Zhuo Huang , Sujie Zhu , Xueyan Liu , Chun Li

SmartMat ›› 2024, Vol. 5 ›› Issue (6) : e1278

PDF
SmartMat ›› 2024, Vol. 5 ›› Issue (6) : e1278 DOI: 10.1002/smm2.1278
RESEARCH ARTICLE

iSKIN: Integrated application of machine learning and Mondrian conformal prediction to detect skin sensitizers in cosmetic raw materials

Author information +
History +
PDF

Abstract

Animal experiments traditionally identify sensitizers in cosmetic materials. However, with growing concerns over animal ethics and bans on such experiments globally, alternative methods like machine learning are gaining prominence for their efficiency and cost-effectiveness. In this study, to develop a robust sensitizer detector model, we first constructed benchmark data sets using data from previous studies and a public database, then 589 sensitizers and 831 nonsensitizers were collected. In addition, a graph-based autoencoder and Mondrian conformal prediction (MCP) were combined to build a robust sensitizer detector, iSKIN. In the independent test set, the Matthews correlation coefficient (MCC) and the area under the receiver operating characteristic curve (ROCAUC) values of the iSKIN model without MCP were 0.472 and 0.804, respectively, which are higher than those of the three baseline models. When setting the significance level in MCP at 0.7, the MCC and ROCAUC values of iSKIN could achieve 0.753 and 0.927, respectively. Regrouping experiments proved that the MCP method is robust in the improvement of model performance. Through key structure analysis, seven key substructures in sensitizers were identified to guide cosmetic material design. Notably, long chains with halogen atoms and phenyl groups with two chlorine atoms at ortho-positions were potential sensitizers. Finally, a user-friendly web tool (http://www.iskin.work/) of the iSKIN model was deployed to be used by other researchers. In summary, the proposed iSKIN model has achieved state-of-the-art performance so far, which can contribute to the safety evaluation of cosmetic raw materials and provide a reference for the chemical structure design of these materials.

Keywords

conformal prediction / cosmetic raw material / deep learning / machine learning

Cite this article

Download citation ▾
Weikaixin Kong, Jie Zhu, Peipei Shan, Huiyan Ying, Tongyu Chen, Bowen Zhang, Chao Peng, Zihan Wang, Yifan Wang, Liting Huang, Suzhen Bi, Weining Ma, Zhuo Huang, Sujie Zhu, Xueyan Liu, Chun Li. iSKIN: Integrated application of machine learning and Mondrian conformal prediction to detect skin sensitizers in cosmetic raw materials. SmartMat, 2024, 5(6): e1278 DOI:10.1002/smm2.1278

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Basketter DA. Risk management of skin sensitisers: a commentary. Regul Toxicol Pharmacol. 2023; 140: 105384.

[2]

Sreedhar D, Manjula N, Pise S, Ligade VJIJCRR. Ban of cosmetic testing on animals: a brief overview. Int J Curr Res Rev. 2020; 12: 113-116.

[3]

Sosa S, Tubaro A, Carlin M, et al. Assessment of skin sensitization properties of few-layer graphene and graphene oxide through the Local Lymph Node Assay (OECD TG 442B). Nano Impact. 2023; 29: 100448.

[4]

Anderson SE, Siegel PD, Meade BJ. The LLNA: a brief review of recent advances and limitations. J Allergy. 2011; 2011: 424203.

[5]

Gu Y, Li J, Kang H, Zhang B, Zheng S. Employing molecular conformations for ligand-based virtual screening with equivariant graph neural network and deep multiple instance learning. Molecules. 2023; 28(16): 5982.

[6]

Wang Z, Gu Y, Zheng S, Yang L, Li J. MGREL: a multi-graph representation learning-based ensemble learning method for gene-disease association prediction. Comput Biol Med. 2023; 155: 106642.

[7]

Gu Y, Zheng S, Yin Q, Jiang R, Li J. REDDA: integrating multiple biological relations to heterogeneous graph neural network for drug-disease association prediction. Comput Biol Med. 2022; 150: 106127.

[8]

Gu Y, Zheng S, Xu Z, Yin Q, Li L, Li J. An efficient curriculum learning-based strategy for molecular graph learning. Brief Bioinform. 2022; 23(3): bbac099.

[9]

Sakai M, Nagayasu K, Shibui N, et al. Prediction of pharmacological activities from chemical structures with graph convolutional neural networks. Sci Rep. 2021; 11(1): 525.

[10]

Yu J, Wang J, Zhao H, et al. Organic compound synthetic accessibility prediction based on the graph attention mechanism. J Chem Inf Model. 2022; 62(12): 2973-2986.

[11]

Lam HYI, Pincket R, Han H, et al. Application of variational graph encoders as an effective generalist algorithm in computer-aided drug design. Nat Mach Intell. 2023; 5(7): 754-764.

[12]

Cortés-Ciriano I, Bender A. Concepts and applications of conformal prediction in computational drug discovery. 2019. doi:10.48550/arXiv.1908.03569

[13]

Ianevski A, Lahtela J, Javarappa KK, et al. Patient-tailored design for selective co-inhibition of leukemic cell subpopulations. Sci Adv. 2021; 7(8): eabe4038.

[14]

Toplak M, Močnik R, Polajnar M, et al. Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models. J Chem Inf Model. 2014; 54(2): 431-441.

[15]

Norinder U, Carlsson L, Boyer S, Eklund M. Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. J Chem Inf Model. 2014; 54(6): 1596-1603.

[16]

Borba JVB, Braga RC, Alves VM, et al. Pred-Skin: a web portal for accurate prediction of human skin sensitizers. Chem Res Toxicol. 2021; 34(2): 258-267.

[17]

Di P, Yin Y, Jiang C, et al. Prediction of the skin sensitising potential and potency of compounds via mechanism-based binary and ternary classification models. Toxicol In Vitro. 2019; 59: 204-214.

[18]

Wilm A, Kühnl J, Kirchmair J. Computational approaches for skin sensitization prediction. Crit Rev Toxicol. 2018; 48(9): 738-760.

[19]

Casey W, Jacobs A, Maull E, Matheson J, Clarke C, Lowit A. A new path forward: the interagency coordinating committee on the validation of alternative methods (ICCVAM) and national toxicology program’s interagency center for the evaluation of alternative toxicological methods (NICEATM). J Am Assoc Lab Anim Sci. 2015; 54(2): 170-173.

[20]

Ghose AK, Viswanadhan VN, Wendoloski JJ. Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: an analysis of ALOGP and CLOGP methods. J Phys Chem A. 1998; 102(21): 3762-3772.

[21]

Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM. Extreme gradient boosting as a method for quantitative structure–activity relationships. J Chem Inf Model. 2016; 56(12): 2353-2360.

[22]

Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003; 43(6): 1947-1958.

[23]

Mei H, Zhou Y, Liang G, Li Z. Support vector machine applied in QSAR modelling. Chin Sci Bull. 2005; 50: 2291-2296.

[24]

Stawiski M, Meier P, Dornberger R, Hanne T. Using the Light gradient boosting machine for prediction in QSAR models. In: Bansal JC, Deep K, Nagar AK, eds. International Joint Conference on Advances in Computational Intelligence. Springer Nature Singapore; 2022.

[25]

Ajmani S, Jadhav K, Kulkarni SA. Three-dimensional QSAR using the k-nearest neighbor method and its interpretation. J Chem Inf Model. 2006; 46(1): 24-31.

[26]

Laredo D, Ma SF, Leylaz G, Schütze O, Sun J-Q. Automatic model selection for fully connected neural networks. Int J Dyn Control. 2020; 8: 1063-1079.

[27]

Cortes-Ciriano I. Bioalerts: a python library for the derivation of structural alerts from bioactivity and toxicity data sets. J Cheminf. 2016; 8: 13.

[28]

Alvarsson J, Arvidsson McShane S, Norinder U, Spjuth O. Predicting with confidence: using conformal prediction in drug discovery. J Pharm Sci. 2021; 110(1): 42-49.

[29]

Kipf TN, Welling MJ. Semi-supervised classification with graph convolutional networks. 2016. http://arxiv.org/abs/arXiv:1609.02907

[30]

Kearnes S, McCloskey K, Berndl M, Pande V, Riley P. Molecular graph convolutions: moving beyond fingerprints. J Comput Aided Mol Des. 2016; 30(8): 595-608.

[31]

Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. 2018. https://arxiv.org/abs/1710.10903

[32]

Voicu A, Duteanu N, Voicu M, Vlad D, Dumitrascu V. The rcdk and cluster R packages applied to drug candidate selection. J Cheminf. 2020; 12: 3.

[33]

Cao Y, Charisi A, Cheng L-C, Jiang T, Girke T. ChemmineR: a compound mining framework for R. Bioinformatics. 2008; 24(15): 1733-1734.

[34]

Tong X, Wang D, Ding X, et al. Blood-brain barrier penetration prediction enhanced by uncertainty estimation. J Cheminf. 2022; 14(1): 44.

[35]

Yin T, Panapitiya G, Coda ED, Saldanha EG. Evaluating uncertainty-based active learning for accelerating the generalization of molecular property prediction. J Cheminf. 2023; 15(1): 105.

[36]

Zhang C, Cheng F, Li W, Liu G, Lee PW, Tang Y. In silico prediction of drug induced liver toxicity using substructure pattern recognition method. Mol Inf. 2016; 35(3-4): 136-144.

[37]

Bassan A, Alves VM, Amberg A, et al. In silico approaches in organ toxicity hazard assessment: current status and future needs in predicting liver toxicity. Comput Toxicol. 2021; 20: 100187.

[38]

Liu J, Guo W, Sakkiah S, et al Machine learning models for predicting liver toxicity. Methods Mol Biol. 2022; 2425: 393-415.

[39]

Savale SK. Genotoxicity of drugs: introduction, prediction and evaluation. Asian J Biomater Res. 2018; 4(6): 1-29.

[40]

Snyder RD, Smith MD. Computational prediction of genotoxicity: room for improvement. Drug Discov Today. 2005; 10(16): 1119-1124.

[41]

Glück J, Buhrke T, Frenzel F, Braeuning A, Lampen A. In silico genotoxicity and carcinogenicity prediction for food-relevant secondary plant metabolites. Food Chem Toxicol. 2018; 116: 298-306.

[42]

Maharana K, Mondal S, Nemade B. A review: data pre-processing and data augmentation techniques. Glob Transit Proc. 2022; 3(1): 91-99.

RIGHTS & PERMISSIONS

2024 The Authors. SmartMat published by Tianjin University and John Wiley & Sons Australia, Ltd.

AI Summary AI Mindmap
PDF

218

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/