Advances in small molecule representations and AI-driven drug research: bridging the gap between theory and application

Junxi Liu , Shan Chang , Qingtian Deng , Yulian Ding , Yi Pan

Chinese Journal of Natural Medicines ›› 2025, Vol. 23 ›› Issue (11) : 1391 -1208.

PDF (5082KB)
Chinese Journal of Natural Medicines ›› 2025, Vol. 23 ›› Issue (11) :1391 -1208. DOI: 10.1016/S1875-5364(25)60946-0
Review
research-article

Advances in small molecule representations and AI-driven drug research: bridging the gap between theory and application

Author information +
History +
PDF (5082KB)

Abstract

Artificial intelligence (AI) researchers and cheminformatics specialists strive to identify effective drug precursors while optimizing costs and accelerating development processes. Digital molecular representation plays a crucial role in achieving this objective by making molecules machine-readable, thereby enhancing the accuracy of molecular prediction tasks and facilitating evidence-based decision making. This study presents a comprehensive review of small molecular representations and AI-driven drug discovery downstream tasks utilizing these representations. The research methodology begins with the compilation of small molecule databases, followed by an analysis of fundamental molecular representations and the models that learn these representations from initial forms, capturing patterns and salient features across extensive chemical spaces. The study then examines various drug discovery downstream tasks, including drug-target interaction (DTI) prediction, drug-target affinity (DTA) prediction, drug property (DP) prediction, and drug generation, all based on learned representations. The analysis concludes by highlighting challenges and opportunities associated with machine learning (ML) methods for molecular representation and improving downstream task performance. Additionally, the representation of small molecules and AI-based downstream tasks demonstrates significant potential in identifying traditional Chinese medicine (TCM) medicinal substances and facilitating TCM target discovery.

Keywords

Small molecular representation / Drug-target interaction prediction / Drug-target affinity prediction / Drug property prediction / De novo drug generation / Traditional Chinese medicine

Cite this article

Download citation ▾
Junxi Liu, Shan Chang, Qingtian Deng, Yulian Ding, Yi Pan. Advances in small molecule representations and AI-driven drug research: bridging the gap between theory and application. Chinese Journal of Natural Medicines, 2025, 23(11): 1391-1208 DOI:10.1016/S1875-5364(25)60946-0

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov. 2004; 3(8):711-716.https://doi.org/10.1038/nrd1470.

[2]

Paul SM, Mytelka DS, Dunwiddie CT, et al. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov. 2010; 9(3):203-214. https://doi.org/10.1038/nrd3078.

[3]

Dickson M, Gagnon JP. Key factors in the rising cost of new drug discovery and development. Nat Rev Drug Discov. 2004; 3(5):417-429. https://doi.org/10.1038/nrd1382.

[4]

Wong CH, Siah KW, Lo AW. Estimation of clinical trial success rates and related parameters. Biostatistics. 2019; 20(2):273-286. https://doi.org/10.1093/biostatistics/kxy072.

[5]

Omejc M. Drug development: the journey of a medicine from lab to shelf. J Dev Drugs. 2020; 9(1):e115. https://doi.org/10.35248/2329-6631.20.9.e155.

[6]

Knox C, Wilson M, Klinger CM, et al. DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res. 2024; 52(D1):D1265-D1275.https://doi.org/10.1093/nar/gkad976.

[7]

Kim S, Chen J, Cheng T, et al.PubChem 2023 update. Nucleic Acids Res.2023; 51(D1):D1373-D1380. https://doi.org/10.1093/nar/gkac956.

[8]

Hearst MA, Dumais ST, Osuna E, et al.Support vector machines. IEEE Intell Syst. 1998; 13(4):18-28. https://doi.org/10.1109/5254.708428.

[9]

Breiman L. Random forests. Mach Learn. 2001; 45:5-32. https://doi.org/10.1023/A:1010933404324.

[10]

Albawi S, Mohammed TA, AlZawi S, et al. Understanding of a convolutional neural network. ICET’17. 2017;1-6.https://doi.org/10.1109/ICEngTechnol.2017.8308186.

[11]

Hochreiter S, Schmidhuber J.Long short-term memory. Neural Comput. 1997; 9(8):1735-80. https://doi.org/10.1162/neco.1997.9.8.1735.

[12]

Devlin J, Chang MW, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv. 2018; 04805. https://doi.org/10.18653/v1/N19-1423.

[13]

Floridi L, Chiriatti M. GPT-3: Its nature, scope, limits, and consequences. Minds Mach. 2020; 30:681-694. https://doi.org/10.1007/s11023-020-09548-1.

[14]

Mouchlis VD, Melagraki G, Zacharia LC, et al. Computer-aided drug design of β-secretase, γ-secretase and anti-tau inhibitors for the discovery of novel Alzheimer’s therapeutics. Int J Mol Sci. 2020; 21(3):703. https://doi.org/10.3390/ijms21030703.

[15]

Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model. 2012; 52(6):1413-1437.https://doi.org/10.1021/ci200409x.

[16]

Favre HA, Powell WH. International union of pure and applied chemistry. Nomenclature of organic chemistry: IUPAC recommendations and preferred names 2013. London. LD: LRSC Publishing. 2013.

[17]

Weininger D. SMILES, a chemical language and information system.1. Introduction to methodology and encoding rules. Chem Inf Comput Sci. 1988; 28(1):31-36. https://doi.org/10.1021/ci00057a005.

[18]

Kearnes S, McCloskey K, Berndl M, et al. Molecular graph convolutions: moving beyond fingerprints. J Comput-Aided Mol Des. 2016; 30:595-608. https://doi.org/10.1007/s10822-016-9938-8.

[19]

Lawlor B. The chemical structure association trust: advancing scientific discovery for fifty years. Chem Int. 2016; 38(2):12-15. https://doi.org/10.1515/ci-2016-0206.

[20]

Abramson J, Adler J, Dunger J, et al.Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024; 630(8016):493-500. https://doi.org/10.1038/s41586-024-07487-w.

[21]

He X, Li J, Shen S, et al. AlphaFold3 versus experimental structures: assessment of the accuracy in ligand-bound G protein-coupled receptors. Acta Pharmacol Sin. 2025; 46(4):1111-1122. https://doi.org/10.1038/s41401-024-01429-y.

[22]

Tang X, Dai H, Knight E, et al. A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation. Briefings Bioinf. 2024; 25(4):bbae338. https://doi.org/10.1093/bib/bbae338.

[23]

Suruliandi A, Idhaya T, Raja S. Drug target interaction prediction using machine learning techniques-a review. Int J Interact Multimedia Artif Intell. 2024; 8(6):86-100. https://doi.org/10.9781/ijimai.2022.11.002.

[24]

Zeng X, Li S, Lv S, et al. A comprehensive review of the recent advances on predicting drug-target affinity based on deep learning. Front Pharmacol. 2024;15:1375522. https://doi.org/10.3389/fphar.2024.1375522.

[25]

Chen M, Jiang Y, Lei X, et al. Drug-target interactions prediction based on signed heterogeneous graph neural networks. Chin J Electron. 2024; 33(1):231-244. https://doi.org/10.23919/cje.2022.00.384.

[26]

Singh S, Kaur N, Gehlot A. Application of artificial intelligence in drug design: a review. Comput Biol Med. 2024;179:108810. https://doi.org/10.1016/j.compbiomed.2024.108810.

[27]

Walters WP, Murcko M. Assessing the impact of generative AI on medicinal chemistry. Nat Biotechnol. 2020; 38(2):143-145. https://doi.org/10.1038/s41587-020-0418-2.

[28]

Mobley DL, Guthrie JP. FreeSolv: a database of experimental and calculated hydration free energies, with input files. J Comput-Aided Mol Des. 2014; 28:711-720. https://doi.org/10.1007/s10822-014-9747-x.

[29]

Simian C, Binxin D, Zhang D, et al. Advances in intelligent mass spectrometry data processing technology for in vivo analysis of natural medicines. Chin J Nat Med. 2024; 22(10):900-913. https://doi.org/10.1016/S1875-5364(24)60687-4.

[30]

Li Shao, Xiao Wei. General expert consensus on the application of network pharmacology in the research and development of new traditional Chinese medicine drugs. Chin J Nat Med. 2025; 23(2):129-142. http://10.1016/S1875-5364(25)60802-8.

[31]

Cannon M, Stevenson J, Stahl K, et al. DGIdb 5. 0:rebuilding the drug-gene interaction database for precision medicine and drug discovery platforms. Nucleic Acids Res. 2024; 52(D1):D1227-D1235. https://doi.org/10.1093/nar/gkad1040.

[32]

Harding SD, Armstrong JF, Faccenda E, et al. The IUPHAR/BPS guide to Pharmacology in 2024. Nucleic Acids Res. 2024; 52(D1):D1438-D1449. https://doi.org/10.1093/nar/gkad944.

[33]

Avram S, Wilson TB, Curpan R, et al.DrugCentral 2023 extends human clinical data and integrates veterinary drugs. Nucleic Acids Res.2023; 51(D1):D1276-D1287. https://doi.org/10.1093/nar/gkac1085.

[34]

Zhou Y, Zhang Y, Zhao D, et al. TTD: therapeutic target database describing target druggability information. Nucleic Acids Res. 2024; 52(D1):D1465-D77. https://doi.org/10.1093/nar/gkad751.

[35]

Tingle BI, Tang KG, Castanon M, et al. ZINC-22─ A free multi-billion-scale database of tangible compounds for ligand discovery. J Chem Inf Model. 2023; 63(4):1166-76. https://doi.org/10.1021/acs.jcim.2c01253.

[36]

Gallo K, Goede A, Eckert A, et al. PROMISCUOUS 2.0: a resource for drug-repositioning. Nucleic Acids Res. 2021; 49(D1):D1373-D1380. https://doi.org/10.1093/nar/gkaa1061.

[37]

Gaulton A, Hersey A, Nowotka M, et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017; 45(D1):D945-D954. https://doi.org/10.1093/nar/gkw1074.

[38]

Siramshetty VB, Eckert OA, Gohlke BO, et al. SuperDRUG2: a one stop resource for approved/marketed drugs. Nucleic Acids Res. 2018; 46(D1):D1137-D1143. https://doi.org/10.1093/nar/gkx1088.

[39]

Kuhn M, Letunic I, Jensen LJ, et al. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016; 44(D1):D1075-D1079. https://doi.org/10.1093/nar/gkv1075.

[40]

Kanehisa M, Furumichi M, Tanabe M, et al.KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017; 45(D1):D353-D361. https://doi.org/10.1093/nar/gkw1092.

[41]

Szklarczyk D, Santos A, Von Mering C, et al. STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Res. 2016; 44(D1):D380-D384. https://doi.org/10.1093/nar/gkv1277.

[42]

Gilson MK, Liu T, Baitaluk M, et al. BindingDB in 2015:a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 2016; 44(D1):D1045-D1053. https://doi.org/10.1093/nar/gkv1072.

[43]

Ru J, Li P, Wang J, et al. TCMSP: a database of systems pharmacology for drug discovery from herbal medicines. J Cheminf. 2014; 6:1-6. https://doi.org/10.1186/1758-2946-6-13.

[44]

Kumar R, Chaudhary K, Gupta S, et al. CancerDR: cancer drug resistance database. Sci Rep. 2013; 3(1):1445. https://doi.org/10.1038/srep01445.

[45]

Hecker N, Ahmed J, Von Eichborn J, et al. SuperTarget goes quantitative: update on drug-target interactions. Nucleic Acids Res. 2012; 40(D1):D1113-D1117. https://doi.org/10.1093/nar/gkr912.

[46]

Xu Y, Lin K, Wang S, et al.Deep learning for molecular generation. Future Med Chem. 2019; 11(6):567-597. https://doi.org/10.4155/fmc-2018-0358.

[47]

Carpenter KA, Cohen DS, Jarrell JT, et al. Deep learning and virtual drug screening. Future Med Chem. 2018; 10(21):2557-2567. https://doi.org/10.4155/fmc-2018-0314.

[48]

O’Boyle NM. Towards a universal SMILES representation-A standard method to generate canonical SMILES based on the InChI. J Cheminf. 2012; 4:1-14. https://doi.org/10.1186/1758-2946-4-22.

[49]

Homer RW, Swanson J, Jilek RJ, et al. SYBYL line notation (SLN): a single notation to represent chemical structures, queries, reactions, and virtual libraries. J Chem Inf Model. 2008; 48(12):2294-307. https://doi.org/10.1021/ci7004687.

[50]

Atz K, Cotos L, Isert C, et al. Prospective de novo drug design with deep interactome learning. Nat Commun. 2024; 15(1):3408. https://doi.org/10.1038/s41467-024-47613-w.

[51]

Yang J, Cai Y, Zhao K, et al. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today. 2022; 27(11):103356. https://doi.org/10.1016/j.drudis.2022.103356.

[52]

Rarey M, Dixon JS. Feature trees: a new molecular similarity measure based on tree matching. J Comput-Aided Mol Des. 1998; 12:471-490. https://doi.org/10.1023/a:1008068904628.

[53]

Rong X. word2vec parameter learning explained. arXiv. 2014; 14112738. https://doi.org/10.48550/arXiv.1411.2738.

[54]

Zhang H, Saravanan KM, Yang Y, et al. Generating and screening de novo compounds against given targets using ultrafast deep learning models as core components. Briefings Bioinf. 2022; 23(4):bbac226. https://doi.org/10.1093/bib/bbac226.

[55]

Gupta A, Müller AT, Huisman BJ, et al. Generative recurrent networks for de novo drug design. Mol Inf. 2018; 37(1-2):1700111. https://doi.org/10.1002/minf.201700111.

[56]

Wu C, Zhang X, Yang Z, et al. Learning to SMILES: ban-based strategies to improve latent representation learning from molecules. Briefings Bioinf. 2021; 22(6):bbab327. https://doi.org/10.1093/bib/bbab327.

[57]

Wang Y, You Z, Yang S, et al. A deep learning-based method for drug-target interaction prediction based on long short-term memory neural network. BMC Med Inf Decis Making. 2020; 20:1-9. https://doi.org/10.1186/s12911-020-1052-0.

[58]

Wolf T, Debut L, Sanh V, et al.Transformers: state-of-the-art natural language processing. EMNLP’2020. 2020. https://doi.org/10.18653/v1/2020.emnlp-demos.6.

[59]

Huang K, Xiao C, Glass LM, et al. MolTrans: molecular interaction transformer for drug-target interaction prediction. Bioinformatics. 2021; 37(6):830-836. https://doi.org/10.1093/bioinformatics/btaa880.

[60]

Balaji S, Magar R, Jadhav Y, et al. Gpt-molberta: Gpt molecular features language model for molecular property prediction. arXiv. 2023; 231003030. https://doi.org/10.48550/arXiv.2310.03030.

[61]

Kalemati M, Zamani Emani M, Koohi S. BiComp-DTA: drug-target binding affinity prediction through complementary biological-related and compression-based featurization approach. PLoS Comput Biol. 2023; 19(3):e1011036. https://doi.org/10.1371/journal.pcbi.1011036.

[62]

Creswell A, White T, Dumoulin V, et al.Generative adversarial networks: an overview. IEEE Signal Process Mag. 2018; 35(1):53-65. https://doi.org/10.1109/MSP.2017.2765202.

[63]

Zhao L, Wang J, Pang L, et al. GANsDTA: predicting drug-target binding affinity using GANs. Front Genet. 2020;10:1243. https://doi.org/10.3389/fgene.2019.01243.

[64]

Chen Z, You Z, Guo Z, et al. Prediction of drug-target interactions from multi-molecular network based on deep walk embedding model. Front Bioeng Biotechnol. 2020;8:338. https://doi.org/10.3389/fbioe.2020.00338.

[65]

Scarselli F, Gori M, Tsoi AC, et al.The graph neural network model. IEEE Trans Neural Networks. 2008; 20(1):61-80. https://doi.org/10.1109/TNN.2008.2005605.

[66]

Jiang M, Li Z, Zhang S, et al. Drug-target affinity prediction using graph neural network and contact maps. RSC Adv. 2020; 10(35):20701-20712. https://doi.org/10.1039/D0RA02297G.

[67]

Zhai H, Hou H, Luo J, et al. DGDTA: dynamic graph attention network for predicting drug-target binding affinity. BMC Bioinf. 2023; 24(1):367. https://doi.org/10.1186/s12859-023-05497-5.

[68]

Nguyen T, Le H, Quinn TP, et al. GraphDTA: predicting drug-target binding affinity with graph neural networks. Bioinformatics. 2021; 37(8):1140-1147. https://doi.org/10.1093/bioinformatics/btaa921.

[69]

Velickovic P, Cucurull G, Casanova A, et al.Graph attention networks. 6th International Conference on Learning Representations Iclr 2018 Conference Track Proceedings. 2017; 1050(20):48550.https://doi.org/10.17863/CAM.48429.

[70]

Xu K, Hu W, Leskovec J, et al. How powerful are graph neural networks? arXiv. 2018;1810.00826.https://doi.org/10.48550/arXiv.1810.00826.

[71]

Zhang S, Tong H, Xu J, et al. Graph convolutional networks: a comprehensive review. Comput Soc Netw. 2019; 6(1):1-23. https://doi.org/10.1186/s40649-019-0069-y.

[72]

Qi H, Yu T, Yu W, et al. Drug-target affinity prediction with extended graph learning-convolutional networks. BMC Bioinf. 2024; 25(1):75. https://doi.org/10.1186/s12859-024-05698-6.

[73]

Bai P, Miljković F, John B, et al. Interpretable bilinear attention network with domain adaptation improves drug-target prediction. Nat Mach Intell. 2023; 5(2):126-136. https://doi.org/10.1038/s42256-022-00605-1.

[74]

Fang X, Liu L, Lei J, et al. Geometry-enhanced molecular representation learning for property prediction. Nat Mach Intell. 2022; 4(2):127-134. https://doi.org/10.1038/s42256-021-00438-4.

[75]

Xia J, Zhao C, Hu B, et al. Mole-bert: rethinking pre-training graph neural networks for molecules. ChemRxiv. 2023. https://doi.org/10.26434/chemrxiv-2023-dngg4.

[76]

Rifaioglu AS, Nalbat E, Atalay V, et al. DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations. Chem Sci. 2020; 11(9):2531-2557. https://doi.org/10.1039/C9SC03414E.

[77]

Zeng X, Xiang H, Yu L, et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nat Mach Intell. 2022; 4(11):1004-1016. https://doi.org/10.1038/s42256-022-00557-6.

[78]

Casey AD, Son SF, Bilionis I, et al. Prediction of energetic material properties from electronic structure using 3D convolutional neural networks. J Chem Inf Model. 2020; 60(10):4457-4573. https://doi.org/10.1021/acs.jcim.0c00259.

[79]

Kuzminykh D, Polykovskiy D, Kadurin A, et al. 3D molecular representations based on the wave transform for convolutional neural networks. Mol Pharmaceutics. 2018; 15(10):4378-4385. https://doi.org/10.1021/acs.molpharmaceut.7b01134.

[80]

Wang Z, Liu M, Luo Y, et al. Advanced graph and sequence neural networks for molecular property prediction and drug discovery. Bioinformatics. 2022; 38(9):2579-2586. https://doi.org/10.1093/bioinformatics/btac112.

[81]

Clevert DA, Le T, Winter R, et al. Img2Mol-accurate SMILES recognition from molecular graphical depictions. Chem Sci. 2021; 12(42):14174-14181. https://doi.org/10.1039/D1SC01839F.

[82]

Li J, Jiang X. Mol‐BERT: an effective molecular representation with BERT for molecular property prediction. Wirel Commun Mob Comput. 2021; 2021(1): 7181815. https://doi.org/10.1155/2021/7181815.

[83]

Stärk H, Beaini D, Corso G, et al. 3d infomax improves gnns for molecular property prediction. arXiv. 2022;2110. 04126. https://doi.org/10.48550/arXiv.2110.04126.

[84]

Xia X, Zhu C, Zhong F, et al. MDTips: a multimodal-data-based drug-target interaction prediction system fusing knowledge, gene expression profile, and structural data. Bioinformatics. 2023; 39(7):btad411. https://doi.org/10.1093/bioinformatics/btad411.

[85]

Liu S, Nie W, Wang C, et al. Multi-modal molecule structure-text model for text-based retrieval and editing. Nat Mach Intell. 2023; 5(12):1447-1457. https://doi.org/10.1038/s42256-023-00759-6.

[86]

Edwards C, Zhai C, Ji H, et al.Text2mol: cross-modal molecule retrieval with natural language queries. EMNLP’2021. 2021.https://doi.org/10.18653/v1/2021.emnlp-main.47.

[87]

Zhang Q, Wei Y, Liao B, et al. MMD-DTA: a multi-modal deep learning framework for drug-target binding affinity and binding region prediction. IEEE ACM T COMPUT BI. 2024; 21(6):2200-2211. https://doi.org/10.1109/TCBB.2024.3451985.

[88]

Lu X, Xie L, Xu L, et al. Multimodal fused deep learning for drug property prediction: integrating chemical language and molecular graph. Comput Struct Biotechnol J. 2024; 23:1666-1679. https://doi.org/10.1016/j.csbj.2024.04.030.

[89]

Xie L, Xu L, Kong R, et al. Improvement of prediction performance with conjoint molecular fingerprint in deep learning. Front Pharmacol. 2020;11:606668. https://doi.org/10.3389/fphar.2020.606668.

[90]

Meng Z, Chen C, Zhang X, et al. Exploring fragment adding strategies to enhance molecule pretraining in AI-driven drug discovery. Big Data Min Anal. 2024; 7(3):565-576. https://doi.org/10.26599/BDMA.2024.9020003.

[91]

Kim D, Lee W, Hwang SJ. Mol-LLaMA: towards general understanding of molecules in large molecular language model. arXiv. 2025; 250213449. https://doi.org/10.48550/arXiv.2502.13449.

[92]

Lee C, Song Y, Jeong Y, et al. Mol-LLM: generalist molecular LLM with improved graph utilization. arXiv. 2025; 250202810. https://doi.org/10.48550/arXiv.2502.02810.

[93]

Ahmad W, Simon E, Chithrananda S, et al. Chemberta-2: towards chemical foundation models. arXiv. 2022; 220901712. https://doi.org/10.48550/arXiv.2209.01712.

[94]

AbdelAty H, Gould IR. Large-scale distributed training of transformers for chemical fingerprinting. J Chem Inf Model. 2022; 62(20):4852-4862. https://doi.org/10.1021/acs.jcim.2c00715.

[95]

Ross J, Belgodere B, Chenthamarakshan V, et al. Large-scale chemical language representations capture molecular structure and properties. Nat Mach Intell. 2022; 4(12):1256-1264. https://doi.org/10.1038/s42256-022-00580-7.

[96]

Irwin R, Dimitriadis S, He J, et al. Chemformer: a pre-trained transformer for computational chemistry. Mach Learn: Sci Technol. 2022; 3(1):015022. https://doi.org/10.1088/2632-2153/ac3ffb.

[97]

Xue D, Zhang H, Xiao D, et al. X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis. bioRxiv. 2020; 12. 23.424259.https://doi.org/10.1016/j.scib.2022.01.029.

[98]

Zhou G, Gao Z, Ding Q, et al. Uni-mol: a universal 3d molecular representation learning framework. ChemRxiv. 2023.https://doi.org/10.26434/chemrxiv-2022-jjm0j.

[99]

Toniato A, Vaucher AC, Schwaller P, et al. Enhancing diversity in language based models for single-step retrosynthesis. Digit Discov. 2023; 2(2):489-501. https://doi.org/10.1039/d2dd00110a.

[100]

Yüksel A, Ulusoy E, Ünlü A, et al. SELFormer: molecular representation learning via SELFIES language models. Mach Learn: Sci Technol. 2023; 4(2):025035. https://doi.org/10.1088/2632-2153/acdb30.

[101]

Cho KH, No KT. iupacGPT: IUPAC-based large-scale molecular pre-trained model for property prediction and molecule generation. Mol Divers. 2025; 2025: 1-9.https://doi.org/10.1007/s11030-025-11280-w.

[102]

Liu Y, Ding S, Zhou S, et al. Moleculargpt: open large language model (llm) for few-shot molecular property prediction. arXiv. 2024; 240612950. https://doi.org/10.48550/arXiv.2406.12950.

[103]

Bagherian M, Sabeti E, Wang K, et al. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Briefings Bioinf. 2021; 22(1):247-269. https://doi.org/10.1093/bib/bbz157.

[104]

An Q, Yu L. A heterogeneous network embedding framework for predicting similarity-based drug-target interactions. Briefings Bioinf. 2021; 22(6):bbab275. https://doi.org/10.1093/bib/bbab275.

[105]

Yu L, Qiu W, Lin W, et al. HGDTI: predicting drug-target interaction by using information aggregation based on heterogeneous graph neural network. BMC Bioinf. 2022; 23(1):126. https://doi.org/10.1186/s12859-022-04655-5.

[106]

Li Y, Qiao G, Wang K, et al. Drug-target interaction predication via multi-channel graph neural networks. Briefings Bioinf. 2022; 23(1):bbab346. https://doi.org/10.1093/bib/bbab346.

[107]

Cheng Z, Yan C, Wu F, et al. Drug-target interaction prediction using multi-head self-attention and graph attention network. IEEE ACM T COMPUT BI. 2021; 19(4):2208-18. https://doi.org/10.1109/TCBB.2021.3077905.

[108]

Zhang Y, Hu Y, Han N, et al. A survey of drug-target interaction and affinity prediction methods via graph neural networks. Comput Biol Med. 2023;163:107136. https://doi.org/10.1016/j.compbiomed.2023.107136.

[109]

Lu Z, Song G, Zhu H, et al. DTIAM: a unified framework for predicting drug-target interactions, binding affinities and drug mechanisms. Nat Commun. 2025; 16(1):2548. https://doi.org/10.1038/s41467-025-57828-0.

[110]

Wang N, Li P, Hu X, et al. Herb target prediction based on representation learning of symptom related heterogeneous network. Comput Struct Biotechnol J. 2019; 17:282-290. https://doi.org/10.1016/j.csbj.2019.02.002.

[111]

Duan P, Yang K, Su X, et al. HTINet2: herb-target prediction via knowledge graph embedding and residual-like graph neural network. Briefings Bioinf. 2024; 25(5):bbae414. https://doi.org/10.1093/bib/bbae414.

[112]

He H, Chen G, Chen CYC. NHGNN-DTA: a node-adaptive hybrid graph neural network for interpretable drug-target binding affinity prediction. Bioinformatics. 2023; 39(6):btad355. https://doi.org/10.1093/bioinformatics/btad355.

[113]

Yang Z, Zhong W, Zhao L, et al. MGraphDTA: deep multiscale graph neural network for explainable drug-target binding affinity prediction. Chem Sci. 2022; 13(3):816-33. https://doi.org/10.1039/D1SC05180F.

[114]

Lin S, Shi C, Chen J. GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery. BMC Bioinf. 2022; 23(1):367. https://doi.org/10.1186/s12859-022-04905-6.

[115]

Liu T, Yang X, Zhou H, et al. A survey of collaborative filtering recommender algorithms based on graph neural networks. J Integr Technol. 2024; 13(4):1-15. https://doi.org/10.12146/j.issn.2095-3135.20230731001.

[116]

Kuang T, Liu P, Ren Z. Impact of domain knowledge and multi-modality on intelligent molecular property prediction: a systematic survey. Big Data Min Anal. 2024; 7(3):858-88. https://doi.org/10.26599/bdma.2024.9020028.

[117]

Wu Z, Ramsundar B, Feinberg EN, et al. MoleculeNet: a benchmark for molecular machine learning. Chem Sci. 2018; 9(2):513-30. https://doi.org/10.1039/C7SC02664A.

[118]

Zhang Z, Liu Q, Wang H, et al. Motif-based graph self-supervised learning for molecular property prediction. Adv Neural Inf Process Syst. 2021; 34:15870-15882. https://doi.org/10.48550/arXiv.2110.00987.

[119]

Withnall M, Lindelöf E, Engkvist O, et al. Building attention and edge message passing neural networks for bioactivity and physical-chemical property prediction. J Cheminf. 2020; 12(1):1. https://doi.org/10.1186/s13321-019-0407-y.

[120]

Siramshetty V, Williams J, Nguyễn DT, et al. Validating ADME QSAR models using marketed drugs. Slas Discov. 2021; 26(10):1326-1336. https://doi.org/10.1177/24725552211017520.

[121]

Hou T, Wang J, Zhang W, et al. ADME evaluation in drug discovery. 7. Prediction of oral absorption by correlation and classification. J Chem Inf Model. 2007; 47(1):208-218. https://doi.org/10.1021/ci600343x.

[122]

Broccatelli F, Carosati E, Neri A, et al. A novel approach for predicting P-glycoprotein (ABCB1) inhibition using molecular interaction fields. J Med Chem. 2011; 54(6):1740-1751. https://doi.org/10.1021/jm101421d.

[123]

Ma C, Yang S, Zhang H, et al. Prediction models of human plasma protein binding rate and oral bioavailability derived by using GA-CG-SVM method. J Pharm Biomed Anal. 2008; 47(4-5):677-682. https://doi.org/10.1016/j.jpba.2008.03.023.

[124]

Wang N, Dong J, Deng Y, et al. ADME properties evaluation in drug discovery: prediction of Caco-2 cell permeability using a combination of NSGA-II and boosting. J Chem Inf Model. 2016; 56(4):763-773. https://doi.org/10.1021/acs.jcim.5b00642.

[125]

Wenlock M, Tomkinson N. Experimental in vitro DMPK and physicochemical data on a set of publicly disclosed compound. ChEMBL. 2015. https://doi org/106019/CHEMBL3301361.

[126]

Sorkun MC, Khetan A, Er S. AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds. Sci Data. 2019; 6(1):143. https://doi.org/10.1038/s41597-019-0151-1.

[127]

Lombardo F, Jing Y. In silico prediction of volume of distribution in humans. Extensive data set and the exploration of linear and nonlinear methods coupled with molecular interaction fields descriptors. J Chem Inf Model. 2016; 56(10):2042-2052. https://doi.org/10.1021/acs.jcim.6b00044.

[128]

Veith H, Southall N, Huang R, et al. Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries. Nat Biotechnol. 2009; 27(11):1050-1055. https://doi.org/10.1038/nbt.1581.

[129]

CarbonMangels M, Hutter MC. Selecting relevant descriptors for classification by bayesian estimates: a comparison with decision trees and support vector machines approaches for disparate data sets. Mol Inform. 2011; 30(10):885-895. https://doi.org/10.1002/minf.201100069.

[130]

Obach RS, Lombardo F, Waters NJ. Trend analysis of a database of intravenous pharmacokinetic parameters in humans for 670 drug compounds. Drug Metab Dispos. 2008; 36(7):1385-1405. https://doi.org/10.1124/dmd.118.082966.

[131]

Xu Z, Lei X, Ma M, et al. Molecular generation and optimization of molecular properties using a transformer model. Big Data Min Anal. 2023; 7(1):142-155. https://doi.org/10.26599/BDMA.2023.9020009.

[132]

Xia C, Tang Q. Uncovering the statistical trends of protein evolution with AlphaFold database. J Integr Technol. 2023; 13(2):74-88. https://doi.org/10.12146/j.issn.2095-3135.20230912001.

[133]

Mouchlis VD, Afantitis A, Serra A, et al. Advances in de novo drug design: from conventional to machine learning methods. Int J Mol Sci. 2021; 22(4):1676. https://doi.org/10.3390/ijms22041676.

[134]

You J, Ying R, Ren X, et al. Graphrnn: generating realistic graphs with deep auto-regressive models. ICML. 2018. https://arxiv.org/abs/1802.08773.

[135]

Grisoni F, Moret M, Lingwood R, et al. Bidirectional molecule generation with recurrent neural networks. J Chem Inf Model. 2020; 60(3):1175-1183. https://doi.org/10.1021/acs.jcim.9b00943.

[136]

Zhang H, Saravanan KM, Wei Y, et al. Deep learning-based bioactive therapeutic peptide generation and screening. J Chem Inf Model. 2023; 63(3):835-845. https://doi.org/10.1021/acs.jcim.2c01485.

[137]

Grisoni F, Huisman BJ, Button AL, et al. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci Adv. 2021; 7(24):eabg3338. https://doi.org/10.1126/sciadv.abg3338.

[138]

ArúsPous J, Patronov A, Bjerrum EJ, et al. SMILES-based deep generative scaffold decorator for de-novo drug design. J Cheminf. 2020; 12:1-18. https://doi.org/10.1186/s13321-020-00441-8.

[139]

Guimaraes GL, SanchezLengeling B, Outeiral C, et al. Objective-reinforced generative adversarial networks (organ) for sequence generation models. arXiv. 2017; 170510843. https://doi.org/10.48550/arXiv.1705.10843.

[140]

De Cao N, Kipf T. MolGAN: an implicit generative model for small molecular graphs. arXiv. 2018; 180511973. https://doi.org/10.48550/arXiv.1805.11973.

[141]

Wang F, Feng X, Guo X, et al. Improving de novo molecule generation by embedding LSTM and attention mechanism in CycleGAN. Front Genet. 2021;12:709500. https://doi.org/10.3389/fgene.2021.709500.

[142]

Wang Y, Zhao H, Sciabola S, et al. cMolGPT: a conditional generative pre-trained transformer for target-specific de novo molecular generation. Molecules. 2023; 28(11):4430. https://doi.org/10.3390/molecules28114430.

[143]

Bagal V, Aggarwal R, Vinod P, et al. MolGPT: molecular generation using a transformer-decoder model. J Chem Inf Model. 2021; 62(9):2064-2076. https://doi.org/10.1021/acs.jcim.1c00600.

[144]

Adilov S.Generative pre-training from molecules. ChemRxiv. 2021. https://doi.org/10.26434/chemrxiv-2021-5fwjd.

[145]

Li Y, Pei J, Lai L. Structure-based de novo drug design using 3D deep generative models. Chem Sci. 2021; 12(41):13664-13675. https://doi.org/10.1039/d1sc04444c.

PDF (5082KB)

410

Accesses

0

Citation

Detail

Sections
Recommended

/