Protein acetylation sites with complex-valued polynomial model

Wenzheng BAO, Bin YANG

PDF(7807 KB)
PDF(7807 KB)
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (3) : 183904. DOI: 10.1007/s11704-023-2640-9
Interdisciplinary
RESEARCH ARTICLE

Protein acetylation sites with complex-valued polynomial model

Author information +
History +

Abstract

Protein acetylation refers to a process of adding acetyl groups (CH3CO-) to lysine residues on protein chains. As one of the most commonly used protein post-translational modifications, lysine acetylation plays an important role in different organisms. In our study, we developed a human-specific method which uses a cascade classifier of complex-valued polynomial model (CVPM), combined with sequence and structural feature descriptors to solve the problem of imbalance between positive and negative samples. Complex-valued gene expression programming and differential evolution are utilized to search the optimal CVPM model. We also made a systematic and comprehensive analysis of the acetylation data and the prediction results. The performances of our proposed method are 79.15% in Sp, 78.17% in Sn, 78.66% in ACC 78.76% in F1, and 0.5733 in MCC, which performs better than other state-of-the-art methods.

Graphical abstract

Keywords

protein acetylation / complex-valued polynomial model / machine learning

Cite this article

Download citation ▾
Wenzheng BAO, Bin YANG. Protein acetylation sites with complex-valued polynomial model. Front. Comput. Sci., 2024, 18(3): 183904 https://doi.org/10.1007/s11704-023-2640-9

Wenzheng Bao received the PhD degree in Computer Science from Tongji University, China in 2018. He is an associate professor, the master’s tutor of School of Information Engineering, Xuzhou University of Technology, China. His research interests include bioinformatics and machine learning

Bin Yang received the PhD degree in Computer Science from Shandong University, China in 2014. He is a professor, the master’s tutor of School of Information Science and Engineering, Zaozhuang University, China. His research interests include bioinformatics and machine learning

References

[1]
Kouzarides T . Chromatin modifications and their function. Cell, 2007, 128( 4): 693–705
[2]
Mann M, Jensen O N . Proteomic analysis of post-translational modifications. Nature Biotechnology, 2003, 21( 3): 255–261
[3]
Lu CT, Lee TY, Chen YJ, et al. “An intelligent system for identifying acetylated lysine on histones and nonhistone proteins,” BioMed research international, 6(528650), 2014.
[4]
Deng W, Wang C, Zhang Y, et al. “GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences,” Scientific reports, 6(39787), 2016.
[5]
Wysocka J, Swigut T, Xiao H, Milne T A, Kwon S Y, Landry J, Kauer M, Tackett A J, Chait B T, Badenhorst P, Wu C, Allis C D . A PHD finger of NURF couples histone H3 lysine 4 trimethylation with chromatin remodelling. Nature, 2006, 442( 7098): 86–90
[6]
Wysocka J, Swigut T, Milne T A, Dou Y, Zhang X, Burlingame A L, Roeder R G, Brivanlou A H, Allis C D . WDR5 associates with histone H3 methylated at K4 and is essential for H3 K4 methylation and vertebrate development. Cell, 2005, 121( 6): 859–872
[7]
Zeng L, Zhou M M . Bromodomain: an acetyl-lysine binding domain. FEBS Letters, 2002, 513( 1): 124–128
[8]
Jenuwein T, Allis C D . Translating the histone code. Science, 2001, 293( 5532): 1074–1080
[9]
Marmorstein R, Roth S Y . Histone acetyltransferases: function, structure, and catalysis. Current Opinion in Genetics & Development, 2001, 11( 2): 155–161
[10]
Bode A M, Dong Z . Post-translational modification of p53 in tumorigenesis. Nature Reviews Cancer, 2004, 4( 10): 793–805
[11]
Walsh G, Jefferis R . Post-translational modifications in the context of therapeutic proteins. Nature Biotechnology, 2006, 24( 10): 1241–1252
[12]
Westermann S, Weber K . Post-translational modifications regulate microtubule function. Nature Reviews Molecular Cell Biology, 2003, 4( 12): 938–948
[13]
Janke C, Bulinski J C . Post-translational regulation of the microtubule cytoskeleton: mechanisms and functions. Nature Reviews Molecular Cell Biology, 2011, 12( 12): 773–786
[14]
Xu Y, Shao X J, Wu L Y, Deng N Y, Chou K C . iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ, 2013, 1: e171
[15]
Qiu W R, Xiao X, Lin W Z, Chou K C . iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. BioMed Research International, 2014, 947416
[16]
Xu Y, Wen X, Shao X J, Deng N Y, Chou K C . iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. International Journal of Molecular Sciences, 2014, 15( 5): 7594–7610
[17]
Xiao X, Ye H X, Liu Z, Jia J H, Chou K C . iROS-gPseKNC: predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget, 2016, 7( 23): 34180–34189
[18]
Tu Y, Lin Y, Hou C, Mao S . Complex-valued networks for automatic modulation classification. IEEE Transactions on Vehicular Technology, 2020, 69( 9): 10085–10089
[19]
Rawat S, Rana K P S, Kumar V . A novel complex-valued convolutional neural network for medical image denoising. Biomedical Signal Processing and Control, 2021, 69: 102859
[20]
Yang B, Bao W . Complex-valued ordinary differential equation modeling for time series identification. IEEE Access, 2019, 7: 41033–41042
[21]
Chen W, Tang H, Ye J, Lin H, Chou K C . iRNA-PseU: identifying RNA pseudouridine sites. Molecular Therapy Nucleic Acids, 2016, 5: e332
[22]
Jia J, Liu Z, Xiao X, Liu B, Chou K C . iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget, 2016, 7( 23): 34558–34570
[23]
Jia J, Zhang L, Liu Z, Xiao X, Chou K C . pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics, 2016, 32( 20): 3133–3141
[24]
Liu Z, Xiao X, Yu D J, Jia J, Qiu W R, Chou K C . pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical-chemical properties. Analytical Biochemistry, 2016, 497: 60–67
[25]
Qiu W R, Sun B Q, Xiao X, Xu Z C, Chou K C . iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics, 2016, 32( 20): 3116–3123
[26]
Qiu W R, Xiao X, Xu Z C, Chou K C . iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget, 2016, 7( 32): 51270–51283
[27]
Feng P, Ding H, Yang H, Chen W, Lin H, Chou K C . iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Molecular Therapy Nucleic Acids, 2017, 7: 155–163
[28]
Bao W, Huang Z, Yuan C A, Huang D S . Pupylation sites prediction with ensemble classification model. International Journal of Data Mining and Bioinformatics, 2017, 18( 2): 91–104
[29]
Qiu W R, Jiang S Y, Xu Z C, Xiao X, Chou K C . iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition. Oncotarget, 2017, 8( 25): 41178–41188
[30]
Qiu W R, Sun B Q, Xiao X, Xu D, Chou K C . iPhos‐PseEvo: identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory. Molecular Informatics, 2017, 36( 5–6): 1600010
[31]
Qiu W R, Sun B Q, Xiao X, Xu Z C, Jia J H, Chou K C . iKcr-PseEns: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics, 2018, 110( 5): 239–246
[32]
Xu Y, Wang Z, Li C, Chou K C . iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC. Medicinal Chemistry, 2017, 13( 6): 544–551
[33]
Bao W, Jiang Z, Huang D S . Novel human microbe-disease association prediction using network consistency projection. BMC Bioinformatics, 2017, 18( S16): 543
[34]
Chou K C . Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry, 1996, 233( 1): 1–14
[35]
Khan Y D, Rasool N, Hussain W, Khan S A, Chou K C . iPhosT-PseAAC: identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Analytical Biochemistry, 2018, 550: 109–116
[36]
Liu B, Liu F, Wang X, Chen J, Fang L, Chou K C . Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research, 2015, 43( W1): W65–W71
[37]
Chou K C . Impacts of bioinformatics to medicinal chemistry. Medicinal Chemistry, 2015, 11( 3): 218–234
[38]
Yuan L F, Ding C, Guo S H, Ding H, Chen W, Lin H . Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. Toxicology in Vitro, 2013, 27( 2): 852–856
[39]
Chen W, Lin H, Chou K C . Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Molecular Biosystems, 2015, 11( 10): 2620–2634
[40]
Cheng X, Zhao S G, Lin W Z, Xiao X, Chou K C . pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites. Bioinformatics, 2017, 33( 22): 3524–3531
[41]
Cheng X, Xiao X, Chou K C . pLoc-mGneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics, 2018, 110( 4): 231–239
[42]
Cheng X, Xiao X, Chou K C . pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics, 2018, 110( 1): 50–58
[43]
Bao W, Chen Y, Wang D . Prediction of protein structure classes with flexible neural tree. Bio-Medical Materials and Engineering, 2014, 24( 6): 3797–3806
[44]
Bao W, Wang D, Chen Y . Classification of protein structure classes on flexible neutral tree. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 14( 5): 1122–1133
[45]
Chen Y, Yang B, Dong J, Abraham A . Time-series forecasting using flexible neural tree model. Information Sciences, 2005, 174( 3–4): 219–235
[46]
Chen Y, Abraham A, Yang B . Hybrid flexible neural-tree-based intrusion detection systems. International Journal of Intelligent Systems, 2007, 22( 4): 337–352
[47]
Chen Y, Abraham A, Yang B . Feature selection and classification using flexible neural tree. Neurocomputing, 2006, 70( 1–3): 305–313

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61902337), Xuzhou Science and Technology Plan Project (KC21047), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Natural Science Fund for Colleges and Universities in Jiangsu Province (No. 19KJB520016) and Young Talents of Science and Technology in Jiangsu, the Key Research Program of the Science Foundation of Shandong Province (ZR2020KE001), the talent project of “Qingtan Scholar” of Zaozhuang University, the PhD research startup foundation of Zaozhuang University (No.2014BS13), and Zaozhuang University Foundation (No. 2015YY02).

RIGHTS & PERMISSIONS

2024 Higher Education Press
AI Summary AI Mindmap
PDF(7807 KB)

Accesses

Citations

Detail

Sections
Recommended

/