Structured Learning in Biological Domain

Canh Hao Nguyen

doi:10.1007/s11518-020-5461-5

Journal of Systems Science and Systems Engineering ›› 2020, Vol. 29 ›› Issue (4) :440 -453. DOI: 10.1007/s11518-020-5461-5

Article

Structured Learning in Biological Domain

Canh Hao Nguyen ¹^,^a

Author information +

History +

PDF

Abstract

Biological domain has been blessed with more and more data from biotechnologies as well as data integration tools. In the renaissance of machine learning and artificial intelligence, there is so much promise of data-driven biological knowledge discovery. However, it is not straight forward due to the complexity of the domain knowledge hidden in the data. At any level, be it atoms, molecules, cells or organisms, there are rich interdependencies among biological components. Machine learning approaches in this domain usually involves analyzing interdependency structures encoded in graphs and related formalisms. In this report, we review our work in developing new Machine Learning methods for these applications with improved performances in comparison with state-of-the-art methods. We show how the networks among biological components can be used to predict properties.

Keywords

Structured learning / sparse modeling / systems biology / deep learning

Cite this article

Download citation ▾

Canh Hao Nguyen. Structured Learning in Biological Domain. Journal of Systems Science and Systems Engineering, 2020, 29(4): 440-453 DOI:10.1007/s11518-020-5461-5

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Bakir G, Hofmann T, Schoelkopf B, Smola AJ, Taskar B, Vishwanathan SVN. Predicting Structured Data, 2006, Cambridge, MA: MIT Press

[2]	Ben-Hur A, Noble WS. Kernel methods for predicting protein- protein interactions. Bioinformatics, 2005, 21(1): 38-46.

[3]	Brouard C, et al. Fast metabolite identification with input output kernel regression. Bioinformatics, 2016, 32: i28-i36.

[4]	de Hoffmann E, Stroobant V. Mass Spectrometry, Principles and Applications, 2007, 3ed

[5]	Duvenaud DK, et al. Convolutional networks on graphs for learning molecular fingerprints. Neural Information Processing Systems, 2015, Montreal, Canada: Curran Associates, Inc. 2224-2232 2

[6]	Gama-Castro S, et al. RegulonDB version 7.0: Transcriptional regulation of escherichia coli k-12 integrated within genetic sensory response units (gensor units). Nucleic Acids Research, 2011, 39(1): 98-105.

[7]	Getoor L, Taskar B. Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning), 2007, Cambridge, MA: MIT Press.

[8]	Gilmer J et al. (2017). Neural message passing for quantum chemistry. Proceedings of the 34th International Conference on Machine Learning(PMLR): 1263–1272. Sydney, Australia.

[9]	Gretton A, et al. Measuring statistical dependence with Hilbert-Schmidt norms. Proceedings of the 16th International Conference on Algorithmic Learning Theory(ALT05), 2005, Berlin, Heidelberg: Springer-Verlag 63-77.

[10]	Griffiths T, Ghahramani Z. Infinite latent feature models and the Indian buffet process. Advances in Neural Information Processing Systems, 2005, Cambridge, MA: MIT Press

[11]	Imre T, et al. Mass spectrometric and linear discriminant analysis of n-glycans of human serum alpha-1-acid glycoprotein in cancer patients and healthy individuals. Journal of Proteomics, 2008, 71: 186-197.

[12]	Jebara T, et al. Probability product kernels. Journal of Machine Learning Research, 2004, 5: 819-844.

[13]	Kanehisa M, Araki, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi T. KEGG for linking genomes to life and the environment. Nucleic Acids Research, 2008, 36(1): 480-484.

[14]	Kato T, Tsuda K, Asai K. Selective integration of multiple biological data for supervised network inference. Bioinformatics, 2005, 21(10): 2488-2495.

[15]	Kitano H. Systems biology: A brief overview. Science, 2002, 295(5560): 1662-1664.

[16]	Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 2007, 58: 1019-1031.

[17]	Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P. Comparative assessment of large-scale data sets of protein? Protein interactions. Nature, 2002, 417(6887): 399-403.

[18]	Nguyen CH, Mamitsuka H. Discriminative graph embedding for label propagation. IEEE Transactions on Neural Networks, 2011, 22(9): 1395-1405.

[19]	Nguyen CH, Mamitsuka H. Latent feature kernels for link prediction on sparse graphs. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(11): 1793-1804.

[20]

Nguyen DH, Nguyen CH, Mamitsuka H (2018). SIMPLE: Sparse interaction model over peaks of MoLEcules for fast, interpretable metabolite identification from tandem mass spectra. Bioinformatics: Proceedings of the 26th International Conference on Intelligent Systems for Molecular Biology (ISMB 2018): i323-i332.

[21]

Nguyen DH, Nguyen CH, Mamitsuka, H (2019). ADAPTIVE: learning Data-dependent, concise molecular VEctors for fast, accurate metabolite identification from tandem mass spectra. Bioinformatics 35: Proceedings of the 26th International Conference on Intelligent Systems for Molecular Biology (ISMB 2019): i164-i172.

[22]	Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie FU, Eisenberg D. The database of interacting proteins: 2004 update. Nucleic Acids Research, 2004, 32(1): 449-451.

[23]	Scheubert K, et al. Computational mass spectrometry for small molecules. Journal of Cheminformatics, 2013, 5: 12.

[24]	Smola AJ, Kondor RI (2003). Kernels and regularization on graphs. In Proceedings of Conference on Learning Theory: 144–158.

[25]	Srebro N, Rennie JDM, Jaakola TS. Maximum-margin matrix factorization. Advances in Neural Information Processing Systems, 2005, Cambridge, MA: MIT Press 1329-1336 17

[26]	Tsuda K, Noble WS. Learning kernels from biological networks by maximizing entropy. Bioinformatics, 2004, 20(1): 326-333.

[27]	Wishart D S. Current progress in computational metabolomics. Briefings in Bioinformatics, 2007, 8: 279-293.

[28]	Yamanishi Y. Supervised bipartite graph inference. Advances in Neural Information Processing Systems, 2008, Cambridge, MA: MIT Press 1841-1848.

[29]	Zhu X, Ghahramani Z, Lafferty J (2003). Semi-supervised learning using Gaussian fields and harmonic functions. The 20th International Conference on Machine Learning (ICML): 912–919.