Understanding traditional Chinese medicine via statistical learning of expert-specific Electronic Medical Records
Yang Yang, Qi Li, Zhaoyang Liu, Fang Ye, Ke Deng
Understanding traditional Chinese medicine via statistical learning of expert-specific Electronic Medical Records
Background: Traditional Chinese medicine (TCM) has been attracting lots of attentions from various disciplines recently. However, TCM is still mysterious because of its unique philosophy and theoretical thinking. Due to the lack of high quality data, understanding TCM thoroughly faces critical challenges. In this study, we introduce the Zhou Archive, a large-scale database of expert-specific Electronic Medical Records containing information about 73,000+ visits to one TCM doctor for over 35 years. Covering the full spectrum of diagnosis-treatment model behind TCM practice, the archive provides an opportunity to understand TCM from the data-driven perspective.
Methods: Processing the text data in the archive via a series of data processing steps, we transformed the semi-structured EMRs in the archive to a well-structured feature table. Based on the structured feature table obtained, a series of statistical analyses are implemented to learn principles of TCM clinical practice from the archive, including correlation analysis, enrichment analysis, embedding analysis and association pattern discovery.
Results: A structured feature table of 14,000+ features is generated at the end of the proposed data processing procedure, with a feature codebook, a term dictionary and a term-feature map as byproducts. Statistical analysis of the feature table reveals underlying principles about the diagnosis-treatment model of TCM, helping us better understand the TDM practice from a data-driven perspective.
Conclusion: Expert-specific EMRs provide opportunities to understand TCM from the data-driven perspective. Taking advantage of recent progresses on NLP for Chinese, we can process a large number of TCM EMRs efficiently to gain insights via statistical analysis.
TCM / EMRs / data-driven perspective / Chinese text mining / statistical analysis
[1] |
Liu, W. H. (2017) TCM acupuncture-moxibustion: contributing to human health. World J. Acupunct. Moxibustion, 27, 1
CrossRef
Google scholar
|
[2] |
Ahn, A. C., Bennani, T., Freeman, R., Hamdy, O. and Kaptchuk, T. J. (2007) Two styles of acupuncture for treating painful diabetic neuropathy–a pilot randomised control trial. Acupunct. Med., 25, 11–17
CrossRef
Pubmed
Google scholar
|
[3] |
Liu, Z., Sun, F., Zhu, M. and Wang, X. (2004) Effect of acupuncture on insulin resistance in non-insulin dependent diabetes mellitus. J. Acupunt.Tuina Sci., 2, 8–11
CrossRef
Google scholar
|
[4] |
Li, S. and Zhang, B. (2013) Traditional Chinese medicine network pharmacology: theory, methodology and application. Chin. J. Nat. Med., 11, 110–120
CrossRef
Pubmed
Google scholar
|
[5] |
Zhang, B., Wang, X. and Li, S. (2013) An integrative platform of TCM network pharmacology and its application on a herbal formula, Qing-Luo-Yin. Evid. Based Complement. Alternat. Med., 2013, 456747
CrossRef
Pubmed
Google scholar
|
[6] |
Li, S., Zhang, B. and Zhang, N. (2011) Network target for screening synergistic drug combinations with application to traditional Chinese medicine. BMC Syst. Biol., 5, S10
CrossRef
Pubmed
Google scholar
|
[7] |
Lam, W., Bussom, S., Guan, F., Jiang, Z., Zhang, W., Gullen, E. A., Liu, S. H. and Cheng, Y. C. (2010) The four-herb Chinese medicine PHY906 reduces chemotherapy-induced gastrointestinal toxicity. Sci. Transl. Med., 2, 45ra59
CrossRef
Pubmed
Google scholar
|
[8] |
Xiang, Y. Z., Shang, H. C., Gao, X. M. and Zhang, B. L. (2008) A comparison of the ancient use of ginseng in traditional Chinese medicine with modern pharmacological experiments and clinical trials. Phytother. Res., 22, 851–858
CrossRef
Pubmed
Google scholar
|
[9] |
Jian, J. and Wu, Z. (2004) Influences of traditional Chinese medicine on non-specific immunity of Jian Carp (Cyprinus carpio var. Jian). Fish Shellfish Immunol., 16, 185–191
CrossRef
Pubmed
Google scholar
|
[10] |
Bick, R. J., Poindexter, B. J., Sweney, R. R. and Dasgupta, A. (2002) Effects of Chan Su, a traditional Chinese medicine, on the calcium transients of isolated cardiomyocytes: cardiotoxicity due to more than Na, K-ATPase blocking. Life Sci., 72, 699–709
CrossRef
Pubmed
Google scholar
|
[11] |
Iwasaki, K., Satoh-Nakagawa, T., Maruyama, M., Monma, Y., Nemoto, M., Tomita, N., Tanji, H., Fujiwara, H., Seki, T., Fujii, M.,
CrossRef
Pubmed
Google scholar
|
[12] |
Deng, K., Liu, D., Gao, S. and Geng, Z. (2005) Structural learning of graphical models and its applications to traditional Chinese medicine. Lect. Notes Comput. Sci., 3614, 362–367
CrossRef
Google scholar
|
[13] |
Feng, Y., Wu, Z., Zhou, X., Zhou, Z. and Fan, W. (2006) Knowledge discovery in traditional Chinese medicine: state of the art and perspectives. Artif. Intell. Med., 38, 219–236
CrossRef
Pubmed
Google scholar
|
[14] |
Yang, H., Chen, J., Tang, S., Li, Z., Zhen, Y., Huang, L. and Yi, J. (2009) New drug R&D of traditional Chinese medicine: role of data mining approaches. J. Biol. Syst., 17, 329–347
CrossRef
Google scholar
|
[15] |
Wang, Q. and Zhu, Y. (2009) Epidemiological investigation of constitutional types of Chinese medicine in general population: based on 21,948 epidemiological investigation data of nine provinces in China. Zhonghua Zhongyiyao Zazhi (in Chinese), 24, 7–12
|
[16] |
Xue, R., Fang, Z., Zhang, M., Yi, Z., Wen, C. and Shi, T. (2013) TCMID: traditional Chinese Medicine integrative database for herb molecular mechanism analysis. Nucleic Acids Res., 41, D1089–D1095
CrossRef
Pubmed
Google scholar
|
[17] |
Liu, B., Zhou, X., Wang, Y., Hu, J., He, L., Zhang, R., Chen, S. and Guo, Y. (2012) Data processing and analysis in real-world traditional Chinese medicine clinical data: challenges and approaches. Stat. Med., 31, 653–660
CrossRef
Pubmed
Google scholar
|
[18] |
Wang, X., Qu, H., Liu, P. and Cheng, Y. (2004) A self-learning expert system for diagnosis in traditional Chinese medicine. Expert Syst. Appl., 26, 557–566
CrossRef
Google scholar
|
[19] |
Yu, S., Ma, Y., Gronsbell, J., Cai, T., Ananthakrishnan, A. N., Gainer, V. S., Churchill, S. E., Szolovits, P., Murphy, S. N., Kohane, I. S.,
CrossRef
Pubmed
Google scholar
|
[20] |
Roden, D. M., Pulley, J. M., Basford, M. A., Bernard, G. R., Clayton, E. W., Balser, J. R. and Masys, D. R. (2008) Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther., 84, 362–369
CrossRef
Pubmed
Google scholar
|
[21] |
Blair, D. R., Lyttle, C. S., Mortensen, J. M., Bearden, C. F., Jensen, A. B., Khiabanian, H., Melamed, R., Rabadan, R., Bernstam, E. V., Brunak, S.,
CrossRef
Pubmed
Google scholar
|
[22] |
Rotmensch, M., Halpern, Y., Tlimat, A., Horng, S. and Sontag, D. (2017) Learning a health knowledge graph from electronic medical records. Sci. Rep., 7, 5994
CrossRef
Pubmed
Google scholar
|
[23] |
Blecker, S., Katz, S. D., Horwitz, L. I., Kuperman, G., Park, H., Gold, A. and Sontag, D. (2016) Comparison of approaches for heart failure case identification from electronic health record data. JAMA Cardiol., 1, 1014–1020
CrossRef
Pubmed
Google scholar
|
[24] |
Denny, J. C., Bastarache, L., Ritchie, M. D., Carroll, R. J., Zink, R., Mosley, J. D., Field, J. R., Pulley, J. M., Ramirez, A. H., Bowton, E.,
CrossRef
Pubmed
Google scholar
|
[25] |
Doshi-Velez, F., Ge, Y. and Kohane, I. (2014) Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics, 133, e54–e63
CrossRef
Pubmed
Google scholar
|
[26] |
Chang, P. C., Tseng, H., Dan, J. and Manning, C. D. (2009) Discriminative reordering with Chinese grammatical relations features. In: SSST’ 09 Proceedings of the 3rd Workshop on Syntax and Structure in Statistical Translation. pp. 51–59
|
[27] |
Levy, R. and Manning, C. D. (2003) Is it harder to parse Chinese, or the Chinese Treebank? In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 1, 439–446
|
[28] |
Che, W., Li, Z. and Liu, T. (2010) LTP: A Chinese language technology platform. In: COLING’10 Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pp. 13–16
|
[29] |
Sun, M., Chen, X., Zhang, K., Guo, Z., Ma, J. and Liu, Z. (2016) THULAC: An efficient lexical analyzer for Chinese
|
[30] |
Li, Z. and Sun, M. (2009) Punctuation as implicit annotations for Chinese word segmentation. Comput. Linguist., 35, 505–512
CrossRef
Google scholar
|
[31] |
Deng, K., Bol, P. K., Li, K. J. and Liu, J. S. (2016) On the unsupervised analysis of domain-specific Chinese texts. Proc. Natl. Acad. Sci. USA, 113, 6154–6159
CrossRef
Pubmed
Google scholar
|
[32] |
Levy, O. and Goldberg, Y. (2014) Neural word embedding as implicit matrix factorization. In: Adv. Neural Inf. Process. Syst. Conference
|
[33] |
Maaten, L. and Hinton, G. E. (2008) Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res., 9, 2579–2605
|
[34] |
Borg, I. and Groenen, P. (1987) Modern multidimensional scaling: theory and applications. J. Educ. Meas., 40, 277–280
|
[35] |
Agrawal, R., Imielinski, T. and Swami, A. (1993) Mining association rules between sets of items in large databases. In: SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 207–216
|
[36] |
Agrawal, R. and Srikant, R. (1994) Fast algorithms for mining association rules. In: Readings in database systems (3rd ed.), pp. 580–592. San Francisco: Morgan Kaufmann Publishers Inc.
|
[37] |
He, P., Deng, K., Liu, Z., Liu, D., Liu, J. S. and Geng, Z. (2012) Discovering herbal functional groups of traditional Chinese medicine. Stat. Med., 31, 636–642
CrossRef
Pubmed
Google scholar
|
[38] |
Deng, K., Geng, Z. and Liu, J. S. (2014) Association pattern discovery via theme dictionary models. J. R. Stat. Soc. B, 76, 319–347
CrossRef
Google scholar
|
/
〈 | 〉 |