1 Introduction
Researchers in the field of data science have recently advocated the value of data and the significance of data mining. In particular,
Nature and
Science have published special issues regarding big data [
1,
2]. These special issues demonstrate the universality of big data in all scientific fields and the requirements of data processing from a new and high perspective that ushers in the era of big data. Similarly, big data is in demand in the field of traditional Chinese medicine (TCM). Specifically, big data is required by the clinical research-sharing systems established at the clinical research base of TCM in China [
3]. This base was organized by the China Academy of Chinese Medical Science. Moreover, this kind of data is necessary in the electronization of hospital information systems and in Internet advancement [
4].
Related theory and technical support should be provided to develop the national data center of TCM [
5]. The actual clinical research paradigm of TCM [
6] has generated much attention because it responds to current clinical research dilemmas and predicts the development trends of TCM and of medical science as a whole. This paradigm is published by the journal
Traditional Chinese Medicine.
The paradigm of real-world clinical scientific research is “data-oriented” and “combines medical practice and scientific computation”. Therefore, data computing is necessary in TCM. Big data collection, management, analysis, and demonstration have been applied successfully in the field of TCM, along with techniques and methods such as complexity [
7], intelligence [
8], data [
9], and computing sciences. Thus, this study systematically illustrates the concept, connotation, value, research fields, and calculation methods of scientific computing in TCM based on these methods and as guided by the real-world clinical research paradigm.
2 Discipline of scientific computing in TCM
The real-world clinical research paradigm of TCM is human-centered, data-oriented, and problem-driven. It alternates medical practice with scientific computation and integrates clinical practice with scientific research [
6]. This paradigm is derived from the typical model of TCM research and combines concepts, theories, and techniques, including modern clinical epidemiology, evidence-based medicine, statistics, and complexity, intelligence, data, and information sciences. A “data-oriented” perspective is both the key in and is the premise of the real-world clinical research paradigm of science and technology. Therefore, all kinds of real-world clinical diagnosis and treatment information must be collected comprehensively and converted into ancient, modern literature, future clinical, and experimental biology data, such as those regarding genomics, proteomics, and metabolomics and those related to human body health during daily living according to the Internet.
The generalization and consolidation of these data opens up a new TCM clinical research prospect that is supported by big data. The “data-oriented” perspective signals the inevitable transition to TCM clinical research, to the organic combination of western medicine and TCM, to the key advantages of complementary technology, and to the premise of the real-world clinical research paradigm of TCM.
This paradigm mainly advocates the “alternation of medical practice with scientific computation”. It is an essential and contemporary “clinic-to-clinic” approach. Thus, scientific computation is a useful tool in the big data era. It supersedes humans in terms of remembering and analyzing the data regarding clinical medical practice to a certain extent. Furthermore, it can determine laws and knowledge accurately and comprehensively. However, the computation results should be verified clinically and practically; as a result, it should be alternated with medical practice.
The “data-oriented” premise and key technology and the main form of “the alternation of medical practice with scientific computation” highlights the demand for data, information, intelligence, complexity, and computing sciences in real-world clinical scientific TCM research. To address problems related to data acquisition, analysis, management, and validation, TCM disciplines must be related to TCM characteristics. Rather than applying these techniques simply and separately, an in-discipline subject is necessary for two reasons: first, a clear understanding of the characteristics of the data regarding TCM theory and on technical problems; and second, in-depth research in the fields of data, information, intelligence, and complexity science and technology. Therefore, applicable theory and technology can be developed and proposed based on data analysis demand to conduct real-world clinical scientific research on TCM.
The scientific computation of TCM caters to these requirements and can be applied in two ways: in the clinical research of complexity, intelligence, data, and information sciences, and incorporated into related computing subjects on conceptualization, theory, and knowledge. Scientific computing can also be integrated into specific TCM techniques that in turn enhance the efficiency and the level of clinical research. The former is limited to the technological aspect but is in development, whereas the latter is nascent and interdisciplinary.
Scientific computation is the basis for data mining analysis with respect to the collection and structured entry of symptoms in TCM according to the “disease-symptom-syndrome-prescription-effect” framework. These symptoms are investigated through image processing and pattern recognition technology to acquire information, such as the color of the tongue and the texture, color, and luster of the face [
10,
11]. Signal processing, voice analysis, and pattern recognition technology are used to determine symptoms related to auscultation and scent, such as voice, coughs, breaths, and body odor [
12,
13]. Machine learning techniques are used to optimize the inquiry scale [
14]. Moreover, vibration signal analysis and pattern recognition techniques are applied to obtain pulse rate, rhythm, force, pattern, and type [
15]. In data mining, feature selection and classification modeling can optimize the information on numerous clinical symptoms and obtain the optimal subset. The transition from symptoms to syndrome can then be modeled and simulated [
16]. Complex network and association rules can be applied in core prescription mining and its integration or removal [
17]. Furthermore, drug effects can be used to detect significant interactions in TCM prescriptions for patients [
18]. These processes apply current and intelligent data processing technology to analyze the information on Chinese medicine.
Related information technology studies combined with TCM concepts and data characteristics are lacking; therefore,
Yin-Yang and theFive Elements theory is applied to machine learning in the form of the Bayesian Yin-Yang Intelligence System [
19]. Multi-label learning is proposed for the diagnosis of mixed TCM syndromes because it is highly accurate [
20].
The scientific computation of TCM emphasizes the scientific computation of the real-world clinical research paradigm of TCM that is related to TCM informatics and engineering in clinical research. TCM informatics is an emergent discipline from studies on Chinese medicine and information science based on the movement laws of dynamic phenomena. It considers overall and dynamic criteria and uses computer and network technology. Moreover, it examines the information phenomenon in the TCM field and information law to exhibit, manage, analyze, simulate, and disseminate TCM information. In the process, data on essential internal relations are determined, converted, and shared [
21]. The scientific computation of TCM overlaps with TCM in the following ways. First, data are conceptually broader than information as computing centers. Therefore, TCM computation focuses on clinical data. Second, TCM is directed by the theories of complexity, intelligence, data, and computing sciences. It concentrates on the collection, analysis, and mining of clinical data to detect and establish rules, as well as a system of individualized clinical diagnosis and treatment.
TCM engineering employs the theories, methods, and techniques of modern natural and engineering sciences synthetically under the guidance of TCM theory. In theoretical systems, experimental research, clinical care, education, scientific research, and production and management decision-making, the study of TCM engineering is exhaustive in all of the following aspects: interdisciplinary, multi-method, multi-approach, multi-tool, and multi-perspective (macro and micro). The promotion of TCM modernization, industrialization, and internationalization can address all kinds of problems in the construction of versatile technology platforms, including those related to theories, techniques, and practices [
22]. Furthermore, it contributes to life science and human health. TCM engineering focuses more on the engineering perspective and the scientific computation of TCM analyses than traditional computation science. It also mines the theories and techniques of TCM clinical diagnosis and treatment systems in the view of complexity and data sciences.
Much data has been generated in various scientific fields in response to the urgent demand for scientific computation in the big data age. Systems and data characteristics in biological and social studies and biological and social computation were established [
23] successively; therefore, the theories and techniques of data science are integrated into the analysis of biological and social data. The mechanism of biological and social systems improves the efficiency and the results of the computations. The scientific computation of TCM thus thrives by applying commonly advantageous technology as a result of the advancement of these disciplines. In the process, clinical research on TCM develops further.
3 Theoretical framework of the scientific computation of TCM
The scientific computation of TCM involves humans, Chinese medicine, and relative medical knowledge, as in biological, social, and complex systems. Consequently, the behavior of the TCM system as a whole essentially cannot be determined through the independent analysis of individual parts given comparatively limited resources. Thus, this behavior should be predicted over a wide range of time or space. As a result, the scientific computation of TCM must be examined based on holism rather than reductionism. The main characteristics of TCM computation are comprehensive big data samples and correlations. Notably: (1) a holistic view of TCM is important; and (2) optimal and exclusive optimal solutions are generally non-existent. Therefore, we should accept any effective solution. Overall, we aim to obtain effective scientific computation solutions for TCM using digital human body systems, objective collection, expert systems, knowledge engineering, parallel systems, complex system science, and data mining as guided by the principles of “clinic-to-clinic” and “continuous exploration and improvement,” in reference to social computing theory and technology [
24]. These aims are illustrated in the theoretical framework of the scientific computation of TCM in Fig. 1.
3.1 Expert and parallel systems and knowledge engineering
The TCM system is extremely complex. Moreover, the proposed real-world clinical research paradigm is “human-centered” and involves patients and doctors, who are both independent and interactive. Hence, system behavior cannot be effectively described by methods and models. In this respect, the relationship between doctor and patient is modeled by digital human body modeling, objective collection, expert systems, and knowledge engineering both individually and in combination.
The digital human body system determines substance composition and generates the mechanical, mathematical, and information models of the human body system [
25]. The objective collection system simulates the acquisition of four TCM diagnostic symptoms using image, sound, smell, and pressure sensors [
26]. On this basis, knowledge rule, reasoning, and the artificial neural network can be used to build expert accessorial diagnosis and treatment systems on medical disease, chronic hepatitis, and sub-health [
27] and to establish TCM knowledge engineering research and systems [
28].
The parallel TCM system consists of an expert system, knowledge engineering, and a practical system [
24]. Clinical scientific research can be managed through the collaboration and comparison of artificial and practical diagnoses with treatment behavior in the parallel system. The parallel system of TCM computation fundamentally compares and analyzes practical and artificial systems based on their connection to provide a “reference” and to “predict” their future status. Accordingly, this parallel system adjusts the control approach to obtain effective solutions to complex problems or to solve issues regarding the implementation of learning and training objectives.
3.2 Analysis and mining of big data
At present, most of the scientific computations of TCM utilize passive observation and statistics. Therefore, research objects are occasionally difficult to experiment on initiatively and “repetitively”. Tests often yield results and conclusions that are highly ungeneralized because current controlled and randomized trials consider uncontrollable and unobservable factors. Thus, analytic reasoning methods cannot analyze TCM computation problems. Moreover, data mining, machine learning, and pattern recognition are important methods in the analysis and mining of big data.
The methods of rule and knowledge acquisition from expert systems, knowledge engineering, and traditional expert inquiry are highly infeasible, as mentioned previously. Therefore, the study of feature selection, classification, clustering, rule extraction, and complex network technology is particularly significant for clinical tasks such as core optimization and syndrome-effect analysis.
TCM is a discipline directed by the real-world TCM clinical research paradigm. It is based on complexity, intelligence, data, and computing sciences. In the theoretical frame mentioned previously, TCM computation methods incorporate the different advantages of various subjects, including:
(1) Mathematical statistics is the most fundamental computation method and was vital in previous TCM studies. The bionic optimization method optimizes techniques based on the biological transportation mechanism, which can accomplish tasks such as symptom, core, and path optimization.
(2) Data mining techniques effectively derive underlying knowledge from big data. The feature extraction and selection method extracts the essential features from these data and can uncover the knowledge and rules behind them. This method can be used to explore core symptoms as well. The classification modeling method is not only used to simulate clinical concepts, but it also determines the diagnosis experience and knowledge of doctors. The association rules method can be used to determine the relationships among symptoms, syndromes, and prescription drugs in mining and in diagnosis. The disease-symptom-syndrome-prescription-effect relationship is most relevant.
(3) TCM is a complicated system; therefore the complex network method can be utilized to study the complex disease-symptom-syndrome-prescription-effect relationship. This relationship can be either direct or nonlinear. The method of system dynamics can simulate the attack processes of diseases to study the occurrence and development of different diseases.
(4) The system that simulates diagnosis process can be constructed using expert systems and knowledge engineering methods to determine medical treatment mechanisms.
(5) The integration method incorporates the methods described above to individualize TCM diagnosis and treatment.
4 Application fields of the scientific computation of TCM
The scientific computation of TCM combines scientific computation with the field of Chinese medicine following biological and social computation. TCM is studied and applied by extracting quintessential information from complex and redundant data on the basis of Chinese medicine and complexity, intelligence, data and computing sciences. Therefore, the interaction between internal and external human factors and the relationships among etiology, pathogenesis, disease location and complex states must be explored through the treatment perspective. Some of the aspects that require investigation are as follows.
4.1 Medical equipment for symptom determination
Information can be acquired from multiple dimensions with the aid of wearable computing devices, including time, location, environment, and physiological and motion signals. Big data regarding the human body can be collected through the continuous and long-term monitoring of multi-dimensional signals. This health-related information is valuable and has a promising market outlook. The four diagnostic instruments in TCM measure four basic components: inspection, auscultation, inquiry, and palpation [
10,
11].
4.2 Structured knowledge system for electronic health recording
Clinical data are vital firsthand information in TCM clinical research. Highly experienced TCM practitioners incorporate much empirical knowledge into clinical data to generate remarkable therapeutic effects [
3,
29].
4.3 Analysis of interactions among symptoms
The relationships between symptoms must be determined to explore the subset of core symptoms. Most of the existing Chinese medical research based on machine learning does not consider the correlation between medical connotations and the symptoms described by the data. However, much of the TCM data on symptoms and syndromes are clearly defined medically. Thus, the interactions between symptoms and syndromes must be studied along with the TCM concepts behind these associations [
14,
30].
4.4 Analysis of the correlation between symptoms and syndromes (disease)
Correlations can be analyzed in diagnosis modeling. Numerous clinical cases may contain one of various syndromes in the context of practical TCM data mining. This task can therefore be regarded as a multi-label classification problem in machine learning. However, existing solutions related to multi-label classification ignore the problem of unbalanced data and label inconsistency [
16,
20].
4.5 Analysis of the core prescription and the incorporation and removal of patent medicine
Effective cures to diseases must be developed, and the incorporation and elimination of these cures must be examined as well. In line with these objectives, valuable information must be extracted from both historical and recent literature [
31].
4.6 Analysis of the relationship between prescription and efficacy
The matching of prescription and efficacy assessments should be analyzed. The clinical efficacy of TCM has been confirmed, and the utility of different prescriptions is a hot topic in the progress of TCM research. Lu
et al. examined nearly 6000 real cases of a pandemic and determined the effects of prescriptions from hospitals. TCM treated the fever during the pandemic more effectively than western medicine. Moreover, combinations of TCM and western medicine were rarely curative [
32].
4.7 Analysis of data obtained from biomedical instruments
The
Mars500 study is a psychological and physiological isolation experiment conducted by Russia, the European Space Agency, and China in preparation for a manned spaceflight to the planet Mars in the unspecified future. This study yields valuable psychological and medical data on the effects of the planned long-term mission in deep space. The experiment investigated the technical challenges, work capability of the crew, and management of long-distance spaceflight. The main concerns during the Martian flight are health problems, the isolated conditions, and the hermetically closed and confined environment. Li
et al. described the regular pattern of syndromes using a statistical method and presented machine learning methods to mine the relationship between computerized symptoms and syndromes differentiated by experts. They screened out 10 key factors that are essential to syndrome differentiation in TCM using feature selection. The average precision of multi-label classification model reached 80% in this study [
33].
4.8 Application of bioinformatics technology in TCM
Lu
et al. determined the curative effect of the medicine etretinate based on a comparison of the treatment cycles of healthy and psoriasis-stricken patients using metabolomics technology. They noted that this technology can be used to analyze the effect of Chinese medicine [
34].
5 Conclusions and perspectives
This study comprehensively illustrated the key technology of “the real-world clinical research paradigm of TCM” in consideration of big data characteristics. This paradigm is “data-oriented” and is mainly presented as “the alternation of medical and scientific practices with computation.” Furthermore, this study proposes the research directions of the scientific computation of TCM. These directions strengthen the research on TCM problems and promote cooperation between TCM clinical researchers and information experts. Therefore, the main content of the current research and the methods of different levels follow these directions.
The scientific computation of TCM can be distinguished from other scientific computations based on its characteristics. Thus, we establish a new scientific computation branch for TCM called TCM computation or Chinese medical computation. The study of TCM computation is based on a certain foundation; however, we must consider the development of the real-world clinical research paradigm. Scientific computation is used to develop solutions and to advance in-depth studies on clinical research processes. Thus, computation methods should be developed in line with the characteristics of Chinese medical data. Big data accumulate unexpectedly; thus, in-depth research is a popular technique that utilizes these data. Furthermore, TCM data analysis has special requirements; therefore, novel algorithms should be developed or existing in-depth research algorithms should be revised for application to TCM data. The research direction is expected to facilitate the implementation and advancement of the real-world clinical research paradigm to enhance TCM and to contribute to the development of the medical field.
Higher Education Press and Springer-Verlag Berlin Heidelberg