Introduction
Traditional Chinese medicine (TCM), with its particular theory and specific diagnosis and treatment methods, developed from long-term clinical practices [
1,
2]. Compared with Western medicine, which is based on biological mechanisms and pathophysiological phenotypes, TCM usually focuses on functional clinical phenotype investigation and is considered an experience-based discipline. Under the premise of inheriting TCM properties and advantages [
3,
4], the use of data analysis and machine learning to provide scientific explanation is an urgent concern in TCM research. Various studies have recently used data mining methods to extract medical knowledge from clinical data [
5,
6]. Zhou
et al. [
7] built a TCM clinical data warehouse for medical discovery and clinical decision support. Some research methods, such as latent tree model [
8] and multidimensional reduction method [
9], have been proposed to detect specific TCM knowledge (e.g., latent symptom clusters for syndrome) from clinical data.
Herb prescription, which consists of multiple herb ingredients with complicated combination regularities, is one of the main therapeutic solutions for clinical management [
10,
11]. Therefore, detecting herb regularities from clinical data is an important topic for TCM knowledge discovery.
Yan
et al. [
4] designed a clinical protocol to evaluate the core effective drug patterns by investigating the treatment information of prestigious Chinese medicine clinicians. By analyzing the frequency of using single herbs and herbal formulas, Chen
et al. [
12] investigated the prescription patterns of Chinese herbal products for patients with sleep disorder and major depressive disorder. Cao [
13] developed a method based on hierarchical clustering of herb-pairs efficacies to determine common prescription patterns and discussed the common patterns that are mathematically coincident with blood–Qi theory. A parameter-free algorithm was proposed to detect herb–herb interaction [
14] and help us understand the effectiveness of herb combinations. Chen
et al. [
15] developed a stacking multivariate linear regression technique to predict the bioactivity capacity of herbal medicines from their chromatographic fingerprints. Chen
et al. [
16] proposed tripartite graph mining approach to detect symptom–herb relationships.
However, the complicated personalized manifestations of real-world patients make the detection of effective TCM treatments for specific disease conditions difficult. In this study, a multistage analysis method that integrates propensity case matching, complex network analysis, and herb set enrichment analysis (HSEA) was proposed to identify effective herb prescriptions for insomnia treatment (Fig. 1). First, propensity case matching [
17] was applied to compare effective and ineffective cases and eliminate the effects of confounding factors. Then, core herb network extraction (CHNE) [
18] and HSEA [
19,
20] were combined to identify core effective herb prescriptions. The results of herb set enrichment and prescription effectiveness ratio comparison revealed that
Gui Pi Wan, one of the most commonly used standardized prescriptions for insomnia [
21], may have a positive effect on curing insomnia in these patients. A method based on mutual information [
22] is used to identify strong herb–symptom relationships and, consequently, in investigating the indications of the discovered effective herb prescriptions.
Materials and methods
Clinical data set of insomnia
Data on the diagnoses and treatments of 955 insomnia cases were collected from real-world observational studies. Each case had one or more (up to four) clinical visits, for a total visit number of 2049. Each clinical record contains symptom, diagnose (i.e., syndrome) and herb prescription (including 536 distinct herbs). There are 842 effective cases and 113 ineffective cases in the dataset (Table 1).
Propensity case matching
In this study, the demographic characteristics and symptoms of patients were significantly different, resulting in poor comparability of the effective and ineffective samples. To eliminate sample bias, confounding factors in the two sample groups were properly balanced. Propensity case matching was used [
17]. In a control experiment, this method effectively balanced the distribution of observed covariates to control confounding bias. There are a large number of confounding factors (except for gender and age, 955 cases contain 1128 distinct symptoms). The amount of confounding factors increased when matching, and less samples were matched. Therefore, we matched as many control samples as possible while missing confounding factors as little as possible.
Core herb network extraction
CHNE [
18], which has been widely applied in prescription compatibility analysis in the TCM field, was used to extract the core herb compatibilities of clinical prescriptions. CHNE defined
prescriptions containing
herbs and built an undirected herb network of
nodes, where a link between herbs
and
indicated that the two herbs appeared in the same prescription. The weight
of the link between herbs
and
represented the co-occurrence in
prescriptions, while
represented the occurrence of herb
in all prescriptions. The power law distribution of weight was investigated to detect the highly frequent herb combination sub-networks.
Herb set enrichment analysis
On the basis of gene set enrichment analysis [
19,
20], we used HSEA to quantitatively determine the differences among prescription herbs in effectiveness score. HSEA focused on the overall herb set effectiveness and was not solely determined by a single herb. HSEA aimed to assess whether the herb set distribution was uniform or a ranked herb sequence was present. This method included four steps.
(1) prescriptions (herb sets) withherbs, which were ranked to form by the correlation , computed by t-test (assuming the test herb set is, withherbs).
(2) The Enrichment score of herb setwas computed. reflects the super-express proportion ofin the top sequence. From head to tail, if the herb of the sequence belongs to, then will be added; if not, will be subtracted.
Here, was used to correct, avoiding largevalues, when most herbs ofare in the middle of thesequence.
(3) The significance of an observedwas estimated through a replacement test. The original effectiveness was assigned to samples, reordering herbs, and computingagain. This step was repeated 1000 times to obtain anhistogram. Finally, the P-value for was estimated.
(4) was computed for each herb set. A multiple hypothesis test was used to correct the significance for each herb set. Then, FDR was computed while controlling the ratio of false positives.
We obtained the enriched herb sets with small P-values. Finally, HSEA figured out the herb prescription with high significant effectiveness rate.
Detection of herb–symptom relationships
Symptoms constitute an important factor for patient diagnosis during clinical diagnosis and treatment in TCM. Therefore, detecting strong herb–symptom relationships is essential for clinical prescription studies. Based on the evaluation standard of information theory, conditional mutual information can well reflect the relevance of two variables under certain conditions [
22]. The present study aimed to identify relevant herb–symptom associations regarding insomnia that could be regarded as supplements to effective prescriptions for individualized patients. In the correlation analysis of herb and symptom, effectiveness was set as a condition and added to mutual information concerning herb–symptom relationships. Therefore, effectiveness-based mutual information for herb–symptom
was defined and used to reflect the herb–symptom relationship under effectiveness constraint, with
and
representing the herb and symptom respectively.
and represent the effective and ineffective samples respectively. According to information theory, a large value for herb and symptom represents a strong correlation between them. Therefore, a strong correlation between herb–symptom relationships with positive effectiveness could be found.
Results
Propensity case matching
First, 955 cases were selected to match the control cases. Distinct results were obtained for case matching using different number of symptoms. Second, according to symptom frequency in the prescription, symptoms were filtered, and the number of matched cases was compared (Table 2). Table 2 shows that the selection of few symptoms corresponded to a higher number of matched control samples. To ensure sufficient research cases, we selected the 10 symptoms (Table 3) whose frequencies were more than 170, in which the typical symptoms of insomnia, such as, difficulty falling asleep, dreaminess, ease of waking up, and poor spiritual fitness are included. These symptoms, together with gender and age, were considered confounding factors. Finally, 107 effective and 107 ineffective samples were matched. The propensity scores of all samples and matched samples are shown in Fig. 2.
A total of 201 effective and 269 ineffective prescriptions were found in the 214 matched cases. The effective prescriptions contained 269 distinct herbs in total, while the ineffective prescriptions included 273 distinct herbs. Herb frequencies of the effective and ineffective samples are shown in Fig. 3A. More than half of the herbs had frequencies below 10. A total of 70% and 66% of herbs in effective and ineffective samples respectively had frequencies below 10. Using a chi-square test, we calculated the P-values for the top 10 herbs in the effective and ineffective prescriptions. As shown in Table 4, the significant herbs in effective prescriptions were Poria with Hostwood (P = 0.033) and Caulis Polygoni Multiflori (P = 0.0006), which have a higher proportion in the effective group than the ineffective group. From a clinical perspective, these two herbs are necessary drugs to cure insomnia. Semen Ziziphi Spinosae is used as a sedative drug with good effect on Qi–Yin deficiency of insomnia. Polygala tenuifolia Willd and Rhizoma Acori Tatarinowii are commonly paired drugs that cure kidney–heart dysfunction in insomnia. The major efficacy of Radix Glycyrrhizae is to regulate herbal properties. There are 439 distinct symptoms in 214 matched cases, where 88% of the symptoms have their frequencies<10 (Fig. 3B).
To investigate whether the clinical characteristics of these 214 samples are different from those of the whole data set, we calculated the frequencies of the related features of these samples. The results showed that there are 64.5% samples composed of female patients and large proportion (69.2%) of the samples are those patients with ages between 30 and 60 years. To further find the distinct features of these samples that are different from the whole data set (955 samples), we carried out a Bernoulli test to confirm these significant features of the matched cases (Table 5) [
23]. It showed that all the frequencies of the demographic features in the matched samples have no significant differences from those of the whole data set. However, the number of patients with some syndromes in the matched cases, such as blood-stasis syndrome (
P = 3.68E–08), blood deficiency (
P = 1.85E–07), kidney weakness (
P = 4.12E–06), Qi deficiency (
P = 6.24E–05), and restlessness (
P = 7.11E–05) is significantly higher than those of the whole data set.
Effective prescription detection
The core herb sub-network was extracted from 201 effective prescriptions for 107 clinical cases. The core herb network () contains 8 herbs and 19 herb combinations (Fig. 5A). Then, the similarities between 201 effective prescriptions andwere computed, and 63 prescriptions containing four or more common herbs with were obtained. To observe herb effectiveness enrichment of these 63 prescriptions and , HSEA was used to analyze their herb set enrichment (Table 6). The P-values for 32 herb sets were below 0.05, which implied significant effectiveness for these prescriptions. The enrichment result for (<0.0001) was best. was composed of Astragalus root, jujube, Codonopsis pilosula, Prepared licorice, Semen Ziziphi Spinosae, Radix Aucklandiae, Rhizoma Atractylodis Macrocephalae, Poria with Hostwood, Arillus Longan, Polygala tenuifolia Willd, ginger, and Radix Angelica Sinensis, which was just Gui Pi Wan. We then found that , , , and were just Gui Pi Wan, while bothand were the addition and reduction of Gui Pi Wan respectively. Therefore, enrichment results indicated that Gui Pi Wan had better effectiveness on curing insomnia in these patients compared with the other herb prescriptions.
The
P value for
was 0.0398 and was ranked 29th, suggesting that the core network extraction results were not good. Therefore, the core sub-network was extracted again from the 32 prescriptions. The second set of core herb compatibilities (
) contained 13 compatibility pairs (Fig. 5B). From the results of the two core prescriptions extracted, Semen Ziziphi Spinosae and Radix Glycyrrhizae had more compatibility with other herbs in high-frequency compatibilities. In addition, Radix Glycyrrhizae was used to regulate herbal property. Meanwhile, as common drug pairs of
Tian Wang Bu Xin Dan, which is a classic prescription for insomnia [
24], Semen Ziziphi Spinosae and Poria Cocos were used together to enhance their heart-nourishing and mind-calming effects. The similarity of
,
,
and all the prescriptions were computed; prescriptions with more than 60% of the same herbs were filtered with
,
and
(Table 7). The effectiveness ratio of
was higher than that of
. Compared with original prescriptions,
had a higher effectiveness ratio (76.9% vs. 42.8% in matched samples and 94.2% vs. 84.9% in all samples). Prescriptions with high herb overlap and
were
Gui Pi Wan and its modifications. Therefore, the result indicated that
Gui Pi Wan might have a better effect on treating insomnia in these patients than the other herb prescriptions. Actually, it is reasonable that since the syndromes, such as blood deficiency, Qi deficiency and spleen deficiency, which are the indications of
Gui Pi Wan for insomnia, are the main diagnoses in the matched samples (Table 5).
Detection of effective herb–symptom relationships
Effectiveness-based condition mutual information was used to detect strong herb–symptom relationships under effectiveness conditions. A total of 342 herbs and 439 symptoms existed in the 214 samples. A total of 84 420 EMI values of herb and symptom were obtained (Fig. 6), with 4684 EMI values (5.5%) above 0.01. These values were considered to be relatively strong herb–symptom relationships. The top 10 herb–symptom relationships with highest EMIs are shown in Table 8, which are most consistent with the TCM empirical knowledge. For example, it is well recognized that Rhizoma Atractylodis Macrocephalae had a certain curative effect on spleen and stomach weakness and asthenia, while Codonopsis Pilosula could relieve abdominal distension. Remission relationships were likewise noted in Radix Glehniae–tooth mark on the tongue (rank: 3), Bombyx Batryticatus–nervous system (rank: 7), and Radix Paeonia Rubra–mild menstruation (rank: 9).
Next, relevant symptoms of, whose EMIs were bigger than 0.01, were extracted from herb–symptom relationships. Table 9 shows the top 20 symptoms with highest EMIs when symptoms are sorted by the sum of EMI. The heart–spleen deficiency syndrome of insomnia included abdominal distension (rank: 3), weak pulse (rank: 4), asthenia (rank: 5), fatigued limbs (rank: 9), loose stools (rank: 13), poor appetite (rank: 15), and pale tongue (rank: 20). The heart–spleen deficiency syndrome of insomnia included asthenia (rank: 5), fatigued limbs (rank: 9), and pale tongue (rank: 20). These strong relevant symptoms of Gui Pi Wan are typical in insomnia and are useful for investigating the indications of the effective herb prescriptions.
Discussion
Symptoms are key clinical manifestations for disease classification and have their underlying molecular mechanisms [
25] that are considered as the sole clinical phenotypes for clinical diagnosis in TCM [
26,
27]. Li
et al. [
28] proposed a network-based correlation analysis method to detect herb–symptom associations. Prescription is the basic treatment method during TCM clinical treatment. Therefore, effective herb prescriptions for particular diseases and detection of herb–symptom relationships are important for TCM clinical practices. The present study proposed a multistage analysis method that integrates propensity case matching, complex network analysis, and HSEA to identify effective herb prescriptions for particular diseases. First, diagnoses and treatment data for 955 insomnia cases were collected from well-designed observational studies. Second, based on propensity case matching, 214 samples were matched with balanced confounding factors. Then, core network extraction and herb set enrichment were combined to identify core effective herb prescriptions. The positive effectiveness rate of the prescription
Gui Pi Wan (76.9% vs. 42.8% for matched samples; 94.2% vs. 84.9% for all samples) indicated that it may have better effect on curing insomnia in these patients compared with other herb prescriptions in our clinical data. Finally, effectiveness-based mutual information was used to detect effective herb–symptom relationships, in which strong herb–symptom correlations were regarded as indications for clinical herb modifications targeting individualized patients.
However, our study has several shortcomings that need to be addressed in our future work. First, ineffective samples were too few (11.83%), resulting in only 214 matched samples. In the future, more clinical cases must be collected to conduct a comprehensive evaluation of the method. Second, the basic TCM features are holism and treatment based on syndrome differentiation [
29–
31]. TCM syndrome is defined as a diagnostic classification of pathological changes, which are based on symptoms and signs [
32]. Therefore, the approach to detect the effective herb prescription should incorporate syndromes as additional components. Finally, it is obvious that the effectiveness of herb prescriptions is relevant to the complicated interactions of their herb ingredients. However, in this study, complex relationships between herb ingredients were not considered. In addition, Li
et al. [
33] studied system pharmacology strategy for drug discovery and combination, which would be useful to investigate the molecular mechanisms of herb prescription. In the future, syndrome and herb–herb relationships will be considered to detect effective herb prescriptions from large-scale TCM clinical data.
Higher Education Press and Springer-Verlag Berlin Heidelberg