1 Introduction
Hypertrophic cardiomyopathy (HCM) is the most common inherited heart disorder, affecting 1 in 200–500 adults worldwide [
1,
2]. It is characterized by left ventricular (LV) hypertrophy and measured via echocardiography or other imaging techniques. HCM is believed to be the most common cause of sudden cardiac death among adolescents and young adults. However, the majority of patients with HCM experience normal or near-normal life and typically remain clinically silent [
3,
4]. Significant differences in clinical manifestation and prognosis among individuals with HCM have elicited the interests of researchers to investigate the underlying mechanisms.
Several hypotheses have been proposed to illustrate extreme phenotypic variability. One such hypothesis defines a pattern of disease progression for HCM as the end-stage or “burnout” phase [
5]; this process is consecutive from onset time to end stage, with adverse cardiac remodeling. Patients in the early phases are frequently asymptomatic, and hypertrophic phenotype is generally absent. As the disease evolves, advanced functional deterioration occurs in the left ventricle; this condition is defined as either a hypokinetic-dilated phase or a restrictive phenotype. However, previous cohort studies have suggested that only a small proportion of patients with HCM progressed to the end stage [
6,
7]. Why some patients undergo remodeling and progression while others do not has not yet been elucidated. Its highly diversified clinical symptoms have prompted us to speculate that HCM may have distinct subtypes.
Recent studies have revealed that patients with HCM caused by different mutations exhibit differences in symptom severity and prognosis [
8,
9]. Genes or modifiable risk factors reportedly influence the phenotypic severity of HCM [
10]. However, modifier genes and their variants remain largely unknown. Moreover, the current approach is insufficient for quantitatively estimating risk for HCM progression, and such estimation is highly desired in clinical practice.
In the current study, a consensus clustering approach was applied to identify the clinical subtypes of HCM on the basis of echocardiography data. Interestingly, two major clinical subtypes were identified, delineating the diversity of clinical outcomes in HCM. By using machine learning methods, we identified subtype-associated genes that could effectively distinguish HCM subtypes, providing further insights into the genetic context, clinical prognosis, and potential interventions. Collectively, these findings may help bridge knowledge gaps among phenotypes, genotypes, and prognoses.
2 Materials and methods
2.1 Study population
The study cohort comprised 793 sporadic patients with HCM from 2007 to 2019 recruited from Tongji Hospital, Wuhan, China. HCM was diagnosed as a maximal end-diastolic LV wall thickness ≥15 mm in echocardiographs or cardiac magnetic resonance images, in the absence of abnormal loading conditions or other cardiac or systemic diseases capable of producing the magnitude of hypertrophy, e.g., congenital heart diseases, aortic stenosis, uncontrolled hypertension, or phenocopied conditions. More limited hypertrophy (13–14 mm) was diagnosed for patients with a family history of HCM [
11,
12]. Peripheral blood samples were obtained from all the participants upon enrollment. All clinical variables, particularly echocardiography characteristics, 24 h Holter, and natural history, were collected from patients’ medical records that were blinded to patient genotype at the start of the study.
2.2 Clinical subtype identification
The R package ConsensusClusterPlus v1.48.0 was employed to identify clinical subtypes on the basis of echocardiography features [
13]. This function provides quantitative stability evidence for determining cluster count and membership in an unsupervised analysis. In particular, the consensus clustering method involves subsampling from a set of items, and it determines clusters of specified cluster counts. Then, pairwise consensus values, i.e., the proportion in which two items occupied the same cluster out of the number of times they occurred in the same subsample, are calculated and stored in a symmetric consensus matrix for each cluster count. Figure S1 shows all echocardiography variables that are commonly used to evaluate cardiac structure and function. Among them, three variables with more than 20% missing data were excluded from further analysis. The seven remaining variables, namely, the thickness of the interventricular septum (IVS) and left ventricular posterior wall (LVPW), left ventricular end-diastolic diameter (LVEDD), left atrial diameter (LAD), left ventricular ejection fraction (LVEF), and the ratios E/A (mitral inflow velocity curves) and septal E/Eʹ (annular tissue Doppler signals) ratios, were used for the clustering analysis. Agglomerative hierarchical clustering was performed with a subsampling ratio of 0.8 for 1000 iterations. Consensus matrices, subtype-consensus plots, and item-consensus plots were used to determine the optimal number of subtypes.
2.3 Simplification of clustering by creating a decision tree
The overrepresentation or underrepresentation of a variable in each subtype was calculated via v-test with the catdes function of the R package FactoMineR v2.0 on the basis of hypergeometric distribution. The contribution of each echocardiography variable to subtype clustering was measured through permutation accuracy importance by using random forest. To construct a simple decision tree model that can discriminate patient subtypes, we used the ctree function of the party v1.3-5 package, a conditional interference framework that estimates a regression relationship via binary recursive partitioning. In particular, patient subtypes determined via the aforementioned consensus clustering were used as input for decision tree modeling to identify the key parameters necessary for distinguishing among patient subtypes, and a classifier that could be applied to external cohorts was created.
2.4 Follow-up and clinical outcomes
Follow-up with the recruited patients was conducted by March 2019 through face-to-face interviews and/or telephone conversations. The primary end point was death due to cardiovascular diseases, including heart failure-related and sudden deaths. Other clinical outcomes included all-cause death, heart transplant, nonfatal stroke, and progression to New York Heart Association (NYHA) class III/IV.
2.5 Whole-exome sequencing (WES) for all the patients
DNA extraction from whole blood and WES was performed on an Illumina platform. The details are described in the Supplementary Methods (Table S1).
2.6 Role of HCM-associated genes in subtype classification
To compare the proportion of patients that carried mutations in known HCM-associated genes (Table S2) between subtypes [
14], we identified putative causal mutations in these genes, and the recommendations of the American College of Medical Genetics and Genomics (ACMG) were adopted to determine the pathogenicity of each variant [
15]. In particular, only rare nonsynonymous or truncating variants (nonsense, frameshift, and splice sites) in HCM-associated genes, with MAF ≤0.1% in the East Asian population from public databases, and labeled deleterious in functional prediction methods, were then subjected to ACMG evaluation. In particular, only truncating variants were retained for evaluating the
TTN gene. Subsequently, we compared the proportions of patients that carried mutations in each of these genes between different subtypes to determine whether mutations in some of these genes were linked to subtype classification. Moreover, we compared the prognoses of patients that carried mutations in these genes with those that did not carry mutations to evaluate the role of HCM-associated genes in predicting the outcomes of patients. The overall effect of these mutations in predicting subtype classification and survival was assessed via the area under the receiver operating characteristic (ROC) curve (AUC), wherein the status of carrying mutations for each patient was regarded as a predictor.
2.7 Novel subtype-specific gene identification based on rare variants
We then estimated the effects of rare variants in a whole-exome scale on subtype classification, not merely in known HCM-associated genes. A total of 136 654 nonsynonymous and truncating variants with MAF < 1% in both our population and East Asian populations from public databases (1000 Genomes Project, Exome Aggregation Consortium, and Genome Aggregation Database) were subjected to subsequent analyses. After gene-based annotation, multiple in-silico computational methods were employed for the functional prediction of variants.
To quantify the mutation burden for each gene, we first assessed the pathogenicity of the included variants by using several
in-silico computational methods, with the best performance in functional prediction, including REVEL, VEST3, MetaLR, and M-CAP [
16]. For each variant, the average score calculated among the four algorithms was considered the combined prediction score. The variant-level prediction scores across the entire gene were accumulated as an overall mutation burden for this gene. Accordingly, a score matrix with
gene numbers×
sample numbers was generated, where the mutational profile for each sample was represented by 17 033 gene burden scores.
To explore the differences in genetic basis between subtypes, we attempted to model the additive effect of gene mutation burden on HCM subtype propensity. Considering that the number of genes was, relatively, considerably larger than the number of samples, which would lead to an overfitting problem and generate models with poor generalization capability, we introduced the L1-norm to penalize the weight of the model parameters; that is, we aimed to find the best compromise between model complexity and empirical risk and identify a minimum number of feature genes to best explain the observations. Consequently, we adopted a logistic regression model with L1 regularization, which can force coefficient values to be 0, generating a sparse solution to selecting the leading genes of each subtype.
2.8 Protein–protein interaction network analysis
Subsequently, we seeded the subtype-specific genes identified above into the STRING Interactome integrated protein–protein interaction (PPI) to build networks through selected connection pairs, with evidence confidence scores over 400. Then, we sought to identify modules that were tightly condensed across the entire network by using the InfoMap algorithm. Members within a module are likely to work collectively to perform biological functions. The aforementioned procedures were implemented using NetworkAnalyst v3.0 [
17]. Finally, the biological functions of the observed modules were determined through enrichment analysis and annotated with the Enrichr web server [
18].
2.9 Individual PPI networks for HCM with reduced LVEF
We retrieved the published proteome expression dataset PXD008934 from ProteomeXchange, which contained the proteomic changes characterized via mass spectrometry in nine human heart tissues with HCM accompanying preserved (53.12% ± 3.75%,
n = 4) or reduced ejection fraction (25.00% ± 9.35%,
n = 5). Individual networks for samples with reduced LVEF were built following the procedures proposed by Maron
et al. [
19]. A Pearson’s correlation matrix was first calculated for each gene pair from all samples with preserved LVEF. Then, each sample with reduced LVEF was added and the correlation matrix was recalculated. Gene pairs with correlations that were significantly changed were mapped to the STRING Interactome. This procedure resulted in a network that represented the dysfunctional or perturbed system of the corresponding sample. We then used a hypergeometric test to determine whether an individualized network was significantly enriched with the identified genes and other genes associated with HCM endophenotypes [
19]. The Benjamini–Hochberg procedure was applied for multiple hypothesis tests.
2.10 Validation by second independent cohort
External validation is required to ascertain the correlation between genetics and subtypes; this procedure verifies the robustness and generalization of the genetic model for clinical practice. Thus, we enrolled another independent cohort that consisted of 414 patients with HCM from the same hospital (Tongji Hospital, China). WES and genotyping were performed in accordance with the procedures described above.
2.11 Statistical analyses
Continuous variables were compared using an unpaired Student’s t-test, while categorical variables were analyzed using the chi-square or Fisher’s exact test. Survival curves were constructed in accordance with the Kaplan–Meier method, and comparisons were performed using the log-rank test. Cox proportional hazard models were used to assess the effects of multiple clinical features on the risk of outcome events. AUC was used to evaluate the performance of the binary classification model. Repeated stratified fivefold cross-validation was used to perform this evaluation. All reported probabilities were two-sided and considered significant at P < 0.05.
3 Results
3.1 Consensus clustering identified two HCM subtypes
An unsupervised consensus clustering approach was applied to determine the number of possible subtypes of all the patients with HCM by using echocardiography data. We observed that these patients were clustered into two–six subtypes. The two subtypes (k = 2) were selected for further analysis because of their better performance and stability (Fig. S2). We further measured the differences in clinical features between the two subtypes. As indicated in Tab.1, more male subjects were found with subtype 1 than with subtype 2 (83.8% versus 63.7%, respectively; P < 0.001). The mean LVPW, LAD, and LVEDD were greater in patients with subtype 1 compared with in patients with subtype 2 (LVPW: 13.35 mm versus 11.64 mm, respectively, P < 0.001; LAD: 46.22 mm versus 39.72 mm, respectively, P < 0.001; LVEDD: 55.82 mm versus 44.98 mm, respectively, P < 0.001), while patients with subtype 1 exhibited less thickness of IVS (15.61 mm versus 17.81 mm, P < 0.001). An apparent reduction in LVEF was consistently observed in subtype 1 (44.67% versus 64.39%, P < 0.001). Both subtypes suffered from LV diastolic dysfunction, while subtype 1 exhibited not only reduced filling function but also damage to LV compliance (E/A ratio: 26.41 versus 2.28, P < 0.001), suggesting reliability for two subcluster divisions. Further propensity score matching to adjust for potential bias in baseline characteristics suggested the same findings (Table S3 and Fig. S3).
3.2 Supervised decision tree modeling to enhance clinical utility
On the basis of the two identified subtypes, we further tested whether a simplified classifier with a minimal subset of these echocardiographic variables used in consensus clustering could still assign patients to their corresponding subtype. We first used random forest to measure the importance of each echocardiographic variable. The result suggested that the preceding clustering was largely driven by LVEF, LVEDD, E/A, LAD, LVPW, and IVS (Fig.1). We then applied decision tree modeling by using the subtypes from the preceding clustering as input to create a classifier that comprised the six variables above (Fig.1). The result revealed that the HCM patients could still be stratified into the two subtypes with an AUC of 0.93 (95% confidence interval 0.91–0.95).
3.3 Association of subtypes with clinical outcome
The above findings revealed two distinct subtypes of HCM on the basis of multiple methods. Subsequently, we verified whether the two subtypes were associated with different prognoses. Among the 775 (97.7%) patients included in the final evaluation, with a mean follow-up time of 32.78 ± 27.58 months, we observed higher all-cause mortality in subtype 1 compared with in subtype 2 (20.2% versus 11.4%,
P = 0.002). The 18 patients who were lost to follow-up were excluded in the survival analysis. Further survival analysis (Fig.2) showed that patients with subtype 1 had a higher likelihood of experiencing primary end point events (cardiac mortality: HR 2.68,
P < 0.001; cardiac death and heart transplant rate: HR 2.83,
P < 0.001; all-cause mortality: HR 2.11,
P < 0.001) and developing moderate or severe congestive symptoms (NYHA class III/IV) (HR 2.69,
P < 0.001) compared with patients with subtype 2. In accordance with an earlier study [
20], age, female sex, NYHA class III/IV symptoms, and history of atrial fibrillation were predictors of cardiac mortality (Table S4). Subtype 1 remained independently associated with a higher risk of cardiovascular death (HR 2.24,
P = 0.0015) compared with subtype 2 after using multivariable modeling inclusive of all significant univariate predictors (Table S5 and Fig. S4). When adjusted for LVEF, subtype 1 remained an independent risk factor for NYHA class III/IV (HR 1.48,
P = 0.047) (Table S6). In general, subtype 1 patients were associated with poor overall survival probability compared with subtype 2 patients.
3.4 Effects of HCM-associated genes on subtyping and disease prognosis
The distinct clinical characteristics of the two subtypes have prompted us to explore the underlying genetic determinants. We first focused on the evaluation of HCM-associated genes (Table S7). For the majority of HCM-associated genes, the proportion of carriers was not different between the subtypes, except for MYBPC3 and MYH7, whose carriers were significantly enriched in subtype 2 relative to that in subtype 1 (Figs. S5 and S6). We further compared the risk of experiencing cardiac death, and no significant difference was observed between carriers and noncarriers for most of the genes associated with HCM (Fig. S7). Consistently, the overall effect assessment suggested that mutations in these genes could hardly discriminate one subtype from the other (AUC = 0.54) or predict survival at the end of the follow-up period (AUC = 0.62) (Fig. S8).
3.5 Machine learning modeling to identify novel genetic determinants
The weak contribution of known HCM-associated genes to the subtypes and outcomes observed above has prompted speculation that other novel disease-modifying genes may be present in HCM. Given that rare variants have a relatively larger effect size but cannot be effectively captured by single-variant analyses [
21–
23], we constructed machine learning models based on the accumulated mutation pathogenicity of rare variants at exome-wide gene level to distinguish the subtypes. Figure illustrates the process of searching for an optimal
C value (the inverse of regularization strength), which was determined by 1000 times (random shuffle) stratified fivefold cross-validation. The optimal
C value was set to 0.033 for a minimum average log loss. At the given
C value, 51 genes among the whole 17 033 gene set were assigned with nonzero weights, with 46 genes exhibiting an increased mutation burden in subtype 1 relative to subtype 2 (Table S8). Subsequently, we constructed models with 46 genes to verify whether they could accurately predict subtype classification. As shown in Fig.3, the machine learning model based on the 46 genes presented superior predictive power with an average AUC of approximately 0.81. Hence, these genes are probably the most distinguishing features of the subtypes.
Moreover, correlation analyses suggested a positive linear link between LVEDD and probability for subtype 1 predicted by the 46-gene model (R = 0.25, P = 5.8e−13) and a negative link between LVEF and probability for subtype 1 (R = −0.34, P = 2.2e−16) (Fig. S10). A similar trend was observed in the survival analysis, wherein the likelihood of experiencing cardiac death and progression to NYHA class III/IV increased following a rise in probability for subtype 1 (Fig. S11). Combined, these results indicate that the identified genes exerted a stronger effect on the severity and prognosis of HCM relative to known HCM-associated genes.
3.6 Network analyses to unravel underlying pathobiology
To further explore the pathobiology that accounted for subtype, we subsequently mapped the 46 machine-identified genes onto human PPI networks to determine associated biological pathways. Subsequent community detection identified 36 modules that were tightly condensed internally. Expectedly, the GO term annotation for these modules suggested links with the cardiovascular system to a certain extent (Fig. S12). Given the dominant role of LVEF in subtyping, published proteomic expression profiles from HCM patients with reduced LVEF compared with those with preserved LVEF were used to generate individual PPI networks (Table S9). Enrichment analysis showed that the 46-gene set was significantly enriched across the patient networks, except for sample HCMrEF5 (Fig.3, Table S10). In addition, we determined that some of the individual networks were also enriched for the HCM endophenotypes identified in a previous study (Fig.3) [
19]. Combined, these results provided insights into the pathobiological complexity of HCM subtypes at the network medicine level.
3.7 Second cohort validation
To validate the correlation of identified genes with phenotypic variability, we enrolled for another independent cohort that comprised 414 patients with HCM recruited from Tongji Hospital (Wuhan, China) and diagnosed with the same criterion, and performed WES. The same mutation burden weighting that used rare variants for the 46 genes was followed. The subtype status of these patients was then predicted using the aforementioned genetic model, which was fitted by the first cohort based on the 46 genes. To avoid confusion, the predicted subtype for each individual was labeled as “group” rather than “subtype”. As presented in Tab.2, 101 patients were predicted for Group 1, while the rest were labeled for Group 2. Significant differences still existed between the two groups in terms of IVS, LVEDD, and LVEF. Compared with the characteristics summarized in the first cohort, Group 1 presented increased LVEDD (53.47 mm versus 49.57 mm, P < 0.001) and impaired LVEF (53.00% versus 57.92%, P = 0.002), while Group 2 was characterized by more severe IVS (15.73 mm versus 17.09 mm, P = 0.002). Moreover, we ranked samples into quartiles in accordance with their predicted possibilities for subtype 1 and observed the same progression trends across quartiles (Fig. S13).
To test the clinical utility of the genetic model, we applied the previous decision tree derived from echo-based clustering (Fig.1) to the second cohort and determined the corresponding clinical subtypes, namely, true labels. The predictive power of the genetic model in the second cohort is depicted in Fig. S14, with an AUC of 0.64. Accounting for the effects of traditional risk factors on cardiovascular diseases, we collected 12 other clinical variables of these patients to construct an integrated model. These clinical variables were as follows: sex, age, smoking, alcohol intake, systolic blood pressure, diastolic blood pressure, serum triglycerides, total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, coronary atherosclerosis, and diabetes. By integrating these factors into our genetic model, we achieved a better interpretation of HCM subtyping with significantly increased AUCs of 0.84 in the first cohort and 0.70 in the second cohort. Overall, these results supported the assumption that the 46 genes, although with limited coverage in all the patients, were associated with HCM phenotypic variability. The integration of genetic and nongenetic factors is capable of recognizing patients who tend to suffer from adverse remodeling, albeit only partially.
4 Discussion
Overall, the comprehensive clustering analysis of echocardiography features from 793 HCM cases uncovered two major clinical subtypes, which exhibited distinct manifestation and genetic basis. Patients with subtype 2 presented a form of asymmetric septal hypertrophy and were associated with a stable course. By contrast, posterior free wall involvement, LV systolic dysfunction, and unfavorable outcomes were more common in subtype 1. The subsequent machine learning model construction identified 46 most distinguishing genes with increased mutation burden in subtype 1. Network analysis revealed functional modules and biological pathways involved in subtypes, along with the enrichment of the identified genes in individual PPI networks for HCM patients with reduced LVEF. External validation in a second cohort of 414 cases provided evidence in favor of the correlation between genetics and subtypes. We intended to draw an overall picture of the genetic basis accounting for subtypes at the levels of variant, gene, and network, without subjective choices in any steps.
Previous studies have noted that the lifelong process of LV remodeling and progressive dysfunction occurred in some HCM patients [
24]. The results of large-scale cohort studies implied the inadequacy that not all, but only a small proportion, of patients developed to this end stage [
7]. Otherwise, the substantial heterogeneity that drives for such different progression is less clear. Therefore, we discussed the possibility that HCM subtypes exist naturally and their differences in a genetic context, intending to view this disease as inclusive of its clinical, morphological, and molecular diversities. In contrast with previous studies that were based on natural history observation and subjective division [
25], we conducted unsupervised clustering in a large-scale HCM cohort by taking advantage of machine learning approaches, wherein relevant structural and functional data recorded via echocardiography were used as input to the clustering algorithm. Therefore, patients were automatically clustered into several groups in accordance with their similarities in echocardiography features.
Considering the limitations in viewing HCM through the narrow prism of a single sarcomere gene mutation, we comprehensively inspected all genes based on WES data from different hierarchies. In contrast with classical burden tests and SKAT [
26], which are based on allele frequency or variance component score tests, we weighted rare variants with pathogenicity instead of regression coefficients and aggregated them into a combined mutation burden for each gene. Given the limited power for detecting genetic susceptibility with a relatively small sample size, we adopted a feature selection algorithm based on a penalized linear classification model that measured the contribution of genes to subtype status. The initial objective could be substantially interpreted to explain and predict the morphological abnormalities of HCM with personal genetics. The minimal subset of 46 genes identified by the L1-penalized regression model achieved the most accurate prediction of HCM subtypes. HCM has been widely regarded as a monogenic disease, in which causal mutation in sarcomere genes is believed to be the prerequisite and a major determinant of the phenotype [
27]. In contrast with this hypothesis, a poor predictive performance was observed in the model based on HCM-associated genes. These observations further support the new perspective that HCM clinical phenotype may be defined by the genetic context rather than solely by a single genetic event [
28]. Similarly, significant increases in the predictive power for both cohorts after integrating the genetic model with clinical risk factors reflected nongenetic contributions in disease processes. However, a considerable proportion of patients with adverse remodeling in the second cohort could not be captured by the 46-gene model. Such limited coverage emphasized the variability in molecular mechanisms among patients with the same HCM diagnosis. Combined, these points underscore the need to expand the spectrum of determinants and modifying factors in HCM remodeling.
Apart from applying echo-based decision trees to match patients to their corresponding subtypes, personal genetics seems more applicable in offering an early evaluation of HCM progression risk. Genetic testing is recommended for patients fulfilling the diagnostic criteria for HCM due to an increased understanding of the genetic basis of HCM and the rapidly evolving high-throughput sequencing technologies. Our results suggest that patients may benefit from genetic testing in other aspects, not only in the diagnosis of HCM. A widening range for genetic testing should be recommended, because sequencing and analysis should not be limited to HCM-associated genes. Evaluating the risk for different progression based on personal genetics and traditional risk factors is possible and may provide valuable advice for early intervention and disease management.
In the absence of experimental evidence, 46 genes were agnostically and automatically selected and considered subtype-related genes. The machine learning algorithm only considered genes whose increased mutational burden in subtype 1 would contribute to prediction accuracy, which might carry a risk of false positives. With the aim of testing whether these selected genes are involved in the pathogenesis of HCM and determine their functional context, we mapped them onto a human PPI network and identified tightly clustered topological modules linked with subtype 1. Expectedly, the subsequent GO and Kyoto Encyclopedia of Genes and Genomes enrichment analyses for these modules indicated that some modules were directly involved in the cardiovascular system, such as cholesterol metabolism, mitochondrial oxidative metabolism, and sarcomere organization (Fig. S12). We also noted some novel or less reported pathways, such as the mTOR signaling pathway, PI3K-Akt signaling pathway, and aminoacyl-tRNA biosynthesis, which may promote new perspectives for HCM [
29,
30]. Previous studies have proposed that individualized PPI networks can provide critical insight into determining patient-specific and clinically relevant HCM pathophenotypic characteristics [
19,
31]. Thus, we utilized the information provided by individual networks of LVEF-reduced patients to check the role of these feature genes and relevant endophenotypes that were unique to specific patients. Our results revealed that the 46-gene set was enriched across the individual networks of HCM patients with reduced LVEF and provided further support for the involvement of these genes in HCM. These results also suggested that mutation signatures in these genes were implicated in phenotypic heterogeneity. Further work is required to establish the relationship between these modules and HCM subtypes and to elucidate their exact mechanisms.
5 Study limitations
Our study was based on the echocardiographic features of 793 patients with HCM from a large-scale cohort. Compared with magnetic resonance imaging, echocardiography may be limited in providing detailed information for patients with poor acoustic windows or in detecting LV apical and anterolateral hypertrophy. In addition, our patients were recruited from a single center and an age span existed in the cohort. A higher rate of progression to heart failure and mortality was observed in this study compared with previously published cohorts, which might be explained by the inadequate attention given to HCM in China. Patients with HCM only visit a hospital when evident symptoms emerge. Meanwhile, the links of genes and pathways obtained from machine learning modeling with HCM subtypes should be further confirmed by animal and cytological experiments. In addition, limited proteins were available for proteome analysis, resulting in incomplete individual network construction.
6 Conclusions
This study was designed to explore the potential subtypes of HCM in a large-scale cohort. On the basis of echocardiography features, we propose a new classification scheme based on a distinct genetic context. Personal whole exome-based machine learning methods have been used to identify HCM subtype-associated genes and subtype prediction model construction. These findings may contribute to our understanding of the correlations among phenotypes, genotypes, and prognoses in HCM.