1 Introduction
Thoracic aortic dissection (TAD) is a relatively rare disease, but it is associated with high mortality and poor prognosis. The genetic background is thought to participate in the development of TAD [
1,
2]. With great progress in genetic research in the past several years, many TAD genes, which are generally involved in smooth muscle cell (SMC) construction, adhesion to the extracellular matrix (ECM), and transforming growth factor-β (TGF-β) signaling pathway, have been identified. Variants in these genes can be detected in the majority of TAD families with or without connective tissue disorders, such as Ehlers–Danlos syndrome (EDS), Loeys–Dietz syndrome (LDS), bicuspid aortic valve (BAV), and Marfan syndrome (MFS) [
3,
4]. However, approximately 80% of TAD patients, who are termed sporadic TAD (STAD), do not have a family history of the disease or syndromic features, but they share many risk factors and pathologic mechanisms with heritable TAD (HTAD), including older age, male sex, long-term hypertension, and smoking status [
2,
5].
Currently, great advancements in imaging and novel diagnostic tests have improved the accuracy in diagnosing TAD, but many TAD individuals are still diagnosed too late to obtain the best treatment opportunity, especially in emergency patients. One of the possible explanations is that the disease usually remains asymptomatic until an acute aortic dissection or rupture occurs, indicating that conventional diagnostic strategy combined with family history and auxiliary examination still lacks sensitivity for predicting the disease. Taking advantage of the development of next-generation sequencing and the reduction of sequencing cost, a series of gene panels of sequencing has been applied to TAD patients to detect some novel variants related to STAD and has led to additional knowledge about genetic variants of TAD [
6–
9]. However, most of these studies detected variants in patients only and lacked the frequency of matched healthy control sets, causing difficultly in evaluating the pathogenicity of variant sites. In addition, the power of single-variant test, including genome-wide association study (GWAS), is so limited that more research or diagnostic testing needs to verify the results further.
In our study, we conducted a two-stage case–control study on STAD cohorts to detect the diagnostic yield based on whole exome sequencing (WES) and performed gene-based association tests, namely, the optimal sequence kernel association test (SKAT-O) [
10], to identify the contribution of each risk locus to the pathogenesis of STAD so that we can improve our understanding of the diagnosis, treatment, and even prediction of prognosis.
2 Materials and methods
2.1 Study participants
The 223 STAD patients confirmed by selective computed tomography (CT), CT aortography (CTA), and retrograde aortography were recruited from Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China, between December 2015 and November 2018. In addition, a total of 414 participants in the control group included healthy subjects from communities of Wuhan and patients without aortic diseases from Tongji Hospital. To replicate the results, we enrolled an independent cohort, including 423 cases and 230 non-STAD controls from Tongji Hospital and 504 controls from the 1000 Genomes East Asian population. Aortic disease was excluded from all control sets from Tongji Hospital by at least one imaging examination, such as transthoracic echocardiography. The exclusion criteria in STAD patients were (1) family history of aortic diseases (dissection, aneurysm, and intramural hematoma); (2) definite or probable inherited connective tissue disorders, such as EDS, LDS, BAV, and MFS; and (3) aortic dissection secondary to trauma, iatrogenic injury, infection, aortitis, and other causes. All participants were Han Chinese, and clinical data were collected from electronic medical records. The study was approved by the Institutional Review Board of Tongji Hospital, Tongji Medical College, Huazhong University of Science Technology, and all participants provided written informed consent prior to inclusion. Fig. S1 shows our whole study design.
2.2 WES
High-quality genomic DNA of all participants (DNA concentration>30 ng/µL and OD 260/280>1.80) was extracted from peripheral blood leukocytes using a QIAamp DNA Mini Kit (Qiagen, Hilden, Germany). The exome capture procedure was generated using the Agilent SureSelect Human All Exon V6 kit (Agilent Technologies, CA, USA) and sequenced on the Illumina NovaSeq platform by Berry Genomics to generate 150 bp paired-end reads with an average sequencing depth above 100×.
The Genome Analysis Toolkit (GATK 3.7) Best Practices workflow was performed for variant discovery analysis. First, using low-quality bases and adaptor sequences trimmed by Trimmomatic [
11], we aligned valid sequencing data on the reference human genome (UCSC hg19) by Burrows–Wheeler Aligner [
12]. Next, Picard was employed to sort and index BAM files and mark duplicated reads. Indel realignment and base quality recalibration were carried out by GATK 3.7, and the final BAM files were used for variant calling and variant quality score recalibration [
13]. Furthermore, we excluded variants that met the following conditions from the analysis: (1) coverage depths<10×, (2) multiallelic variants, (3) significant deviation from the Hardy–Weinberg equilibrium (HWE;
P<1×10
−6), and (4) missing rate >20%.
2.3 Quality control of exome sequencing
Quality control was performed on each variant including variant call rate >95%, minor allele frequency (MAF)>0.05, linkage equilibrium or low levels of linkage disequilibrium, and HWE
P>1×10
−6. Individuals with a heterozygosity rate deviating more than six standard deviations from the mean were excluded. Subsequently, the calculation of cryptic relatedness and creation of genetic relationship matrix were carried out separately in PLINK and Genome-wide Complex Trait Analysis [
14,
15]. After removing one of each pair for related samples with PI_HAT>0.2, we calculated the principal components for population stratification to identify outliers within our population and conducted multidimensional scaling (MDS) for ancestry analysis to compare the consistency between genetic and self-reported information [
16]. The data for the MDS analysis came mainly from four 1000 Genomes populations merged with our samples, including European, Asian, American, and African (Table S1).
2.4 Target sequencing
The
COL3A1 exon sequencing data were generated from next-generation sequencing based on the Ion Torrent platform, as we previously described [
17]. In brief, the
COL3A1 gene was captured and amplified using Ion AmpliSeq
TM Library Kit 2.0 (Life Technologies, USA). Then, libraries were quantitated using a Qubit 2.0 fluorometer (Invitrogen, USA) and combined at equal molar ratios for sequencing.
2.5 Candidate gene and variant profile
We searched the OMIM database, GWAS catalog, and online literature to obtain candidate genes associated with heritable aortic aneurysm and dissection. Oncoplot was customized to visualize exonic variants of those genes by R software (version 3.5.0) and maftools package [
18].
2.6 Variant filtration
Variants were annotated with ANNOVAR. After extracting the region among 29 validated genes, we considered variants as “higher priority” if they met the following criteria: (1) located in an exon with rare frequency variants, with MAF<0.01 in the 1000 Genomes project and the Exome Aggregation Consortium (ExAC) and gnomAD database of the East Asian population; (2) functional variants including missense, frame shift, splicing sites, and nonsense variants; and (3) potential deleterious variants defined as damaging, or possibly damaging, in at least five out of eight databases, including SIFT [
19], PolyPhen-2 [
20], MutationTaster2 [
21], VEST4 (score>0.8) [
22], REVEL (score>0.4) [
23], MetaLR [
24], CADD (score>20) [
25], and GERP++_RS (score>2) [
26].
2.7 Gene-based association analysis and permutation test
To assess the contribution of rare deleterious variants within a genomic locus to STAD risk, we tested for the association of genes with more than one rare variant in our population by SKAT-O method within the SKAT package in R [
10].
We performed 10 000 permutations of phenotype labels to obtain empirical P values, and a suggestive threshold at α = 0.05 of the P value is 2.49 × 10−3 (Fig. S2).
2.8 Sanger sequencing
All deleterious COL3A1 variants identified by WES or target sequencing were confirmed by Sanger sequencing. The primers and PCR conditions are listed in Table S2.
2.9 Further bioinformatics analysis of COL3A1
COL3A1 expression in different tissues and read counts of 299 aortic tissue samples were downloaded from the Genotype-Tissue Expression project release V7 database. We also obtained another microarray profile from the GSE52093 data set from the Gene Expression Omnibus (GEO) database of the National Center for Biotechnology Information. We used R and gganatogram package to illustrate the distribution of
COL3A1 in the body stereoscopically [
27]. Analysis of differentially expressed mRNAs was conducted according to the aorta status after being normalized by limma package [
28]. For all
P values, a false discovery rate (FDR) based on the Benjamini–Hochberg method was applied to correct the statistical significance of multiple testing [
29]. Genes with absolute fold change (log
2FC)≥1 and FDR-adjusted
P values<0.05 were considered as significant. Gene enrichment analysis and Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were applied using defined gene sets by R clusterProfiler package [
30]. We also used all genes to perform the gene set enrichment analysis (GSEA) by R GSEABase package [
31], and pathways with absolute normal enrichment score≥1 and
P values<0.05 were considered as significant.
3 Results
3.1 WES and quality control
WES of all 637 samples including 223 STAD cases and 414 controls was conducted on genomic DNA with an average fold coverage of 110-fold on target, and over 95% of target regions had>20× coverage (Table S3).
Subsequently, population structure, genetic relationship, and ancestry analyses were performed for extensive quality control. Briefly, one healthy control was an outlier in heterozygosity, three cases were removed according to the result of principal component analysis, and the rest did not detect distinct subpopulations (Fig. S3a). Three controls were excluded for potential kinship (PI_HAT>0.2), and a heatmap of the kinship matrix with genetic relationship among the discovery cohort suggested low levels of relatedness (Fig. S3b). In addition, 72 867 single nucleotide polymorphisms (SNPs) were identified for ancestry analysis, and the result was in line with the self-reported ancestry of the East Asian population (Fig. S3c).
3.2 Baseline characteristics of patients
Between December 2015 and November 2018, a total of 220 STAD patients comprising130 (59.10%) type A dissection and 90 (40.90%) type B dissection diagnosed by imaging (TTE or CT/CTA) and surgery were enrolled. The detailed characteristics of the STAD population are summarized in Table 1. Among the patients, 175 (79.55%) were male, and the average age at diagnosis was 51.02±10.11 years (range, 22–82 years). Hypertension and cigarette smoking, which are two of the most common risk factors associated with TAD, were observed in 118 (53.63%) and 60 (27.27%) patients, respectively.
Across all individuals, most patients had multiple imaging studies performed and all underwent CT/CTA. More than two thirds of type A patients had a dissection extending from the ascending to the abdominal aorta, and one third had dissection to the common iliac arteries. Eight (8.89%) type B cases were restricted to the abdominal aorta with/without extending to the renal artery and iliac arteries. Notably, the median aortic dimension and ejection fraction were 37.00±11.00 mm and 60.00%±4.00%, respectively. In addition, we found that nearly 67% patients presented dilation of the ascending aorta (more than 35 mm), only five (4.13%) had abnormal left ventricular ejection fraction, and one had an ejection fraction of only 36%.
3.3 Selection of genes for analysis
Syndromic features such as familial thoracic aortic aneurysm and dissection, LDS, and EDS are generally Mendelian (single gene) diseases. The genes responsible may provide insight into the etiology of aortic dissection. Thus, we screened OMIM, GWAS catalog, and PubMed database to summarize genes associated with aortic aneurysm and dissection, and shortlisted 29 validated genes implicated in syndromic and familial aortic aneurysm and dissection (Table S4). Nearly all genes encoded proteins involved in the TGF-β signaling pathway and SMC contraction and adhesion to the ECM, with variants in these genes leading to a compromised intima and aortic dissection.
3.4 Gene-based burden test of the candidate genes
In the discovery cohort, we initially enrolled 220 STAD patients, but seven of them carried pathogenic or likely pathogenic variants based on the ACMG/AMP in the genes for HTAD [
32] and were excluded in the association study of STAD. We identified a total of 1952 variants among the 29 target genes. After variant filtration based on allele frequency (MAF less than 1%) and function (nonsense, frame shift, splice site, and damaging missense), 174 variants remained among 25 genes (
SMAD3,
TGFB3,
BGN, and
COL4A5 genes were excluded) (Figs. 1 and 2A). Only genes with at least two rare deleterious variants were considered for gene-based burden test, which reduced the list to three additional genes (
FOXE3,
SMAD4, and
TGFB2). Subsequently, SKAT-O was performed to evaluate the aggregate effects of rare variants. The results showed that
COL3A1 (
P = 7.35 × 10
−6) had significant association with STAD (
P<2.49 × 10
−3), and
COL3A1 ranked first based on
P value and the number of net patients carrying the rare deleterious variants (Fig. 2B).
On the whole, 7 rare COL3A1 variants were detected in 14 cases and 3 controls, including 7 damaging missense variants. The characteristics of patients with multiple deleterious variants of COL3A1 are shown in Table 2. Among the subjects carrying COL3A1 damaging variants, nine were diagnosed with type A STAD, and all of them exhibited an enlargement of the aortic dimension, the average of which was 41.1 mm. Two of five type B patients showed normal aortic dimension, and both had a history of hypertension.
Considering the false-positive rate in WES, Sanger sequencing was conducted to verify all the rare deleterious variants. Among these variants, most were clustered at the triple-helical region, which is a characteristic feature of all fibrillar collagens. Two were located in the fibrillar collagen C-terminal (COLFI) domain (Fig. 2C). Homology analysis revealed that the sites were highly conserved across species (Fig. S4).
Furthermore, we target-sequenced COL3A1 in an independent cohort, which included 423 cases and 734 controls to replicate this result. With the same criteria of variant filtrations performed, 17 (4.02%) patients carried rare damaging variants, but 22 (3.00%) were detected in controls, including 6 (2.61%) non-STAD patients from Tongji Hospital and 16 (3.17%) subjects from the 1000 Genomes East Asian population. We performed SKAT-O test, and the results suggested that COL3A1 significantly increased the risk of STAD (SKAT-O: P = 0.021).
3.5 COL3A1 expression and function in the aorta
COL3A1 can be found in many tissues as a major structural component. In our study, it was widely expressed throughout the body, especially in the aorta, bladder, and colon (Fig. 3A). To further explore the function of COL3A1 in TAD, we analyzed microarray data (GSE52093) from the GEO database, which included seven dissected ascending aortas and five normal controls. We found that COL3A1 was overexpressed in the dissected aortic tissues after excluding one sample (GSM1259277) from clustering analysis (P = 0.002) (Figs. 3B and S5), and some classical pathways related to TAD could be detected significantly as well (Fig. S6). We further grouped cases by their median expression and used all genes to conduct GSEA based on GO, KEGG, and Reactome pathway databases to investigate the systematic characterization and biological functions associated with COL3A1. The results showed a close association between COL3A1 expression and ECM components, including collagen trimer, collagen formation, and ECM structural proteins (Fig. 3C). In addition, analysis of a larger sample data set from the GTEx database, including 299 aortic tissue samples, confirmed that COL3A1 was related to the ECM pathway (Fig. 3D).
4 Discussion
STAD is a neglected life-threatening cause of sudden chest pain. Our study broadened the knowledge of genetic backgrounds of STAD and supported the theory that genetic diversity is behind its occurrence. We conducted WES on 637 subjects from Chinese populations, including 223 consecutively evaluated STAD patients, and focused on the function of 29 HTAD genes in STAD patients. The results indicated that variants in COL3A1 could increase the risk of STAD significantly and that COL3A1 was the top hit by gene-based association test. Afterward, our results were confirmed in an independent population of 1157 individuals. All variants we detected were predicted to be harmful and located in highly conserved regions, leading to the dysfunction of COL3A1. Transcriptome data from GEO and GTEx databases indicated that COL3A1 not only is widely expressed throughout the body, especially in the aorta, but also functions as a key factor in aortic dissection through the ECM pathway.
One of the main characteristics of STAD is the lack of familial aggregation and symptoms of connective tissue disorder, which results in the inability to detect the disease until the rupture of the aortic wall takes place. We noted that the onset age of TAD in our study population was much younger than that reported in the International Registry of Acute Aortic Dissection (IRAD), regardless of the type of TAD, but was consistent with Sino-RAD analysis. For example, the mean age at diagnosis of type A dissection was 50.72±10.35 years, which was similar to that reported in Sino-RAD (50.5±11.2 years) but lower than that reported in IRAD (61.1±14.1 years) [
33,
34]. The cause of this discrepancy remains unclear, but it might be related to racial genetic differences, insufficient disease awareness, and poorly controlled hypertension [
35].
Although cases of STAD continue to be reported, the genetic predisposition of STAD remains unclear. In a previous study, GWAS was performed on 765 STAD patients, and variants identified at the 15q21.1 locus, where
FBN1 is located, were shown to be significantly associated with STAD [
36]. However, the result from GWAS elucidated only a small proportion of missing heritability for many complex diseases and was limited to detecting rare variants associated with the disease [
37]. In our study,
FBN1 is a potential gene causing STAD (
P = 0.008). Among the seven patients who carried the HTAD-related variants, five of them carried
FBN1 mutations and were excluded in the association study. Thus, it did not reach the power of significance (
P = 2.49 × 10
−3).
Type III collagen, encoded by the
COL3A1 gene, is a major structural component of the aorta vessel wall and consists of a signal peptide and four domains, namely, vWF-C, nonhelical (N-) terminal, triple-helical, and COLFI domains [
38]. Variants in the third domain, particularly glycine substitutions, usually result in vascular EDS (known as EDS type IV), with clinical manifestations of translucent skin, extensive bruising, and rupture of the uterus or arteries [
39,
40]. In our study design, we excluded patients who had definite or probable inherited connective tissue disorders. Thus, although we detected 14 patients carrying
COL3A1 damaging variants, none of them displayed characteristic facial features (e.g., prominent cheekbones and sunken cheeks), positive family history, abnormally thin and pale skin, or other EDS IV clinical features except aortic dissection. Additionally, the
COL3A1 damaging variants identified in the present study have not been reported to be pathogenic or likely pathogenic to EDS IV according to ACMG, Clinvar, and HGMD database, which could explain the absence of distinct facial features in the patients, even though they all presented with aortic dilatation.
We had found that potential pathogenic variants in
COL3A1 were associated with STAD, indicating that
COL3A1 might play an essential role in the pathogenesis of aortic dissection. Thus, further bioinformatics analysis was performed to investigate its function in TAD. The
COL3A1 gene was overexpressed in TAD patients unexpectedly, which might be related to have mutations in other genes and have upregulated
COL3A1 to compensate. In addition, the GSEA result showed that the
COL3A1 gene is associated with the ECM signaling pathway, which was in line with D'hondt’s work [
41].
While some significant results have been shown in our study, the following limitations cannot be omitted. First, although the ratio of case to controls was suitable for the present study, the relatively small sample size after variant filtering resulted in some genes being excluded from analysis, including TGFB3, BGN, and COL4A5. Second, the patients that we selected had no family history of TAD, but family information could further support our conclusion. Lastly, the health conditions of controls from the 1000 Genomes project, such as age, smoking status, and sex, remain unknown, which may mix potential patients who may in fact have early aortic disease.
In our study, we identified a profile of known HTAD genes in STAD and conducted a two-stage case–control study to verify whether variants in COL3A1 could increase the risk of STAD. In addition, based on the GEO and GTEx public database, COL3A1 overexpression may contribute to the pathogenesis of STAD through the ECM signaling pathway in dissected aorta. This result can expand our knowledge of the genetic basis and pathogenesis of STAD, which may further help in providing better genetic counseling to suspected or high-risk patients.