1 Introduction
IgA nephropathy (IgAN) is a common primary glomerulonephritis throughout the world. It is characterized by the deposition of IgA in the mesangial area of glomeruli and various histopathological lesions, including mesangial cell proliferation and accumulation of extracellular matrix [
1]. IgAN shows a slow but persistent clinical course, and approximately 25%–50% of IgAN patients will develop into end-stage renal disease (ESRD) within 20 years after initial diagnosis [
1]. The clinicopathologic patterns and the prognosis of IgAN show wide variability among patients. The gross hematuria often coincides with mucosal infections in either the upper respiratory tract or the gastrointestinal tract during the course of IgAN [
2]. IgAN has led to substantial healthcare burden, and the prevalence of biopsy-proven IgAN shows a marked difference in various regions. Asian populations have the highest prevalence of 40%–50% of primary glomerulonephritis, which in contrast to 20%–30% in Europeans, 10%–20% in United States, and < 5% in Africans [
1,
3,
4]. Although the precise pathogenesis of IgAN remains unclear, genetic and environmental factors are found to contribute to the susceptibility and disease progression of IgAN [
5].
To date, seven genome-wide association studies (GWASs) for IgAN have been conducted, which have reported dozens of susceptibility loci for IgAN [
5–
11]. Interestingly, most of these loci are either associated with inflammatory bowel disease (IBD) or involved in the maintenance of the intestinal epithelial barrier [
8,
11,
12]. Furthermore, loci associated with IgAN are significantly enriched in the KEGG pathways of intestinal immune network for IgA production and asthma, which suggests that IgAN, IBD, and asthma might share some common etiology [
8]. Indeed, the recently proposed concept of mucosa–kidney axis highlights an essential role of dysregulated mucosal immunity in IgAN [
13,
14]. Moreover, most IgA antibodies, especially galactose deficient IgA1, were produced by the mucosal immune system, including the upper airway and the gut [
15]. Notably, Nefecon, which is a novel enteric targeted-release formulation of budesonide (TRF-budesonide) directed to the mucosa-associated lymphoid tissue to modulate immune cell function and mucosa IgA production, has been applied as a specific treatment of IgAN. Consistent with the importance of intestinal mucosal immunity in the pathogenesis of IgAN, the NEFIGAN (NCT01738035) study has found that TRF-budesonide can reduce proteinuria and the risk of future progression to ESRD in IgAN patients [
16]. However, the identified genetic loci together only explain about 11% of the disease risk of IgAN [
11]. Increasing the sample sizes and genotyping density have been proven useful to identify novel susceptibility loci. Recent methodology advances in the integrative analysis; for instance, genetic analysis incorporating pleiotropy and annotation (GPA) [
17] is powerful and cost effective in identifying the susceptibility loci by combining information from existing GWAS data of multiple related traits and functional annotations.
Previous GWASs of IgAN were mostly based on Chinese and had relatively moderate sample sizes [
5–
10], in contrast to other large-scale GWASs in Europeans, such as IBD and asthma [
18,
19]. In this study, we genotyped 1000 IgAN cases and 1000 healthy controls using the Illumina Infinium Omni2.5-8 BeadChip and obtained individual genotyping data from published IgAN GWASs in Han Chinese. We imputed all samples with a large Asian whole-genome sequencing reference panel [
20] and performed a meta-analysis involving 3616 cases and 10 417 controls across 8 669 456 imputed variants after quality controls (QC). Based on the meta-analysis results, we performed enrichment analysis to identify the specific tissues and cell types relevant to the pathogenesis of IgAN. Finally, we applied the GPA method to discover additional IgAN susceptibility loci by leveraging pleiotropy shared with IBD and asthma, as well as genomic functional annotations from relevant cell types. By comprehensive GWAS and post-GWAS analyses, we aimed to advance the understanding of molecular mechanisms underlying the genetic predisposition of IgAN.
2 Materials and methods
2.1 Study subjects and genotyping data in previous studies
A large GWAS meta-analysis was performed for this study. We collected individual-level genotype data from two published IgAN GWASs of Han Chinese population [
9,
10]. For each dataset, we used PLINK [
21] to perform a series of standard QC: excluding monomorphic single nucleotide polymorphisms (SNPs) and SNPs on sex chromosomes, with missing rate > 0.05 or
P value for Hardy–Weinberg equilibrium (HWE) < 10
−6, and excluding samples with > 0.1 missing genotypes. Details for the two datasets are provided below.
(1) Dataset 1. We downloaded the dataset of Gharavi
et al.’s study [
10] from dbGaP (accession numbers: phs000431.v1.p1 and phs000431.v2.p1), which consisted of 1194 cases and 902 controls genotyped on the Illumina Human 610 Quad BeadChip. After lifting over to GRCh37, 486 809 autosomal SNPs were obtained. QC procedures removed one duplicated SNP and 346 SNPs failing in the HWE test, which resulted in 486 462 SNPs for further analyses.
(2) Dataset 2. The second dataset was from our previous study [
9], including 1434 cases and 10 661 controls genotyped on various Illumina GWAS arrays. We excluded 2098 controls genotyped on the Illumina OmniExpress array, given that the number of overlapping sites across all arrays would be substantially reduced if the OmniExpress array data were included. The number of overlapping autosomal SNPs for the remaining samples was 407 694. We excluded additional nine samples due to missing rate > 0.1 and 564 SNPs failing in HWE tests, which resulted in 1434 cases and 8554 controls genotyped on 407 130 SNPs.
2.2 Genotyped study subjects in the current study
We newly genotyped 1000 IgAN cases and 1000 controls using Illumina Infinium Omni2.5-8 BeadChip, which covered 2 368 775 SNPs across the genome after removing duplicated sites (Dataset 3). All IgAN patients were histopathologically diagnosed by kidney biopsy, and patients with secondary IgAN, including Henoch–Schonlein purpura, systemic lupus erythematosus, HIV infection, or other autoimmune diseases, were removed. Baseline demographic and clinical data for IgAN patients were collected at the time of diagnosis. After removing 15 samples due to > 0.1 missing rate, 10 249 SNPs due to > 0.05 missing rate, 3930 SNPs failing in the HWE test, 759 111 monomorphic SNPs, and 34 030 SNPs on sex chromosomes, the final dataset consisted of 995 cases and 990 controls genotyped on 1 561 455 SNPs.
This study was approved by the Institutional Review Board at the Guangdong Provincial People’s Hospital and the First Affiliated Hospital of Sun Yat-sen University according to the Declaration of Helsinki. Informed consent form was obtained from all study subjects.
2.3 Source of IBD and asthma summary statistics
GWAS summary statistics of asthma for 7 488 535 variants were from the GWAS of 14 085 cases and 76 768 controls of European ancestry from the UK Biobank [
19]. GWAS summary statistics of IBD for 11 555 663 variants were downloaded from the GWAS of 38 155 cases and 48 485 controls of European ancestry [
18].
2.4 Principal component and family relatedness analysis
We extracted 192 162 shared SNPs across the three post-QC IgAN datasets and merged the datasets. We then used PLINK [
21] to prune the SNPs by removing SNPs with linkage disequilibrium (LD) (
r2 > 0.2) within 1 Mb, which left 68 920 independent SNPs with MAF > 0.01 for relatedness check and principal component analysis (PCA). No cryptic relatedness was observed in the first two datasets because relatedness had been removed in the previous studies [
8,
10]. For the newly genotyped samples, we used PLINK to calculate pairwise identity-by-descent, and we randomly removed one individual from each pair with individuals with close relatedness (
). This step removed seven cases and 29 controls, which left 988 cases and 961 controls of Dataset 3. Finally, we performed PCA for the remaining unrelated 3616 cases and 10 417 controls from the three datasets [
21]. The top 10 PCs were visually inspected and used to adjust for population structure in the association analyses.
2.5 Genotype imputation
We performed genotype imputation using an Asian whole-genome sequencing reference panel, which consisted of 4441 Singapore Chinese, Malay, and Indian samples from the SG10K project [
20] and 998 East Asian and South Asian samples from the 1000 Genomes Project (1KGP) [
22]. Our previous study suggested that the combined reference panel of SG10K and 1KGP could improve the accuracy of imputation [
20]. We imputed each of the three genotyping datasets separately. After excluding SNPs that had different alleles from the imputation reference panel, we first phased each genotyping dataset with the imputation reference panel using EAGLE2 [
23]. Then, we performed imputation using Minimac4 [
24]. After removing variants with MAF < 0.005,
P value of HWE < 10
−6, and imputation
R2 < 0.3 using bcftools (version 1.10.2) [
25], 8 669 456 biallelic variants (including SNPs and INDELs) shared by three imputed datasets were retained for downstream analyses.
2.6 Association analysis
For each of the three imputed datasets, we performed single-variant association test between IgAN and the imputed dosage using the Wald test of logistic regression implemented in EPACTS. The top 10 PCs were included as covariates to adjust for the potential population structure. We then combined the results using fixed effect inverse-variance-weighted meta-analysis implemented in METAL [
26]. The LD scores were estimated based on reference genotypes of East Asian (EAS) samples in the 1KGP [
22]. SNPs in the major histocompatibility complex (MHC) region (i.e., chromosome 6, 25–35 Mb) were removed for LD score regression (LDSC). The meta-analysis summary statistics were adjusted for the intercept of LDSC to correct the potential uncorrected confounding induced by population stratification [
27]. FUMA [
28] was used to define the genomic risk loci by clumping and obtain functional information of the SNPs in these loci. We identified the independent significant SNPs that had a
P value < 5 × 10
−8 and were independent from each other at
r2 < 0.1. Except for the MHC region, the genomic risk loci were defined by identifying physical regions in LD with these independent significant SNPs in 1 Mb region.
2.7 Identification of cell types relevant to specific traits
We first performed MAGMA gene-property analysis on the FUMA platform [
28] to identify tissue specificity of IgAN for identifying relevant cell types, which was based on the gene expression data from the GTEx project [
29]. The variant-level
P values were transformed into gene-level Z scores using MAGMA. The association of gene-level Z scores with gene expression in each tissue or cell type was tested to identify relevant tissues or cell types. A Bonferroni correction was applied to account for multiple testing. Then, the stratified LDSC was applied to evaluate if the heritability of a trait was enriched in genomic functional regions specific to particular cell types. We downloaded tracks of GenoSkyline-Plus annotations specific to particular cell types [
30], including 66 tracks used in the original GenoSkyline study [
30] and one additional track for the kidney. A binary annotation was created for each cell type, which indicated whether an SNP resided in the tracks of GenoSkyline-Plus annotations specific to particular cell types. We excluded the MHC region and included 53 baseline annotations as suggested in the stratified LDSC model [
31]. The coefficient
P value was selected as a measure of the association of the cell type with the trait. Specifically, the LD scores were calculated based on the genotypes of EAS sample from 1KGP [
22] for IgAN and European (EUR) samples from 1KGP [
22] for IBD and asthma. After identifying cell types relevant to IgAN by LDSC, we further explored the distribution of IgAN signals in different chromatin states [
32] of the relevant cell types. The enrichment was defined as the ratio of the proportion of significant variants in a certain chromatin state track and the proportion of all variants in the corresponding chromatin state track. The Fisher’s exact test for a 2 × 2 table was used for the enrichment test.
2.8 GPA analysis
We extracted 5 033 499 autosomal SNPs shared by the GWAS summary statistics of IBD [
18], asthma [
19], and IgAN. We first selected the independent (
r2 < 0.1) lead SNP in each locus showing suggestive association with any of the three traits (
P < 10
−5). Then, we selected the approximately LD independent SNPs by thinning SNPs to at least 2 kb apart based on the genotypes of EAS sample from 1KGP [
22], and 945 405 SNPs were left for GPA analysis. We incorporated GenoSkyline-Plus annotation tracks of PMA-I-stimulated CD4
+CD25
–IL17
+ Th17 cells as the relevant cell type for IgAN, IBD, and asthma in the GPA integrative analysis. The arguments of association pattern could be set as “1” and “*,” which indicate the phenotype of interest and non-interest. For pair-wise analyses between IgAN and IBD and between IgAN and asthma, the pleiotropic test for each variant was conducted by setting the pattern as “11.” However, the association test for IgAN was conducted by setting the pattern as “1*.” Thus, the loci identified by the association test included the pleiotropic loci for two traits and unique risk loci for IgAN. Notably, IBD and asthma GWAS summary statistics were based on European ancestry, while IgAN was from Han Chinese population. Thus, heterogeneity of genetic effect might exist. In addition, the summary statistics have been pruned to be approximately independent to represent the signals. Therefore, we sought to identify the risk regions instead of risk variants in the GPA integrative analysis. We identified the independent loci with variants passing the threshold of false discovery rate (FDR) 0.05 and independent from each other at
r2 < 0.1. The LD information was calculated using reference genotypes of EAS samples in 1KGP [
22].
2.9 Colocalization analyses
For the novel loci identified by GPA, we further performed colocalization analysis using coloc [
33] to detect whether a pair of diseases share a causal variant based on a window of 100 kb of the lead variants. A total of five hypotheses were assumed, including the H4 that a causal variant was shared. A pair of diseases were considered to be colocalized if the posterior probability of H4 (PP4) was larger than 0.7. The colocalization results were plotted using the Locus Compare R package [
34].
2.10 Analysis of clinical phenotypes and prognosis of IgAN
The top variants at the seven novel loci were selected to study their associations with the clinical phenotypes. The continuous variables were shown as median (interquartile range), and the category variables were expressed as number (percentage). For the genotype–phenotype association analysis, we used the logistic regression for binary and ordinal variables and linear regression for continuous variables. The Cox proportional hazard model was performed for the survival analysis. In addition, we calculated the polygenic risk score (PRS) of IgAN using the significant and independent variants at the loci identified by GWAS meta-analysis and GPA integrative analysis. The SNP information used in PRS is shown in Table S1. Then, we tested the associations between the clinical phenotypes and PRS of IgAN. All analysis was performed using R version 4.1.3.
2.11 eQTL and functional annotation analysis
The eQTL analysis was based on the GTEx website. The Functional Mapping and Annotation (FUMA [
28]) and Ensembl were also used to annotate the variants at the susceptibility loci. In addition, PolyPhen-2 and SIFT were used to investigate the potential effect of missense mutation on protein function. We also checked the functional annotation according to the CADD, RegulomeDB, and chromatin looping database.
3 Results
3.1 GWAS meta-analysis of IgAN in Han Chinese
We obtained Chinese GWAS data of IgAN from three sources: (1) published study by Gharavi
et al. [
10]; (2) our previous study by Li
et al. [
8]; and (3) newly genotyped samples in the present study. After QC (see details in “Materials and methods” section , Fig. S1), the sample sizes were 1194 cases and 902 controls in Dataset 1, 1434 cases and 8554 controls in Dataset 2, and 988 cases and 961 controls in Dataset 3, with a total of 3616 cases and 10 417 controls in the meta-analysis. PCA found no obvious population stratification between cases and controls (Fig. S2). GWAS analysis with the top 10 principal components was applied to each dataset separately (Fig. S3), followed by inverse variance weighted meta-analysis for the 8 669 456 shared variants (Fig.1) [
26]. The intercept of LDSC was 1.031 (standard error (SE) = 0.008, Fig. S4), which indicates that the population structure was well controlled and the inflation was mostly due to polygenicity [
27].
Fig.1 Manhattan plot of the IgAN GWAS meta-analysis in Han Chinese. P values were adjusted for the intercept of LDSC. The red dash line indicates P = 5 × 10−8, and the gray dash line indicates P = 10−5. Known IgAN loci are labeled in black, while novel loci are labeled in red. |
Full size|PPT slide
We identified six independent loci for IgAN (
Pmeta < 5 × 10
−8, LD
r2 < 0.1), including five reported loci:
HLA gene cluster at the MHC region,
DEFA at 8p23.1,
ITGAX-ITGAM at 16p11.2,
TNFSF13 at 17p13.1, and
HORMAD2 at 22q12.2 (Fig.1, Tab.1). Among non-MHC loci, we identified two novel missense variants at previously reported loci, which are rs2230429 (Pro517Arg,
Pmeta = 1.33 × 10
−8) at
ITGAX and rs11552708 (Gly67Arg,
Pmeta = 7.66 × 10
−7) at
TNFSF13 (Table S2). We also examined the
r2 with two previously reported variants (rs1150612 and rs3803800) using the HaploReg v4.2 database, which showed that the rs2230429 was in strong LD with the rs1150612 (
r2 = 0.93) while the rs11552708 had no LD with the rs3803800. In particular, rs2230429 was predicted to be deleterious by PolyPhen2 [
35] and SIFT [
36] (Table S2). Furthermore, we identified one novel locus at 4p14 with the lead SNP rs11736377 (
Pmeta = 4.17 × 10
−8) located 17 kb upstream of a long noncoding RNA (lncRNA)
LOC101060498 (Fig. S5). In our study, we also checked the previously reported loci of published IgAN GWASs [
5–
11]. We found that 58 of 60 variants were detected in our data and showed consistent effect direction, of which 52 had
P < 0.05 (Table S3). These results suggest that our GWAS is consistent with those in previous studies.
Tab.1 Significant loci associated with IgAN by GWAS meta-analysis |
Locus | Lead varianta | Dataset | AFb | OR (95% CI)c | P value |
4p14LOC101060498 | rs11736377 | Dataset 1 | 0.617 | 0.88 (0.76, 1.00) | 5.49 × 10−2 |
chr4:40301264 | Dataset 2 | 0.593 | 0.77 (0.71, 0.84) | 4.58 × 10−9 |
T/C | Dataset 3 | 0.601 | 0.94 (0.83, 1.07) | 3.64 × 10−1 |
Intergenic | Meta | | 0.83 (0.78, 0.89) | 4.17 × 10−8 |
MHC | rs9270599 | Dataset 1 | 0.554 | 1.41 (1.24, 1.60) | 1.78 × 10−7 |
chr6:32561656 | Dataset 2 | 0.572 | 1.78 (1.56, 2.02) | 1.42 × 10−18 |
A/G | Dataset 3 | 0.631 | 1.47 (1.29, 1.69) | 2.47 × 10−8 |
Upstream | Meta | | 1.55 (1.43, 1.67) | 3.94 × 10−28 |
8p23.1DEFA | rs11137085 | Dataset 1 | 0.543 | 1.35 (1.18, 1.54) | 1.17 × 10−5 |
chr8:6877468 | Dataset 2 | 0.504 | 1.30 (1.19, 1.41) | 6.14 × 10−9 |
C/G | Dataset 3 | 0.505 | 1.39 (1.22, 1.59) | 1.13 × 10−6 |
Upstream | Meta | | 1.33 (1.25, 1.42) | 3.00 × 10−17 |
16p11.2ITGAX-ITGAM | rs13332545 | Dataset 1 | 0.226 | 0.83 (0.72, 0.96) | 1.44 × 10−2 |
chr16:31377390 | Dataset 2 | 0.293 | 0.80 (0.72, 0.89) | 6.26 × 10−5 |
C/T | Dataset 3 | 0.317 | 0.78 (0.68, 0.89) | 3.45 × 10−4 |
Intron | Meta | | 0.80 (0.75, 0.86) | 1.29 × 10−8 |
17p13.1TNFSF13 | rs11078696 | Dataset 1 | 0.680 | 0.84 (0.72, 0.97) | 1.93 × 10−2 |
chr17:7459299 | Dataset 2 | 0.641 | 0.82 (0.74, 0.90) | 2.17 × 10−5 |
T/G | Dataset 3 | 0.632 | 0.72 (0.63, 0.83) | 6.78 × 10−6 |
Intron | Meta | | 0.80 (0.75, 0.86) | 4.63 × 10−10 |
22q12.2HORMAD2 | rs12537 | Dataset 1 | 0.230 | 0.73 (0.63, 0.85) | 3.09 × 10−5 |
chr22:30423460 | Dataset 2 | 0.177 | 0.72 (0.64, 0.81) | 5.94 × 10−8 |
T/C | Dataset 3 | 0.168 | 0.79 (0.66, 0.94) | 6.51 × 10−3 |
3'UTR | Meta | | 0.74 (0.68, 0.80) | 1.37 × 10−12 |
3.2 Identification of relevant tissues and cell types for IgAN
Based on the GWAS summary statistics, we conducted MAGMA gene-property analysis to identify relevant tissues for IgAN [
28]. Compared with gene expression profiles specific to 54 different tissues from the GTEx project [
37], IgAN GWAS signals showed significant enrichment (
P = 5.46 × 10
−6) in genes expressed in the whole blood, rather than the kidney-related tissues (e.g., kidney cortex and kidney medulla,
P > 0.05) (Fig.2). Next, we conducted stratified LDSC to pinpoint specific cell types relevant to IgAN, IBD, and asthma. Among GenoSkyline-Plus annotation tracks specific to 67 different cell types [
30], PMA-I-stimulated CD4
+CD25
–IL17
+ Th17 cells were the only significant cell type for IgAN after multiple test correction (
P = 7.05 × 10
−5) and ranked the third for IBD (
P = 3.26 × 10
−4) and asthma (
P = 8.09 × 10
−5) (Fig.2). While most of the IgAN-associated variants were intergenic or intronic, they were significantly enriched in regions with regulatory function, including the flanking active transcription start site (TssAFlnk), enhancers (Enh), active transcription start site (TssA), and weak repressed PolyComb (ReprPCWk), among 14 chromatin states in PMA-I-stimulated CD4
+CD25
–IL17
+ Th17 cells (
P < 0.05/14; Fig. S6). Furthermore, among 28 baseline annotations in the stratified LDSC, the heritability of IgAN was significantly enriched in super enhancer and H3K27ac-annotated regions (
P < 0.05/28; Fig. S7).
Fig.2 Identification of tissues and cell types relevant to specific traits. (A) Relevant tissues of IgAN identified by MAGMA. The x-axis indicates 54 tissues from the GTEx Project, and the y-axis represents –log10(P values) of enrichment test based on tissue-specific gene expression profiles. (B) Relevant cell types of IgAN, IBD, and asthma identified by the stratified LDSC. The x-axis indicates –log10(P values) of the heritability enrichment test, and the y-axis indicates 67 cell types from the GenoSkyline-Plus annotation tracks. Dashed lines indicate the Bonferroni-corrected significance cutoffs (A, 0.05/54 = 0.00093; B, 0.05/67 = 0.00075). |
Full size|PPT slide
3.3 Integrative analysis of IgAN, IBD, and asthma
The GPA method can boost the statistical power to identify IgAN signals by integrating pleiotropic information from GWAS of relevant traits and functional annotation from relevant cell types [
17]. We performed GPA analyses for IgAN and IBD, as well as IgAN and asthma, both with functional annotation from the PMA-I-stimulated CD4
+CD25
–IL17
+ Th17 cells. In the joint analysis with IBD, we identified eight pleiotropic loci associated with IgAN and IBD, including two novel loci, namely,
PNKD at 2q35 (rs1473901, false discovery rate FDR
11 = 0.013, where the subscript 11 denotes pleiotropic association with both diseases) and
UBE2L3 at 22q11.21 (rs1811069, FDR
11 = 0.043) (Fig.3). Notably, six IgAN GWAS loci, including the newly reported loci
ZMIZ1 in the recent IgAN GWAS [
11], showed significant pleiotropic effects with IBD. Furthermore, two more loci showed significant association specific to IgAN, including
TXNL4B at 16q22.2 (rs77303550, FDR
1* = 0.045, where the subscript 1* denotes association with IgAN) and
LINC00113 at 21q21.3 (rs1811069, FDR
1* = 0.032) (Fig.3).
Fig.3 IgAN risk loci identified by GPA analysis. (A) Manhattan plot of the pleiotropy test between IgAN and IBD. (B) Manhattan plot of the IgAN association test in the joint analysis of IgAN and IBD. (C) Manhattan plot of the pleiotropy test between IgAN and asthma. (D) Manhattan plot of the IgAN association test by the joint analysis of IgAN and asthma. The red dash line indicates the genome-wide significance level at FDR = 0.05. Gene names were labeled in black if they were significant in our IgAN meta-analysis or previous studies, in red if they were novel loci identified by the pleiotropic test, and in orange if they were novel IgAN specific loci. FDR11, false discovery rate of the pleiotropy test; FDR1*, false discovery rate of the association test. |
Full size|PPT slide
In the joint analysis with asthma, we identified three pleiotropic loci associated with IgAN and asthma (Fig.3), including one novel loci as SCAF8 at 6q25.2 (rs10872710, FDR11 = 0.036). In addition, we identified one novel locus specific to IgAN: TRAF3 at 14q32.32 (rs8004192, FDR1* = 0.028) (Fig.3).
In summary, we identified six novel loci for IgAN by integrative analysis with GWAS summary statistics of IBD and asthma (Tab.2). The original GWAS results at the six loci are presented in Fig.4. The three pleiotropic loci showed similar association peaks in both diseases (Fig.4–4C), and they were colocalized with PP4 > 0.7 (Fig. S8A–S8C). Specifically, PNKD at 2q35 and UBE2L3 at 22q11.21 were significant GWAS loci for IBD (Fig.4 and 4B). Although SCAF8 at 6q25.2 did not reach genome-wide significance in the original asthma GWAS, it displayed a noticeable regional peak (Fig.4). Finally, we observed regional peaks for IgAN, but these peaks were less obvious for IBD or asthma at the three novel loci specific to IgAN in the GPA analyses (Fig.4–4F). Moreover, no colocalized signal was observed (Fig. S8D–S8F).
Tab.2 Novel loci identified by GPA integrative analysis for IgAN |
Loci | Type | FDRa | Lead SNP | PIgAN | PIBD | PAsthma | Genes |
2q35 | IgAN and IBD | 0.013 | rs1473901 | 2.05 × 10−5 | 5.35 × 10−8 | 0.017 | PNKD |
22q11.21 | IgAN and IBD | 0.043 | rs1811069 | 1.68 × 10−4 | 6.21 × 10−10 | 0.331 | UBE2L3 |
6q25.2 | IgAN and asthma | 0.036 | rs10872710 | 3.32 × 10−5 | 0.655 | 1.33 × 10−4 | SCAF8 |
16q22.2 | IgAN | 0.045 | rs77303550 | 2.10 × 10−5 | 0.004 | 0.023 | TXNL4B |
21q21.3 | IgAN | 0.032 | rs80271593 | 8.60 × 10−6 | 0.007 | 0.418 | LINC00113 |
14q32.32 | IgAN | 0.028 | rs8004192 | 3.20 × 10−5 | 0.663 | 0.002 | TRAF3 |
Fig.4 Region plots of the original GWAS association P values for the six novel risk loci identified by GPA analysis. (A) Pleiotropic locus at 2q35 for IgAN and IBD. (B) Pleiotropic locus at 22q11.21 for IgAN and IBD. (C) Pleiotropic locus at 6q25.2 for IgAN and asthma. (D) IgAN specific locus at 16q22.2 by joint analysis of IgAN and IBD. (E) IgAN specific locus at 21q21.3 by joint analysis of IgAN and IBD. (F) IgAN specific loci at 14q32.32 by joint analysis of IgAN and asthma. At each locus, the lead variant in GPA analysis is indicated by the purple diamond, while neighboring variants are colored by their LD with the lead variant based on East Asians in the 1000 Genomes Project (color bar shown in panel A). |
Full size|PPT slide
3.4 Association with clinical characteristics and prognosis of IgAN patients
We collected the clinical and pathological characteristics of 2418 IgAN patients in Datasets 2 and 3 (Table S4). Then, we investigated the associations of clinical phenotypes with the lead variants of the seven novel loci and the PRS. We observed that the protective allele C of rs1473901 at PNKD was associated with lower level of serum IgA (P = 0.017), and the protective allele T of rs77303550 at TXNL4B was correlated with milder proteinuria (P = 0.043) (Table S5). In addition, the PRS was found to be closely related with the increased proteinuria (P = 0.012), lower eGFR (P = 0.031), decreased C3 levels (P = 0.029), severe mesangial hypercellularity (P = 5.2 × 10−5), and segmental glomerulosclerosis (P = 0.039), respectively (Table S6). Follow-up data were available for 582 IgAN patients with a median follow-up time of 25.6 months, among whom 73 reached the endpoint of ESRD or doubled the serum creatinine after diagnosis. We applied Cox regression with adjustment for age, gender, hypertension, proteinuria, and eGFR (Tables S7 and S8). We found the correlation of the PRS with prognosis of IgAN (P = 0.011). No SNPs were associated with the prognosis of IgAN.
3.5 Functional investigation of the novel loci and variants
The seven novel loci identified by the present study included five protein coding genes and two long noncoding RNA genes, of which three lead variants were intergenic and four were intronic (Tab.3). The lead variants at four novel loci are expression quantitative trait loci reported by the GTEx Project (Table S9). We also checked the functional annotation by using the CADD, RegulomeDB, and chromatin looping database (Table S10). Moreover, the variants that are highly correlated (r2 > 0.7) with the seven lead SNPs and nominally significant (P < 0.05) at each novel locus were annotated based on the FUMA platform (Table S11). For example, rs1811069 at 22q11 is associated with the expression of UBE2L3 in the whole blood, mucosa, and skin, while rs8004192 at 14q32 is associated with the expression of TRAF3 in the mucosa of esophagus, brain, and thyroid. Notably, UBE2L3 has been reported to associate with several immune diseases and involve in immune response pathways of which alterations can trigger autoimmune diseases. TRAF3 plays an important role in T cell development and T cell dependent immune responses (Tab.3).
Tab.3 Functional annotations of novel loci and associated SNPs |
SNP/Locus | eQTLa | Associated diseasesb | Gene function |
rs11736377, 4p14, intergenic variant close to LOC101060498 | No evidence for effects on gene expression levels | Vitiligo, Graves’ disease, psoriasis, and autoimmune traits | LOC101060498 is chromatin interacted with the adjacent immune-related gene RHOH, which is important for the development, migration, and signaling of T cells [40]. The involvement of RHOH in autoimmune-related diseases might be due to its ability to regulate T helper cell-induced cytokine production, especially in the Th17-cell differentiation [40] |
rs1473901, 2q35, intron variant of PNKD | Associated with expression level of PNKD in blood, thyroid, skin, spleen, and liver | Inflammatory bowel disease, hematological traits (i.e., granulocyte count) | PNKD plays a role in regulation of myofibrillogenesis. The dysfunctional PNKD can reduce glutathione levels in cells and result in increasing oxidative stress levels, which has been linked to inflammation [69] |
rs10872710, 6q25.2, intron variant of SCAF8 | No evidence for effects on gene expression levels | Estimated glomerular filtration rate and diabetic nephropathy | SCAF8 is the anti-terminator protein required to prevent early mRNA termination during transcription. SCAF8 can suppress the use of early, alternative poly(A) sites, which prevents the accumulation of nonfunctional truncated proteins. SCAF8 also acts as a positive regulator of transcript elongation [70] |
rs1811069, 22q11.21, intergenic variant close to UBE2L3 | Associated with expression level of UBE2L3 in blood, mucosa, and muscularis of esophagus and skin | Inflammatory bowel disease, hepatitis B, systemic lupus erythematosus, and rheumatoid arthritis | UBE2L3 encodes a member of the E2 ubiquitin-conjugating enzyme family involved in ubiquitin/proteasome-dependent degradation. The ubiquitin/proteasome pathway is involved in the development of multiple kidney diseases [53] |
rs8004192, 14q32.32, intron variant of TRAF3 | Associated with expression level of TRAF3 in mucosa of esophagus, brain, and thyroid | Hematological traits, serum IgA level, systemic lupus erythematosus, multiple sclerosis, eczema, and asthma | TRAF3 encoding a member of the TNF receptor associated factor protein family participates in the signal transduction of CD40 and plays a role in T cell dependent immune responses and the regulation of antiviral responses [61]. TRAF3 is required for normal signaling by the T cell antigen receptor, and cytoplasmic TRAF3 restrains nuclear factor κB activation in T and B cells [71] |
rs77303550, 16q22.2, intron variant of TXNL4B | Associated with expression level of TXNL4B in cultured fibroblasts | Cardiovascular disease and type II diabetes mellitus | TXNL4B plays an essential role in pre-mRNA splicing and is required in cell cycle progression for S/G(2) transition [72] |
rs80271593, 21q21.3, intergenic variant close to LINC00113 | No evidence for effects on gene expression levels | Chronic renal failure | LINC00113 promotes proliferation, survival, and migration by activating PI3K/Akt/mTOR signaling pathway in atherosclerosis [73]. It is downregulated in clear cell renal cell carcinoma samples compared with normal samples [74] |
4 Discussion
We have conducted a large genome-wide meta-analysis of IgAN in Han Chinese, involving 3616 cases and 10 417 controls. Followed by integrative analysis with GWAS summary statistics of IBD and asthma, we have discovered seven novel loci associated with IgAN, including one at 4p14 identified by meta-analysis and six by the GPA integrative analyses. Moreover, we have pinpointed the PMA-I-stimulated CD4+CD25–IL17+ Th17 cells as the relevant cell type for IgAN. Our results imply that mucosal immunity and inflammation, especially T cell immune response and IL-17 signal pathway, play important roles in the pathogenesis of IgAN.
In the GWAS meta-analysis, we identified a novel locus at 4p14 with the lead SNP rs11736377 residing at 17 kb upstream of the lncRNA gene
LOC101060498. Epigenomic annotations show that rs11736377 is in the active enhancer region of PMA-I-stimulated CD4
+CD25
–IL17
+ Th17 cells [
32], which suggests that rs11736377 may regulate the expression of genes nearby. While the function of
LOC101060498 has not been fully studied in different tissues and cell types, recent three-dimensional genomic studies suggest that immune-related genes can interact with lncRNAs and form immune gene-priming lncRNAs for robust transcription [
38]. Specifically, the Hi-C data of CD4 T cells [
39] show that
LOC101060498 is chromatin interacted with the adjacent immune-related gene
RHOH, which is involved in the T cell receptor-mediated signal transduction and plays an important role in T cell development and activation [
40]. In addition, the lack of RhoH can stimulate T cell differentiation into Th17 cells in psoriasis-like chronic dermatitis [
41]. RhoH has also been implicated in several other autoimmune diseases, such as psoriasis and systemic lupus erythematosus, by regulating T helper cell-induced cytokine production [
40].
Remarkably, we found that GWAS signals of IgAN were significantly enriched in the functional regions of whole blood and PMA-I-stimulated CD4
+CD25
–IL17
+ Th17 cells. Interestingly, a recent GWAS study of IgAN also identified that PMA-I-stimulated primary T helper cells were one of the top enriched cell types in IgAN, which is consistent with our results [
11]. While recent studies have highlighted the role of B cells and complement activation in IgAN, the essential role of T cells in the pathogenesis of IgAN has gained increasing attentions [
42]. Th17 cells, which are characterized by the production of several cytokines (IL-17A, IL-17F, IL-22, and GM-CSF) and high CCR6 expression, played an important role in the early response to bacterial pathogens and in autoimmune diseases, such as nephritis and asthma [
42,
43]. As previously reported, the proportion of Th17 cell and the serum level of IL-17A were significantly higher in IgAN patients, and they were also associated with 24h proteinuria [
44–
46]. Furthermore, expression of IL-17A was found in renal tubule of IgAN patients with decreased renal function, severe proteinuria, and serious tubulointerstitial injury [
47]. In addition, IL-17 could downregulate the mRNA expression of C1GALT1 and C1GALT1C1, which led to increased production of galactose deficient IgA1 [
48]. Overall, Th17 cells and the serum level of IL-17 may be the aggravating factors in IgAN.
In addition to IgAN, cell type enrichment analysis also suggested that PMA-I-stimulated CD4
+CD25
–IL17
+ Th17 cells were relevant to IBD and asthma. In IBD, Th17 cells are proinflammatory by secreting cytokines, recruitment of neutrophils, or transformation into Th1 cells [
49]. In asthma, Th17 cytokines, such as IL-17A and IL-22, can promote neutrophil recruitment in the airways and are involved in airway remodeling and hyperresponsiveness via binding to IL-17RA and IL-17RC in airway smooth muscle cells [
50]. Drugs have been developed to treat IBD and asthma by targeting Th17 cells. For example, ustekinumab targeting Th17 differentiation has been used to treat IBD, and secukinumab targeting IL-17A is in phase II clinical trials for treating asthma (NCT01478360). Consistent with our results, a recent study has reported that abnormal humoral immunity mediated by Th17 cells may be a pathogenic mechanism shared by IgAN and IBD [
51]. Given the potential shared immunopathogenesis related to Th17 cells, treatment targeting the imbalance of the T cells compartment may be desirable for IgAN.
Based on a limited number of significant IgAN GWAS loci, previous studies have reported shared pleiotropic loci between IgAN and many other complex traits and diseases [
8]. By contrast, we performed GPA pleiotropy analysis based on GWAS summary statistics across the whole genome. Three novel loci (i.e.,
PNKD,
UBE2L3, and
SCAF8) of 11 loci showing pleiotropic effect between IgAN and IBD and between IgAN and asthma were identified by pleiotropic analyses. Notably, two pleiotropic loci between IgAN and asthma, namely, S
CAF8 and
OVOL1, were not associated with asthma in the GWAS we used. However, they have been reported to significantly associate with asthma in a recent study with a larger sample size [
52], which supports the validity and effectiveness of the integrative analysis approach we used. Among the pleiotropic loci,
UBE2L3 and
ZMIZ1 have pleiotropic effects on IgAN and IBD.
UBE2L3 encodes a member of the E2 ubiquitin-conjugating enzyme family, which is involved in ubiquitin/proteasome-dependent degradation. The ubiquitin/proteasome pathway has been implicated in the development of multiple kidney diseases [
53]. Proteasome inhibitors, such as bortezomib, have been efficacious in some renal disorders [
54–
56]. Moreover, elevated expression and activity of UBE2L3 might trigger autoimmune diseases by altering immune response pathways, such as the nuclear transcription factor κB signaling pathway, GSK3b/p65 signaling pathway, and p53 signaling pathway [
53].
ZMIZ1 encodes a member of the PIAS (protein inhibitor of activated STAT) family of proteins, which play critical roles in the immune system by influencing T cell development and regulating TGF-β/Smad pathway [
57]. In fact, genetic variants in
UBE2L3 and
ZMIZ1 have been reported to associate with several immune or inflammatory disease [
58–
60]. The GWAS of IgAN by Kiryluk
et al. has also reported
ZMIZ1 and
OVOL1 as the novel loci of IgAN, which provided evidence to support our results [
11]. These pleiotropic analysis results, together with the cell-type specific analysis, highlight the involvement of IL-17 signal pathway and T cell immunity in the pathophysiology of IgAN.
Besides pleiotropic loci, our integrative analyses identified three other novel loci at 14q32.32, 16q22.2, and 21q21.3 with specific associations with IgAN, potentially due to gain in statistical power by incorporating functional annotations. Notably,
TRAF3 at 14q32.32 encodes a TNF receptor-associated factor that participates in the signal transduction of CD40 and in the regulation of T cell dependent immune responses and antiviral responses [
61]. Genetic variants in
TRAF3 have been reported to associate with hematological traits [
62], serum IgA level [
63], and asthma [
64]. The two other loci, namely,
TXNL4B at 16q22.2 and
LINC00113 at 21q21.3, are not directly related to immune functions. By contrast, they have been reported to associate with cardiovascular disease [
65], type 2 diabetes [
66], and chronic renal failure [
67].
In conclusion, through comprehensive GWAS meta-analysis and post-GWAS enrichment and integrative analysis, we have pinpointed PMA-I-stimulated CD4
+CD25
–IL17
+ Th17 cells as the most relevant cell type for IgAN and identified seven novel and biologically relevant loci in association with IgAN (Tab.3). However, some limitations of our study should be considered. First, we did not conduct replication analysis or experimental validation of the novel loci, despite their important roles in T cell immune response and inflammation. In the future, we will conduct further validation through another large independent population. Second, the GWAS summary statistics of IBD and asthma were based on European populations while the IgAN GWASs were based on Chinese population. Although using GWASs from the same population is ideal for GPA analysis, we suggest that most of the underlying causal variants are shared across populations, as demonstrated by several recent multi-ethnic GWASs of complex diseases [
68]. Third, the identification analysis of cell types relevant to specific traits may be restricted by the functional annotations of available tissues and cell types, although multiple immune cell types have been explored in our study. Moreover, we have pruned the variants to be approximately independent in the GPA analysis to minimize the effect of LD difference across populations. Despite these limitations, our findings highlight the involvement of T cell immunity, especially Th17 cells, in the development of IgAN. Our results also provide critical information for future functional studies to advance understanding of the etiology of IgAN.
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}