1 Introduction
Pelvic organ prolapse (POP), characterized by the descent of female pelvic organs (vagina, uterus, bladder, and rectum) into the vaginal cavity, significantly impacts women’s quality of life [
1]. Although POP can affect women of all ages, its prevalence increases with advancing age. It affects approximately 40% of postmenopausal women [
2] and is the main indication for hysterectomy in these women, accounting for one-sixth of the total number of hysterectomy surgeries in all age groups [
3]. Up to 19% of women are at risk of undergoing POP surgery during their lifetime [
4,
5].
Many risk factors for POP have been identified, including age, body mass index (BMI) [
6], parity, and vaginal delivery [
1]. Genetic predisposition is also a significant factor, contributing to approximately 43% of POP variance [
7]. Several genome-wide association studies (GWASs) have pinpointed loci associated with POP [
8–
11]. A recent large-scale GWAS, covering 28 086 cases and 546 291 controls of European ancestry, reported independent lead signals and unveiled the first polygenic risk score (PRS) for POP [
2]. PRS, a calculated metric that quantifies an individual’s genetic predisposition to a specific disease by summing the effects of disease-related genetic variants, has been prevalently utilized as a tool in risk prediction and stratification [
12]. However, such research has been predominantly confined to European ancestries, thereby leaving a gap in Asian polygenic studies on POP. Additionally, clinical models integrating genetic and nongenetic factors to predict the risk of developing POP comprehensively are lacking.
In this study, we developed a PRS for POP in Chinese women based on well-established GWAS summary statistics to assess its transferability and effectiveness in risk stratification when combined with clinical risk factors (age, BMI, and parity). A clinical model was built for the quick assessment of POP risk by integrating PRS with nongenetic features, thereby enhancing its practicality for clinical use.
2 Materials and methods
2.1 Patient recruitment
We initially recruited women on the basis of our previous population-based, cross-sectional, multistage, and multicenter epidemiological survey from February 2014 to March 2016 [
13]. This survey encompassed more than 54 000 women from six major geographical regions of the Chinese mainland (North-East, North, East, South Central, North-West, and South-West).
For the discovery cohort, we identified and recruited 576 cases of severe POP from the national survey. These cases were diagnosed with stage III or IV POP by the Pelvic Organ Prolapse-Quantification (POP-Q) system. The controls were selected based on a diagnosis of POP-Q stage 0 and had no history of POP surgery. We finally recruited 623 controls from the Physical Examination Center.
For the external validation cohort, we recruited an independent group of 264 women diagnosed with POP and 200 healthy women from the Chinese mainland from May 2015 to October 2017. The diagnosis criteria were the same as above.
Ethical approval was acquired from the Research Ethical Committee (S-689 and JS-1744) of Peking Union Medical College Hopsital. All participants have signed the informed consent before the collection of their personal information and samples. Demographic characteristics, including race, age, height, weight, residence, parity, and general medical history, were recorded for all women.
2.2 Exclusion criteria
We excluded pregnant women, women who had undergone treatment for pelvic floor disorders, and women diagnosed with gynecological malignancy. We also excluded women with well-known connective tissue disorders that could predispose to tissue weakness, e.g., Marfan syndrome, Ehlers-Danlos syndrome, Steinert syndrome, and rheumatoid arthritis. Women with neurological conditions potentially leading to incontinence, such as multiple sclerosis or stroke [
8], were also not included. We also considered that genetic variants differ between different race/ethnic groups. We aimed to maintain genetic homogeneity and minimize the effects of population stratification in our study; thus, we carefully selected participants to include only those from the majority ethnic group (i.e., all participants were of Han Chinese origin).
2.3 Genotyping, quality control, and imputation
DNA samples were extracted from peripheral blood. The discovery cohort and controls of the external validation cohort were genotyped with Affymetrix Axiom Genome-Wide CHB 1 Array Plate (Affymetrix, San Diego, CA, USA), which was designed with the coverage of common alleles (Minor Allele Frequency (MAF) > 5%) for the Han Chinese population. Cases of external validation cohort were genotyped with a customized Illumina Infinium Asian Screening Array (Illumina, San Diego, CA, USA), which was built with an East Asian reference panel.
Quality control was conducted with PLINK (v1.90b6.24 64-bit, 6 Jun 2021). Variants were filtered with a minor allele frequency of < 0.01, genotype call rate of < 95%, or Hardy–Weinberg Equilibrium value of < 1 × 10−6. Imputation was performed using Minimac4 software at the Michigan Imputation Server, with 1000 Genomes Phase3 v5 (Grch37/hg19) and EAS (East Asian) being used as the reference panel and reference population respectively. The imputed variants with an r2-value of < 0.3 were filtered as recommended to remove the poorly imputed variants.
2.4 PRS calculation
We utilized the well-established GWAS mentioned above, which is also the largest GWAS meta-analysis for POP, as summary statistics to calculate a disease-specific PRS [
2]. We excluded single nucleotide polymorphisms (SNPs) with allele frequency less than 1% in the East Asian population (Table S1). Then, we constructed PRS using the rest of the SNPs, which were significantly associated with POP (
P < 5 × 10
−8), and their effect sizes [
2] with PRSice-2 software [
14] (v2.3.5). Principle component analyses were conducted separately in discovery and validation cohorts. Age and the top three principle components (PC1–3) were used as covariates to adjust PRS (Supplementary Material).
2.5 Clinical model building and validation
In our discovery cohort, we developed a clinical prediction model using linear regression. The model integrated key variables, including PRS, age, parity, and BMI, and its robustness was confirmed through 10-fold cross-validation. We applied a fast backward variable selection method to refine the model, thereby ensuring the inclusion of the most relevant factors. Then, the model’s external validity was assessed using our external validation cohort. The receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) were generated to evaluate the model’s predictive performance. Additionally, we produced calibration curves using the bootstrapping method to gauge accurately the model’s accuracy and discriminative ability. A nomogram was constructed and served as a practical tool for visualizing and quantifying individual risk assessments combined with these factors.
2.6 Statistical analysis
Statistical analysis and result visualization were performed by R version 4.0.3 (2020-10-10). Mann–Whitney U test was used to compare the distributions of the continuous variables between the case and control groups. The chi-square test was used to determine if a significant association exists between the categorical variables within case and control groups. The Wilcoxon rank sum test was employed for subsequent subgroup analysis. Two-tailed P values were reported, and P < 0 0.05 was defined as statistically significant.
3 Results
3.1 Sample characteristics
In the discovery cohort, we initially recruited 576 cases and 623 controls. After quality control, 523 cases and 512 controls remained (Fig. S1). The average age of the case group was 63.23 ± 11.50 years, with an average BMI of 24.34 ± 3.10 kg/m2. The median parity for this group was 2 ranging from 1 to 9. For the control group, the average age was 60.54 ± 8.64 years, and the average BMI was 24.01 ± 3.22 kg/m2, with a median parity of 1 ranging from 0 to 7. All participants in the discovery cohort were divided binarily according to age ( < 50 years old vs. ≥50 years old) and parity (≤1 vs. > 1) to understand further the relationship of PRS with classical clinical risk factors (age, BMI, and parity). For the BMI, we classified participants into four groups following the criteria from the Centers for Disease Control and Prevention: underweight ( < 18.5 kg/m2), normal (18.5–24.9 kg/m2), overweight (25–29.9 kg/m2), and obese (≥30 kg/m2). Demographic details of the discovery cohort and the external validation cohort are provided in Tab.1.
3.2 Variant selection
In constructing the PRS, we leveraged the genetic variants identified from the most comprehensive GWASs to date, encompassing three major studies (Icelandic and UKB cohorts, FinnGen study, and Estonian Biobankp (EstBB)) [
2]. This extensive analysis included 28 086 women diagnosed with POP and 546 291 female controls of European descent. In this study, 30 genetic variants showed genome-wide significant associations (
P < 5 × 10
−8) with POP. These variants are implicated in various biological processes and pathways, particularly those involving the extracellular matrix (ECM) and connective tissue molecular changes, growth, and development, in urogenital development and metabolic and cardiovascular health. According to the allele frequencies of the East Asian population recorded in the The Genome Aggregation Database (gnomAD) database (v3.1.2) and NyuWa database, five variants were excluded with allele frequencies lower than 1%. Among the rest of the loci, PRSice-2 software identified the “best-fit” 20 variants for POP PRS with a
P value threshold of 3.6 × 10
−8 (Supplementary Material). Detailed variant information is shown in Table S1.
3.3 Overall PRS prediction performance
Overall, the case group exhibited a significantly higher PRS than the control group (P = 6.9 × 10−4) (Fig.1). The distribution plot illustrated a noticeable separation in PRS between cases and controls. This observation indicates a divergent genetic risk profile (Fig.1). An ascending trend in the odds ratio of developing POP from the lowest to the highest decile was observed when the cohort was stratified into deciles based on PRS values. Individuals in the top PRS decile had an odds ratio of 1.78 (95% confidence interval (CI): 1.19–2.67), whereas those in the bottom decile had an odds ratio of 0.68 (95% CI: 0.46–1.00) (Fig.1). The top decile showed a 2.6-fold greater risk (P = 8.2 × 10−4) than the bottom decile. This gradation underscores the substantial role of PRS in delineating risk levels for POP and predicting individual susceptibility in clinical settings.
3.4 Subgroup analyses of the correlation between PRS and POP
We performed detailed subgroup analyses to explore further the correlation between PRS and POP in different clinical characteristics groups (age, BMI, and parity). Our findings indicate that females older than 50 years in the case group show a higher PRS than those in the control group (P = 9.6 × 10−4). Conversely, this association was not statistically significant in females younger than 50 years (P = 0.55) (Fig.2), indicating that the PRS may be related to a postmenopause onset of POP. For parity, we observed a noteworthy association between PRS and disease occurrence in individuals with one or no childbirths (P = 2.6 × 10−4). This finding suggested that individuals in the case group with few childbirths have a high PRS, thereby pointing to an elevated genetic predisposition to developing POP among these women. Despite having few physical injuries to the pelvic area typically associated with vaginal delivery, these individuals still exhibit a pronounced susceptibility to POP. This finding indicates a strong genetic influence (Fig.2). Our analysis did not reveal any significant PRS difference in the case and control groups across the BMI spectrum (Fig.2).
3.5 Integrated model combining PRS and clinical risk factors
In constructing the risk prediction model for women older than 50 years within the discovery cohort, we selected samples with full clinical information (
n = 764). We first employed a stepwise backward elimination approach to the preliminary fitted model and removed the least significant variable BMI (
P = 0.60) to ascertain the most pertinent factors influencing POP occurrence. Consequently, we determined that age (
P = 3.8 × 10
−3), parity (
P = 8.1 × 10
−15), and PRS (
P = 1.4 × 10
−3) remained significant independent clinical predictors for POP. Then, these identified factors (age, parity, and PRS) were incorporated into a generalized linear model. The model’s predictive performance was evaluated using 10-fold cross-validation, with an AUC score of 0.757 (95% CI: 0.723–0.792), indicating a good predictive ability (Fig.3). We further assessed the model’s applicability by applying it to our external validation cohort with full clinical information (
n = 329). The results indicated that the model demonstrated robust performance with an AUC of 0.717 (95% CI: 0.660–0.775). This finding showed the model’s reliability and effectiveness in a broad clinical context (Fig.3). As far as we know, this model has outperformed the existing risk prediction model combining PRS and clinical risk factors, in which the C-stat is 0.63 [
2].
We constructed a nomogram incorporating three predictive variables: age, parity, and PRS (Fig.3). This nomogram comprises six axes—points, age, parity, PRS, total points, and POP risk. The ‘points’ axis assigns a numerical value to each predictor variable. The values obtained for each predictor are summed to derive the ‘total points’, which corresponds to an estimated risk of POP development on the ‘POP risk’ axis. This risk reflects the individual’s probability of developing POP, thereby providing a user-friendly tool for clinicians to estimate POP risks on the basis of a combination of genetic and clinical factors. The discriminative capability of the integrated model was evaluated by calibration curves, in which the mean absolute error (or calibration error) refers to the average difference between the predicted probabilities and the observed frequencies. In the discovery cohort and external validation cohort, the calibration errors were 0.004 and 0.012 respectively. This observation demonstrated a commendable level of precision and predictive reliability of the nomogram (Fig.3 and 3E).
4 Discussion
POP represents a prevalent health concern for women worldwide. Thus, reliable methods are necessary for risk prediction and stratification of its development. In this study, we constructed the first PRS model of POP for the Chinese population. Based on this first model, we developed a comprehensive clinical prediction model that integrates genetic and nongenetic risk factors to enhance our understanding of POP susceptibility and provide a possible foundation for early intervention and personalized healthcare strategies for women with an elevated POP risk.
Genetic etiology can affect the structure and function of the connective tissues, muscles, and nerves of the pelvic floor, leading to POP development [
15,
16]. Several GWASs of POP have identified SNPs in genes associated with the components and metabolism of ECM and steroid hormone receptor gene expression. For instance, Allen-Brady
et al. [
8] investigated a cohort of 115 European-American women with a familial history of POP, pinpointing six loci with genome-wide significance. In another notable study, Olafsdottir
et al. [
10] utilized the data from Icelandic populations and the UK Biobank and identified eight POP-associated variants at seven loci, including
WNT4,
GDF7,
EFEMP1,
FAT4,
IMPDH1,
TBX5, and
SALL1. Furthermore, the most extensive GWAS to date, which included data from the aforementioned UK biobank study and additional two cohorts from the FinnGen and EstBB projects [
2], replicated eight previously identified variants and highlighted 22 novel variants. In this study, we calculated the PRS in the Chinese population based on these disease-associated SNPs. A primary concern when directly applying GWAS summary statistics from other ancestries is the potential for reduced transferability owing to differences in allele frequencies and linkage disequilibrium patterns among different ancestries [
17]. However, despite the absence of the GWAS summary statistics from the same ancestry, certain strategies can help improve the confidence in transferability across ancestries. First, the scale of the dataset is crucial. In a related study by Jung
et al. [
18], the results showed that GWAS summary statistics derived from sufficiently large datasets can greatly enhance the reliability of trans-ancestry PRS applications. Second, functional variants are likely to be transferable. Variants within functional regions tend to be evolutionarily conserved and have a direct impact on disease manifestation. Prioritizing such shared functional variants among ancestries can effectively refine the predictive accuracy of a PRS model. In our study, we tried to utilize the largest available GWAS datasets because of the lack of a specific POP GWAS study for the Asian population. Then, we selected SNPs that are predominantly associated with functional regions. Consequently, our results showed that this PRS model could effectively stratify the relative lifetime risk of developing POP in Chinese women, thereby suggesting the cross-ancestry transferability of this PRS model. This finding not only indicated the transferability of this PRS model from European to Asian populations but also expanded the potential global utility of our research.
Besides genetic factors, other clinical characteristics, such as advanced age, elevated BMI, and increased parity, are recognized as primary nongenetic risk factors. Our previous study also identified the interactions between genetic variants and nongenetic risk factors associated with POP severity [
19]. In particular, our current study’s subgroup analysis suggested that polygenic factors might have a pronounced association with POP onset after the age of 50, which coincides with the average onset of menopause for Chinese women [
20,
21]. The onset of menopause marks a substantial hormonal transition in a woman’s life and is characterized by the end of regular menstrual cycles and a gradual decline in estrogen production by the ovaries [
22]. Estrogen plays a crucial role in maintaining various aspects of female health, including bone density [
23], cardiovascular health [
24], and the overall functioning of the reproductive system. As women age past this pivotal threshold, the decline in estrogen’s protective effects, particularly in the presence of genetic predispositions, may exacerbate the risk of developing POP [
25]. This convergence of factors highlights the intricate interplay between hormonal changes and genetic influences, which potentially affect women’s life quality during and after menopause [
26].
Parity stands out as a crucial nongenetic factor influencing POP, with a reported adjusted risk ratio being as high as 10.85 [
27,
28]. The rationale behind this association lies in the physical stress and strain on the muscles and connective tissues of the pelvic floor during vaginal delivery. Unlike the subtle influence of genetic predispositions, the physical impact of childbirth is immediate and substantial, often leading to the stretching and weakening of supportive pelvic structures [
29,
30]. This outcome is significant in precipitating the development of POP. Our subgroup analysis revealed a noteworthy finding: in women with one or no childbirths, the affected individuals’ PRS was significantly higher than healthy individuals’ PRS. This finding suggested that genetic predisposition may play a pronounced role in POP among women with few childbirths. In other words, the genetic background can explain why some women still develop POP despite having given birth only once or not at all. Nevertheless, as parity increases, its impact becomes increasingly dominant. As a result, the genetic influences may be overshadowed, as indicated by PRS. Such findings underscore the independent but interconnected roles of parity and PRS in the development of POP, thereby highlighting the complex relationship between genetic predisposition and physical trauma in the pathogenesis of this condition.
This study still has some limitations. First, we relied exclusively on GWAS data from European populations. Thus, direct representation from Asian cohorts is lacking. As a result, the transferability and accuracy of our findings still warrant cautious interpretation even though the European population-based GWAS studies encompass a large sample size. Additionally, the sample size of our study is modest. Further validation in large datasets of diverse ancestries is essential to solidify our conclusions. Lastly, the nature of POP as a late-onset condition introduces the possibility of sample selection bias. Thus, a risk exists in which some control participants may develop POP later in life. This case applies particularly to those who are at a high genetic predisposition but have yet to exhibit symptoms at the time of the study. This scenario can potentially skew the perceived risk factors and their impact. Addressing these limitations in future research will be crucial for enhancing the understanding and prediction accuracy of POP risk across diverse populations.
In conclusion, our study utilized a large-scale GWAS to construct a PRS for POP in Chinese females. A comprehensive risk prediction model in combination with the classical risk factors was also developed. This model demonstrated good performance, thereby enabling clinicians to identify accurately the individuals who can significantly benefit from early and targeted lifestyle interventions and precise pharmacological treatments.