1. Introduction
Male infertility is a significant global factor affecting couples’ fertility, with its incidence on the rise due to changes in lifestyle and environmental influences. Approximately 10% to 20% of couples face infertility issues, with male factors accounting for up to 50% of these cases [
1]. With the aid of Assisted Reproductive Technology (ART), many infertile couples have been able to fulfill their desire to have children. Despite remarkable advancements in
in vitro fertilization (IVF) techniques, the rate of high-quality embryos remains a critical limiting factor, directly impacting the success rate of IVF and the health of the resulting infants. Therefore, exploring the factors that influence the rate of high-quality embryos is not only scientifically meaningful for understanding and improving male infertility but also holds significant clinical value for enhancing the success rates of assisted reproductive technologies.
In recent years, research has found that in addition to the quality of a woman’s eggs, the quality of a man’s sperm is also one of the key factors determining the rate of high-quality embryos [
2,
3]. Sperm DNA Fragmentation Index (DFI), as an important indicator for assessing the integrity of sperm DNA, has been found to be closely related to the rate of high-quality embryos [
4,
5]. However, due to the influence of experimental design and detection methods, its predictive value for pregnancy outcomes is limited [
5]. Ziouziou
et al. [
2] conducted a narrative review and meta-analysis of literature on sperm DNA fragmentation published over the past 5 years, indicating that an increase in sperm DFI has a negative impact on natural pregnancy and ART outcomes. DFI testing may be particularly important in the assessment of male infertility in cases of varicocele, idiopathic or unexplained infertility, recurrent miscarriage, or previous ART failure. Additionally, factors such as the male partner’s body weight [
6,
7] and mycoplasma infections [
8] may also affect the formation of high-quality embryos. Lifestyle factors such as smoking, alcohol consumption, drug use, environmental exposure, and genetic factors are all significant determinants of male fertility [
9,
10,
11].
Previous studies have covered multiple key factors of male infertility, such as lifestyle, environmental exposure, and genetic factors [
2,
3,
4,
5,
6,
7,
8,
9,
10,
11]. These studies all indicate that male infertility is a complex multifactorial issue that requires a comprehensive consideration of various potential influencing factors to develop effective prevention and treatment strategies. However, these studies also have some shortcomings. Firstly, most studies are observational, making it difficult to establish causal relationships. Secondly, there are differences in the design, samples, and methods of different studies, which limit the comparability and reproducibility of the results. Additionally, few studies have considered the interactive effects of multiple factors, which may limit our understanding of the complex pathophysiological mechanisms of male infertility. Currently, there is a lack of a comprehensive predictive model to assess the risk of male infertility, which restricts our ability to conduct personalized diagnosis and treatment in clinical practice. Therefore, this study aims to construct a comprehensive predictive model to assess the risk of male infertility by systematically analyzing multiple potential influencing factors and their interactive effects. This will help to more accurately identify high-risk populations and provide a scientific basis for personalized diagnosis and treatment, thereby improving the overall success rate of assisted reproductive technology. This not only helps to promote the reproductive health of patients but also provides new perspectives and research directions for related basic scientific research.
2. Material and Methods
2.1 Patient Selection
A retrospective analysis was conducted on the clinical data of 373 couples who underwent IVF treatment at the Reproductive Medicine Center of Affiliated Hospital of Nantong University from January 1, 2021, to June 30, 2024. The data were collected from medical records. Inclusion criteria: (1) Failure to conceive after 1 year or more of regular, unprotected intercourse with the same partner; (2) Male with normal or abnormal sperm parameters (according to World Health Organization [WHO], 2010 [
12]: concentration, motility, and morphology, exclusion of azoospermia) and no andrological history of concern (cryptorchidism, hypogonadotropic hypogonadism, genetic abnormalities such as Klinefelter’s syndrome or Y-chromosome microdeletion, drug abuse, cancer treatment, or other iatrogenic factors); Exclusion criteria: Factors affecting egg quality in women were excluded, such as age (selecting women under 38 years old), polycystic ovary syndrome, diminished ovarian reserve (selecting patients with Follicle-Stimulating Hormone [FSH]
10 mIU/mL, Anti-Mullerian Hormone [AMH]
0.12 ng/mL, Antral Follicle Count [AFC]
5 ng/mL), endometriosis, chromosomal abnormalities, and other factors affecting egg quality. The study protocol was approved by the Ethics Committee of Affiliated Hospital of Nantong University (2025-K037-01).
2.2 IVF/ICSI Procedures
We used controlled ovarian stimulation, using a gonadotropin-releasing hormone antagonist and recombinant follicle-stimulating hormone/human menopausal gonadotropin. Once the 3 dominant follicles reached a mean diameter of at least 17 mm, we injected 250 µg of recombinant human chorionic gonadotropin (hCG) version SJ20130091 (Ovidrel, Serono, Geneva, Switzerland) and retrieved oocytes using transvaginal ultrasound guidance 36 hours later. We then incubated the oocytes in G-IVF
TM version 10136 (Vitrolife, Gothenburg, Sweden) and performed IVF or intracytoplasmic sperm injection (ICSI) 4–6 hours after retrieval. Normal fertilization was determined by the presence of 2 pronuclei 16–20 hours after insemination. Embryonic development was monitored at 48 and 72 hours after retrieval and graded at 72 hours based on the number of cells, level of fragmentation, and cell size variation [
13]. High-quality embryos were defined as grade I and II and reserved for later embryo transfer. The study examined the number of 2 pronuclei (2PN) (which indicated normal fertilized zygote or embryo), identified the day after IVF. The number of cleaved embryos with 2 or more blastomeres, occurred 2 days after IVF. The embryo quality grading was determined 3 days (D3) after IVF and divided into good quality (grade 1 and 2) and poor quality (grade 3 and 4) [
14].
2.3 Semen Analysis and DNA Fragmentation Assay
On the day of oocyte retrieval, semen samples were collected by masturbation after 2 to 7 days of abstinence and each sample was incubated at 37 °C to liquefy for a maximum of 60 min. After the liquefaction, conventional semen analysis (sperm concentration and motility) was carried out according to the WHO guidelines , 2010 [
12] by using a Computer-Assisted Sperm Analyzer (WLJY-9000, Weili, Beijing, China). Semen samples for fertilization were treated with density-gradient centrifugation (DGC) version 10102 (SpermGrad, Vitrolife, Gothenburg, Sweden) and swim up. This discontinuous density gradient centrifugation consisted of 2 (90% and 45%) 1 mL layers of SpermGrad, and 2 mL of semen was deposited on the 45% layer. The gradient was then centrifuged at 300 g for 20 minutes. After centrifugation, the seminal plasma supernatant was discarded, and the sperm pellet was washed with 1 mL of embryo culture medium version 10136 (Vitrolife, Gothenburg, Sweden). After centrifugation for 10 minutes, the pellet was suspended with 1 mL culture medium for sperm to swim up.
All men underwent a routine sperm chromatin dispersion (SCD) test 1 month prior to IVF procedure. SCD test version 20152401259 (ShenZhen Huakang Co., Ltd., Shenzhen, Guangdong, China) was carried out according to the manufacturer’s instructions as described in our previous study [
15]. In brief, semen samples were diluted with phosphate-buffered saline to a concentration of 5–10
10
6/mL, the prepared spermatozoa were mixed with melted agarose, then pipetted onto pre-coated slides and covered with a coverslip (22 mm
22 mm). The slide was placed in the refrigerator at 4 ℃ for 4 min to allow the agarose to solidify with sperm cells embedded. The coverslip was gently removed and the slide was incubated in an acid solution for 7 min, and then in lysis buffer for 20 min. After washing the slide for 3 minutes with washing buffer, the slide was dehydrated in increasing concentrations of ethanol (70%, 90%, 100%, each for 2 min) and air-dried. After Wright-Giemsa staining, at least 200 spermatozoa per sample were scored under a 400 objective of the microscope for holes. Sperm with a small halo (similar to/smaller than a third of the minor diameter of the nucleus) or no halo was considered to have significant DNA fragmentation. DFI = sperms with DNA fragmentation/total sperms counted
100%.
2.4 Model Construction and Validation
The dataset was divided into 80% for training and 20% for testing, resulting in the construction of a predictive model for outcomes and a causal effect estimation model. The outcome prediction model utilized the Random Forest algorithm, which was selected after parameter optimization. The causal effect estimation model employed a double machine learning (DML) approach to analyze the rate of high-quality embryos. Model validation included assessments using Mean Square Error (MSE), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), as well as SHapley Additive exPlanations (SHAP, a unified framework for interpreting machine learning models) methods for analyzing feature importance and creating marginal causal effect plots.
2.5 Intervention Strategy Development
Based on the potential outcomes framework and causal effect estimation, personalized intervention strategies have been formulated. These strategies take into account modifiable male factors that significantly impact the rate of high-quality embryos in univariate logistic regression, such as pharmacological treatment for mycoplasma positivity, improving sperm quality, and weight reduction, all being evaluated in conjunction with clinical practice.
2.6 Statistical Methods, Software, and Tools
Statistical analysis was performed using the SPSS statistical package version 26.0 (IBM Corp., Armonk, NY, USA). Prior to the analysis, all continuous data underwent normality tests, including the Shapiro-Wilk test and the Kolmogorov-Smirnov test. Data that conform to a normal distribution were expressed as the mean plus or minus the standard deviation, while data that do not conform to a normal distribution were represented using quartiles. Inter-group comparisons of continuous data were performed using the Wilcoxon rank-sum test. For categorical data, the chi-square test was used without continuity correction, as all theoretical frequencies in our dataset were greater than 5. The significance level was set at = 0.05. Variable selection was carried out using univariate and multivariate logistic regression analysis. Model development was based on Python version 3.9 (Python Software Foundation, Wilmington, DE, USA), with the predictive model utilizing the scikit-learn package version 1.0.2 (scikit-learn community, Paris, France) and causal effect estimation employing the EconML package version 0.15.1 (Microsoft Research, Redmond, WA, USA).
3. Results
3.1 The Comparison of Male and Female Characteristics Between the 2 Groups
A total of 373 infertile couples were initially enrolled in this study. Participants were categorized into 2 groups based on their D3 high-quality embryo rate: a control group (high-quality embryo rate
45%, n = 106) and an observational group (high-quality embryo rate
45%, n = 267). The grouping criterion is established based on the expert consensus on key quality control indicators in human ART embryology laboratories in China, which is also consistent with Italian Society of Fertility and Sterility [
16]. Significant differences were observed between the 2 groups in terms of several characteristics, including female AFC, sperm DFI, male height, male weight and male mycoplasma levels. In male characteristics, sperm DFI and male weight were found to be lower in the observational group compared to the control group. In the group with a higher rate of high-quality embryos, the proportion of mycoplasma-negative individuals is higher (Table
1). This suggests that the presence of mycoplasma infection may have a negative impact on embryo quality, which is an important consideration in the field of assisted reproductive technology.
3.2 Factors Affecting the Rate of High-Quality Embryos
Univariate logistic regression analysis was first performed on the baseline data presented in Table
1 to identify potential factors associated with the high-quality embryo rate. Table
2 shows the results of factors with
p 0.5 in univariate logistic regression. It can be seen that female AFC, basal LH levels, sperm DFI, male height, male body weight and mycoplasma infection all have significant impacts on the high-quality embryo rate, and all are negatively correlated (Table
2). Taking the high-quality embryo rate as the dependent variable, and incorporating variables with
p 0.1 from the univariate analysis into the multivariate model, a bidirectional stepwise logistic regression was performed. The results revealed that factors significantly affecting the rate of high-quality embryos included mycoplasma infection, basal LH levels and male body weight (Table
3).
3.3 The Process of Formulating Intervention Strategies Based on Causal Effect Estimation
Based on the causal effect estimation, we have developed a personalized intervention strategy for male infertility patients to improve their high-quality embryo rates and determine the most effective intervention. This is a counterfactual prediction problem, where we are interested in knowing the potential increase in high-quality embryo rates if the patient receives a certain intervention. To achieve this, we constructed a predictive model for outcomes and a causal effect estimation model for counterfactual predictions.
First, we preprocessed the collected patient data, including imputing missing values of continuous variables with mean values and excluding outliers. Then, we divided the dataset into a training set (80%) and a test set (20%) for building and validating the outcome prediction model and the causal effect estimation model. We evaluated the predictive model using common regression model evaluation metrics such as MSE, RMSE, and MAE. Each model underwent parameter optimization, testing 3 commonly used machine learning models: Support Vector Machine (SVM), Decision Tree, and Random Forest. Table
4 shows the evaluation results of each model after running 100 times. It can be observed that the Random Forest model outperformed the other 2 models in predictive performance, hence we selected the Random Forest model as the final outcome prediction model (Table
4). The evaluation of the causal effect model mainly involved analyzing model features through the SHAP method and plotting marginal causal effect diagrams to confirm clinical consistency. By analyzing the high-quality embryo rates through the predictive model and the causal effect estimation model, targeted intervention measures were formulated. The causal effect estimation employed a DML method, which is a method for estimating heterogeneous intervention effects. We compared 3 commonly used DML methods, and the results are shown in Table
5. It can be observed that the non-parametric model (NonParam DML) performed the best. Compared to the outcome prediction model, there was a certain improvement in performance, which also confirms that causal models are more stable and reliable. We selected modifiable male factors significantly affecting the high-quality embryo rate from the univariate logistic regression analysis, namely sperm DFI, male body weight, and mycoplasma infection, and constructed causal effect estimation models with these indicators as intervention variables. First, the high-quality embryo rate was predicted based on these indicators, and the need for intervention was determined according to the magnitude of the high-quality embryo rate. When intervention is needed, the causal effects of potential intervention strategies, that is, the changes in the high-quality embryo rate, are assessed one by one based on the causal effect estimation model, providing a reference for clinicians to formulate targeted intervention measures (Fig.
1).
3.4 Test Results for Intervention Variable of Mycoplasma
The rate of high-quality embryos is significantly higher in mycoplasma-negative individuals than in mycoplasma-positive individuals (75.0% vs 33.3%,
p = 0.000, Fig.
2A). When the intervention variable is mycoplasma and the outcome variable is the rate of high-quality embryos, the heterogeneity-adjusted covariates include modifiable indicators among male factors, such as sperm concentration before DGC, progressive motility, sperm volume, total sperm count, total motile sperm count, sperm DFI, sperm extraction yield, fertilization technique, and other female factors, as well as other covariates among male factors that cannot be changed in the short term. In the collected data, 23.3% of patients underwent treatment for mycoplasma positivity, meaning their mycoplasma test results changed from positive to negative after intervention, and this group was used to validate the accuracy of the model’s causal effect estimation. The results showed that sperm DFI, the proportion of PR sperm before DGC, and male weight are the 3 factors that have a greater impact on the intervention effect, with sperm DFI and male weight generally having a negative correlation with the intervention effect of mycoplasma, and the proportion of PR sperm before DGC generally having a positive correlation with the intervention effect of mycoplasma (Fig.
2B). Fig.
2C shows the marginal causal average effect for mycoplasma-positive patients in the test set, which are almost all negative values, indicating that when patients change from mycoplasma-positive to negative, the rate of high-quality embryos increases.
3.5 Test Results for Intervention Variable of Sperm DFI
We further analyzed the relationship between sperm DFI and the rate of high-quality embryos and found no clear linear relationship (Fig.
3A). When the intervention variable is sperm DFI, the factors that have a significant impact on the intervention effect include the total sperm before DGC, sperm extraction yield, and male weight (Fig.
3B). Fig.
3C shows the marginal effect for patients in the test set with DFI
25, indicating that there are differences in the marginal effects among different patients, but the vast majority are negative values, meaning that reducing DFI would increase the rate of high-quality embryos. However, the improvement in the high-quality embryo rate from reducing sperm DFI is minimal, with the maximum increase being only 0.3% (Fig.
3B). This finding is consistent with the results of the multivariate logistic regression analysis, which attenuated the impact of sperm DFI on the high-quality embryo rate.
3.6 Test Results for Intervention Variable of Male Weight
We analyzed the relationship between male weight and the rate of high-quality embryos, and observed no clear linear relationship. However, it was noted that patients with higher body weights tend to have relatively lower rates of high-quality embryos (Fig.
4A). When the intervention variable is male weight, the factors that significantly impact the intervention effect are sperm extraction yield, sperm DFI, and total motile sperm count before DGC (Fig.
4B). Fig.
4C illustrates the marginal causal effect of body weight for patients who are overweight (
80 kg), showing that for the majority, the effect values are negative, indicating that being overweight reduces the rate of high-quality embryos.
4. Discussion
In IVF cycles, female factors play a significant role in embryo development, particularly the impact of egg quality on embryo development. The influence of male factors and sperm quality on embryo development has been less studied. Pellegrini
et al. [
17] found the relationship between male sperm quality and embryo quality in donor egg IVF is that poor sperm quality is associated with changes in morphokinetic parameters of embryos on day 3 after donor egg IVF, emphasizing the important role of sperm in embryo development. Our study aimed to explore male factors affecting the rate of good embryos and to develop intervention strategies based on causal effect estimation. Through the analysis of data from 373 infertile couples, we found that male factors such as sperm DFI, male height, weight, and mycoplasma infection status are all significantly correlated with the rate of good embryos.
Firstly, we observed that in the group with a higher rate of good embryos, the sperm DFI in males was lower. Sperm DFI, as an important indicator of sperm genetic quality, is associated with improved embryo quality when reduced. This is consistent with previous research findings. Jiang
et al. [
18] found that DFI and sperm morphology affect the blastocyst formation rate and the rate of good blastocyst formation on day 6. A meta-analysis by Kaiyal
et al. [
5] also found that the fertilization rate and embryo cleavage rate in the group with high sperm DFI were significantly lower than those in the low DFI group. Our results show that the weight of males in the high-quality embryo group was lower. Liu
et al. [
19] studied 6569 first fresh
in vitro fertilization-embryo transfer (IVF-ET) cycles and found that the number of available embryos and high-quality embryos was significantly reduced in the overweight/obese female group and the mixed overweight/obese male and female group compared to the normal weight group. A recent meta-analysis indicated that male obesity has a negative impact on basic sperm parameters, such as sperm count, concentration, and motility, increasing the incidence of infertility and reducing pregnancy rates [
6] . Although there is controversy in human studies [
6,
19] due to significant heterogeneity between studies and the use of body mass index as an indicator of obesity, the relationship between male obesity and sperm DNA damage is controversial, but evidence from rodent models clearly shows that male obesity increases sperm DNA damage [
20]. A meta-analysis indicated that being overweight and/or obese is associated with lower sperm quality, such as sperm volume, sperm count and concentration, sperm motility, total motility and normal morphology [
21]. Additionally, there is an increased proportion of sperm with low mitochondrial membrane potential (MMP) in obese men, and the proportion of DNA fragments and abnormal morphology has also increased [
6]. These all suggest that obesity may affect pregnancy outcomes by affecting sperm DFI. Genital tract mycoplasma can affect the health of fetuses and newborns. However, the sperm optimization process in current ART treatments cannot ensure the eradication of mycoplasma. It is beneficial to avoid using semen contaminated or infected with mycoplasma in ART treatments, and routine screening for mycoplasma before ART treatment is advisable. Our results show that the proportion of positive mycoplasma in the low-quality embryo group was significantly increased. In a study of 306 couples undergoing IVF treatment, 32% of semen samples were found to have ureaplasma urealyticum infection, with similar infection rates at the prostate and seminal vesicle levels, and a significant decrease in pregnancy rates after embryo transfer in the infected group [
22]. In couples infected with mycoplasma who received specific antibiotic treatment to eradicate the infection, improvements in semen quality and better ART outcomes were also observed [
23]. Therefore, routine screening for genital tract mycoplasma can improve semen quality, thereby simplifying IVF procedures and increasing the success rates of embryo implantation and pregnancy [
8]. Secondly, our univariate logistic regression analysis showed that male sperm DFI, male height, weight, and mycoplasma infection were all negatively correlated with the rate of high-quality embryos. The identification of these factors provides potential targets for further intervention. However, further multivariate logistic regression analysis found that while sperm DFI was significant in the univariate analysis, it was not significant in the multivariate analysis. This may be due to collinearity between sperm DFI and other variables (such as mycoplasma positivity), or its effect being diminished when considering other variables. In contrast, mycoplasma infection and male weight were significant in both univariate and multivariate analyses, indicating that they have an independent significant impact on the high-quality embryo rate. In particular, a negative status of mycoplasma infection is associated with a higher rate of high-quality embryos, which further suggests the importance of screening and treatment of infection status in infertility treatment.
Causal effect estimation is a science based on observational data to perform counterfactual estimation and analyze the causal relationship between interventions and outcomes. It attempts to answer who is the cause and who is the effect, and how much of an effect the cause will have. In practical applications, causal effect estimation can help us understand the extent to which specific interventions affect outcomes [
24]. Therefore, based on the above results, we constructed an outcome prediction model and a causal effect estimation model to evaluate the effects of different intervention measures according to Aaron Baum’s method [
25]. By comparing 3 commonly used machine learning models (SVM, Decision Tree, and Random Forest), we found that the Random Forest model outperformed the other 2 models in predictive performance. Therefore, we chose the Random Forest model as the final outcome prediction model. At the same time, we also used the two-machine learning method to estimate heterogeneous intervention effects and identified modifiable male factors that have a significant impact on the rate of good embryos. In medical research, the same treatment may have significantly different effects on populations with different characteristics, that is, there is heterogeneity in treatment effects. Causal forest analysis can effectively address the limitations of traditional analysis methods in testing treatment effect heterogeneity and provide more accurate heterogeneity assessment. Through this analysis, researchers can identify the response of different patient groups to specific treatments, thereby providing more customized treatment plans for patients [
26]. We found through causal forest analysis that sperm DFI, male weight, and mycoplasma infection are modifiable factors that significantly affect the rate of high-quality embryos. This finding provides a basis for targeted interventions for clinicians. In the intervention analysis of mycoplasma infection, we found that the rate of high-quality embryos in the mycoplasma-negative population was significantly higher than that in the positive population. In addition, sperm DFI, the proportion of PR sperm before DGC, and male weight are factors that have a greater impact on the intervention effect. These results indicate that treating mycoplasma infections can increase the rate of high-quality embryos, which is of great significance for clinical practice. In the intervention analysis of sperm DFI, we found that reducing DFI can increase the rate of high-quality embryos. This finding is consistent with existing literature on the relationship between sperm DFI and embryo quality [
27]. However, the effect of reducing sperm DFI on improving the high-quality embryo rate is minimal. This may be because sperm DFI is often tested before ART and may not truly reflect the sperm DFI levels on the day of fertilization. Further analysis of sperm on the day of fertilization is needed to determine the true impact of sperm DFI on the high-quality embryo rate. Moreover, we found that a decrease in male weight is also associated with an increase in the rate of high-quality embryos, further emphasizing the importance of lifestyle interventions in infertility treatment.
5. Conclusion
In summary, our research results emphasize the importance of assessing male factors in infertility treatment. By identifying and intervening in key factors such as sperm DFI, male weight, and mycoplasma infection status, the rate of high-quality embryos can be increased, thereby increasing positive fertility outcomes for infertile couples. However, it should be noted that this study only analyzed a specific group of infertile couples, and further studies are needed in the future to verify the applicability of these findings in other populations, as well as to further explore the intervention strategies of these factors and how they interact with other factors not considered in this study. Our study was conducted at a single center, which may introduce selection bias and limit the generalizability of our results. A multi-center design should be considered to account for potential variations in patient populations and treatment protocols. In conclusion, this study provides clinicians with a process for formulating intervention strategies based on causal effect estimation, which helps to improve the rate of good embryos in infertile patients.
Availability of Data and Materials
The datasets that support the findings of this study are available from the corresponding author (QW) upon reasonable written request.
scientific research project of Shanghai Qingpu District Health Commission(QWJ2022-06)