1. College of Biosystems Engineering and Food Science, National-Local Joint Engineering Laboratory of Intelligent Food Technology and Equipment, Fuli Institute of Food Science, Zhejiang University, Hangzhou 310058, China
2. Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan 314100, China
3. College of Environment & Resource Sciences, Zhejiang University, Hangzhou 310058, China
4. Ningbo Innovation Center, Zhejiang University, Ningbo 315100, China
dhliu@zju.edu.cn
Show less
History+
Received
Accepted
Published Online
2026-03-09
2026-06-03
2026-06-22
PDF
(13413KB)
Abstract
Foodborne diseases are frequently linked to the consumption of fruits and vegetables, with the virulence genes (VGs) carried by Escherichia coli (E. coli) being one of the primary contributors to these illnesses. However, the trends of VGs in E. coli from fruits and vegetables remain poorly understood. This study analyzed the genome sequences of 1084 E. coli from fruits and vegetables collected worldwide between 2000 and 2023. The results showed a significant upward trend in VGs over time, peaking at 83.11 in 2019. The major changes were observed in toxin-producing and delivery system. Moreover, the number of plasmids and VGs increased significantly over time, suggesting a rising risk of transmission of VGs. Human-induced factors, including the increase of CO2 concentration and pesticide use, led to the change of VGs of E. coli, further exacerbating food safety risks. Furthermore, the multiple regression model which can predict VGs showed that reduced CO2 emissions by 20%, VGs could be reduced by 4.85. Among different types of produce, the isolation of E. coli from leafy vegetables exhibited the highest levels of VGs, with spinach (86.86) and lettuce (75.95) showing the highest concentrations. This trend correlates strongly with the frequency of E. coli induced outbreaks, indicating an emerging public health risk.
Foodborne diseases caused by pathogen contamination are among the most critical food safety and public health challenges in the world [1]. Fruits and vegetables, as an integral part of a healthy balanced diet, are increasingly consumed globally due to the promotion of healthier lifestyles [2]. However, these ready-to-eat commodities are often consumed raw or with minimal processing, which limits the effective removal or inactivation of microorganisms effectively and makes them a significant source of foodborne outbreaks [3]. The incidence of foodborne illnesses associated with fruits and vegetables has risen sharply. For example, in the United States, 48 foodborne outbreaks linked to fruits and vegetables were reported between 2016 and 2023, resulting in 13,040 illnesses and 30 deaths, of which 22.82% (n = 11) were attributed to Escherichia coli (E. coli) [4]. A notable incident occurred in October 2024, when E. coli contamination of onions used in McDonald’s hamburgers caused a severe foodborne outbreak across 14 states, infecting 104 individuals, 34 of whom were hospitalized [5]. Such incidents underscore the urgent need to control foodborne pathogens in fruits and vegetables, especially E. coli, to effectively mitigate foodborne illnesses and enhance food safety and public health.
E. coli in fruits and vegetables can spread rapidly through food supply chains, direct contact or environmental pathways. This pathogen is highly diverse, encompassing both commensal and pathogenic variants [6,7] with a wide range of virulence and antimicrobial resistance characteristics [8]. Therefore, studying the pathogenic characteristics of E. coli is essential for developing early warning and prevention systems and preventive measures to reduce the risk of foodborne outbreaks and ensure food safety and public health.
The concept of pathogenic bacteria has evolved significantly since the late 19th century, when Pasteur and Koch first defined them as microorganisms capable of causing diseases in hosts. They attributed disease symptoms to specific macromolecules, such as exotoxins and endotoxins [9]. In 1945, a large-scale outbreak of infantile diarrhea in the United Kingdom led to the first isolation of enteropathogenic Escherichia coli (EPEC) from affected children [10]. Over time, differences in bacterial pathogenicity have received increasing attention. E. coli virulence arises from the interaction of numerous virulence-associated genes, including virulence genes (VGs) located on pathogenicity islands and plasmids, structural VGs (e.g., those encoding fimbriae and flagella), and enterotoxin-encoding genes [11–13]. These VGs enable E. coli to colonize hosts, evade defenses, and cause diseases.
Some VGs are located on mobile genetic elements, such as plasmids, which increases the spread of bacterial resistance, posing challenges for food safety monitoring and disease prevention [14,15]. The number, distribution, and pathogenicity of VGs vary among bacterial species, strains, and sources, influencing the risks they pose to food safety and public health [16]. Bacterial virulence is highly dynamic and evolves in response to environmental pressures [17,18], which can drive the acquisition or loss of VGs [19,20]. Despite these insights, research on the virulence of E. coli from fruits and vegetables remains scare, leaving the dynamics and changes of VGs in this context unclear [21].
To investigate the temporal dynamics of VGs and their corresponding factors, this study focuses on E. coli as a foodborne pathogen isolated from fruits and vegetables. Using whole-genome sequencing, we examine the relationships between VG abundance and variables such as time, geographical region, isolation source, and genotype. Furthermore, we analyze the mobility and functionality of these VGs to assess their associated risks systematically.
2 Methods
2.1 Sample collection
The genomic sequences were downloaded from the National Center for Biotechnology Information (NCBI) Pathogen Detection Database, with the search term “Escherichia coli”, yielding a total of 563,337 isolates. Briefly, genomes were initially retrieved from the NCBI database and filtered based on available metadata fields, including host, isolation source, and sample description, retaining only entries explicitly linked to vegetables and fruits. The TexSmart tool was then applied to parse unstructured metadata and standardize source annotations using a predefined set of produce-related keywords (e.g., “lettuce”, “spinach”, “tomato”). To further minimize potential misclassification, isolates labeled as clinical or human-derived samples, those originating from environmental sources without a clear association with produce, and records with ambiguous or insufficient descriptions (e.g., “unknown” or “sample”) were excluded. In addition, ambiguous entries were manually reviewed where necessary to ensure consistency with our definition of produce-associated strains. A total of 1084 isolates derived from fruits and vegetables (lettuce, spinach, cilantro, cucumber, tomato, parsley, cantaloupe, etc.) were identified. Additionally, although we attempted to collect sequences from other databases (GenBank, CNCB, and EMBL), we did not find additional isolates from different time periods. Consequently, the number of isolates used in each time period were as follows: 136 strains (2000–2004), 57 strains (2005–2009), 222 strains (2010–2014), 241 strains (2015–2019), and 422 strains (2020–2023), with 6 strains lacking specific time information. Leafy vegetables (73.34%) were the predominant source of E. coli isolates, including lettuce (n = 331), spinach (n = 201), cilantro (n = 137), among others. Non-leafy vegetables and fruits included cucumber (n = 101), tomato (n = 95), common ingredients in salads. As one of the largest consumers of ready-to-eat fruits and vegetables, the United States contributed 588 E. coli isolates from 21 states. Among these, California, the highest producer of fruits and vegetables, had the largest number of isolates (n = 206) (Table S5). The raw sequencing data were assembled using Unicycler v0.5.0, an integrated assembler based on de novo assembly, suitable for short-read and hybrid sequencing data [22]. Initially, the raw data underwent quality control and trimming (e.g., removal of low-quality bases and adapter sequences) to ensure high-quality input data [23,24]. Unicycler then used SPAdes v3.15.5 for the preliminary assembly of short-read data. SPAdes v3.15.5 utilizes the de Bruijn graph algorithm to optimize k-mer selection for accurate contig construction, resolving sequence overlaps and repeats through fine-tuned node connections. Unicycler v0.5.0 further processed the SPAdes v3.15.5 output by optimizing the assembly graph, linking contigs, and removing redundant sequences to generate high-quality contigs and comprehensive assembly statistics, laying the foundation for subsequent annotation and analysis [25].
2.2 Phylogenetic analysis
Phylogenetic analysis was performed using MEGA 11[26] and iTOL (Interactive Tree of Life) tools. Initially, multiple sequence alignment of the target sequences was conducted using MEGA 11 software, with the ClustalW algorithm for preliminary alignment. Manual adjustments were made to ensure the accurate alignment of conserved regions, resulting in a high-quality alignment file (.meg format). Based on the alignment results, a phylogenetic tree was constructed using the Neighbor-Joining (NJ) method in MEGA 11. The appropriate evolutionary model (p-distance) was selected to compute the genetic distance matrix, and the robustness of the tree branches was assessed using 1000 bootstrap resamplings for statistical support. Bootstrap resampling was performed by randomly resampling alignment positions with replacement to generate 1000 pseudo-replicate datasets, and bootstrap support values were calculated for each node and mapped onto the final phylogenetic tree. Once the tree was constructed, it was exported in Newick format (.nwk) and uploaded to iTOL for high-resolution visualization and annotation. In iTOL, branches of the phylogenetic tree were color-coded, node labels were added, and specific data were mapped to generate a visually informative and easy-to-interpret tree, providing a clear depiction of the evolutionary relationships among the sequences. The resulting phylogenetic tree revealed substantial genetic diversity within E. coli, although some samples exhibited a high degree of relatedness. For example, 76 samples isolated from cilantro in Germany in 2017 (SAMN12676276-12676349) and 290 samples isolated from tomatoes (n = 79), lettuce (n = 120), and cucumbers (n = 91) in Cambodia in 2022 displayed relatively close genetic relationships. To minimize the influence of samples from a single source on the analysis, the average number of VGs was calculated for each group. The 76 samples from cilantro in Germany (2017) yielded a single average value, while the 290 samples from Cambodia in 2022 were divided by source (tomato, lettuce, and cucumber) to calculate three distinct average values.
2.3 Gene content analysis
Functional annotation and analysis of the 1084 genomes were performed through strict comparisons with multiple authoritative databases to ensure accuracy and high reliability. First, the genome sequences were compared with the E. coli 7-locus multi-locus sequence typing (MLST) database using the BLAST tool. The alignment threshold was set to sequence coverage ≥90% and sequence identity ≥95%, in order to accurately identify the sequence type (ST) of each sample [27]. For plasmid replicon identification, the genome sequences were compared with the PlasmidFinder database using the BLAST tool, with a threshold of sequence coverage ≥90% and sequence identity ≥90%, to determine the plasmid types carried by the strains [28]. Antibiotic resistance genes (ARGs) were identified based on the Comprehensive Antibiotic Resistance Database (CARD), with the alignment threshold set at sequence coverage ≥90% and sequence identity ≥90%, to screen for potential resistance genes [29]. Finally, VGs were identified by comparing the genomes with the Virulence Factor Database (VFDB, version 6.0), using the same alignment thresholds of sequence coverage ≥90% and sequence identity ≥90% [30,31]. All database comparisons were carried out with strict threshold criteria to ensure high-quality annotation results, providing a scientific foundation for subsequent functional analysis and public health risk assessments. Using information from the VFDB, VGs were classified to obtain functional annotations and potential virulence effects of each gene. To accurately differentiate plasmid sequences from chromosomal sequences in the genomes, PlasmidFinder v.2.0.1 was used to classify the contigs of each genome, identifying and annotating plasmid and chromosomal sequences. This step helped clarify the specific locations of resistance and VGs carried by plasmids, providing more precise data for subsequent analysis. The results of the database comparisons were preprocessed using Python v.3.10, converting the presence or absence of VGs and plasmid replicons into a binary matrix format. In the matrix, each genome was marked as 1 (present) or 0 (absent) for each VG or plasmid replicon. Subsequently, R v.4.3.2 was used for visual analysis of the binary matrix to generate prevalence maps of VGs and plasmid replicons by time group. The sources of isolation were incorporated to further investigate the transmission patterns of these factors over time and across different sources. The co-occurrence network of VGs and plasmid replicons in the samples was constructed and visualized using Gephi v0.10.1. Based on the gene annotation results, the co-occurrence frequencies of each VG and plasmid replicon were calculated to generate a co-occurrence matrix. The processed matrix was imported into Gephi in edge list format, and the ForceAtlas algorithm (Fruchterman Reingold) was applied to optimize the network layout to present the relationships between nodes. Nodes representing VGs and plasmid replicons were colored differently, and their size was adjusted based on their degree of connectivity. The edge thickness and transparency were visualized based on the co-occurrence frequency weights, to reveal the potential associations and biological significance between VGs and plasmid replicons.
2.4 Key driver analysis
We assessed the relative importance of key factors influencing E. coli VGs using the variable importance tool in a random forest model [32]. Specifically, a random forest model was built based on 11 relevant factors and E. coli VGs, quantifying the impact of these factors on VGs. The selected factors included: plasmid count, ARGs, fruit and vegetable production (FAOSTAT, FAO), temperature, GDP, population density, pesticide usage, CO2 emissions, water pollution, environmental water pollution, and drinking water pollution. The random forest model was executed in R v.4.3.2 using the randomForest and rfPermute packages, with a random seed set to 42 and other default parameters. To optimize the model, the random forest was initially trained on 70% of the data. The remaining 30% was used as a validation set to assess model accuracy. After parameter optimization, the final model was built using all the data with the following parameters: ntree = 500, importance = TRUE, nrep = 1000. In the random forest model, the percentage increase in mean squared error (%IncMSE) indicates the contribution of each variable to the model’s prediction. The higher the %IncMSE, the greater the relative importance of the variable in the model. To assess the relative importance of each predictor, we used the rfPermute package to perform permutation tests on each predictor, generating the %IncMSE for each decision tree. Statistical significance of the contribution of each predictor was evaluated using random permutation methods to calculate P-values. Variables with significant contributions were marked with statistical significance levels (* indicates P < 0.05, ** indicates P < 0.01, *** indicates P < 0.001). Spearman rank correlation analysis was used to assess the correlations between different variables [33]. Specifically, we examined: (1) the correlation between CO2 emissions and VGs, and (2) the correlation between pesticide usage and VGs. Correlation analysis was performed using the cor.test() function in R, specifying method = "spearman" to compute the Spearman correlation coefficient (ρ) and its corresponding P-value for each variable pair. The Spearman correlation coefficient ranges from -1 to 1, where ρ > 0 indicates a positive correlation, ρ < 0 indicates a negative correlation, and the closer |ρ| is to 1, the stronger the correlation. Statistical significance was evaluated through P-values, and correlations were considered significant when P < 0.05.
2.5 Multiple regression model analysis
First, all the data of independent variables are normalized by min-max (Equation 1). Multiple linear regression model was used to analyze the relationship between plasmid, fruit and vegetable production, GDP, population density, CO2 emissions and output (VGs). First of all, the data containing five input features and one output variable was read and divided into an 80% training set and a 20% validation set. A multiple linear regression model was constructed using PyTorch, and the mean square error (MSE) was used as the loss function, and the stochastic gradient descent (SGD) optimization algorithm was used for training. A total of 1000 iterations were performed during the training process, after each iteration the model performance on the validation set was evaluated and the best model was saved. Finally, the trained linear regression equation was obtained, including the weights and bias terms of each feature. In addition, to further assess model robustness and statistical significance, an ordinary least squares (OLS) regression analysis was performed using the same variables, from which R2 values and P-values were obtained.
where is the original value, and represent the minimum and maximum values of the variable, respectively, and is the normalized value.
3 Results
3.1 Trends in virulence gene variation
This study analyzed a total of 1084 E. coli isolates from the National Center for Biotechnology Information (NCBI) database (as of January 2024). These isolates were obtained from fruits and vegetables across 10 countries, including the United States and Germany (Table S1). The majority of isolates (n = 588) originated from the United States, particularly California, a major hub for fruit and vegetable production. Hosts included lettuce (n = 331), spinach (n = 201), cilantro (n = 137), cucumber (n = 101), and tomato (n = 95). The whole-genome sequences of the 1084 E. coli isolates were annotated using the Pathogenic Bacterial Virulence Factor Database (VFDB), resulting in the identification of 417 distinct VGs. The temporal analysis revealed a significant upward trend in the number of VGs over time, with the highest value (83.11) observed in 2019 (Fig. S3). The isolates were grouped into five periods: 2000–2004 (n = 136), 2005–2009 (n = 57), 2010–2014 (n = 222), 2015–2019 (n = 241), and 2020–2023 (n = 422). To account for sample size disparities, random sampling (n = 57) was performed with 999 repetitions to create a balanced dataset. The results are shown in Fig. 1A. From 2000 to 2019, the number of VGs carried by E. coli significantly increased (P < 0.001), rising from 69.10 to 86.23, representing an increase of 24.79%.
3.2 Categories of virulence genes
The annotated 417 VGs were classified based on VFDB criteria [34] (Table S4) into categories such as effector delivery systems, nutritional/metabolic factors, adherence, exotoxins, immune modulation and invasion. Effector delivery systems accounted for the largest share (163 VGs) (Fig. 1 and Table S3). Temporal trends indicated significant increases in VGs related to effector delivery systems and exotoxins, which were the primary drivers of the overall change of VGs (Fig. 2C and D). An increased abundance of genes gspH, gspG, espX2, and espP (effector delivery systems), as well as stx2B, stx2A, hlyB, and hlyA (exotoxins), was observed (Fig. 2E and F). One key finding was the universal presence of the fur gene, a regulator of iron homeostasis and a critical factor in bacterial survival and pathogenicity. As a virulence gene, fur can significantly enhance the survival and infection potential of pathogenic bacteria by regulating the expression of iron-related metabolic pathways and virulence factors.
3.3 Differences in virulence genes across genotypes
The isolates were characterized using multi-locus sequence typing (MLST) based on seven pairs of housekeeping genes (adk, fumC, gyrB, icd, mdh, purA, and recA). A total of 309 sequence types (STs) were identified, with ST11 (5.35%, n = 58) the most dominant, followed by ST297 (n = 57), ST6186 (n = 38), ST10 (n = 30) and ST58 (n = 26). ST11 was the most prevalent sequence type in this study and corresponds to the well-characterized E. coli O157:H7 lineage, a major foodborne pathogen associated with Shiga toxin production [35]. Notably, more than half of the STs (n = 162) were represented by a single sample, while 73 isolates lacked definitive ST classification (Fig. S1). Significant differences in VGs were observed among genotypes. ST11 isolates carried an average of 138.83 VGs, far exceeding the overall average of all samples (75.47). ST11 isolates were primarily associated with lettuce (n = 24) and spinach (n = 20) and spanned isolation years from 2000 to 2021. However, the prevalence of ST11 decreased over time, while genotypes with fewer VGs, such as ST48 and ST101, showed an increasing trend (Fig. S5).
3.4 Transfer of virulence genes
Genomic sequences were aligned with the plasmid database (PlasmidFinder), and the alignment thresholds were set at ≥90% sequence coverage and ≥90% sequence similarity to identify the type of plasmids carried by the strains. Plasmids were present in 890 samples out of a total of 1084 samples, with an average number of 2.66 plasmids, and the average length of plasmids was 38,343, of which 363 plasmids were ring-forming. The number of plasmids increased significantly over time, from an average of 1.35 in 2000–2004 and to 2.91 in 2015–2019 (Fig. S2A). Among 236 isolates, plasmid-associated VGs were identified, totaling 1185 VGs, with an average of 4.37 per sample. The highest VG count (18) was in an isolate from lettuce by the US FDA in 2021 (Fig. 1). The average number of VGs in plasmids increased from 2.81 in 2000–2004 to 5.26 in 2015–2019, with an increase of 87.19%. There were 12 plasmid types, of which plasmid IncFIB (AP001918) carried the most VGs (n = 34), followed by IncFIA and IncFII (n = 23). The VGs in plasmids were mainly distributed in adherence (34, 35.42%), exotoxins (32, 33.33%), effector delivery systems and nutritional/metabolic factors (11, 11.46%). To further investigate the mobility of VGs, we analyzed the interactions between VGs and plasmids, and the results showed that some plasmids co-occurred with VGs (Fig. 3). Exotoxin genes (hlyA, hlyB, hlyC, and hlyD) frequently co-occurred with plasmid IncFIB, with co-occurrence rates increasing significantly over time (25, 24, 25, and 25 times in the 2000s, respectively; 127, 128, 125, and 128 times in the 2010s, respectively; and the number of co-occurring plasmids and VGs increased in the 2010s). VGs in the effector delivery system frequently co-occurred with IncFIB and IncFII, and in the 2000s, only espP and IncFIB, toxB and IncFII co-occurred each once in the 2000s, and in the 2010s the number of times espP co-occurred with IncFIB, toxB, and IncFII increased to 29 and 14 times, respectively.
3.5 Analysis of virulence genes in different isolates
The number of VGs was consistently higher in leafy vegetables, such as spinach (86.86) and lettuce (76.14), compared to non-leafy vegetables and fruits, including cucumbers (66.25), tomatoes (64.75), cantaloupes (64.44), and strawberries (65.5) (Fig. 2B). From 2010 to 2019, the number of VGs isolated from leafy vegetables showed a significant increase from 83.62 to 89.48 (7.01%), while the total number of samples only rose by 3.32% during the same period. Lettuce samples, in particular, showed a broad temporal and geographical distribution, with a significant upward trend in the number of VGs. Between 2015 and 2019, the average number of VGs in lettuce isolates reached 99.19, representing a 1.42-fold increase compared to 2000–2004. In addition, while the E. coli VGs isolated from non-leafy vegetables and fruits samples initially showed growth, their numbers eventually plateaued. A similar trend was observed for plasmid-borne VGs in leafy vegetable samples, which showed a steady rise from 2.90 to 5.42 over the same period. Co-occurrences of plasmids and VGs also increased over time. For instance, the exotoxin gene hlyA co-occurred with the plasmid IncFIB, with the frequency rising from 24 occurrences in the 2000s to 115 in the 2010s. Similarly, the effector delivery system gene espP co-occurred with IncFIB 22 times in the 2010s as compared to just one occurrence in the 2000s (Fig. S2). Analysis of ST typing in different isolates revealed notable variations in the distribution of ST types across host sources. For lettuce samples, the percentage of ST11 (VGs: 138.83) was 7.25%, significantly higher than its overall percentage of 5.35% in all samples. Similarly, the percentage of ST297 (VGs: 76.78) in lettuce was 6.95%, compared to 5.26% in all samples. Spinach also exhibited a high representation of ST11 (9.95%) and ST297 (6.47%). In contrast, the percentages of ST11 and ST297 in cucumber were low (0.99%), with ST1112 (VGs: 64.00) and ST906 (VGs: 67.36) being the most common at 3.96%. For tomato, ST11 and ST297 accounted for 2.11% and 0, respectively, with ST155 (VGs: 66.14) being the most common at 11.58%.
3.6 Analysis of key drivers
A random forest regression model was employed to evaluate the predictive contributions of multiple factors on the changes of VGs in E. coli. The predictors included human activities (e.g., fruit and vegetable production, temperature, GDP, population density, pesticide usage, CO2 emissions, water pollution, environmental water pollution, and drinking water pollution) and biological factors (e.g., plasmid count, and ARGs). The model was used to assess the relative importance of these variables in explaining the observed variation in VGs. The significance analysis showed the decisive role of human activities, such as CO2 emissions, GDP, and population density, in influencing these changes. Among these variables, CO2 emissions stood out with a %IncMSE of 7.12%, much higher than other factors (Fig. 4). This suggests that CO2 emissions may be a key driver of changes of VGs in E. coli. Further correlation analysis confirmed significant positive relationship between CO2 emissions and the number of VGs (P < 0.05, Fig. 4B). Global migration patterns were also found to be significantly correlated with VG changes (Fig. S6). For leafy vegetable-associated E. coli isolates, pesticide use emerged as a potential critical factor, with an importance score of 3.44%. Biological factors, such as the presence of plasmids, also played a substantial role, accounting for 3.33% of the variation. Additionally, human activities such as GDP, CO2 emissions, fruit and vegetable production, population density, and temperature, were found to significantly impact changes in E. coli VGs. Besides, plasmid was also an important biotic factor of changes of VGs. The correlation analysis between plasmid and VGs indicated that the plasmid IncFIB showed significant correlation with hlyB, hlyD, hlyC, HlyA, espP [36]. The United States, with the largest number of widely distributed and representative isolates, was used as a case study to analyze the relationship between VG changes and foodborne disease outbreaks. The results revealed a significant correlation between the number of VGs and the incidence of foodborne disease outbreaks caused by E. coli (Fig. S7A). This correlation was particularly noticeable for outbreaks linked to lettuce, spinach and other fruits and vegetables (Fig. S7B). Furthermore, a multiple regression model was constructed between the number of virulence genes and each key factor (Equation 2). According to the coefficients of different independent variables, the weight of plasmid number and CO2 emission is the highest.
where is VGs, is the number of normalized plasmids, is the number of normalized fruit and vegetable production, is the number of normalized GDP, is the number of normalized population density, is the number of normalized CO2 emissions.
4 Discussion
This study provided a comprehensive analysis of the trends in E. coli VGs isolated from fruits and vegetables since 2000. It examined the relationship between VGs and time, and isolation sources, as well as the dynamics of plasmid-associated VGs. The results showed a significant growth trend in VGs, with toxin secretion-related genes (e.g., effector delivery systems and toxins) being the primary contributors to this increase. Besides, the number of plasmids carrying VGs increased over time, along with an increase in the types and number of co-occurring plasmids and VGs. Leafy vegetables, in particular, showed the highest levels of E. coli contamination and VG prevalence, with a clear upward trend over the years. These findings highlight leafy vegetables as a high-risk category for contamination.
Bacterial pathogenicity is a complex process requiring the coordinated action of four steps: (1) host invasion, (2) tissue colonization, (3) tissue injury, and (4) host defense evasion [37,38]. A critical element in this process is the effector delivery system, which transports toxins or effector molecules into host cells. This system disrupts host physiological processes and play a vital role in bacterial virulence and pathogenic mechanisms [39]. Effector delivery system proteins can regulate or interfere with a wide range of cellular functions, including gene expression, vesicular trafficking, programmed cell death and cell cycle progression [40,41]. Exotoxins, on the other hand, are directly responsible for cell damage and disease. Common E. coli toxins include Shiga toxin1 (stx1) and Shiga toxin2 (stx2), both of which are associated with severe symptoms like diarrhea and fever, and the genes encoding these toxins are often plasmid-borne [40,42]. Pathogenic bacteria typically rely on the combined action of effector delivery systems and exotoxins [43]. This study also found that genes related to the toxin secretion system are abundant in both chromosomes and plasmids. These two VG categories dominate the observed changes and represent critical elements of E. coli pathogenicity.
Notably, no significant correlation was observed between the number of anti-microbial resistance (AMR) genes and the number of VGs. This can be attributed to the vastly different evolutionary timescales of these traits [43]. While hosts and bacteria have co-evolved over millions of years, leading to adaptations in bacterial virulence to counteract host defenses [19,44], the evolution and spread of resistance are much more recent, mostly in the past 50 years following the introduction of antibiotics [45]. These findings reveal the complex interplay between human activities, bacterial evolution and the genetic characteristics of E. coli, providing an important basis for the development trend of virulence and AMR genes.
Plasmids, small circular DNA molecules capable of autonomous replication in bacterial cells, are key contributors to bacterial VGs [6]. They can confer specific phenotypic traits to host bacteria, significantly enhancing genetic diversity and evolutionary potential [46]. The number of plasmids in the samples increased significantly over time, indicating either an enhanced bacterial adaptability to the environment, or the spread and accumulation of plasmids driven by external factors such as environmental changes [47,48]. In the One health framework, plasmids play a key role in gene transmission, influencing the flow of genetic material across different biological levels [49]. Plasmid-borne VGs may enhance dissemination through horizontal gene transfer in E. coli, promoting the spread of virulence traits in food-associated environments and increasing risks to food safety and public health [50]. Unlike AMR genes, the horizontal transfer of VGs is less frequent, so the proportion of plasmid-carried VGs remains relatively low [51]. However, E. coli strains, particularly the ST11 type, exhibited the highest number of VGs on both chromosomes and plasmids. 84.48% of ST11 samples were derived from vegetables, suggesting a strong host preference among E. coli strains with high VG counts.
In this study, co-occurrences of VGs (e.g., hlyB, hlyD, stcE, etc.) and plasmids (IncFIB) were frequently detected. Previous studies have demonstrated the correlation between plasmid types (IncFIA, IncI1, IncFIB and IncFII) and both VGs and AMR genes [52,53]. These plasmids, known for their broad host range, can move between different environments and habitats, facilitating the mixing of environmental, human, and animal microorganisms and driving the wide spread of VGs [51,54,55]. In particular, wide-area bacteria such as E. coli exist widely in a variety of environments and are key vectors for transhabitat transmission of plasmids containing VGs [49,56]. The abundance of co-occurring VGs in the samples remained consistently high, further indicating that these VGs may co-evolve over time. Moreover, VGs that mediate similar virulence functions usually co-occur, such as hlyB and hlyD/hlyC/hlyA, iucA and iucB/iucC, clbG and clbD/clbD/clbH, etc. It is hypothesized that the emergence of this phenomenon is driven by the long-term stress selection of external environment for a certain type of virulence function [57].
Under strong selection pressure from harsh environments, highly pathogenic bacteria populations may gain a competitive edge over less virulent strains, enabling their growth [58]. This study revealed that the number of plasmids and the number of VGs in plasmids increased over time, indicating that E. coli has continuously improved its competitive advantage and adaptability in the process of evolution [46]. Correlation analysis further showed a significant positive correlation between the number of plasmids and VGs (P < 0.05), suggesting that the rise in VGs complements the increase in plasmid number to a certain extent (Fig. S6). The effective regulation of VGs and the control of their spread lies in the control of plasmids carrying VGs.
Analysis of samples from different sources identified spinach and lettuce as significant reservoirs of E. coli with high VG content (Fig. S1 and Fig. S3). The leafy vegetables were particularly upsetting because 51.7% of the outbreaks associated with agricultural products in developed countries are associated with them [3]. The WHO identified leafy vegetables as one of the most critical commodities in microbial safety, especially with regard to E. coli O157:H7, which is also considered as a prominent food hazard in countries like the United States and Sweden [59, 60]. Leaf vegetables are at high risk due to their frequent direct contact with irrigation water, which often acts as a transmission medium for E. coli. Surface water for irrigation has also been implicated in multiple outbreaks [3,61]. Taking US data as an example. The number of VGs was significantly correlated with outbreaks caused by E. coli (Fig. S7 and Table S5), and the number of VGs from different isolated sources was also significantly correlated with outbreaks caused by E. coli, indicating that the increase in the number of VGs increased the pathogenic risk of E. coli.
The changes in E. coli VGs were associated with CO2 emissions, pesticide use, and population density. The increase in CO2 concentrations can enhance plant photosynthesis, which in turn affects the growth of symbiotic microorganisms, including the change of pseudomonas population, etc [62–64]. These changes can impact the pathogenicity and virulence of the microbial flora. Furthermore, rising CO2 and other greenhouse gases contribute to higher atmospheric temperatures, which can alter the activity of enzymes and the availability of oxygen, thus affecting microbial respiration and other life activities [65]. The use of pesticides not only influences the resistance genes of bacteria, but also may affect the virulence characteristics of bacteria by changing their growth environment and pressure [66,67]. Studies suggest that after pesticide exposure, bacteria may become more virulent, especially in stressful environments, and that they can adapt to this selection pressure by adjusting their virulence factors. This can have serious implications for fruit and vegetable ecosystems and public health, especially in areas with high pesticide use. Therefore, controlling pesticide use and managing agricultural practices are essential to reducing the virulence of pathogens in food production [68]. The multiple regression model can predict VGs based on external influencing factors. The coefficient of the model indicated that environmental factors: CO2 emission and biological factors: plasmid number are the dominant factors determining VGs, which is consistent with the results of the random forest regression model. The OLS regression model indicated that plasmid, fruit and vegetable production, GDP, population density, CO2 emissions collectively accounted for 58.7% of the observed variation in VGs (R2 = 0.587). Among these variables, CO2 emissions showed a significant association with VGs (P < 0.05). Based on the fitted regression equation, a 20% reduction in CO2 emissions was associated with an estimated decrease of 4.85 VGs in E. coli. Factors such as fruit and vegetable production, GDP and population size have little effect on VGs. When fruit and vegetable production is doubled, VGs is only increased by 0.9450. While the model used in this study provides a relatively accurate prediction of VGs, environmental factors interact in complex, nonlinear ways. Further research can improve the model by incorporating a broader range of ecological factors.
The United States has the largest number and distribution of E. coli samples, with the highest concentration of isolates from California. California, being the top producer and importer of fruits and vegetables, accounts for a significant portion of E. coli detections. The genetic diversity of E. coli isolated from California is extensive (Fig. S1). Over time, the number of VGs have gradually risen with increased human activities, population growth, and higher fruit and vegetable consumption, with a decline after 2019 (Fig. S3). This trend correlates with the onset of the COVID-19 pandemic in 2019, which drastically changed people’s lifestyle, reduced travel, and slowed the spread and circulation of bacteria [2,69]. At the same time, heightened awareness of hygiene and protective measures during the pandemic effectively curbed the transmission of microorganisms [70]. The overuse of disinfectants and antibiotics has threatened the survival of microorganisms in an unprecedented way [71]. Virulence-related functions, such as adhesion, invasion, delivery systems and toxin production, all require energy. In response to survival pressures, E. coli may reduce some virulence functions, resulting in a drop in the number of VGs [72,73]. This reflects a strategic adaptation by E. coli to the environment. The change of external living environment is one of the key driving factors for the change of bacterial virulence [57,74], which provides us with important clues for an in-depth understanding of microbial ecology and evolutionary mechanism.
This study does have some limitations. The timing and specific locations of sample collection were not always detailed, and there were years with fewer samples. In addition, a large proportion of samples from Cambodia in 2022 (290 samples, or 26.75%) came from a single BioProject (PRJNA357722), which may limit the diversity of the samples and thus produce certain deviations in the overall trend analysis. Despite these potential biases, this study presents an extensive coverage of sample collection across different time periods, locations, and isolation sources, ensuring significant genomic diversity. Although the Cambodia data may introduce some local biases, its impact on the overall analysis is relatively small. Furthermore, while the study provides in-depth analysis of overall VGs and specific ST types, some representative VGs may not have been fully addressed.
5 Conclusions
In conclusion, this study uncovers a concerning long-term increase in VGs in E. coli from fruits and vegetables worldwide, linking this trend to human factors such as CO2 emissions. These findings provide critical evidence for a hidden yet escalating chain of effects: climate change–microbial evolution–food safety deterioration. This study highlights that strengthening agricultural environmental management, reducing pollution emissions, and optimizing food safety monitoring are critical to reducing the virulence of E. coli.
World Health Organization. WHO Estimates of the Global Burden of Foodborne Diseases: Foodborne Diseases Burden Epidemiology Reference Group 2007–2015. Geneva: World Health Organization, 2015
[2]
Khoury C K, Jarvis A, Jones A D. Trade and its trade-offs in the food system. Nature Food, 2020, 1(11): 665–666
[3]
Machado-Moreira B, Richards K, Brennan F. et al. Microbial contamination of fresh produce: what, where, and how?. Comprehensive Reviews in Food Science and Food Safety, 2019, 18(6): 1727––1750
[4]
U. S. Food, Drug Administration. Public health advisories from investigations of foodborne illness outbreaks. , Available online(accessed 7 May 2026),
[5]
U. S. Food, Drug Administration. Outbreak investigation of E. coli O157: H7: onions (October 2024). , Available online(accessed 7 April 2025),
[6]
Denamur E, Clermont O, Bonacorsi S. et al. The population genetics of pathogenic Escherichia coli. Nature Reviews Microbiology, 2021, 19(1): 37–54
[7]
Horesh G, Blackwell G A, Tonkin-Hill G. et al. A comprehensive and high-quality collection of Escherichia coli genomes and their genes. Microbial Genomics, 2021, 7(2): 000499
[8]
Schreiber H L, Conover M S, Chou W C. et al. Bacterial virulence phenotypes of Escherichia coli and host susceptibility determine risk for urinary tract infections. Science Translational Medicine, 2017, 9(382): eaaf1283
[9]
Johnson D I. Bacterial pathogens and their virulence factors. , 2018,
Kaper J B, Nataro J P, Mobley H L T. Pathogenic Escherichia coli. Nature Reviews Microbiology, 2004, 2(2): 123–140
[12]
Vila J, Sáez-López E, Johnson J R. et al. Escherichia coli: an old friend with new tidings. FEMS Microbiology Reviews, 2016, 40(4): 437–463
[13]
Holmes C L, Anderson M T, Mobley H L T. et al. Pathogenesis of gram-negative bacteremia. Clinical Microbiology Reviews, 2021, 34(2): e00234–20
[14]
Yu Z G, Goodall E C A, Henderson I R. et al. Plasmids can shift bacterial morphological response against antibiotic stress. Advanced Science, 2023, 10(2): 2203260
[15]
Debroas D. Global analysis of the metaplasmidome: ecological drivers and spread of antibiotic resistance genes across ecosystems. Microbiome, 2025, 13(1): 77
[16]
Mather A E, Gilmour M W, Reid S W J. et al. Foodborne bacterial pathogens: genome-based approaches for enduring and emerging threats in a complex and changing world. Nature Reviews Microbiology, 2024, 22(9): 543–555
[17]
Croxen M A, Finlay B B. Molecular mechanisms of Escherichia coli pathogenicity. Nature Reviews Microbiology, 2010, 8(1): 26–38
[18]
Du S C, Lin H J, Luo Q. et al. House dust microbiome differentiation and phage-mediated antibiotic resistance and virulence dissemination in the presence of endocrine-disrupting chemicals and pharmaceuticals. Microbiome, 2025, 13(1): 96
[19]
Diard M, Hardt W D. Evolution of bacterial virulence. FEMS Microbiology Reviews, 2017, 41(5): 679–697
[20]
Cui Y X, Hu J X, Peng S S. et al. Limiting resources define the global pattern of soil microbial carbon use efficiency. Advanced Science, 2024, 11(35): 2308176
[21]
Priest N K, Rudkin J K, Feil E J. et al. From genotype to phenotype: can systems biology be used to predict Staphylococcus aureus virulence?. Nature Reviews Microbiology, 2012, 10(11): 791–797
[22]
Wick R R, Judd L M, Gorrie C L. et al. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Computational Biology, 2017, 13(6): e1005595
[23]
Bankevich A, Nurk S, Antipov D. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology, 2012, 19(5): 455–477
[24]
Chen S F. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta, 2023, 2(2): e107
[25]
Zhao Y X, Hu J J, Wang J Q. et al. Comammox nitrospira act as key bacteria in weakly acidic soil via potential cobalamin sharing. iMeta, 2025, 4(1): e271
[26]
Tamura K, Stecher G, Kumar S. MEGA11: molecular evolutionary genetics analysis version 11. Molecular Biology and Evolution, 2021, 38(7): 3022–3027
[27]
Maiden M C J, Bygraves J A, Feil E. et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences of the United States of America, 1998, 95(6): 3140–3145
[28]
Carattoli A, Zankari E, García-Fernández A. et al. In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrobial Agents and Chemotherapy, 2014, 58(7): 3895–3903
[29]
McArthur A G, Waglechner N, Nizam F. et al. The comprehensive antibiotic resistance database. Antimicrobial Agents and Chemotherapy, 2013, 57(7): 3348–3357
[30]
Zhou S Y, Liu B, Zheng D D. et al. VFDB 2025: an integrated resource for exploring anti-virulence compounds. Nucleic Acids Research, 2025, 53(D1): D871–D877
[31]
Olm M R, Bhattacharya N, Crits-Christoph A. et al. Necrotizing enterocolitis is preceded by increased gut bacterial replication, Klebsiella, and fimbriae-encoding bacteria. Science Advances, 2019, 5(12): eaax5727
[32]
Liao H P, Liu C, Zhou S G. et al. Prophage-encoded antibiotic resistance genes are enriched in human-impacted environments. Nature Communications, 2024, 15(1): 8315
[33]
Liao J Q, Guo X D, Weller D L. et al. Nationwide genomic atlas of soil-dwelling Listeria reveals effects of selection and population ecology on pangenome evolution. Nature Microbiology, 2021, 6(8): 1021–1030
[34]
Liu B, Zheng D D, Zhou S Y. et al. VFDB 2022: a general classification scheme for bacterial virulence factors. Nucleic Acids Research, 2022, 50(D1): D912–D917
[35]
Pugh H L, Connor C, Siasat P. et al. E. coli ST11 (O157: H7) does not encode a functional AcrF efflux pump. Microbiology, 2023, 169(4): 001324
[36]
Bethke J H, Davidovich A, Cheng L. et al. Environmental and genetic determinants of plasmid mobility in pathogenic Escherichia coli. Science Advances, 2020, 6(4): eaax3173
[37]
Silva L N, Zimmer K R, Macedo A J. et al. Plant natural products targeting bacterial virulence factors. Chemical Reviews, 2016, 116(16): 9162–9236
[38]
Wang L M, Villafuerte Gálvez J A, Lee C. et al. Understanding host immune responses in Clostridioides difficile infection: implications for pathogenesis and immunotherapy. iMeta, 2024, 3(3): e200
[39]
Büttner D, Bonas U. Getting across—bacterial type III effector proteins on their way to the plant cell. The EMBO Journal, 2002, 21(20): 5313–5322
[40]
Galán J E, Wolf-Watz H. Protein delivery into eukaryotic cells by type III secretion machines. Nature, 2006, 444(7119): 567–573
[41]
Green E R, Mecsas J.. Bacterial secretion systems: an overview.. Microbiology Spectrum, 2016, 4(1):
[42]
Young L S. The role of exotoxins in the pathogenesis of Pseudomonas aeruginosa infections. Journal of Infectious Diseases, 1980, 142(4): 626–630
[43]
Beceiro A, Tomás M, Bou G. Antimicrobial resistance and virulence: a successful or deleterious association in the bacterial world?. Clinical Microbiology Reviews, 2013, 26(2): 185–230
[44]
Didelot X, Walker A S, Peto T E. et al. Within-host evolution of bacterial pathogens. Nature Reviews Microbiology, 2016, 14(3): 150–162
[45]
Larsson D G J, Flach C F. Antibiotic resistance in the environment. Nature Reviews Microbiology, 2022, 20(5): 257–269
[46]
Arnold B J, Huang I T, Hanage W P. Horizontal gene transfer and adaptive evolution in bacteria. Nature Reviews Microbiology, 2022, 20(4): 206–218
[47]
Ares-Arroyo M, Nucci A, Rocha E P C. Expanding the diversity of origin of transfer-containing sequences in mobilizable plasmids. Nature Microbiology, 2024, 9(12): 3240–3253
[48]
Soucy S M, Huang J L, Gogarten J P. Horizontal gene transfer: building the web of life. Nature Reviews Genetics, 2015, 16(8): 472–482
[49]
Castañeda-Barba S, Top E M, Stalder T. Plasmids, a molecular cornerstone of antimicrobial resistance in the One Health era. Nature Reviews Microbiology, 2024, 22(1): 18–32
[50]
Rodríguez-Beltrán J, DelaFuente J, León-Sampedro R. et al. Beyond horizontal gene transfer: the role of plasmids in bacterial evolution. Nature Reviews Microbiology, 2021, 19(6): 347–359
[51]
Liu G, Thomsen L E, Olsen J E. Antimicrobial-induced horizontal transfer of antimicrobial resistance genes in bacteria: a mini-review. Journal of Antimicrobial Chemotherapy, 2022, 77(3): 556–567
[52]
Long J Z, Zhang J F, Xi Y Y. et al. Genomic insights into CRISPR-harboring plasmids in the Klebsiella genus: distribution, backbone structures, antibiotic resistance, and virulence determinant profiles. Antimicrobial Agents and Chemotherapy, 2023, 67(3): e01189–22
[53]
Pilla G, Tang C M. Going around in circles: virulence plasmids in enteric pathogens. Nature Reviews Microbiology, 2018, 16(8): 484–495
[54]
Rillig M C, Ryo M, Lehmann A. et al. The role of multiple global change factors in driving soil functions and microbial biodiversity. Science, 2019, 366(6467): 886–890
[55]
Zhao R N, Hao J, Yang J T. et al. The co-occurrence of antibiotic resistance genes between dogs and their owners in families. iMeta, 2022, 1(2): e21
[56]
Mathers A J, Peirano G, Pitout J D D. The role of epidemic resistance plasmids and international high-risk clones in the spread of multidrug-resistant Enterobacteriaceae. Clinical Microbiology Reviews, 2015, 28(3): 565–591
[57]
Hafner L, Gadin E, Huang L. et al. Differential stress responsiveness determines intraspecies virulence heterogeneity and host adaptation in Listeria monocytogenes. Nature Microbiology, 2024, 9(12): 3345–3361
[58]
Qin S G, Xiao W, Zhou C M. et al. Pseudomonas aeruginosa: pathogenesis, virulence factors, antibiotic resistance, interaction with host, technology advances and emerging therapeutics. Signal Transduction and Targeted Therapy, 2022, 7(1): 199
[59]
World Health Organization. Microbiological hazards in fresh leafy vegetables and herbs: meeting report. Geneva: World Health Organization, 2008
[60]
Eppinger M, Mammel M K, Leclerc J E. et al. Genomic anatomy of Escherichia coli O157: H7 outbreaks. Proceedings of the National Academy of Sciences of the United States of America, 2011, 108(50): 20142–20147
[61]
Belias A M, Sbodio A, Truchado P. et al. Effect of weather on the die-off of Escherichia coli and attenuated Salmonella enterica Serovar Typhimurium on preharvest leafy greens following irrigation with contaminated water. Applied and Environmental Microbiology, 2020, 86(17): e00899–20
[62]
Jiao N Z, Luo T W, Chen Q R. et al. The microbial carbon pump and climate change. Nature Reviews Microbiology, 2024, 22(7): 408–419
[63]
Tarnawski S, Hamelin J, Jossi M. et al. Phenotypic structure of Pseudomonas populations is altered under elevated pCO2 in the rhizosphere of perennial grasses. Soil Biology and Biochemistry, 2006, 38(6): 1193–1201
[64]
Zhao Y X, Liu Z S, Zhang B F. et al. Inter-bacterial mutualism promoted by public goods in a system characterized by deterministic temperature variation. Nature Communications, 2023, 14(1): 5394
[65]
Singh B K, Bardgett R D, Smith P. et al. Microorganisms and climate change: terrestrial feedbacks and mitigation options. Nature Reviews Microbiology, 2010, 8(11): 779–790
[66]
Balagué C E, De Ruiz C S, Rey R. et al. Effect of the herbicide 2, 4-dichlorophenoxyacetic acid on uropathogenic Escherichia coli virulence factors. Toxicology, 2002, 177(2–3): 143–155
[67]
Ye M, Zhang Z Y, Sun M M. et al. Dynamics, gene transfer, and ecological function of intracellular and extracellular DNA in environmental microbiome. iMeta, 2022, 1(3): e34
[68]
Springmann M, Kennard H, Dalin C. et al. International food trade contributes to dietary risks and mortality at global, regional and national levels. Nature Food, 2023, 4(10): 886–893
[69]
Ciotti M, Ciccozzi M, Terrinoni A. et al. The COVID-19 pandemic. Critical Reviews in Clinical Laboratory Sciences, 2020, 57(6): 365–388
[70]
Ko K K K, Chng K R, Nagarajan N. Metagenomics-enabled microbial surveillance. Nature Microbiology, 2022, 7(4): 486–496
[71]
Xiao S Q, Yuan Z M, Huang Y. Disinfectants against SARS-CoV-2: a review. Viruses, 2022, 14(8): 1721
[72]
Pokorzynski N D, Groisman E A. How bacterial pathogens coordinate appetite with virulence. Microbiology and Molecular Biology Reviews, 2023, 87(3): e00198–22
[73]
Toledo-Arana A, Dussurget O, Nikitas G. et al. The Listeria transcriptional landscape from saprophytism to virulence. Nature, 2009, 459(7249): 950–956
[74]
Zhou J N, Ma H M, Zhang L H. Mechanisms of virulence reprogramming in bacterial pathogens. Annual Review of Microbiology, 2023, 77: 561–581
RIGHTS & PERMISSIONS
The Author(s) 2026. This article is published by Higher Education Press.