1 Introduction
Plant roots closely interact with the microbial community surrounding them, creating a close and inter-dependent cross-talk (
Schlaeppi and Bulgarelli, 2015;
Cordovez et al., 2019). Plants secrete different exudates through their root system to ‘attract’ specific assemblages of microbial communities (
Sasse et al., 2018). This ʻselectionʼ enriches specific bacterial populations from the surrounding soil, and different plants share a core root microbiome that is distinct from the bulk soil, despite belonging to different botanical families (
Levy et al., 2018). Specific root microbial community members play key roles in plant development, growth and health through secretion of enzymes and phytohormones, nutrient provision (e.g., nitrogen and phosphorus) and production of metabolites that help defend plants against micro- and macro-pathogens, through direct antagonism or induced resistance (
Tyc et al., 2017;
Sharrar et al., 2020).
Polyketides and non-ribosomal peptides are specific secondary metabolite (SM) families synthesized by large enzyme complexes termed non-ribosomal peptide synthases (NRPSs) and polyketide synthetases (PKSs). These enzymes are typically organized on operon-like structures termed biosynthetic gene clusters (BGCs), which employ a modular arrangement composed of repeating domains that assemble the compound in a step-wise manner (
Jenke-Kodama et al., 2005;
McErlean et al., 2019). Polyketides and non-ribosomal peptides are known for their diverse structure and broad array of functions, which include antimicrobials, pigments and siderophores (
Wang et al., 2014). While many of these compounds have been exploited in medicine, industry and agriculture, we now realize that soil microbiomes contain a diverse array of unknown BGCs, and we have only touched the tip of the iceberg regarding their potential ecological function and applicative potential (
Dror et al., 2020b;
Waschulin et al., 2021).
Most of the research conducted to date on bacterial SM-encoding activity has focused on individual cultured isolates, using customized bioinformatic tools (
Eastman et al., 2014;
de Bruijn et al., 2015;
Nguyen et al., 2018). However, as most soil bacteria cannot be cultivated by existing methods, a huge fraction of bacterial SM remains untapped (
Bodor et al., 2020;
Dror et al., 2020a). To assist in revealing this ‘uncultured fraction’, culture-independent methods and pipelines have been developed and applied to mine BGC from environmental microbiomes. These include functional gene amplicon sequencing and extracting SM-encoding genes from shotgun meta–omics data (
Amos et al., 2015;
Charlop-Powers et al., 2016). By applying these techniques, we recently found that the composition and diversity of NRPS and PKS encoding BGCs are distinct in lettuce vs. tomato roots and that both are significantly different when compared to bulk soil. We also identified specific BGCs ubiquitous to plant roots that may play a role in plant growth and health (
Dror et al., 2020b).
This study exploited in-silico pipelines previously developed in our lab to investigate NRPS and PKS composition, diversity, taxonomy and potential function in soil and root-associated microbial communities from a broad range of geographic regions and plant types. We pinpointed novel SM families unique to each of these environments, which may play important ecological functions and can have future translational potential.
2 Methods
2.1 Shotgun metagenome datasets
Assembled soil and root-associated shotgun metagenomes were searched and downloaded from the MG-RAST and JGI-IMG repositories using the keywords: ‘shotgun metagenome’, ‘terrestrial biome’, ‘root’ and ‘rhizosphere’. The source of each dataset was validated by referring to publications linked to the data. Only well characterized datasets with sufficient metadata and >200 000 contigs were selected for further analyses ( Tab.1). Soil texture and pH for the selected samples were obtained from the website www.SoilGrid.com, using coordinates provided in the repositories or associated manuscripts.
2.2 NRPSs/PKSs mining and annotation
Raw metagenome FASTA files were filtered to include only contigs longer than 300 bp. These were used as input for ORF prediction with Prodigal, and filtered again for ORFs > 80 bp. To avoid redundancy, CD-HIT was applied (using a >97% identity cutoff), and subsequently, ORFs were randomly selected from each dataset using the seqtk sample command to avoid dataset size bias. NRPS/PKS annotation was performed by aligning sampled ORFs to AD or KS domains (for NRPSs and PKSs, respectively) downloaded from the antiSMASH-DB API platform (
Blin et al., 2019) using DIAMOND blastx (>50% identity, >50% query cover, max E value 10
−10). Only alignments longer than 70 bp were used for downstream analyses.
2.3 Taxonomy annotation
Annotated AD and KS genes were taxonomically assigned by running Diamond against the non-redundant (nr) Blast NCBI database (08/2016 version). The lowest common ancestor classification was determined using MEGAN V6.21.12 (
Huson et al., 2016) by taking the top 20 percent of the hits and filtering for a minimum score of 50 and maximum expect value of 0.0. Genes identifiers were then converted to full taxonomy using the mapping feature provided by MEGAN (11.2018 version). A custom Python code was used to aggregate all genes by environment (soil and roots) and according to different taxonomic levels. These outputs were visualized using R.
2.4 Statistics
Statistical analyses were performed using R. Statistical differences in the presence of AD and KS sequences between studied environments (soil vs. roots) were determined using the Student
t-test (following Shapiro-Wilk test for normality). Alpha and beta diversities of samples were calculated using the R package vegan (
Oksanen et al., 2013), with Aitchison transformation to account for compositional data. To assess differences in diversity between environments, principal coordinate analysis (PCoA) and analysis of similarity (ANOSIM) were performed. Differences were considered statistically significant when
P values were lower than 0.05. Python and R scripts can be found on github using the link https://github.com/barakdror/microbial_ecology_R.
3 Results
3.1 Soil and root shotgun metagenome datasets
Over 100 metagenome datasets were initially screened for our analysis, however, following a robust filtering process to prevent methodological biases, only 11 root-associated and 9 soil metagenomes (primarily from North and South America) were selected for in-silico analyses (Tab.1). The root-associated datasets included a variety of host vegetables (tomato, lettuce, cucumber, and potato), cereals (wheat, corn, switchgrass and miscanthus) and trees (poplar).
3.2 NRPSs and PKSs abundance and presence in rhizospheres and soils
We aligned the predicted ORFs within each dataset against a reference database composed of antiSMASH-generated adenylation (AD) and ketosynthase (KS) domain sequences (markers for NRPSs and PKSs, respectively), and subsequently calculated the percentage of AD and KS genes in each dataset (i.e., number of unique AD/KS genes found divided by the total number of ORFs predicted in the dataset). Generally, we saw a significant positive correlation between the relative abundance of AD and KS domains (Fig.1, R=0.85, p-value< 0.01, Pearson correlation). This trend was not related to sample environment (roots vs. soil), as among the top five AD/KS domains-rich datasets, three were root-associated (tomato, wheat and lettuce) and two were soil-associated (Colorado and San Felipe). Moreover, we saw no significant difference in the presence of AD or KS domains in root vs. soil samples when pooled together (Fig.1 and Fig.1, respectively). We also did not find any correlation between AD or KS domain abundance and soil pH or latitude (Figs. S1 and S2).
3.3 Diversity and taxonomy of NRPSs and PKSs in studied samples
No statistical differences were detected in alpha diversity of soil- and root-associated samples for either NRPS or PKS associated genes (Shannon index, Student t-test, p-value>0.05, Fig. S3). However, principal coordinate analysis revealed significant differences in the composition of root-associated NRPS and PKS relative to those from bulk soil samples, based on analysis of AD (ANOSIMR=0.16, p-value<0.05,Fig.2) and KS domains (ANOSIM R=0.21, p-value<0.05,Fig.2), respectively.
We assessed whether the observed difference in NRPSs and PKSs composition between soil and roots derives from an inherent difference in the taxonomy of the bacteria harboring these genes. We taxonomically annotated both the total ORFs sampled within each dataset, and the mined NRPS and PKS within these datasets using the NCBI nr database. Resulting annotations were taxonomically assigned using MEGAN and grouped at different taxonomic levels. At the class level the overall bacterial communities were phylogenetically diverse and soil samples were substantially different than roots. However, the taxonomic affiliation of NRPS and PKS was much narrower, and these communities were primarily annotated to Actinobacteria, Alphaproteobacteria and Betaproteobacteria (Fig.3). In addition, ~35%−40% of the NRPS and PKS in both soil and roots datasets could not be assigned to specific bacterial taxa (compared with ~10%−15% of the total soil and root ORFs). A similar trend was observed at the family level (Fig.3). Most of the NRPSs and PKSs in soil and root samples could not be taxonomically characterized (~70%−75%), though some root-associated NRPS and PKS were associated with Streptomycetaceae and Mycobacteriaceae, whereas certain soil NRPS and PKS were associated with Corynebacteriaceae andBradyrhizobiaceae.
3.4 Commonality and novelty of NRPSs and PKSs in soil vs. root-associated environments
Analysis of the distribution of extracted NRPSs and PKSs between soil- and root-associated datasets, revealed a relatively low fraction of common SM-encoding genes between environments (Fig.4). To identify potentially novel NRPSs and PKSs, we aggregated the soil and root associated AD and KS domains, and divided them into three groups (70%−85%, 85%−97% and >97%), based on identity of amino-acids to antiSMASH-DB reference sequences. Overall, most of the AD and KS hits in both soils and roots were classified as potentially novel (70%−85% identity, Fig.5), whereas a very small fraction (less than 4%) were classified as known (i.e., identified in previous studies, >97% identity to reference sequences).
4 Discussion
The rhizosphere microbiome is a subset of the surrounding soil microbial community, however, the organic oasis created by root exudates and mucilage facilitate proliferation of specific bulk soil populations, whose composition are dictated by specific plant species (
Peiffer et al., 2013;
Kawasaki et al., 2016;
Qiao et al., 2017). Secondary metabolites (SMs) are believed to play a critical role in both soil and root-associated microbiomes (
Tyc et al., 2017), but little is known about the abundance and potential function of SMs in these environments and their distribution in soil vs. root ecosystems. In this study, we explored and characterized soil- and root-associated SM-encoding genes specifically focusing on root-associated SMs with potential impact on host plants.
The composition of bacterial NRPS and PKS differed between soil and roots, implying a distinct SM-encoding capacity within each environment. While the observed pattern may stem from issues related to the complexity of the soil versus the root microbiome and consequently from sampling depth as the rhizosphere microbiome is a derivative of the larger soil microbiome pool, our findings point toward the selective power the host plant may have on the surrounding bacterial community. These results coincide with previous studies that found the rhizosphere microbiome to be enriched in genes related to ecosystem-specific functional processes such as transport, primary metabolism (
Yan et al., 2017;
Lee et al., 2019), nitrogen cycling (
Schmidt et al., 2019), motility, chemotaxis and plant cell-wall degradation (
Ofek-Lalzar et al., 2014).
We also found soil and root microbial communities are rather similar in their potential to encode for NRPS and PKS, regardless of soil pH. However, there was a clear and significant positive correlation between NRPS and PKS ratios in individual samples. This can reflect the presence of SM-rich bacterial taxonomic families within some samples, and their absence in others. This is supported by previous studies that found soil bacterial structure and functional traits to differ between biomes (
Fierer et al., 2009;
Noronha et al., 2017). We found that in root-associated samples, NRPSs and PKSs were commonly associated with
Actinobacteria,
Alphaproteobacteria, Betaproteobacteria,
Gammaproteobacteria and
Bacteroidetes, whereas in soil they were exclusively linked to a higher share of
Actinobacteria and
Alphaproteobacteria. The fact most of the mined NRPSs and PKSs had low identity (70%−85%) when compared to the antiSMASH-DB database, emphasize the huge untapped diversity of these SM families.
Comparative analysis of environmental shotgun metagenomic data such as the one presented here, have the potential to link functional genes and processes to specific ecosystems, and therefore, future research should complement this study and expand the number of datasets used, to quantitatively determine the abundance of selected NRPSs and PKSs in soil vs. root environments. However, as previously discussed (
Eckert et al., 2020;
Vangay et al., 2021), a large fraction of environmental shotgun metagenome data is not deposited into public repositories, and those that are frequently lack details and metadata that is crucial for enabling robust comparative analyses. There is a vital need for repositories to draft submission standards or adopt previously established guidelines (e.g., MIxS) (Yilmaz et al., 2011) that will ensure a minimal set of background data for submission of environmental shotgun metagenome.
5 Conclusions
This study found soil and root microbial communities to differ in their SM-encoding genes at a global scale, regardless of plant host and soil properties. We found this distinction is likely derived from specific taxonomies of the bacterial strains residing in each environment. Given the immense amount of potentially novel sequences that likely encode metabolites with unknown function, future research will aim to pinpoint, characterize, and express root-specific BGCs in order to elucidate their in-vitro and in-planta function. This process also needs to be conducted for BGC that are profuse in soil and root environments that encode for metabolites that are currently only known for clinical activity.