1 Introduction
Pollen and plant macrofossils are the most common proxies for reconstructing past vegetation and exploring past human-environment interactions. Pollen analysis is superior in representing vegetation changes at regional or even broader spatial scales, while plant macrofossil analysis is better suited for local plant composition (
Birks, 2003;
Jørgensen et al., 2012;
Zhou et al., 2020). We can obtain generally reliable past vegetation patterns based on the combination of these two proxies, but pollen analysis does not provide definite evidence for local vegetation changes and human–environment interactions as its taxonomic resolution is typically restricted to genus or even family level (
Madeja et al., 2009;
Zhou et al., 2020), while plant macrofossils could complement the limitations of pollen, but are mostly poorly preserved in environmental archives (
Jørgensen et al., 2012). Hence, a vegetation proxy that is abundant and allows studies at higher taxonomic resolution would be of benefit.
Plant environmental DNA extracted from lacustrine sediment (sedimentary DNA, sedDNA) has being employed increasingly in the investigation of past vegetation particularly in temporal changes of plant diversity (
Anderson-Carpenter et al., 2011;
Jørgensen et al., 2012;
Parducci et al., 2012;
Alsos et al., 2016,
2018;
Huang et al., 2021;
Liu et al., 2021;
Wang et al., 2021). It has potential to reveal past vegetation changes and human impacts at high taxonomic resolution, which could complement the limitations of palynological evidence. However, its reliability has not been fully assessed (
Parducci et al., 2012;
Jia et al., 2021). For instance, how representative lacustrine sedDNA is of the vegetation community surrounding the lake, its source areas, transportation processes, and preservation in sediments are still unclear. Systematic research on plant sedDNA extracted from lake surface sediments will help to ensure the reliability of research into past ecology and its driving mechanisms (
Birks and Birks, 2016).
Temporal changes, driving factors, and human impacts on the fragile Tibetan ecosystem have recently become hot scientific topics, because of enhancing intensity of human land-use in this cold, arid, and hypoxic environment (
Miehe et al., 2019;
Piao et al., 2019;
Chen et al., 2021). Understanding the long-term change patterns and driving factors of the ecosystem is essential for predicting vegetation trends and for developing sustainable strategies (
Piao et al., 2015;
Abel et al., 2021). Hitherto, pollen analysis using dated sediment samples is the primary approach to explore long-term past vegetation changes, but uncertainties and contradictions still exist due to the low taxonomic resolution in identifying and attributing the influence of long-distance pollen grains transported by wind and rivers to the pollen assemblages (
e.g., Zhu et al., 2015;
Cao et al., 2021). For instance, pollen spectra are dominated by a few herbaceous taxa (including Cyperaceae,
Artemisia, Poaceae, and Chenopodiaceae) with low taxonomic resolution, which causes uncertainty in the history of human impacts on vegetation (
e.g., Herzschuh et al., 2010;
Miehe et al., 2014); arboreal pollen grains occur generally in pollen assemblages, with particularly high abundances in samples from the Last Glacial Maximum across the entire Tibetan Plateau (e.g., Qinghai Lake,
Shen et al., 2005; Luanhaizi Lake,
Herzschuh et al., 2006; Tangra Yumco Lake,
Ma et al., 2019; Mabu Co,
Han et al., 2021), which are generally regarded as transported by wind from a long distance and have a strong influence on past vegetation and climate reconstructions. Hence, the use of plant sedDNA metabarcoding approach will benefit palaeoecological research on the Tibetan Plateau; its feasibility has already been confirmed and is aided by the cold and arid environment (
e.g., Liu et al., 2021;
Jia et al., 2022a).
To improve the understanding of sedDNA as a proxy on the Tibetan Plateau, studies on its relationship to vegetation and comparison with the pollen-to-vegetation relationship are essential. In this study, we extracted pollen assemblages from 27 lake surface-sediment samples collected from the central-eastern Tibetan Plateau and compared them with their previous plant sedDNA data (
Stoof-Leichsenring et al., 2020) to investigate their similarity and differences in representing surrounding vegetation communities. We aim to determine (i) the source area of lake sedDNA and (ii) how to combine the two approaches when investigating past environmental changes.
2 Study area
Our study area is located on the central-eastern Tibetan Plateau (TP), extending along a transect from 31.6°N to 35.5°N and 91.9°E–99.8°E with an elevation range from 3847 to 5168 m a.s.l. (Fig. 1, Tab.1). Climate is mainly controlled by the Asian Summer Monsoon in summer with relatively warm and wet climatic conditions, and by westerlies in winter with cold and dry conditions (
Wang, 2006). The main vegetation type is alpine meadow with steppe and sporadic patches of subalpine shrub (Fig.1). The plant communities of alpine meadow are generally dominated by
Kobresia species (Cyperaceae), with common taxa including Poaceae, Ranunculaceae, Asteraceae,
Polygonum (Polygonaceae),
Potentilla (Rosaceae), Fabaceae, and Caryophyllaceae; the subalpine shrub is generally distributed on the northern slopes of mountains with
Salix oritrepha and
Potentilla fruticosa as the main shrub components, while the herbaceous taxa mentioned above are also common (
Wu, 1995;
Herzschuh et al., 2010; and unpublished vegetation survey).
3 Materials and methods
3.1 Sample collection
We collected 27 modern lake sediment samples from the central-eastern TP in July and August 2018 (Tab.1). All the selected lakes (or pools) are shallow and small with a radius less than 100 m, to reduce the influence of long-distance pollen grains transported by wind or rivers. The 27 samples were collected from the central part of each lake in order to reduce the influence of local plant communities from the lake shore. The samples comprised the top 2 cm of lake sediment for each site.
3.2 Pollen analysis
Approximately 10 g of wet sediment were taken for each sample. Before pollen extraction, a tablet of
Lycopodium spores (27560 grains/tablet) was added to each sample as tracers (
Maher, 1981). Pollen extraction followed the standard acid-alkali-acid procedures described in
Fægri and Iversen (1975), including 10% HCl, 10% KOH, 40% HF and 9:1 mixture of acetic anhydride and sulphuric acid successively, and followed by 7-μm-mesh sieving. Pollen grains were identified using a Zeiss microscope at 400 × magnification. Pollen identification was based on the relevant literature (
Wang et al., 1995;
Tang et al., 2017) and on the modern pollen reference slides collected from the central and eastern Tibetan Plateau (including 401 common species of alpine meadow;
Cao et al., 2020). At least 500 terrestrial pollen grains were counted for each sample. Percentages of terrestrial pollen taxa were calculated based on their sum. The modern pollen data set is available online (
Cao et al., 2021).
3.3 Plant sedDNA metabarcoding analysis
Detailed descriptions of the molecular genetic laboratory work, DNA sequencing, bioinformatic assessments, and downstream data filtering are presented in
Jia (2020) and
Stoof-Leichsenring et al. (2020).
In brief, DNA extractions were performed in a dedicated DNA isolation and pre-PCR laboratory at Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research (AWI) in Potsdam, Germany. They were finished under a UV cabinet (Biosan, USA), which is used only for DNA extraction of modern environmental samples. The DNA isolation laboratory is physically separated from the post-PCR area to prevent contamination of DNA samples with PCR products. Approximately 5–10 g of lake sediment was taken and processed using the DNeasy PowerMax Soil Kit (Qiagen, Germany). One blank sample was included for each batch of 10 samples and processed in the same way as the sediment samples. PCRs were performed with the “
g” and “
h” universal plant primers for the P6 loop region of the chloroplast
trnL (UAA) intron (
Taberlet et al., 2007). PCR products were purified with MinElute PCR Purification Kit (Qiagen, Germany), quantified with Qubit 2.0 fluorometer (Invitrogen, USA), and pooled in equimolar concentrations for next-generation sequencing by the external sequencing service at Fasteris SA, Switzerland.
The bioinformatic assessment was conducted with the OBITools package (
Boyer et al., 2016). Taxonomic assignment was based on the sequence reference database for Arctic and Boreal vascular plants and bryophytes (
Sønstebø et al., 2010;
Willerslev et al., 2014;
Soininen et al. 2015) and the European Molecular Biology Laboratory (EMBL) nucleotide database version 133 (
Kanz et al., 2005; The European Nucleotide Archive (ENA) website). The following taxa were further removed from the data set: (i) taxa with less than 100% identity; (ii) taxa that are not naturally found in China; (iii) taxa that occur only once in the whole data set; (iv) taxa that occur in the extraction blanks and PCR negative template controls with significantly high sequence counts. The sedDNA data set is available in
Stoof-Leichsenring et al. (2020) and
Jia et al. (2022b). The percentages for sequence types were calculated after rarefaction (
Birks and Line, 1992).
3.4 Climate data
To obtain modern climatic data for the 27 sampled lakes, the Chinese Meteorological Forcing Data set (CMFD; gridded near-surface meteorological data set) with a temporal resolution of three hours and a spatial resolution of 0.1° was employed (
He et al., 2020). The CMFD is made through the fusion of remote-sensing products, reanalysis data sets, and in situ station data between January 1979 and December 2018 (
He et al., 2020). Geographical distances of each sampled lake to each pixel in the CMFD were calculated based on their longitude/latitude coordinates using the
rdist.earth function in the fields package version 9.6.1 (
Nychka, et al., 2019) for R (version 3.6.0;
R Core Team, 2019), and the climatic data of the nearest pixel to a sampled lake were assigned to represent the climatic conditions of that lake. Finally, the mean annual precipitation (
Pann; mm), mean annual temperature (
Tann; °C), and mean temperature of the coldest month (
Mtco; °C) and warmest month (
Mtwa; °C) were calculated for each sampled lake.
3.5 Ordination analysis
To visualize the relationships between modern pollen/sedDNA assemblages and climatic variables, ordination techniques were performed based on the square-root transformed percentages of 26 pollen taxa and 69 sedDNA taxa (those present in at least 1 sample and with a maximum relative abundance ≥ 1%) to stabilize variances and optimize the signal-to-noise ratio (
Prentice, 1980). Detrended correspondence analysis (DCA;
Hill and Gauch, 1980) showed that the length of the first axis of the pollen and sedDNA data was 1.12 SD (standard deviation units) and 3.23 SD respectively, indicating a linear response model is suitable for our pollen data and still useful for the sedDNA data sets (
ter Braak and Verdonschot, 1995). Thus, we performed redundancy analysis (RDA) to visualize the distribution of pollen/sedDNA species and sampling sites along the climatic gradients, selecting the minimal adequate model using forward selection and checking the variance inflation factors (VIF) at each step. VIF values higher than 20 indicate that some variables in the model are co-linear, so we stopped adding variables (
ter Braak and Prentice, 1988). These ordinations were performed in
R using the
decorana and
rda functions in the
vegan package version 2.5-4 (
Oksanen et al., 2019).
4 Results
4.1 Modern pollen assemblages
Fifty-three pollen taxa (belonging to 39 families) were identified from the 27 lake surface-sediment samples, including 27 taxa at family level, 25 taxa at genus level, and 1 taxon at species level. Pollen assemblages are dominated by herbaceous pollen taxa, in which Cyperaceae is the dominant taxon (mean 58.0%, maximum 94.4%), while Poaceae (mean 13.5%, maximum 87.7%), Artemisia (mean 6.0%, maximum 23.6%), Ranunculaceae (mean 5.4%, maximum 25.4%), Polygonum (mean 2.0%, maximum 11.7%), Thalictrum (mean 1.8%, maximum 6.1%), and Asteraceae (including three types Aster, Saussurea, and Taraxacum; mean 2.1%, maximum 5.3%) are the common herbaceous taxa. Salix (mean 0.8%, maximum 5.3%) is the major shrub taxon, while arboreal taxa occur with low percentages generally (mean total arboreal percentage 1.4%, maximum 5.8%), mainly comprising Pinus (mean 0.5%, maximum 1.8%), Betula (mean 0.1%, maximum 0.7%), and Alnus (mean 0.1%, maximum 0.6%) (Fig.2). The compositions and their abundances in the pollen data set represent the plant community of the alpine meadow well on the central-eastern Tibetan Plateau.
4.2 Description of sedDNA sequence data
SedDNA metabarcoding of the 27 samples has higher taxonomic resolution than the pollen identification. SedDNA sequences are assigned into 153 unique sequence types, including 60 species, while others are at family or genus level. These sequence types belong to 37 families, including Poaceae, Cyperaceae, Asteraceae, Chenopodiaceae, Rosaceae, and Apiaceae, which represents the alpine meadow community well in general (Fig.2 and Fig.3; Tab.2). For instance, the family Cyperaceae (mean 4.5%, maximum 28.3%) is represented by 11 types, including Blysmus compressus, Trichophorum pumilum, Carex atrofusca, Carex maritima, Carex microglochin, Carex parva, and two undifferentiated genera of Carex and Kobresia, separately, among which, one undifferentiated type of Kobresia (mean 1.9%, maximum 11.3%) and Carex maritima (mean 1.2%, maximum 13.6%) have the most frequent occurrence (Tab.3). The family Poaceae (mean 2.5%, maximum 28.3%) is represented by 11 types, among them only one type at species level (Stipa breviflora) while most types are at subfamily or tribe level (Tab.3). Metabarcoding can identify different sequences (e.g., for Apiaceae: Apiaceae_st2, Apioideae_st3, and Apioideae_st4), although it can only assign a few to species (e.g., only Carum carvi for Apiaceae) in our study (Tab.3).
The components are generally consistent with the plant community of alpine meadow on the central-eastern Tibetan Plateau, however, the abundances of the sedDNA sequence types differ from the pollen abundances, sometimes quite strongly (Fig.2 and Fig.3). Pollen assemblages are dominated by Cyperaceae (mean 58.0%, maximum 94.4%), while Asteraceae (mean 26.9%, maximum 82.3%), Polygonaceae (mean 21.8%, maximum 76.2%), Rosaceae (mean 12.1%, maximum 78.7%), and Ranunculaceae (mean 11.7%, maximum 90.0%) are the most common types in the sedDNA assemblages (Fig.3). Families of Cyperaceae and Poaceae have higher percentages in the pollen assemblages than in the sedDNA data set, while families of Asteraceae, Polygonaceae, Rosaceae, and Ranunculaceae have significantly lower abundances in the pollen assemblages than that in the sedDNA data set (Fig.4). In addition, the sedDNA data set has greater variation in percentages of types among samples, for instance, percentages of Asteraceae range from 0% to 82.3% with a 26.9% mean value (Fig.3), giving quite a high standard deviation of 23.4%.
4.3 RDA results
Our study area has a Pann gradient from 225 to 689 mm, and cold thermal conditions with low Tann (−7.3°C to 2.3°C) and Mtco (−19.2°C to −7.4°C). RDA reveals that Pann explains more of the pollen assemblage variation (10.8% as a sole predictor) in the data set than the two thermal variables (Tab.4). The RDA biplot shows that the direction of the Pann vector has a smaller angle with the positive direction of Axis 1 (capturing 43.2% of the total inertia in the data set) than with the positive direction of Axis 2 (10.3%), indicating that the major component of Axis 1 is likely to be effective moisture. The RDA separates the pollen taxa into two groups generally: Cyperaceae, Ranunculaceae, Rosaceae, and Salix indicating wet climatic conditions, and Poaceae, Artemisia, and Chenopodiaceae indicating relatively warm and dry climatic conditions (Fig.5).
RDA was performed for the sedDNA data set using the 69 major types and the 22 families separately, and both RDAs reveal that the importance of Pann decreases in explaining the variation in the sedDNA assemblages, while the importance of thermal variables increases slightly, relative to that for pollen assemblages (Tab.5 and Tab.6). The RDA biplot for the 22 sedDNA families shows that Tamaricaceae is located in the positive direction of Pann while Asteraceae occurs in its negative direction, along with Poaceae and Cyperaceae, which are different from the biplot of the pollen data (Fig.5).
5 Discussion
Relative to pollen assemblages extracted from surface soils, those extracted from lake surface sediments should be lightly influenced by local plant compositions and represent well the regional vegetation because of the larger pollen source area (
e.g., Jacobson and Bradshaw, 1981;
Jackson, 1990;
Tian et al., 2020;
Liu et al., 2023). In this study, the pollen assemblages obtained from the 27 lake surface sediments have low variation between samples (Fig.2), reflected by the short length of the DCA first axis (1.12 SD). The similar pollen assemblages represent the alpine meadow well, because Cyperaceae is the dominant family in both the vegetation community and the pollen assemblages on the central-eastern Tibetan Plateau (Tab.1).
The components identified by the sedDNA metabarcoding are consistent with the components in pollen assemblages and plant communities generally (Tab.2 and Tab.3;
Wang and Herzschuh, 2011). However, the abundances of sedDNA sequence types have larger variation between samples, as seen by the longer first DCA axis (3.23 SD) than that for the pollen data. This implies that sedDNA assemblages are strongly influenced by local plants surrounding the lakes. Hitherto, the available studies on modern sedDNA conclude that sedDNA assemblages better represent the local plant community than the regional vegetation, which our data support. For instance,
Yoccoz et al. (2012) investigated soil environmental DNA and found that crop DNA sequences which occur in cultivated land were absent from nearby uncultivated plots, indicating that the eDNA reflects the local plant composition. Similarly,
Alsos et al. (2018) found that 73% and 12% of sedDNA taxa from lake surface sediments were recorded in vegetation surveys within 2 m and about 50 m away from the lakeshore, respectively.
Arboreal pollen grains occur in the 27 pollen assemblages at low abundances, while corresponding trees are absent around the sampled lakes (Fig.2). These arboreal pollen grains occurring in the non-forest areas on the Tibetan Plateau are confirmed as long-distance transported by wind (as exogenous pollen grain;
Cao et al., 2021;
Wang et al., 2022), which indicates pollen assemblages extracted from lakes should represent regional vegetation. These exogenous taxa are not identified in the sedDNA metabarcoding, suggesting that the pollen grains have no influence on the sedDNA sequences on the Tibetan Plateau, as has been noted in previous studies (
Alsos et al., 2016;
Niemeyer et al., 2017;
Parducci et al., 2019). In summary, we show that the plant sedDNA assemblages extracted from small lakes represent the local plant components surrounding the lake.
SedDNA metabarcoding has the capability of identifying plants at higher taxonomic resolution than optical pollen identification, and many of them can be assigned to species level (Tab.3). Among the sedDNA sequence types, some of them fail to be identified to genus or species level, possibly because vascular plants included in the sequence reference database are mostly collected from Arctic and Boreal environments (
Sønstebø et al., 2010;
Willerslev et al., 2014), but they can be separated into different types such as
Carex_st1,
Carex_st2,
Kobresia_st1,
Kobresia_st2. Hence, we argue that the sedDNA metabarcoding is a powerful approach in reconstructing past biodiversity and in exploring presence or absence of plants, which will only improve in the future as the reference databases are extended.
SedDNA metabarcoding has some limitations, however, such as the bias of PCR amplification, which could cause the abundance of a sedDNA sequence type to not reflect its actual abundance in the community. Taking Cyperaceae as an example, its poor detection in sedDNA could be caused (at least partly) by the long sequence length of the “
g” and “
h” primer for
Carex and
Eriophorum (> 80 bp;
Alsos et al., 2018;
Jia et al., 2022a), potentially explaining its low abundances in sedDNA assemblages from the alpine meadow despite Cyperaceae being the dominant plant. In a future step, two specific primers, targeting the ITS (internal transcribed spacer) region (
Baamrane et al., 2012;
Willerslev et al., 2014), are recommended to be tested to determine to what extent they could improve the taxonomic resolution and abundance of Cyperaceae and Poaceae on the Tibetan Plateau (
Jia et al., 2022a). In summary, the strong influence of local plant components and the PCR bias might prevent abundances in sedDNA assemblages from being used directly to reconstruct past plant cover and past climate.
6 Conclusions
The reliability and representation of plant sedDNA extracted from lake surface sediments from the central-eastern Tibetan Plateau are assessed as a way of exploring past vegetation. By comparing sedDNA and pollen assemblages extracted from 27 lake surface-sediment samples, we find that the identified sedDNA sequence types and pollen taxa are generally consistent with plant components of alpine meadow, and sedDNA assemblages have higher taxonomic resolution. Relative to pollen assemblages, sedDNA data are strongly influenced by local plants while rarely affected by exogenous plants, indicating that sedDNA metabarcoding is a robust approach for reconstructing past biodiversity. Because of the strong influences from local plants and PCR bias, the abundances of sedDNA sequence types vary greatly among sampled sites and should be treated cautiously when investigating past vegetation cover and climate.