Introduction
Cassava (
Manihot esculenta) is an extremely important tropical crop ranked in the top three tuber/root crops along with potato and sweet potato. Its tropical origination makes cassava a unique staple food in tropical and subtropical areas. Over 18 Mhm
2 of cassava is grown widely distributed from 30° N to S and feeds over 600 million people. Given the food shortage and demand for biofuels, cassava can make an increasingly important contribution as a highly efficient starch crop, because of its high potential photosynthesis and starch accumulation, and strong adaption to drought and barren soils. It is reported that cultivated cassava has a higher rate of photosynthesis than C
3 plants
[ 1], but its wild progenitor,
M. esculenta ssp.
flabellifolia, and other 20 wild relatives showed C
3 rather than C
4 characteristics
[ 2–
7]. The wild ancestor of cassava is a woody shrub that originated in primeval tropical forests. It entered the savannah over the course of its evolution and was domesticated into cultivated cassava as a perennial shrub that can complete its life cycle in one year and can reproduce both sexually and asexually. Its genome is not large but heterozygous, and the draft genomes of cassava and a wild ancestor have been completed
[ 8]. This will be an essential resource to understand the evolution of the high efficiency of starch accumulation of cassava.
At the level of expressed genes, a cassava cDNA library of bulking root has been reported. This library annotated 2878 unigenes and identified many involved in starch metabolism
[ 9]. In recent years, cDNA amplified fragment length polymorphisms (cDNA-AFLP) and microarrays have been used to screen differentially expressed transcripts in cassava, such as those for resistance to cassava bacterial blight (CBB) and storage root formation
[ 10–
12] and microarrays have been used for study of resistance to CBB, drought/low-temperature
[ 13–
15] and post-harvest physiological deterioration
[ 16]. Today, next-generation high-throughput RNA sequencing technique (RNA-Seq) is broadly used for mining global transcription profiles. The transcriptome in rice endosperm at 3, 6 and 10 d after pollination, revealed several genes involved in the accumulation of substances such as starch/sucrose, whereas protein metabolism was upregulated between 3 and 10 d after pollination
[ 17]. Universal RNA-Seq found
TaIPS1 and 12 candidate genes respond to P
i starvation in wheat
[ 18] and white lupin
[ 19], respectively. In cassava, RNA-Seq was used for analysis of mosaic virus-infected susceptible and tolerant landraces and highlighted differences in resistance, basal defense and cell wall associated genes during infection
[ 20]. Also, transcriptome profiling of low-temperature treated cassava apical shoots showed dynamic responses to cold stress
[ 21]. Several studies reported the high level expression of genes for photosynthesis, sucrose transport and starch metabolism
[ 22–
24], but there was no analysis of the change during evolution. In this study, we focused on comparison of transcriptional profiling between the wild progenitor
M. esculenta ssp.
flabellifolia W14 and cultivated
M. esculenta cv. Arg7. RNA-Seq was used to investigate the global transcriptome in the leaf, stem and storage root. Differentially expressed transcripts and expression patterns were analyzed and for first time revealed some new concepts for the higher photosynthesis and starch accumulation rates in cultivated cassava compared to its wild progenitor.
Materials and methods
Plant materials
The wild progenitor of cultivated cassava, M. esculenta ssp. flabellifolia W14 donated by CIAT, was originally collected from the south rim of Amazon basin. One accession of W14 is characterized by lower storage root yield and low starch content in the storage roots (about 5%); M. esculenta cv. Arg7 came from Argentina, and is a cultivated variety with high yield and higher starch content in the storage roots (about 30%). These two lines were planted in an experimental station in tropical Hainan Island in 2011. During the growth of plants, 8 samples including leaves and immature stems at 150 DAP (days after planting) of W14 and Arg7, storage roots of W14 at 150 DAP and three samples of storage roots of Arg7 at 90, 150 and 210 DAP were obtained for RNA-Seq analysis. At each time point, samples were taken from three different individuals at 10 am and snap frozen in liquid nitrogen before being stored at - 80°C.
RNA extraction and sequencing
Total RNA was extracted from leaves, immature stems and storage roots with the RNAplant Reagent (Tiangen, Beijing, China) and purified with the RNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA). The cDNA libraries were prepared for sequencing with the Illumina Hiseq2000 Sequencer following the protocol of Zhong et al.
[ 25] and 9.69 to 68.67 million clean reads of approximately 52 bp each were generated for all samples.
Mapping reads to the reference genome
The cassava genome and gene information were downloaded from Phytozome (http://www.phytozome.net/cassava.php). The sequence map files generated by TopHat were used as input for the Cufflinks software
[ 26], which assembles the alignments in the sequence map file into transcripts. Cufflinks performs this assembly independently from existing gene annotations and constructs a minimum set of transcripts that best describes the RNA-Seq reads, estimating transcript abundance as specific fragments detected per kb of transcript per million mapped reads (FPKM). Cuffmerge was used to merge these eight assemblies with the reference annotation into a single file, which was later used to identify the differentially expressed genes.
Analysis of differential gene expression
The logarithm of the FPKM is normally distributed and homoscedastic in nature (homogeneity of variance). The R package, DEGseq (http://bioconductor.org/packages/release/bioc/html/DEGseq.-html) was used to identify differentially expressed genes with a random sampling model based on the read count for each gene by performing Fisher’s exact test with robust false discovery rate correction to obtain an adjusted P-value between certain test gene groups and the whole genome annotation.
Annotation of assembled transcripts
The predicted functions of the genes were determined with InterProScan
[ 27] by comparing with protein databases, such as Pfam, PRINTS, ProFile, SuperFamily, ProDom and SMARTGOIDs for each gene were obtained from the corresponding InterProScan entry. Thereafter, the WEGO software
[ 28] was used to perform the GO functional classification, with a BLASTP (ftp://ftp.ncbi.nih.gov/blast/executables/blast±/LATEST/) parameter of 1×10
–5 to search for the best hit for the peptide sequence against genes in the KEGG
[ 29] database. Within the significant KEGG category, the enrichment
Re was given by the formula
Re = (
nf/
n)/(
Nf/
N), where
nf is the number of differentially expressed genes within the particular category,
n is the total number of genes within the same category,
Nf is the number of differentially expressed genes in the entire RNA-Seq data set and
N is the total number of genes in the RNA-Seq data set. KEGG pathway network analyses were performed with Cytoscape software (version 2.6.2, http://www.cytoscape.org).
Putative functions for the unique sequences were assigned by performing BLASTX queries against the Arabidopsis protein database (TAIR10_peptide; http://www.arabidopsis.org). The MapMan tool facilitated the classification of transcripts into hierarchical categories in a manner that alleviated the redundancy present in other commonly used ontologies and was used to view the annotated process of interest or metabolic pathway for groups of cassava transcripts.
The R package MINET
[ 30] was used to construct the gene association network. MINET was developed from the small-sample inference framework of graphical Cytoscape. The mutual information between all pairs of variables in the data set is computed according to the estimator argument. Then the algorithm considers the estimated mutual information in order to build the network.
Validation of gene expression by quantitative RT-PCR
Four samples, AL and WL, AR150 and WR150, were used to confirm the RNA-Seq data. First, the samples of total RNA were converted to single-stranded cDNA using oligo-dT and reverse transcriptase. Then, a gene-specific primer pair was designed for each gene, after which PCR amplification was carried out using a SYBR Premix Ex TaqTM kit (Takara, Dalian, China) and a Rotor-Gene 6000 Real-Time PCR Machine (Corbett Robotics, Brisbane, Australia). The β-actin gene was used as a control and all PCR amplifications were repeated three times. The relative expression of the genes between Arg7 and W14 was quantified using the ΔΔCT method. The primer pairs are listed in Appendix A.
Results
RNA-Seq analysis of developing leaf, stem and storage root of cassava
Under field grown conditions, at three key development stages, leaves, stems and storage roots of Arg7 and W14 were collected and used for RNA-Seq analysis. Eight annotated transcripts were named as AL (Arg7 leaf), AS (Arg7 stem), AR90 (Arg7 storage root at 90 DAP), AR150 (Arg7 storage root at 150 DAP), AR210 (Arg7 storage root at 210 DAP), WL (W14 leaf), WS (W14 stem) and WR150 (W14 storage root at 150 DAP). A total of 126.65 million sequence reads were aligned to the draft cassava genome by TopHat and Cufflinks
[ 26] (eight libraries have been deposited in GenBank/SRR (Table 1). There were a total of 41302 transcripts assembled, with an average of about 86% of the reads matched to the draft genome. Among them, 25961 were considered to be transcriptionally active with FPKM≥3 in at least one library. Between Arg7 and W14, the number of transcripts ranged from 39863 vs 37822 in all libraries (AppendixB, Fig. S1).
Comparative gene enrichment analysis between cultivated Arg7 and wild W14
All the transcripts were classified into three main GO categories. Among the assembled transcripts of Arg7 and W14, 10772 and 10735 transcripts, respectively, were assigned to GO categories and they were involved in 360 pathways (Appendix C). Among the GO-categorized transcripts, the main cellular component categories were cell, cell part, organelle, organelle part and macromolecular complex. The primary molecular function categories were binding activity, catalytic activity, transcription regulator activity, transporter activity and structural molecule activity. The main biological process categories were cell process, metabolic process, biological regulation, response to stimulus and pigmentation. In general, for the 47 main pathways, the number of annotated transcripts did not differ markedly between Arg7 and W14. However, there were more annotated transcripts in two pathways (nutrient reservoir activity and locomotion process) in Arg7 than in W14 (Fig. 1).
Global analysis of gene differential expression profiling
The transcript profiles of the cultivated Arg7 and its wild ancestor W14 were compared. The gene expression counts were normalized to the fragments per kb per million reads (FPKM) for each transcript from the eight libraries (five of Arg7 and three of W14). Transcripts with FPKM<3 exhibited low abundance in some tissues and were considered silent, in accordance with previous work on white lupin
[ 19]. Among the 41302 transcripts assembled, 25961 had an FPKM≥3 in at least one library and were considered to be transcriptionally active. A total of 2117 and 1963 transcripts were found to be differentially expressed (above twofold change,
P<0.05, FPKM≥3) in the leaf and stem, respectively, between Arg7 and W14, especially whereas 1875, 3584 and 3967 transcripts showed differential expressions comparing the AR90, AR150, AR210 in Arg7 to WR150 in W14. The distribution of these differentially expressed genes is shown in Figs. 2a–2c. It seems more transcripts were differentially expressed in storage roots between Arg7 to W14. In addition, we identified 1662 and 6577 transcripts that exhibited a species-specific expression pattern in Arg7 and W14, respectively (Fig. 2d). All differently expressed genes in leaf, stem and storage roots between Arg7 and W14 have been assigned to 103, 93 and 119 pathways (AR150 vs WR150), respectively (Appendix D).
Transcripts/genes involved in light response predominantly expressed in cultivated cassava
When we compared the genes transcribed in functional leaves of Arg7 to W14, a total of 2,117 transcripts were differentially expressed between AL and WL; of these 1,514 were more highly expressed in AL than in WL, and conversely 603 were more highly expressed in WL compared to AL. The significantly differentially expressed genes are shown in Appendix B (Fig. S3). A total of 81 transcripts were much more highly expressed and 23 were expressed at a lower level in AL as compared with WL (Fig. 3).
GO annotation revealed that the most of the transcripts significantly expressed in AL are genes involved in photosynthesis. Fifty five of 81 upregulated genes focus on light reactions, photosystem I (PSA) and photosystem II (PSB). The two light-harvesting complex families, namely, LHC-I with six gene members had 2.8- to 42.6-fold expression, and LHC-II with another six members had 2.7- to 7.9-fold expression. Ten PSA subunits had 2.5-fold to 49.1-fold and 14 PSB subunits had 3.1- to 45.1-fold higher expression in Arg7. Furthermore, genes in the electron carrier, electron transporter and ATP synthase families were 5.7-, 4.2- and 3.4-fold more highly expressed in AL than WL. 17 of 84 upregulated transcripts in Arg7 belonged to the Calvin cycle of carbon fixation, such as genes for Rubisco small subunit, which has a regulatory function, but not the Rubisco large subunit with a catalytic function, NAPDH, 3-PGA, GAP and PEPC. These genes were expressed 3–6 fold higher in Arg7 compared with W14 (Fig. 3). The expression level of genes for ATPase III, Rubisco S and PEPC in AL was validated by quantitative reverse transcription PCR (Fig. 4). A few of the genes in the photorespiration pathway had a higher expression level in Arg7, such as one member of the glycolate oxidase family, glycine decarboxylase, hydroxypyruvate reductase and phosphoribulokinase/uridine kinase were expressed 6.9-, 5.5-, 13.5- and 10.3-fold, respectively, in comparison to W14. Only nine genes had significantly lower expression in Arg7 than in W14, being members of photosystem II, such as PSBA, PSBB, PSBC, PSBD, and PSBZ, and two in photosystem I (PSAA, PSAB) and a subunit of cytochrome B6, that ancestral subunits maybe not yet been selected in process of domestication cassava (AppendixB, Fig. S2). These gene expression patterns, involving light response and carbon fixation, are consistent with cultivated cassava being a C
3–C
4 plant in photosynthesis as reported
[ 1,
2].
Transcripts/genes for transportation and sucrose degradation with upregulated expression in developing stem of cassava
A total of 1963 transcripts were differentially expressed between Arg7 (AS) and W14 stems (WS); 1195 of these had significantly higher expression in AS than in WS, whereas 768 genes conversely showed lower expression. Eighteen gene families with 83 members involved in transport were significantly upregulated and another 17 families with 75 members were downregulated in stem of Arg7 compared with in W14. It is notable that the ABC transporters and ATPase relative transporters, potassium channel (AKT1, ATKC1, KAT3), transporters for calcium (calmodulin binding) and other unspecified cations, sugars, peptides, amino acids had 8- to 35.6-fold higher expression in developing stems of Arg7 than W14. Some members of transporter gene families, such as water channels and transport of nitrates (NPT2), phosphate, and amino acids had low expression in stems of Arg7 but were significantly higher in W14 (Fig. 5; Appendix E), indicating that the expression of these genes have been reduced through domestication. In the glycolysis pathway, the process of glucose degradation seems to have been strengthened. Genes for several key rate-limiting enzymes such as PFK that consumes ATP, PGK and PK that both produce ATP, GAPDH producing NADPH reducing power, and ALDO were predominantly expressed in cultivated Arg7 compared with W14 (Fig. 6; Appendix E). Meanwhile, There were 30 two families of transcription factor with 212 members with>3-fold higher in expression in stems of Arg7 than in W14, such as MYB, HSF, AP2-EREBPs, G2 like and unspecified DNA binding proteins (Appendix B, Fig. S3; Appendix E) that may respond to environmental factors and regulate the marked changes in metabolic processes that has occurred during the domestication of cassava.
Transcripts/genes involved in key steps of starch synthesis and energy transport were predominantly expressed in developing storage root of cassava
Among the differential transcripts in developing storage root, we focused on the variation of transcripts/genes for starch metabolism and molecular transport pathways, and the transcription factors involved. Among the three comparative transcriptome results for AR90, AR150 and AR210 vs WR150, the gene expression patterns were similar (Appendix A). It was shown that 17 genes in the starch synthesis pathway had 3- to 40-fold higher expression in AR150 than in WR150. They were only involved in three key restricted steps. (1) Sucrose degradation; the first step in starch synthesis, e.g, genes for six invertases (INVs) and fructose-1,6- bisphosphatase (FBP) with 4- to 11.5-fold greater expression in Arg7 than in W14. (2) Starch synthesis; genes for starch synthesis (GBSS) and glucan phosphorylase (GlgP) had markedly higher expression, 23.8- and 9.1-fold, respectively, in AR150 than WR150. Genes for starch branching enzyme (SBE2.1), glucose-1-phosphate adenylyltransferase (APL3) also had 3- to 3.2-fold higher expression than W14. The high expression of GBSS1 and SBE2.2 in storage root of Arg7 compared to W14 was confirmed by qRT-PCR (Fig. 7). (3) Starch degradation; we found five genes involved in starch hydrolysis had significantly higher expression in AR150 than in WR150, such as the gene for maltose transmembrane transporter (RCP1) with 37.9-fold higher expression than in W14, and the expression of genes for three α-amylases and β-amylase were 6- to 13-fold higher than in W14 (Fig. 8). This indicates that there are robust chemical processes demanding energy and sources materials. For transportation, a total of 109 genes for transporters were significantly upregulated. Predominantly these are involved in energy transport; 13 members of ABC (ATP binding cassette) transporters, which utilize the energy of ATP binding and hydrolysis to energize the translocation of various substrates across membranes, had significantly higher (4- to 32-fold) expression in AR150 than in WR150. In particular, the genes for the H+ transporting pyrophosphatases (AVP1, AVP3) absolutely high expressed in Arg7 and had a 17-fold higher than in W14, indicating an enhanced requirement for transport, and also serving as a sources of inorganic phosphate used in other biological processes. Six genes for p-ATPase and v-ATPase (AHA4) had 18- to 20.5-fold higher expression in Arg7 and genes for 13 mitochondrial solute carriers (misc transporters), and 14 metabolite transporters at the mitochondrial membrane were expressed 5- to 68-fold higher than in W14. These proteins generally catalyze neutral exchanges, but some are electrogenic or involve H+ co-transport, where the energy center has diverse metabolic functions. Among them, NAT7, which belongs to the nucleobase/ascorbate transporter family, was predominant, being 26.8-fold more highly expressed in Arg7 than W14. It is located in the plasma membrane and plasmodesmata, and may be involved in efficient transportation of molecules between cells in storage root of cassava. The function of NAT7 requires more attention. Three genes for water channel proteins (PIP, TIP) also had significantly higher expression. Another 56 upregulated transcripts/genes, including transporters for nitrates (NRT, NTP3), phosphate (PHT4-5), potassium (KUP7), sulfate (SULTR2, SULTR5), amino acids, nucleotides and peptides, and sugars (TMT1, ZIF1). These also include the transporters of unspecified anions (13 members), calcium and other metals (Fig. 9, Appendix A). Meanwhile, 234 members of a total of 13 families of transcription factors had 3- to 278-fold higher expression in storage root of Arg7. These were: ethylene responsive binding protein (EREBPs), bZIPs and ABI3, bHLHs, C2H2s and C2C2s (Zn), MYBs and MYB-related, WRKYs, Homeoboxes, putative DNA binding proteins, unclassified transcription factors and others (Appendix B, Fig. S4). The construction of the co-expressive networks clarified their targets in some special functional groups, although the relationship of these TFs to relevant biological processes is still unknown.
Co-expression networks of genes significant expressed in leaf and storage root of cassava
In the leaf, a co-expressed network composed of 105 photosynthesis genes, 31 transcription factors and 4 miRNAs was constructed. There were 6 sub-networks with hubs of PSAL, PSAG, PSBO1, PETE 1, RBCS1A and PGR5 showing the associated expression profiling of the transcription factors and their regulated pathways (AppendixB, Fig. S5). It is notable that the transcription factor such as MYB, WRKY6, NAC035 and KNAT3 were linked in the sub-networks, indicating that these TFs have important roles in regulating photosynthesis gene expression.
In the storage root, a total of 197 genes enriched in starch metabolism, transportation and transcription regulation were clustered into a network consisting of four nodes or sub-networks with hub genes of starch synthase (GBSS), glucan phosphorylase (GlgP), GBS2.1 and GBS2.2. Seven genes for energy transporters, including ABC, ALDO, pantothenate kinase (PANK2), CXIP2 and UBX domain-containing protein, and 15 genes for transcription factors, including HCF, MYB, AGL44, ERF9, HB54, NAC2, SCL1 and protein binding, have been integrated into the network (AppendixB, Fig. S6). This verified their expression as being positively correlated with the expression of key genes in starch biosynthesis.
Discussion
Enhancing expression of photosynthetic light reactions genes was a vital event in domestication of cassava
It was shown that although there are 1514 genes highly expressed in leaves of the cassava cultivars, the remarkable finding is the genes for light reactions such as photosystem I (PSA) and photosystem II (PSB), the two light-harvesting complex families and other members in the light reaction chains, had significantly higher expression, along with some key genes in Calvin cycle of carbon fixation (RUBP, NAPDH, 3-PGA, GAP and PEPC). Cultivated cassava normally has a high net photosynthesis rate, up to 54 mg·dm
–2·h under certain conditions
[ 31]. It is considered as a C
3-C
4 intermediate plant, whereas the wild progenitor (
M. esculenta ssp.
flabellifolia) exhibits C
3 characteristics
[ 1,
2]. It has been reported that photorespiration-related genes had higher expression in the C
3-C
4 intermediate species
Flaveria ramosissima[ 32] but not in cassava. There have been reports of increased response to light intensity under drought stress
[ 33] and shade mainly induced expression of genes for light reactions, light-harvesting complexes as well as light signaling in cassava leaf
[ 34]. Comparative genome annotation of W14 and KU50 has identified genes enriched for response to stimuli including light, temperature, water and oxygen evolution
[ 24], but our research further illuminated the specific gene and gene families that respond to light which have been selectively enhanced during domestication. These numerous characteristics in light response and carbon fixation support the conclusion that cultivated cassava is a C
3-C
4 plant.
Expression of active sucrose degradation and transport genes indicates robust energy metabolism in stem of cassava
There was significantly higher expression of genes for several key rate-limiting enzymes in the glycolysis pathway (PFK, PGK and PK, GAPDH and ALDO) in stems of domesticated cassava compared with W14, as well as 212 members transcription factor families. According to information from biochemical and transcriptome analysis of stem phloem sap, there is a powerful stem transport capacity which may affect starch accumulation in the storage roots
[ 35]. Cassava stems are used for reproduction and to keep the plant upright, whereas the roots store materials, which differs from most plants where bulbs or tubers are used for material storage and reproduction
[ 21]. This unique feature of cassava is not well understood.
Expreesion of rate-limiting genes in starch synthesis and energy transport are significantly enhanced in storage root of cassava
Based on comparison of three developmental stages of the storage roots of Arg7 with wild W14, we found that the genes specially involved in three key restricted steps, sucrose degradation (INVs, FBP), starch synthesis (GBSS, GlgP, SBE2, APL3) and starch degradation (RCP1, a-AMY, b-AMY) had significantly higher expression in the cultivar than the wild ancestor. Also, robust energy transport activity in developing storage root was observed, predominantly in selected ABC (ATP binding cassette) transporters, H
+ transporting pyrophosphatase (AVP1, AVP3), p-ATPase and v-ATPase (AHA4), mitochondrial solute carriers (misc transporters) and many metabolite transporters at the mitochondrial membrane. Arg7 is a high starch content cultivar with 30% starch in mature fresh storage roots, but W14 has a low starch content at less than 5%. At the genome level, it has been reported that expansion of copy number and alternative splicing were found in genes for some key steps in starch synthesis, such as sucrose transporter (SUTs), sucrose synthases (GBSS) and ADP glucose pyrophosphorylase (APL)
[ 24]. It was recently reported from a proteomics study that the enzymes involved in energy metabolism and binding activity, such as glucan phosphorylase, ADP-glucose pyrophosphorylase, ATPase, fructokinase, malate dehydrogenase and multiple enzymes participated in binding, may have important roles in cassava storage roots
[ 36]. The above evidence and other reports in wheat and other crops
[ 22,
37,
38], strongly support our findings.
Conclusions
Large-scale comparative transcriptome analysis revealed the genes important in leaf, stem and storage root of cultivated cassava by comparison with its wild ancestor. This new understanding focused attention on specific genes for light harvesting and response, together with Calvin cycle genes in leaf, other genes for transport and key rate-limiting enzymes coupling ATP consumption in the glycolysis pathway in stem, and genes for sucrose degradation, amylose synthesis and hydrolysis. The expression of genes involved in three key steps in starch synthesis associated with storage has been enhanced during domestication. The co-expression network in each organ linked important transcription factors to their putative targets, and further confirmed some of important hub genes in photosynthesis, transportation and starch biosynthesis. This advance will contribute to elucidating the mechanism of highly efficient carbohydrates accumulation and also genetic improvement in cassava.
The Author(s) 2016. Published by Higher Education Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0)