Introduction
Sweet potato,
Ipomoea batatas, a globally important food crop, has high yielding potential and wide adaptability. It also contains abundant nutritional materials including complex carbohydrates, dietary fibers, minerals, vitamins and various antioxidants such as carotenoids and anthocyanins
[1]. The storage roots of purple-fleshed sweet potato are rich in anthocyanins. Now, high anthocyanin content has become one of the most important breeding goals for sweet potato.
Anthocyanins belong to the flavonoid family. They are important natural colorants widely distributed among flowers, fruits, seeds, leaves and storage roots, and have been implicated in tolerance to biotic and abiotic stresses in plants
[2]. Anthocyanins also have great potential to be utilized in food colorants, nutritional foods and drug development, such as hepatoprotective, antimutagenic, antineoplastic, antihyperglycemic, antiinflammatory and antioxidant agents
[3–
6].
The structural genes and transcription factors involved in the synthesis and metabolism of anthocyanins have been identified in some plant species such as
Arabidopsis, maize, petunia and snapdragon
[7]. Transcription factors of the MYB, basic helix–loop–helix (bHLH) and WD40 classes can form a MBW complex which binds to promoters and induces transcription of the phenylpropanoid
[8] and anthocyanin
[9] biosynthetic pathway genes, including those encoding phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumarate CoA ligase (4CL), chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), flavonoid 3′-hydroxylase (F3′H), flavonoid 3′5′-hydroxylase (F3′5′H), dihydroflavonol reductase (DFR), leucoanthocyanidin dioxygenase/anthocyanidin synthase (LDOX
/ANS
) and UDPglucose-flavonoid 3-
O-glucosyltransferase (UFGT). Among these, PAL, C4H and 4CL catalyze the primary steps in the biosynthesis of phenylpropanoids, which convert phenylalanine to a wide variety of phenolic compounds including flavonoids. CHS, CHI, F3H, F3′H, F3′5′H, DFR, ANS/LDOX and UFGT are key enzymes controlling the metabolism of flavonoids, of which DFR, ANS/LDOX and UFGT are more anthocyanin-specific enzymes
[10]. So far, several anthocyanins biosynthesis-associated genes including
IbCHI,
IbF3′H,
IbDFR,
IbANS,
IbMYB1,
IbWD40 and
IbMADs10 have been isolated from purple-fleshed sweet potato
[11–
17]. Their overexpression or downregulation was found to significantly affect anthocyanin levels. Although the anthocyanin biosynthetic pathway is well characterized, the molecular mechanisms regulating flux through the pathway are still unclear.
Sweet potato is an autohexaploid (2
n = 6x= 90) with an estimated genome of 2.4 Gb
[18]. Due to the complexity of the genetics and the lack of genome resources, the breeding of sweet potato has been constrained. Therefore, transcriptome sequencing has become an efficient way to discover important genes and transcription factors in sweet potato. Transcriptome sequencing of sweet potato has offered important transcriptional data resources to study flower development, storage root formation, carotenoid biosynthesis, and cSSR and SNP marker development of this crop
[19–
23]. Also, there have been several reports related to transcriptome sequencing of purple-fleshed sweet potato. Xie et al.
[24] performed RNA sequencing on the tuberous roots of the purple sweet potato and the results were compared with the RNA database of the yellow sweet potato. Ma et al.
[25] characterized the root transcriptomes of a mutant of purple sweet potato and its wild type by high-throughput RNA sequencing.
In this study, we conducted a transcriptome sequencing of the purple-fleshed sweet potato cv. Jingshu 6 and its mutant JS6-5 with high anthocyanin content on the Illumina HiSeqTM 2500 platform. We analyzed the differentially expressed genes (DEGs) involved in anthocyanin biosynthesis and provided insights into the molecular mechanism of anthocyanin biosynthesis.
Materials and methods
Plant materials
The purple-fleshed sweet potato cv. Jingshu 6 and its high anthocyanin mutant JS6-5 were used in this study (Fig. S1). Jingshu 6 is a commercial cultivar with anthocyanin content of 27.02 mg per 100 g FW (fresh weight), whereas in JS6-5, the content of anthocyanin is up to 79.80 mg per 100 g. Three growing tuberous roots (diameter 30–35 mm) of Jingshu 6 and JS6-5 were harvested 110 d after planting. The samples were immediately frozen in liquid nitrogen and stored at -80°C.
RNA extraction
We used the RNAprep Pure Plant Kit (Tiangen Biotech, Beijing, China) to extract total RNA from the storage roots of Jingshu 6 and JS6-5. After DNase treatment, the purified RNA concentrations were quantified using a Nanodrop spectrophotometer (Thermo Nanodrop Technologies, Wilmington, DE, USA). The quality of the total RNA was subsequently examined using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA).
cDNA library construction and Illumina sequencing
Poly (A) mRNA was enriched from the total RNA using magnetic beads with Oligo (dT). Poly (A) mRNA was subsequently fragmented by an RNA fragmentation kit (Ambion, Austin, TX, USA). The fragmented RNA was transcribed into first-strand cDNA using reverse transcriptase and random hexamer primers. The second-strand cDNA was synthesized using DNA polymerase I and RNase H (Invitrogen, Carlsbad, CA, USA). After end repair and the addition of a poly (A) tail, suitable length fragments were isolated and connected to the sequencing adaptors. The fragments were sequenced on Illumina HiSeq™ 2500 platform.
De novo assembly of sequences
To acquire high quality reads, the raw reads were processed by removing adaptor sequences and low quality reads with unidentified bases. The clean reads were
de novo assembled using the Trinity software which combined three components: Inchworm, Chrysalis and Butterfly
[26]. Firstly, clean reads produced from the two libraries were assembled together into contigs using the Inchworm program
[26]. Secondly, based on the paired-end sequences information, the contigs were linked into sets of connected parts using the Chrysalis program followed by transcript construction using the Butterfly program
[26]. Finally, the transcripts were clustered and processed with multiple sequence alignment tool BLAT according to their nucleotide identity
[26]. We took the longest transcript of each cluster units as the unigene. The unigenes formed into the non-redundant database used for annotation.
Functional annotation and classification of non-redundant unigenes
To confirm the functional annotation of the unigenes, they were aligned to various database including the NCBI non-redundant (Nr) database, the protein family (Pfam) database, the Swiss-Prot protein database, the Cluster of Orthologous Groups (COG) database, the euKaryotic Clusters of Orthologous Groups (KOG) database and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database by using BLASTX (E<1×10
−5). Using the Blast2GO program, we obtained gene ontology (GO) annotation of the unigenes according to the BLASTX results against the Nr databases. Each unigene was named based on the highest BLAST score. The WEGO software was used for executing function classification of all GO-annotated unigenes
[27]. The GO classification provides an ontology of defined terms related to gene product properties, such as cellular component, molecular function, and biological process. It enhances our understanding of gene classification in a species-independent manner and provides an overview of gene functions of the species
[28].
Analysis of differentially expressed genes
Gene expression levels in Jingshu 6 and JS6-5 were normalized by calculating FPKM (fragments per kilobase of transcript per million fragments mapped)
[26]. The FPKM method minimizes the influence of sequencing depth and gene length when estimating gene expression levels. The EBSeq software was used to identify differentially expressed genes between the two samples. With the standard of Benjamini–Hochberg false discovery rate, the results of all the statistical tests were corrected by multiple testing. The unigenes were deemed significantly differentially expressed at adjusted
P<0.01 with at least a twofold change in FPKM.
Expression analysis of genes involved in anthocyanin biosynthesis by qRT-PCR
Total RNA of the storage roots of Jingshu 6 and JS6-5 was reverse-transcribed using the Quantscript Reverse Transcriptase Kit (Tiangen Biotech, Beijing, China). The cDNA solution was used as template to validate gene expression. The specific primers for
PAL, C4H, 4CL, CHI, F3H, LDOX, UFGT,
MYB1 and 12 randomly picked genes, and the sweet potato
b-actin gene used as internal control are listed in Table S1. We performed the PCR amplifications by ABI PRISM 7500 (software for 7500 and 7500 Fast Real-Time PCR Systems, V2.0.1, Foster City, CA, USA) using SYBR
®Premix Ex Taq™ II (TaKaRa Code No. RR820A, Takara Biomedical Technology (Beijing) Co., Ltd.). The comparative CT method (2
–ΔΔCT method) was used to quantify gene expression
[29].
SSR detection
Using the MISA software ((MIcroSAtellite identification tool), six types of SSRs including mono-, di-, tri-, tetra-, penta- and hexa-nucleotide repeats were detected among the unigenes with length>1000 bp.
Results
Transcriptome sequencing and de novo assembly
In this study, 23693146 raw reads with a total of 3659735905 nt and an average GC content of 48.7% were produced from tuberous roots of Jingshu 6, and 28751513 raw reads totalling of 4272814403 nt and an average GC content of 48.6% from tuberous roots of JS6-5. After removing adaptor sequences and discarding low quality reads, 22873364 (96.5%) and 27955097 (97.2%) high quality reads were obtained from Jingshu 6 and JS6-5, respectively. Further assembly of the high quality reads yielded 1910521 contigs with mean length of 59.61 bp. These contigs were assembled into 35592 unigenes with a mean length of 697.22 bp (Table 1). The length distributions of contigs and unigenes are shown in Fig. 1.
Function annotation and classification of non-redundant unigenes
For functional annotation, the sequences of 35592 assembled unigenes were searched in the Nr, Swiss-Prot, Pfam, KOG, GO, COG and KEGG databases. According to the BLASTX results, a total of 28183 (79.2%) unigenes were annotated with putative functions based on hits from at least one database. Among them, 28072 (78.9%) of the putative proteins showed similarity to sequences in the Nr database. Also, 18669 (52.5%), 17247 (48.5%), 16247 (45.7%), 14669 (41.2%), 7924 (22.3%) and 6113 (17.2%) of the unigenes had functional annotation in the Swiss-Prot, Pfam, KOG, GO, COG and KEGG databases, respectively (Table 2). In particular, the predicted result of the Nr database revealed that 5731 (16.1%) and 5431 (15.3%) unigenes showed significant homology with sequences of Nicotiana sylvestris and N. tomentosiformis, respectively (Fig. 2).
Analysis and functional classification of DEGs
In all, 35592 unigenes had detectable levels of expression in Jingshu 6 and JS6-5. Using the IDEG6 software, 1566 genes were differentially expressed between Jingshu 6 and JS6-5 (Fig. 3). Among them, 994 were upregulated and 572 were downregulated in JS6-5 compared with Jingshu 6.
In total, 1436 (91.7%) of the 1566 DEGs were annotated. The numbers of DEGs annotated in the Nr, Swiss-Prot, Pfam, GO, KOG, COG and KEGG databases were 1433, 1130, 1073, 847, 762, 491 and 329, respectively.
According to the GO function and significant enrichment analyses (Fig. 4), a total of 847 DEGs were classified into three categories including 53 functional groups. The most common assignments in the cellular component category were cell part (320 DEGs) and cell (319 DEGs) subcategories. In the molecular function category, the majority of DEGs were grouped into the catalytic activity (513 DEGs) and binding (364 DEGs) subcategories. Among all related molecular function, there were five terms related to anthocyanin biosynthesis including phenylalanine ammonia lyase activity (GO:0045548), 4-coumarate CoA ligase activity (GO:0016207), chalcone isomerase activity (GO:0045430), flavonoid 3′,5′-hydroxylase activity (GO:0033772) and flavonoid 3′-monooxygenase activity (GO:0016711). DEGs in the biological process category were primarily sorted into the metabolic process (596 DEGs) and cellular process (430 DEGs). Among all related biological processes, there were six terms related to anthocyanin biosynthesis including L-phenylalanine catabolic process (GO:0006559), cinnamic acid biosynthetic process (GO:0009800), anthocyanin-containing compound biosynthetic process (GO:0009718), anthocyanin accumulation in tissues in response to UV light (GO:0043481), positive regulation of flavonoid biosynthetic process (GO:0009963) and flavonoid biosynthetic process (GO:0009813).
The 491 DEGs were annotated with the COG database and subdivided into 25 COG classifications (Fig. 5). The largest group is the cluster for general functional prediction (137 DEGs), followed by amino acid transport and metabolism (66 DEGs), carbohydrate transport and metabolism (58 DEGs) and secondary metabolite biosynthesis (57 DEGs).
A total of 117 relevant metabolic pathways were identified by comparing the 35592 unigenes with the KEGG database. Among them, 329 DEGs were assigned to the 84 pathways. The most noticeable groups were phenylpropanoid biosynthesis (ko00940) and phenylalanine metabolism (ko00360) which are related to anthocyanin biosynthesis (Fig. 6).
Detection of genes related to anthocyanin biosynthesis
According to the results of the GO, COG, KEGG pathway enrichment, 14 DEGs were directly involved in anthocyanin biosynthesis (Table 3). Among them, we identified five genes related to PAL activity, three genes related to C4H, two genes related to 4CL and four genes related to CHI, F3H, LDOX and UFGT. Nine DEGs encoding the cytochrome P450 family, N-hydroxycinnamoyl/benzoyltransferase (HCBT), ATP-citrate lyase (ACL) and transmembrane ascorbate ferrireductase were also found to be related to anthocyanin biosynthesis. All the genes above were significantly upregulated in JS6-5 compared with Jingshu 6 (Table 3). It is noteworthy that 24 important transcription factors, including MYB, bHLH, MADs, NAC, AP2/ERF and bZIP were significantly upregulated in JS6-5 compared with Jingshu 6 (Table 3).
Expression analysis of genes involved in anthocyanin biosynthesis by qRT-PCR
To verify the transcriptome data, we used qRT-PCR to analyze the expression levels of 12 randomly selected genes, such as c33937.graph_c0, c41003.graph_c0, c31758.graph_c0, c37779.graph_c0, c36415.graph_c0, c38523.graph_c0, c57881.graph_c0, c56053.graph_c0, c54151.graph_c0, c5336.graph_c02, c40801.graph_c0 and c53825.graph_c0 in the two materials (Table S2). All these genes were significantly upregulated in JS6-5 and the results were highly concordant with the RNA-seq results (Fig. S2).
In addition, qRT-PCR results showed that the expression level of anthocyanin biosynthetic pathway genes including PAL (c17852.graph_c0), C4H (c36515.graph_c0), 4CL (c40730.graph_c0), CHI (c32008.graph_c0), F3H (c39514.graph_c0), LDOX (c38710.graph_c0), UFGT (c40432.graph_c0) and MYB1 (c30141.graph_c0) were also significantly upregulated in JS6-5 and the results were highly concordant with the RNA-seq results (Fig. 7).
SSR markers
Using the MISA software to search for potential SSRs from 7547 unigenes, 1810 of them contained SSR markers and 415 had more than one SSR. In total, 2349 SSRs were identified including 894 mono-, 686 di-, 714 tri-, 41 tetra-, nine penta- and five hexa-nucleotide repeats.
Discussion
High-throughput transcriptome sequencing technology can generate large amounts of genome wide transcription data. Transcriptome analysis is an effective method to identify the candidate genes involved in complex biosynthetic pathways, especially in non-model plant for which reference genome sequences are unavailable
[26]. To date, several sweet potato transcriptomes have been sequenced
[19–
23]. Only two reports were related to transcriptome sequencing of purple-fleshed sweet potato
[24,
25]. In this study, we characterized the root transcriptomes of the purple-fleshed sweet potato cv. Jingshu 6 and its mutant JS6-5 with high anthocyanin content on the Illumina HiSeq
TM 2500 sequencing platform. A total of 22873364 and 27955097 high quality reads were produced from Jingshu 6 and JS6-5, respectively. These reads were assembled into 35592 unigenes. There were 1566 DEGs between JS6-5 and Jingshu 6. Among them, 994 were upregulated and 572 were downregulated in JS6-5 compared with Jingshu 6. This study provides the genomic resources for discovering the candidate genes and enabling further research on the molecular mechanism of anthocyanin biosynthesis in sweet potato.
The results showed that 14 DEGs including
PAL (5 unigenes),
C4H (3 unigenes),
4CL (2 unigenes),
CHI (1 unigene),
F3H (1 unigene),
LDOX (1 unigene) and
UFGT (1 unigene) encoded the enzymes that directly participated in anthocyanin biosynthesis (Fig. 8). In addition, 9 DEGs encoding the cytochrome P450 family, HCBT, ACL and transmembrane ascorbate ferrireductase were also found to be related to anthocyanin biosynthesis (Table 3). Among them, the four cytochrome P450 enzymes were annotated for flavonoid biosynthesis. HCBT can produce anthramides working on acylated anthocyanins
[30]. ACL catalyzes citrate to acetyl-CoA, and it plays important roles in the biosynthesis of flavonoids
[31]. Transmembrane ascorbate ferrireductase could be the link between iron levels and tissue ascorbate
[32]. Ascorbate influences anthocyanin accumulation during high light acclimation
[33]. The 23 genes above were all significantly upregulated in JS6-5 compared with Jingshu 6. This study provides the candidate genes involved in anthocyanin biosynthesis, most of which have not previously been reported in sweet potato.
Anthocyanin biosynthesis in plants is governed by a regulatory network that consists of the MBW complex and other transcription factors. In
Arabidopsis, the MBW complex associated with anthocyanin biosynthesis is formed by an R2R3 MYB transcription factor (PAP1, PAP2, MYB113 or MYB114), a bHLH transcription factor (TT8, GL3 or EGL3) and the WD40-repeat protein TTG1
[8]. In addition, some members of the MADs, NAC, AP2/ERF and bZIP families have been reported to have important roles in regulating anthocyanin biosynthesis. For example, MADs box genes have been reported to positively regulate anthocyanin accumulation in sweet potato and bilberry fruits
[17,
34]. In peach,
PpNAC1 activates the expression of
PpMYB10.1 resulting in anthocyanin accumulation
[35].
Arabidopsis NAC transcription factor (ANAC078) acts as a positive regulator to enhance anthocyanin accumulation under high light acclimation
[36]. The ERF/AP2 transcription factor PyERF3 is found to interact with PyMYB114 and PybHLH3 to coregulate anthocyanin biosynthesis
[37]. In
Arabidopsis, HY5 (a bZIP protein) positively regulates anthocyanin biosynthesis by transcriptional activation of the PAP1 transcription factor
[38]. In this study, 24 important transcription factors, including MYB, bHLH, MADs, NAC, AP2/ERF and bZIP were significantly upregulated in JS6-5 compared with Jingshu 6 (Table 3). Among them, a MYB transcription factor (c30141.graph_c0) was highly homologous to
IbMYB1 which was identified as a key positive regulator of anthocyanin biosynthesis
[15]. There are few reports of others likely to regulate anthocyanin biosynthesis in sweet potato and the most likely ones are shown in Fig. 8. Thus, this study provides important information for discovering and isolating the transcription factors and further illuminating the molecular mechanisms of anthocyanin biosynthesis in sweet potato.
The genome of the sweet potato is still unavailable now and molecular genetic markers are important for sweet potato breeding and genetic linkage maps. Due to their functionality, abundance, high polymorphism and excellent reproducibility, SSRs are useful resources for genome analysis
[39] and in this study, a total of 2349 potential SSRs were identified from 7547 unigenes.
Conclusions
A total of 35592 unigenes were obtained from the purple-fleshed sweet potato using high-throughput sequencing technology. There were 1566 DEGs between Jingshu 6 and JS6-5. Among them, 23 differentially expressed genes and 24 important transcription factors might be involved in anthocyanin biosynthesis of sweet potato. In addition, 2349 SSRs were identified. This study not only provides the candidate genes but also provides insights into the molecular mechanism of anthocyanin biosynthesis in sweet potato.
The Author(s) 2018. Published by Higher Education Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0)