Introduction
Human erythropoiesis is a precisely controlled process that involves successive differentiation from hematopoietic stem cells (HSCs) into erythroblast cells and subsequently into mature red blood cells (RBCs). During this process, cells undergo tremendous morphological and molecular changes [
1,
2]. The complexity of erythropoiesis has been studied for years. In addition to experimental studies, high-throughput sequencing and data analyses were found to be advantageous in supplying landscape in erythropoiesis studies.
Global gene expression analyses of human erythroid differentiation have been featured in several studies. Hundreds of differently expressed genes were identified as potential erythropoiesis regulating genes [
3–
5]. Besides the dynamic expression of genes, non-coding RNAs presented a fluctuating expression pattern during erythroid development [
6–
9]. Of these genes, microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) displayed the most fluctuations.
In recent years, several groups have utilized miRNA arrays to show that the expression patterns of approximately 70 miRNAs are strongly associated with erythropoiesis [
10–
12]. These results supported the notion that hematopoietic stem cells or progenitor cells conduct complex miRNA regulation, which is essential for normal erythropoiesis. Recent studies have shown the important roles of miRNA in erythroid development. For example, ectopic overexpression of miR-191 was found to block erythroid enucleation, but had minor effects on proliferation and differentiation, whereas downregulation of miR-191 was essential for erythroid chromatin condensation and enucleation [
13]. Another study showed that mice lacking miR-451 displayed a reduction in hematocrit, an erythroid differentiation defect, and ineffective erythropoiesis in response to oxidative stress [
14,
15]. However, owing to limitations of the miRNA array, global miRNA expression profiles during human erythroid differentiation remains unclear.
lncRNA is another group of noncoding RNAs. This group is greater than 200 nucleotides in length and exists in many cell types and species [
16]. lncRNAs play a key role during normal and malignant hematopoiesis [
17,
18]. However, the role of lncRNAs in erythropoiesis has not been extensively studied [
19]. Recently, Alvarez-Dominguez
et al. reported the dynamic expression pattern of lncRNAs throughout erythropoiesis and possible erythroid-specific lncRNAs [
7].
By using RNA-Seq and miRNA-Seq data, we provide a comprehensive view of global mRNA, miRNA and lncRNA expression landscape, potential interactions, and further regulations during erythropoiesis.
Materials and methods
Cell culture and erythroid cell isolation
After informed consents were obtained, cord blood from healthy volunteers was collected. The Research Ethics Committee of Beijing Institute of Transfusion Medicine approved all the studies of hematopoietic stem and differentiated cells. Mononuclear cells were obtained by centrifugation on the lymphoprep density gradient and enriched for HSCs by positive selection using anti-CD34-tagged magnetic beads, which used mini-MACS columns (Miltenyi Biotech, Auburn, CA, USA), according to manufacturer protocol. Mature red blood cells were recovered from the packed red blood cell fraction at the bottom [
20]. Methods for erythroid differentiation were described previously [
21,
22]. Based on the expression of cell surface markers GPA and CD71, day 14 cultured erythroblasts were isolated by FACS sorting (Fig. S1). Four different populations were sorted, namely, P2, which is roughly equivalent to CFU-E, P3, which mainly contains pro-erythroblasts, P4, which mainly contains intermediate erythroblasts, and P5, which is roughly equivalent to late erythroblasts [
23,
24]. Further, cell morphology of each stage was observed on stained cytospin slides by microscope. As shown in supplemental Fig. S1, P2 to P5 were consistent with differentiation from CFU-E to late erythroblast. Total RNA from each sample was extracted by TRIzol reagent (Invitrogen, Carlsbad, CA, USA) according to manufacturer instructions. The K562 human erythroid leukemia cell line (ATCC, Manassas, VA, USA) was cultured in RPMI-1640 medium (Gibco, Grand Island, NY, USA) with 10% fetal bovine serum (FBS: Hyclone, Logan, UT, USA) at 37 °C in a 5% CO
2 incubator. Hemin (Sigma, St. Louis, MO, USA) (0.05 mmol/L) was used to induce K562 toward erythroid differentiation.
Library preparation and RNA sequencing
The integrity of RNA samples was determined using 1.2% agarose gel electrophoresis, followed by removal of the residual genomic DNA with RNase-free DNaseI (Ambion, Austin, TX, USA). mRNA and small RNA libraries were constructed using the Illumina mRNA-Seq and miRNA-Seq library preparation kits, respectively. The size distribution and concentration of the libraries were determined by using Agilent Bioanalyzer DNA 2000 chip (Agilent Technologies, Santa Clara, CA, USA) followed by sequencing on the Illumina HiSeq2000 Genome Analyzer platform. The RNA-Seq library was sequenced with 2 × 100-bp in pair-end mode by 100-bp lengths, and the miRNA library was sequenced in single-end mode by 100-bp lengths.
Data processing
Data analysis
Quality control analysis of the FASTQ format sequencing raw data was performed using FastQC software [
25]. Human hg19 reference genome was downloaded from IGenomes. FASTQ format sequencing reads were mapped to the hg19 reference genome using Tophat version 2.0.9 software with the default parameters [
26,
27]. The Cuffdiff tool in Cufflinks software package was used to identify differentially expressed genes using default parameters [
28,
29]. Transcript abundance was measured as FPKM (fragments per kilobase of exon per million) for RNA-Seq data. Genes with FPKM>1 in at least one sample was set as expressed. For differential expression analysis, the cutoff for significant fold change was>2 and
P value<0.05. Genes that were differentially expressed between one and its next stage were merged together for clustering. K-means clustering was performed using ConsensusClusterPlus, R package [
30]. Enrichments of biological processes, transcription factors, and hallmark analyses were performed with Gene Set Enrichment Analysis (GSEA) [
31]. A network was built based on the GeneMANIA database [
32] and displayed using Cytoscape software [
33].
lncRNA identification
To identify known lncRNAs, TopHat, Cufflinks, and Cuffdiff were used and annotated with the GENCODE long noncoding RNA GTF file downloaded from the GENCODE website [
34]. To identify novel lncRNAs, a two-step mapping strategy was applied to obtain and combine splice junctions from all samples to annotate mapped reads. The first round of mapping was used as above, novel splice junctions were identified in each sample, and were then combined to obtain a master set of combined novel junctions. The second round of mapping was performed with the master set of junctions supplied with option “-j” and option “-no-novel-juncs” enabled. BAM output from the second round mapping was used as an input to Cufflinks for transcript assembly with “-g” option and with parameters “-min-isoform-fraction=0.0,” “-min-frags-per-transfrag=1” and “-upper-quartile-norm” enabled. After obtaining a set of assembled transcripts, we excluded transcripts according to the following filters:
(1) Size selection. Transcripts shorter than 200 bases were excluded.
(2) Coverage selection. Cufflinks estimated the read coverage of each transcript. We eliminated transcripts with coverage below 3.
(3) Filter of known non-lncRNA annotations. Transcripts with an exon overlapping with transcripts from coding genes, microRNA, rRNA, tRNA, snRNA, snoRNA, or pseudogene annotated by Refseq and Ensemble were eliminated.
(4) Coding potential calculation. We estimated the coding potential for each transcript using PfamScan [
35], GetORF [
36], and BLASTX [
37]. Pfam is a database for classifying proteins. PfamScan was developed to identify transcripts containing protein-coding domains cataloged in the Pfam database. Transcripts translated in all six reading frames into amino acid sequences were provided as input protein sequences to PfamScan. Transcripts with protein-coding domains were reported by PfamScan as significant (default E-value cutoff is 0.001). EMBOSS GetORF was used in the Galaxy platform with the following parameters: minimum nucleotide ORF size: 300; maximum nucleotide ORF size: 10 000 000; what to output: translation of regions between STOP codons; all start codons to code for Methionine: NO; circular sequence: NO; find ORFs in the reverse complement: NO; and number of flanking nucleotides to output: 0. BLASTX was used to search non-redundant protein databases with translated transcript sequences. BLASTX computed evidence of peptide homology for each transcript with a significant threshold of E-value<0.000 000 6.
miRNA analysis
For each sample, miRNA sequence reads were first analyzed by using miRDeep2 software to obtain basic miRNA expression reads [
38]. The miRDeep2 used Bowtie to map the reads to the human genome hg19, and discarded reads that were mapped more than five times to the genome, as well as allowed for mapping with one mismatch to the precursors in the second step. MirSystem database was used to obtain miRNAs targeting to differentially expressed genes [
39].
Heatmaps and calculation of correlations
Unsupervised hierarchical clustering of genome-wide expression profiles (FPKM>1 in at least one sample) was performed by using R software to determine relationships, if any, among samples. The distance ( ) between genes and samples was measured by the Pearson correlation coefficient ( ), where . The Pearson correlation coefficient function is
where
and where is the number of samples, and are observed values of the two variables, and and are mean values of the two variables. We integrated data of mRNA and miRNA expression, as well as target prediction data to determine their relationship with respect to their expression levels using R software.
Quantitative real-time PCR
Total RNA was isolated by using TRIzol (Invitrogen) according to manufacturer protocol. cDNA was synthesized from total RNA by First Stand Synthesis Kit (Thermo Scientific, Waltham, MA, USA). Real-time PCR was performed on a BioRad CFX with SYBR Green/Rox (Thermo Scientific) according to manufacturer instructions. The sequences of primers used in the study are presented in supplemental Table S1.
Results
Similar expression patterns of genes, miRNAs, and lncRNAs
Quantitative real-time PCR of several erythroid genes was performed to validate the relative expression values of sequencing data (Fig. 1). The results indicated good consistency of RNA-seq data and qPCR results. Expression patterns were identified by normalized read counts for coding genes, miRNAs, and lncRNAs during human erythroid differentiation. Similar expression patterns are shown in Fig. 2A, in which the greatest number of RNAs was expressed with FPKM/RPKM that ranged from 0 to 5. Approximately 85.8% of genes were detected with FPKM lower than 30 during erythroid differentiation, whereas approximately 75.6% of miRNAs and around 97.7% of lncRNAs were detected. Interestingly, an average of 16% of miRNAs was expressed with more than 100 RPKM. The overall expression of lncRNAs remained below 30 FPKM, and an average of 7 lncRNAs was expressed over 30 FPKM in each differentiation stage (Fig. 2B).
A heatmap for expressed genes (FPKM>1) indicated significant changes during early erythroid differentiation (HSC to P2) and terminal maturation (P5 to RBC). However, gene expression variations were subtle during the middle differentiation stage (Fig. 2C). Distinct from genes, miRNAs and lncRNAs presented clustered expression based on cell types (Fig. 2D and 2E). Approximately 43.4% of miRNAs and around 66.8% for lncRNAs were expressed in less than three types of differentiation stage, in contrast with genes (approximately 29.8%). Notably, around 41.4% of lncRNAs were detected as expressed in only one differentiation stage. This finding suggests that noncoding RNAs, especially lncRNAs, present a stage-specific signature, which could be utilized as promising biomarkers for cell identification.
Gene expression patterns during erythroid differentiation and key factors
To obtain a further view of gene expression patterns during erythroid differentiation, K-means clustering was performed in an unsupervised and unbiased manner using ConsensusClusterPlus to identify statistically significant subtypes. According to the consensus cumulative distribution function (CDF) and δ area plot (supplemental Fig. S2A and S2B), the differentially expressed genes were divided into four subtypes as clustered, namely, subtype 1 (1507 genes), subtype 2 (506 genes), subtype 3 (1451 genes), and subtype 4 (11 genes).
Gene ontology (GO) analyses of genes within the four subtypes were performed to investigate the associated functions and potential biological processes during erythroid differentiation (Fig. 3). Genes that were highly expressed during differentiation and significantly downregulated at the maturation stage were classified as subtype 1, with enriched GO terms “regulation of developmental process” and “regulation of cell cycle.” Genes that were highly expressed in HSC but decreased at subsequent stages were classified as subtype 2, which enriched in GO terms associated with “regulation of catalytic activity” and “apoptosis.” In subtype 3, with gene expressions that gradually decreased during primary differentiation but increased during maturation, GO terms in functions with “negative regulation of developmental process” and “protein secretion” were significantly enriched. Genes in which expressions were altered more than once during differentiation were classified as subtype 4, with no GO terms enriched due to the small subtype size.
Hematopoiesis is a precise process controlled by the combined effects of transcription factors that permit cell differentiation and proliferation [
40]. To study dynamic changes across differentiation stages, we identified expressed transcription factors within the four subtypes. Transcription factors in the Krüppel-like factor (KLF) family were detected during all differentiation stages and were further classified into distinct subtypes. Subtype 1 consisted of DNA binding transcription factors
GATA2 and
GATA5, which encoded proteins involved in the development of embryonic cells and hematopoietic cell lineages [
41,
42]. Additionally, core-binding factors
RUNX1 and
RUNX2 were identified in subtype 2. These results implied that certain groups of transcription factors are responsible for the corresponding differentiation period, and they are likely to cooperate with each other to drive the overall process. Similarly, hemoglobin genes
HBA1,
HBA2,
HBB, and
HBG1 were found in subtype 1, whereas another hemoglobin gene
HBM and α hemoglobin stabilizing protein gene (
AHSP) were found in subtype 3. GO terms for
HBM were associated with iron ion and oxygen binding, which are essential for RBCs carrying iron and oxygen. This finding suggests that distinct hemoglobin genes function in various differentiation stages, and that
HBM and
AHSP are likely to participate in maturation.
Multiple signaling pathways were activated during erythroid differentiation
To investigate pathways activated during differentiation, we mapped differentially expressed genes in the four subtypes to KEGG pathways using GSEA. Tumor suppressor gene
TP53 is known for its role in balancing development and differentiation in cells across species [
43]. However, its role in the maturation of red blood cells is still unknown. P53 signaling pathway was enriched in genes with decreased expression during differentiation in subtype 1 (Fig. 4A and 4B). Expression data show that with the downregulation of
TP53 and its downstream genes, P53 signaling pathway was significantly inhibited during maturation. We analyzed genes with high expression in HSC but decreased expression during differentiation in subtype 2, and the chemokine signaling pathway was apparent. All known chemokines, except for CCL13 and CCL18, are expressed in HSC, but these decreased during differentiation (Fig. 4C). This result provides evidence that chemokines are possible essential factors for maintenance of stemness.
Comprehensive analysis of miRNAs during erythroid differentiation and maturation
Several miRNAs play roles in erythroid differentiation [
9]. However, a comprehensive understanding of miRNA functions during differentiation is lacking. Thus, an integrated analysis of miRNA and gene expression data was performed to investigate miRNAs and their target genes during differentiation. Here, Mirsystem database was used to summarize miRNA targets. By integrating expression data, we calculated Pearson correlation coefficient between expressed miRNAs and differentially expressed genes within the four subtypes. We obtained 4 miRNAs potentially regulating 35 genes in subtype 1, 6 miRNAs potentially regulating 1 gene in subtype 2, 50 miRNAs potentially regulating 256 genes in subtype 3, and 1 miRNA potentially regulating 1 gene in subtype 4 (
P value≤0.01; see supplemental Table S2).
With the greatest number of miRNA-gene pairs, subtype 3 consisted of 29 genes involved in “metabolism of heme (a cofactor consisting of iron and porphyrin)” and “reactive to oxygen species,” of which 21 genes were possibly regulated by hsa-miR-532-3p (Fig. 5). We observed that genes in subtype 1 significantly correlated with miRNAs, including 14 genes participating in “nucleotide binding” and “response to DNA damage stimulus” that were mainly regulated by hsa-miR-107 and hsa-miR-548h-5p. With the upregulation of the two miRNAs in RBCs, their target genes in subtype 1 were significantly repressed. These results suggest that miRNAs are likely to play key roles during erythroblast enucleation and maturation.
Potential cis-regulating of lncRNAs in erythroid differentiation
By screening the global human lncRNAs transcription profiles of sorted cells during erythroid differentiation, we identified 1034 known lncRNAs (FPKM≥1, ranging from 160 to 206 in 6 samples). As lncRNAs showed clustered expression based on cell types, we selected 60 lncRNAs expressed at more than one differentiation stage with FPKM≥10 for further study (see supplemental Table S3).
Focusing on differentially expressed genes within the four subtypes, 35 090 pairs of lncRNAs-gene potential interactions were identified using Pearson correlation coefficient of≥0.8 or≤-0.8 and P value≤0.05 (supplemental Table S4 and Fig. S3A). As Pearson correlation coefficient helps estimate positive or negative correlations, significantly correlated lncRNA-gene interactions might indicate a simple regulation manner that lncRNAs bind to the transcription region to activate or inhibit gene expression. Interestingly, 34 lncRNAs frequently correlated with 125 genes associated with “hemopoiesis, leucocyte activation” and “DNA repair” in subtype 3, where genes were highly expressed in HSCs and RBCs (Fig. 6A). This finding indicates that lncRNAs potentially play roles during erythroid differentiation.
Previous reports have shown that lncRNAs were physically involved in transcription regulation by cis-regulating nearby genes in human [
44]. Based on this knowledge, we estimated physical locations of expressed lncRNAs and genes in the same chromosome within 50 kb up- and down-stream. Together with the Pearson correlation coefficient described above, 19 pairs of lncRNA-gene interactions with promising regulatory relationships were identified during the erythroid differentiation (supplemental Table S5). We validated lncRNA
RP11-326C3.2 in the K562 cell line using PCR, showing that the location between
NLRP6 and
ATHL1 (Fig. 6B and 6C). The expression of
RP11-326C3.2 and Acid Trehalase-Like 1 (
ATHL1) were significantly increased during maturation. As the lncRNA was just 4701 bp upstream of
ATHL1, this close relationship inferred their potential co-expression.
In addition, we identified 83 novel lncRNAs expressed during differentiation (see Materials and methods, Supplemental Table S6). However, only 13 lncRNAs were expressed at more than one stage of differentiation with FPKM≥1. Based on the strategy above, eight pairs of novel lncRNA-gene interactions were identified, but these were not significant (supplemental Fig. S3C). Considering the close location proximities of lncRNAs on chromosome 5 (44777323-44778184 for CUFF.113706.1, 44777314-44778042 for CUFF.99342.1, and 44777330-44778142 for CUFF.86667.1, detected from P2, P3, and P4 samples, respectively), we assumed these interactions were in fact one lncRNA expressed at three stages.
Dynamic network associated with the maturation of red blood cells
Our results depicted comprehensive functions of miRNAs in regulating maturation of RBCs and lncRNAs, which were positively or negatively associated with the upregulated coding genes in RBCs. To display all potential regulation interactions directly, we used GeneMANIA and Cytoscape to build a regulation network of functional RNAs during the maturation of RBCs (Fig. 7). Genes associated with “reacting to oxygen species” and “heme metabolism” were upregulated in RBCs, whereas those associated with “nucleotide binding” and “response to DNA damage stimulus” were downregulated. As one miRNA can bind to the 3′-untranslated regions of more than one coding genes, inhibition of hsa-miR-532-3p expression served the function of upregulating target genes during erythrocyte development. Notably, lncRNAs
JHDM1D-AS1 (antisense RNA for JHDM1D),
AC087294.2, and
RP11-66D17.5 were frequently associated with genes in the network. The three lncRNAs and coding genes were distantly located from each other. This distance suggested trans-regulation or post-transcriptional regulation during development. Other studies have demonstrated competitions between lncRNAs and miRNAs, which acted as competing endogenous RNAs (ceRNAs) [
45,
46]. BLAST analyses (at least 8 bp of matched base pairs) and DIANA LncBase prediction v2 (supplemental Tables S7 and S8) further revealed that lncRNA
TOPORS-AS1 could compete with miRNAs hsa-miR-532-3p, hsa-miR-191-5p, and hsa-miR-652-3p.
Discussion
Stage specific complexity of erythroid differentiation has been widely studied for decades. Erythropoiesis is a complex process that begins at the level of pluripotent hematopoietic stem cells and terminates with the enucleation to form red blood cells. This step requires multiple signaling pathways to cooperate to ensure a sufficient and smooth process. Diverse cell morphologies can be observed during each stage of hematopoiesis. By detecting cell surface markers
CD71 and
CD235a, the process is divided into six stages, namely HSC, P2, P3, P4, P5, and RBC without nucleus [
23,
24]. Studies on normal differentiation of erythrocytes may help in understanding the fluent development process and providing knowledge on hematopoiesis during abnormal development. In the current study, by comparing expression profiles from the six stages of differentiation, we discovered potential dynamic interactions of mRNAs and noncoding RNAs. The expression trends of RNAs were similar yet distinct, and erythrocyte differentiation relied mainly on the activation of coding genes regulated by a relatively small number of noncoding RNAs. As miRNAs and lncRNAs presented stronger cell-type- or tissue-specific features than coding genes [
47], they could be promising markers for sorting cells.
Transcription factors and hemoglobin are essential for erythroid maturation. To survive after enucleation, RBCs have to accumulate abundant transcription products during the cell cycle. The transcription factor E2F was described as a transcription regulator of histones [
48]. E2F activator, especially
E2F2, was optimally expressed in RBCs to preserve cell activities [
49]. The increase of progressive hemoglobin stabilizing protein (
AHSP) gene expression indicated its vital role in the maturation of the red blood cells. Camila
et al. (2004) reasoned that AHSP may assist the assembly of free globin chains into mature hemoglobin [
50].
Based on these discoveries, we aimed to build a complex network presenting interactions between coding genes and noncoding RNAs that control the maturation of RBCs. Noncoding RNA regulation of mRNAs has been studied for many years, including the degradation effect of miRNAs on target genes, and recent discussions on lncRNA regulation of coding genes by various means. Bianchi
et al. (2012) demonstrated that miRNAs were involved in erythroid differentiation, and that they functioned in the proliferation of early erythroid cells, expression of fetal g-globin genes and enucleation [
9]. Several miRNAs were co-regulated by transcription factors, such as
GATA1 and
GATA2, to regulate erythroid and megakaryocytic differentiation [
51–
53]. Grabher
et al. (2011) discovered that miRNAs regulated the cell fate decision between erythroid and megakaryocytic lineages
in vivo [
54]. Based on the knowledge that miRNAs are essential for erythrocyte differentiation, our study took advantage of the RNA-Seq technologies to integrate mRNA and miRNA expression data. We identified 29 genes as promising miRNA targets that involved in metabolism of heme, erythroblast differentiation, and reactive to oxygen species. hsa-let-7e-5p, hsa-miR-133a-3p, hsa-miR-193a-5p, hsa-miR-30c-5p, hsa-miR-30c-5p, hsa-miR-502-3p, hsa-miR-502-5p, hsa-miR-532-3p, and hsa-miR-652-3p were found as potential regulators for maturation, and approximately 72.4% of associated genes were statistical targets of hsa-miR-532-3p. This miRNA has not been studied in HSCs or blood cells, but was reported to associate with Crohn’s disease and many cancers [
55,
56]. In addition, Rudnicki
et al. validated that hsa-miR-532-3p effected the expressions of target transcripts involved in apoptosis pathways [
57]. Wang
et al. demonstrated that hsa-miR-532-3p directly targeted apoptosis repressor with caspase recruitment domain (ARC) and participated in Doxorubicin-induced mitochondrial fission and apoptosis [
58]. Moreover, hsa-miR-532-3p negatively correlated with
UBE2B and
UBE2H. Given that hsa-miR-532-3p was closely associated with genes encoding ubiquitin-conjugating enzymes, and showed a relationship with genes functioning in heme metabolism and reaction to oxygen species as discussed above. Hence, hsa-miR-532-3p is expected to play roles in regulating the maturation of RBCs.
Recent studies have revealed that lncRNAs, such as
alncRNA-EC7, enhances
SLC4A1 and that
LincRNA-EPS is required for the maturation of RBC [
7]. Results of the present study implied that expression of lncRNAs significantly correlated with genes in subtype 3, and possibly controlled hematopoiesis and DNA repair during erythrocyte maturation. Using methods discussed previously [
8], we further identified 82 novel lncRNAs that showed highly confident non-coding potential. Our results provide further information on functions of noncoding RNAs during hematopoiesis and provide resources for the lncRNA database.
To summarize, we integrated coding gene and noncoding RNA expression data derived from six stages of erythrocyte differentiation. We presented potential correlations between the three types of RNAs and the promising regulations that occurred during maturation of RBCs. As the discoveries of noncoding RNAs have contributed to the studies of erythroid terminal differentiation as well as erythrocyte disorders, the current study provided basic knowledge for human health and diseases. We are hopeful that miRNAs and lncRNAs could be utilized for artificially induced hematopoiesis and new therapeutic strategies against blood diseases in the future. Our findings highlight the advantages of high-throughput RNA sequencing to identify genes, miRNAs, and lncRNAs and to quantify their expression levels.
Higher Education Press and Springer-Verlag Berlin Heidelberg