Introduction
Embryonic stem cells (ESCs) can be derived from the inner cell mass of the preimplantation blastocyst and maintained
in vitro under defined culture conditions without losing their pluripotency and self-renewal capacities (
Evans and Kaufman, 1981;
Martin, 1981;
Bongso et al., 1994;
Thomson et al., 1998). Pluripotency is defined as the potential to give rise to any cell type of the embryo proper, whereas self-renewal refers to the ability of a cell to propagate indefinitely without losing its pluripotent properties. Pluripotency maintenance as well as the exit and re-acquisition of pluripotency by differentiation and somatic cell reprogramming, respectively, require the interplay of transcriptional regulators, epigenetic modifiers, and extracellular signaling pathways (
Boyer et al., 2006;
Buganim et al., 2013). Tremendous efforts have been directed toward studying chromatin binding proteins such as DNA binding transcription factors and chromatin modifying proteins (
Chambers and Tomlinson, 2009;
Kashyap et al., 2009;
Hanna et al., 2010;
Apostolou and Hochedlinger, 2013;
Saunders et al., 2013;
Radzisheuskaya and Silva, 2014). However, much less is known about the roles of RNA-binding proteins (RBPs) in pluripotency, differentiation, and reprogramming. RBPs participate in every step of RNA biology, from transcription, splicing, and polyadenylation to RNA modification, transport, translation, and turnover. Moreover, RBPs can function as bridging factors between RNA molecules and protein complexes. The recent technical development for studying RBP properties and partners (
Ule et al., 2005;
Darnell, 2010;
Li et al., 2014) has facilitated the discovery of new RBPs and has opened up new avenues for understanding their biological functions. This review focuses on RBPs that play roles in ESC maintenance, differentiation, and somatic cell reprogramming in the human and mouse settings.
Characterization of RNA-binding proteins
RNA-binding domains
Historically, RBPs were named as such because they possessed canonical RNA-binding domains for direct and specific interactions with their RNA targets. The specificity of these interactions can be sequence- and/or structure-mediated, giving rise to different modes of recognition. Post-translational modification of RBPs can also modify their RNA-binding affinity, function, and localization, generating additional layers of complexity (Reviewed in (
Glisovic et al., 2008)). The main canonical RNA-binding domains are discussed below and are summarized in Table 1.
RNA-recognition motif (RRM)
The RNA-recognition motif (RRM), also known as RNA-binding domain (RBD) or ribonucleoprotein (RNP) domain, is the most abundant (0.5%–1% of human genes) (
Venter et al., 2001) and is by far the most extensively studied RNA-binding domain in higher vertebrates (
Maris et al., 2005). This domain has been shown to interact not only with RNA, but also with DNA and protein partners. It is often found as multiple repeats within a single protein. A single RBD can bind 2–6 nucleotides, whereas multiple copies of the domain allow for the recognition of larger and more complex RNA targets, thus enhancing the affinity and specificity of RNA-binding (
Maris et al., 2005).
Serine-arginine rich splicing factors (SR)
Serine-arginine (SR) rich splicing factors are a conserved family of RBPs essential for cell survival and exon-intron boundary recognition during spliceosome assembly (
Manley and Tacke, 1996). SR proteins are not only involved in the regulation of constitutive and alternative splicing, but also in regulating a wider range of processes, from transcription to translation (
Zhong et al., 2009). Examples of SR proteins are the large and poorly characterized Son and Srsf3, whose functions are discussed later in this review.
K-homology domain (KH)
The hnRNP K-homology (KH) domain is approximately 70 amino acids long and is found in proteins with different functions such as splicing, transcriptional regulation, and translational control (
Valverde et al., 2008). The KH domain recognizes four nucleotides with rather weak affinity, but can act in synergy when present in multiple copies in RBPs.
RGG box
RGG domains, first discovered in some hnRNPs, consist of several Arg-Gly-Gly repeats (
Kiledjian and Dreyfuss, 1992;
Dreyfuss et al., 1993). RGG motifs can bind to their target RNAs directly or indirectly through other proteins (
Godin and Varani, 2007). The arginine of the RGG is often dimethylated, which is an important modification in regulating the RNA-binding activity (
Rajyaguru and Parker, 2012). The best known examples of RGG proteins are members of the hnRNP family of proteins such as hnRNPA1, hnRNPK, and hnRNPU.
Double-stranded RNA-binding domain (dsRBD)
Double-stranded RNA-binding domains (dsRBDs) were first described for their ability to bind RNA motifs by their 3D-shapes rather than by their sequences (
Stefl et al., 2005). Typical dsRBDs are 70 amino acids long, have a conserved structure, and are involved in various processes such as RNA interference, localization, processing, translational control, and editing (
Chang and Ramos, 2005). This module is often combined with auxiliary functional domains to allow for specialized functions. For example, Adar2 and PKR have similar dsRBDs, but their specificity is achieved by a deaminase domain and a kinase domain, respectively (
Valente and Nishikura, 2005;
García et al., 2007).
Piwi/Argonaute/Zwille (PAZ) domain
The PAZ domain is named after the proteins Piwi, Argonaute, and Zwille, which are involved in post-transcriptional gene silencing. The PAZ domain is composed of two subdomains, one of which is well characterized as a single-stranded nucleic acid binding domain (
Yan et al., 2003). The post-transcriptional gene silencing mediated by PAZ-containing RBPs is achieved by binding of the PAZ domain to the two-base 3′ overhangs that characterize small interfering RNAs (siRNAs) (
Borozdin et al., 2004;
Simon et al., 2011;
Tian et al., 2011).
RNA helicase DEAD-box
RNA helicases are abundant enzymes that utilize ATP to bind or remodel RNA and RNA-protein complexes (RNPs), and can be found in all organisms. The largest helicase family is comprised of the DEAD-box family of proteins, which are named due to their characteristic Asp-Glu-Ala-Asp motifs. DEAD-box proteins have essential roles in cellular RNA metabolism including transcription, pre-mRNA splicing, ribosome biogenesis, nucleo-cytoplasmatic transport, translation, and RNA decay (reviewed in (
Cordin et al., 2006;
Linder and Jankowsky, 2011)).
RNA-binding zinc finger (ZnF)
The zinc finger (ZnF) represents a classical dsDNA binding domain, but it can also interact with RNA (
Pelham and Brown, 1980;
Amarasinghe et al., 2000;
Teplova and Patel, 2008). ZnFs can be found alone or in tandem in RBPs, and also in combination with other types of RNA-binding domains. The zinc finger-containing RBPs are classified according to the amino acids that interact with the Zn
2+ ion, e.g. CCHH, CCCH, and CCCC, with the CCHH-type being the most frequent. Examples of ZnF proteins involved in RNA processes and that are reviewed here are Mbnl1 (
Teplova and Patel, 2008) and Lin28 (
Hagan et al., 2009).
PUF RNA-binding repeats
The PUF family of proteins (formed by Pumilio and FBF) are involved in the regulation of embryogenesis, development, and differentiation of most higher eukaryotes (
Quenault et al., 2011). The PUF proteins contain a PUM-HD-type RNA-binding domain composed of eight repeats of around 36 amino acids, each one having the ability to bind a single nucleotide in the 3′ untranslated region (UTR) of a target mRNA (
Edwards et al., 2001;
Wang et al., 2001,
2002). The specificity and high affinity that characterize the RNA-binding PUF domain, together with its modular design, make it attractive for protein engineering. The specific binding of a protein to a given RNA of interest can be achieved by fusing PUF to any desired effector domain (
Wang et al., 2002;
Filipovska et al., 2011). Pum1 is one of the members of the PUF family of RBPs that was recently discovered as a pluripotency regulator, which will be discussed later in this review.
Non-canonical RNA-binding domains
Some RBPs, despite the fact of not possessing canonical RNA-binding domains, have been shown to interact and function with RNAs. The best studied example is the polycomb protein Ezh2. Ezh2 is one of the catalytic subunits of the Polycomb Repressive Complex 2 (PRC2) which is involved in epigenetic repression by chromatin compaction. Several studies have demonstrated how Ezh2 is able to interact with several RNA species that will affect its function, by either recruiting or impeding its access to its chromatin targets (
Zhao et al., 2010;
Davidovich et al., 2013). Other proteins containing non-canonical RNA-binding domains which function in pluripotency and/or differentiation are Lsd1, Suz12 and Wdr5 and will be discussed later in this review.
Several high-throughput analyses in ESCs (
Baltz et al., 2012;
Castello et al., 2012;
Kwon et al., 2013) have identified RNA-binding candidates that do not contain any of the canonical and well-studied RNA-binding motifs described above. It remains to be determined if those proteins are actual RBPs or if their binding to RNA is mediated through intermediate proteins. Additional domain mapping experiments should be carried out to identify previously uncharacterized RNA-binding domains. That the increasing number of proteins shown to interact with RNA makes RNA-binding a more ubiquitous property than previously expected, and highlights not only the importance of RBPs but also of regulatory RNAs.
Approaches for studying RBPs
Understanding the physical association of RNA and protein
in vivo requires the use of specific techniques that take into account the biochemistry of both molecules. Accordingly, a series of specific crosslinking procedures, followed by immunoprecipitation techniques have been developed. The first reported protocol was RNA immunoprecipitation (RIP) that allowed for the identification of RNA species associated to RNA-binding proteins (
Niranjanakumari et al., 2002). By using formaldehyde as a reversible crosslinker agent combined with high-stringency IP conditions, specific RNAs associated with bait proteins could be determined. However, the sticky nature of RNA necessitated the use of many controls, and only a few targets identified via RIP could actually be validated. A breakthrough in mapping RNA–protein interactions
in vivo arrived when CLIP (cross-linking and immunoprecipitation) strategies were developed (
Ule et al., 2003;
Ule et al., 2005;
Jensen and Darnell, 2008). The main difference with RIP is that crosslinking of tissues or cultured cells is achieved by treatment with UV-B irradiation that specifically generates a covalent bond between RNA-RNA and RNA-protein complexes (
Zwieb and Brimacombe, 1978;
Brimacombe et al., 1988) without inducing protein–protein crosslinks. The formation of such bound complexes is restricted to short-distance interactions (on the order of Angstroms) and allows for the use of highly stringent washes, reducing the background and increasing the specificity of the co-immunoprecipitated RNA molecules. HITS-CLIP (also called CLIP-seq) or “high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation” includes RNA digestion (to reduce its size, typically to 20–100 nt) and ligation to RNA linkers for further reverse transcription and high throughput sequencing, in order to identify the exact binding sites of an RBP to its target RNAs (reviewed by
Darnell in 2010). Some RBPs show a low efficiency UV crosslink, making it difficult to analyze by CLIP. In that case, an RNA-protein in tandem immunoprecipitation (RIPiT) protocol is recommended (
Singh et al., 2014).
A novel technique was developed to overcome the low efficiency of crosslinks generated by 254 nm UV-B. PAR-CLIP or photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation relies on the incorporation of photoreactive ribonucleoside analogs (i.e., 4-thiouridine or 4-SU, and 6-thioguanosine or 6-SG) into nascent RNA transcripts by living cells (
Hafner et al., 2010b). Crosslinking by irradiating cells with 365 nm UV light induces an efficient crosslink of photolabeled RNAs to their interacting RBPs. The precise position of the crosslinking can be determined by the appearance of mutations during sequencing of the co-immunoprecipitated (co-IPed) RNAs (4-SU results in cytidine conversions whereas 6-SG gives rise to adenosine mutations). Moreover, the presence of mutations is a hallmark for UV-crosslinked RNAs, which can be used to discriminate specifically co-IPed RNAs from background sequences belonging to abundant cellular RNAs.
A systematic analysis of preferential RNA sequences of RBPs can be achieved by RNAcompete experiments (
Ray et al., 2009). This method involves an
in vitro competitive binding reaction where an RBP is incubated with a high molar excess of a complex pool of RNAs, followed by affinity recovery of the protein and analysis of the co-IPed RNAs by microarray. Many human RBPs have been tested by RNAcompete, allowing for the identification of their consensus RNA-binding sites (
Ray et al., 2013). This technique is very helpful for predicting the function of an RBP when combined with RNAseq experiments after knockdown of that specific RBP. For instance, an RBP whose consensus binding motif would map to the 3′ UTRs of RNAs, and whose depletion would lead to downregulation of its RNA targets, would be expected to be involved in RNA-binding and stabilization. On the other hand, a protein whose consensus binding motif would map to alternative exon boundaries of its target RNAs, and whose depletion results in differences in the inclusion/exclusion ratio of the alternative exon, would likely be considered an alternative splicing regulator.
A different approach can be taken to study RBPs if the main interest is to unveil the proteins bound to a given RNA species or RNA group (
Castello et al., 2012). In that case the first step is the use of a tagged oligonucleotide as bait in order to capture all the RBPs covalently bound to it after UV crosslinking. The co-IPed proteins would then be identified by mass spectrometry analysis.
Technique developments in the last decade focusing on RBPs and their interacting RNAs have led to an exponential increase in the number of published studies on RBP function in various biological processes. Table 2 summarizes the published high-throughput studies of the target RNAs of selected RBPs reviewed in this paper.
Functions of RBPs in ESC maintenance, differentiation, and reprogramming
RBPs with important roles in ESC maintenance, differentiation, and somatic cell reprogramming can be divided into four categories according to their distinct functions. The largest group of known RBPs is implicated in the first and second categories, which correspond to co- and post-transcriptional regulation of mRNAs, respectively. The first group includes alternative splicing and alternative polyadenylation, whereas the second refers to microRNA (miR) recognition and pairing, RNA stability regulation, and RNA turnover. The third category, includes RBPs involved in the sequestration or transportation of RNAs in the nucleus or to the cytoplasm, respectively. Lastly, the fourth category, which mainly comprises long non-coding RNA (lncRNA)-interacting RBPs, contains epigenetic regulators with histone remodeling functions. All of these categories are discussed in detail below and summarized in Fig. 1 and Table 3.
RBPs with co-transcriptional regulatory function
After their transcription, nascent pre-mRNAs mature into functional mRNAs. This maturation includes constitutive splicing and polyadenylation for intron removal and poly-A tail addition, respectively. In most cases, however, alternative splicing and/or alternative polyadenylation are required, impacting the final mRNA sequence, structure, and/or stability.
Alternative splicing
More than half of human and mouse genes express multiple mRNAs through alternative splicing (AS) (
Lander et al., 2001;
Modrek et al., 2001;
Johnson et al., 2003). AS is the process by which a unique primary transcript gives rise to multiple mature mRNAs, most commonly by alternative inclusion or exclusion (“skipping”) of individual exons (
Altschul et al., 1990;
Florea et al., 1998;
Hubbard et al., 2002). In the past few years, studies have been reported on the regulation of AS during differentiation and reprogramming of mouse and human ESCs (
Pritsker et al., 2005;
Sugnet et al., 2006;
Yeo et al., 2007; Kunarso et al., 2008;
Salomonis et al., 2009;
Salomonis et al., 2010;
Wu et al., 2010;
Gabut et al., 2011;
Ohta et al., 2013b). Pluripotency-associated transcripts such as
Dnmt3b, Nanog, Sall4, and
Oct4 have been shown to be themselves subjected to AS (
Atlasi et al., 2008;
Gopalakrishnan et al., 2009;
Rao et al., 2010;
Das et al., 2011;
Tsai et al., 2014). The importance of the regulation of alternative splicing for the maintenance of the stem cell state and differentiation is exemplified by stage-specific splicing factors such as Rbfox2, Mbnl1, and Son in the pluripotency control process.
Rbfox2 (also known as Fox2 and Rbm9) was first described in mammals as a tissue-specific RBP of muscle and neuronal cells that is involved in splicing regulation (
Underwood et al., 2005). More recently, it has been shown to be required for human ESC (hESC) viability, where its RNA targets have been determined by CLIP-seq (
Yeo et al., 2009). Rbfox2 binds and regulates AS of transcripts coding for RBPs such as Lin28 and Rbfox2 itself, nuclear mRNA splicing factors, and serine/threonine kinases, indicative of a role for Rbfox2 in regulating RNA metabolism and signaling pathways in hESCs. Together with Mbnl1, Rbfox2 has also been implicated in regulating exon inclusion/exclusion during differentiation and reprogramming, influencing the structure of proteins involved in membrane dynamics, cell adhesion, migration, and polarity (
Venables et al., 2013b). The post-transcriptional regulatory pathway NMD (nonsense-mediated decay) is often coupled to AS to prevent the production of truncated proteins from transcripts containing premature termination codons (
Brogna and Wen, 2009). In mouse ESCs (mESCs), Rbfox2 has also been implicated in regulating RBP gene expression (including itself) through AS-NMD, resulting in the auto-regulation of splicing networks (
Jangi et al., 2014). According to its role in regulating mesenchymal and epithelial-specific splicing events (
Venables et al., 2013a;
Braeutigam et al., 2014) and the requirement of MET during the early stage of reprogramming (
Li et al., 2010;
Samavarchi-Tehrani et al., 2010), a potential role for Rbfox2 in somatic cell reprogramming should be expected.
As stated above, Mbnl1 is another splicing factor that has been implicated in coordinating the inclusion and exclusion of differentiation-specific exons (
Venables et al., 2013b). Mbnl1 represses ESC-specific alternative splicing events in differentiated cells, and its best-characterized target is the Foxp1 ES-specific exon (Foxp1-ES). The minimal expression of Mbnl1 in ESCs and its upregulation in somatic cells are compatible with specific roles of Foxp1-ES in ESC maintenance and somatic cell reprogramming (
Gabut et al., 2011;
Han et al., 2013).
Son was identified in an RNAi screen as an essential regulator of maintenance of hESC identity (
Chia et al., 2010). Son is a DNA- and RNA-binding protein that has been previously implicated in maintaining genome stability and regulating RNA splicing for effective cell cycle progression in HeLa cells (
Ahn et al., 2011). In hESCs, Son functions as a weak splicing regulator of many genes, among which are Oct4, Prdm14, Med24, and E4f1 pluripotency-related transcripts (
Lu et al., 2013). Moreover, depletion of Son is detrimental for somatic cell reprogramming. Another Oct4 splicing regulator in hESCs is Tip110, which regulates the Oct4A splicing variant both
in vitro and
in vivo, and is necessary for hESC maintenance (
Liu et al., 2012;
Liu et al., 2013). Oct4A has been shown to be the Oct4 variant responsible for maintenance of pluripotency (
Tsai et al., 2014). Son and Tip110 regulate AS of pluripotency transcripts in a gene specific manner. In line with that, mouse Nanog, but not Oct4 or Sox2, was recently found to be bound by another RBP, Rbm47, although the functional relevance of their interaction remains to be elucidated (
Yeganeh et al., 2013).
Other splicing factors that have been implicated in regulating somatic cell reprogramming are the two RNA-binding proteins U2af1 and Srsf3 (
Ohta et al., 2013b). U2af1 (also known as U2af35) is an auxiliary component of the U2 snRNP and participates in the exon definition step during the splicing process (
Zhang et al., 1992). Its mutation has been implicated in splicing defects in different cancer cell types (
Yoshida et al., 2011;
Imielinski et al., 2012;
Brooks et al., 2014). Srsf3, on the other hand, is not only a general and alternative splicing factor but has also been shown to be essential in RNA polyadenylation, RNA export, and protein translation (
Zahler et al., 1992;
Lou et al., 1998;
Huang et al., 2003;
Bedard et al., 2007). Knockdown of these RBPs reduces the efficiency of MEF reprogramming without affecting the phenotype, morphology, or proliferation of MEFs (
Ohta et al., 2013a).
In mouse ESCs, the RBP Fus was identified as an Oct4 interactor by Lufkin and collaborators (
Cheong et al., 2011). Fus is implicated in RNA metabolism, including splicing and miRNA processing (
Ishigaki et al., 2012;
Morlando et al., 2012). Whereas it has been shown to regulate gastrulation in frogs (
Dichmann and Harland, 2012), Fus function in RNA regulation and the nature of its partnership with Oct4 in ESCs is currently unknown.
Additional RBPs have been shown to function in regulating AS in pluripotency and differentiation (
Yeo et al., 2009;
Han et al., 2013;
Ohta et al., 2013b;
Venables et al., 2013b). Nevertheless, their interplay with the pluripotency network remains elusive. Further study is needed to determine the interrelationship between the splicing factors and pluripotency factors and their functional contributions to ESC maintenance, somatic cell reprogramming, and differentiation of ESCs.
Alternative polyadenylation
mRNA alternative polyadenylation (APA) is very common (70% of mammalian genes produce APA mRNAs) and plays an important role in post-transcriptional gene regulation (
Elkon et al., 2013;
Tian and Manley, 2013). APA gives rise to different protein C-termini or 3′ UTRs that may impact the protein output of gene expression through affecting the stability, translation, and/or intracellular localization of mRNAs. APA is highly regulated in ESCs, during somatic cell reprogramming, and during development (
Flavell et al., 2008;
Ji et al., 2009;
Ji and Tian, 2009;
Shepard et al., 2011;
Boutet et al., 2012). In contrast to more differentiated cells, ESCs favor proximal polyadenylation sites and produce shorter 3′-UTRs, therefore having an impact on microRNA or RBP regulatory pathways (
Sandberg et al., 2008;
Mueller et al., 2013). However, the mechanisms underlying these APA changes remain poorly understood.
A very recent report has identified the first factor involved in pluripotency-specific APA (
Lackford et al., 2014). Fip1 is an essential mRNA 3′ processing factor that was identified in two genome-wide RNAi screens as a potential self-renewal factor in mouse ESCs (
Ding et al., 2009;
Hu et al., 2009). Lackford and colleagues showed that Fip1 depletion leads to partial differentiation of mESCs and inhibition of MEF reprogramming. Fip1-mediated maintenance of ESC-specific APA profiles is required to promote optimal expression of pluripotency and self-renewal factors, linking for the first time APA regulation to cell fate determination.
RBPs regulating post-transcriptional regulation
Gene expression in eukaryotic cells involves several steps of RNA processing that culminate with the translation of RNA into protein. Apart from the rates of transcription, pre-mRNA splicing, and polyadenylation, mRNAs can further be subjected to regulation by means of stabilization through interaction with RBPs or degradation via the NMD pathway or microRNA-interaction. All these steps, together with the regulation of mRNA export to the cytoplasm, will determine the steady-state amount of mRNAs, and as a result, the amount of protein that will be available to the cell.
RBPs regulating mRNA stability
In the mouse, the cytoplasmic protein Unr has been shown to play an essential role during development, as homozygous disruption of the
Unr gene is embryonic lethal (
Boussadia et al., 1997). Nevertheless,
Unr-/- ESCs can be obtained, although with a spontaneous primitive endoderm differentiation propensity (
Elatmani et al., 2011). Unr acts downstream of Nanog, contributing to ESC maintenance through the binding and destabilization of the
Gata6 mRNA (
Elatmani et al., 2011).
La is another RNA-binding protein shown to be important for preimplantation development and ESC derivation (
Park et al., 2006). Although it has been implicated in many RNA-related pathways and has been well established as playing a role in protecting tRNAs and small RNAs from degradation (reviewed in (
Wolin and Cedervall, 2002), its targets in ESCs have not been determined. By contrast, the La related protein Larp7 has been recently assigned a role in ESC maintenance (
Dai et al., 2014). Larp7 binds and stabilizes
Lin28 mRNA by recruiting the poly(A) polymerase Star-PAP, safeguarding ESCs from entering a primed-differentiation state.
From a genome-wide screen for genes that are downregulated during differentiation of germline cell-derived pluripotent stem cells (GPSCs) (
Fagoonee et al., 2010), three RBPs, including Esrp1, were identified. Esrp1, also known as Rbm35a, is a tumor suppressor for colorectal cancer that acts through binding to the 5′ UTRs of mRNAs and inhibiting their translation (
Ivanov et al., 2007). More recently, its negative role in maintaining the balance of self-renewal and commitment to differentiation has been established (
Fagoonee et al., 2013). Through its binding to
Oct4 and
Sox2 mRNAs, Esrp1 fine-tunes their polysomal loading, and therefore affects their translation efficiency, explaining why the knockdown of
Esrp1 results in increased self-renewal and enhanced MEF reprogramming.
Dazl is an RBP involved in mRNA stability in mESCs (
Xu et al., 2013). Dazl is expressed in mouse germ and embryonic stem cells, but has opposite functions in these two cell types. Whereas it functions as a translational stimulator in germ cells (
Tsui et al., 2000), its role in ESCs is mainly repressive (
Xu et al., 2013). Through binding to the 3′ UTR of
Oct4,
Sox2, and
Mvh mRNAs, Dazl represses translation and decreases the steady-state levels of these pluripotency regulators in mESCs, balancing stem cell maintenance and differentiation.
Pum1 is another RBP that inhibits translation and promotes degradation of its target mRNAs through binding at a highly conserved eight-nucleotide motif located in their 3′ UTRs (
Galgano et al., 2008). The naive pluripotency factor transcripts for
Tfcp2l1,
Sox2,
Tbx3, and
Esrrb are among the targets of Pum1 in mESCs (
Leeb et al., 2014). Given their shared function in regulating translation of pluripotency-related transcripts such as
Oct4 and
Sox2, Dazl, Pum1, and Esrp1 are likely to be regulated by the same pathways in ESCs and/or during differentiation. Further studies are needed to understand how these post-transcriptional regulators are controlled by the pluripotency network.
RBPs involved in microRNA regulation
MicroRNAs (miRs) are small (20–25 nt) non-coding RNAs that promote specific destabilization and degradation of their target mRNAs by base-pairing to their 5′ or 3′ UTRs (
Bartel, 2009;
Lytle et al., 2007). Several miR families have been implicated in the control of ESC maintenance, differentiation, and somatic cell reprogramming (
Marson et al., 2008;
Sinkkonen et al., 2008;
Tay et al., 2008;
Judson et al., 2009;
Melton et al., 2010;
Tiscornia and Izpisúa Belmonte, 2010;
Anokye-Danso et al., 2011). miR processing consists of successive maturation steps of the initially transcribed primary RNA (pri-miRNA), which can be subjected to regulation (
Kim et al., 2008). First, pri-miRNAs are processed by the dsRNA-binding proteins Dgcr8 (Pasha) and RNase III type enzyme Drosha, giving rise to the pre-miRNAs. The pre-miRNAs will then be further cleaved by Dicer, another RNase III type enzyme, to finally give rise to the functional mature miRNAs (see Fig. 1) (
Krol et al., 2010). Dicer-deficient mice die around day 7.5 of mouse embryonic development with serious morphological abnormalities (
Bernstein et al., 2003). Nevertheless,
Dicer and
Dgcr8 null mESCs can be maintained, albeit with a defect in proliferation and differentiation (
Kanellopoulou et al., 2005;
Murchison et al., 2005;
Wang et al., 2007). This phenotype can be explained by the incapacity for miRNA maturation due to the absence of either of these two RBPs and the subsequent improper downregulation and accumulation of transcripts involved in cell cycle control and pluripotency regulation. Both the defective proliferation and differentiation can be rescued by overexpression of the mature pluripotency-associated
miR-290 and
miR-302 and differentiation-promoting
let-7 miRNA, respectively (
Hutvágner et al., 2001;
Wang et al., 2007;
Kim and Choi, 2012).
Mature miRNAs and other silencing RNAs exert their function in the RNA-silencing pathway with the help of RBPs such as the Argonaute family of proteins (
Hutvagner and Simard, 2008). The Argonaute family can be divided in two subfamilies: Piwi and Ago. In humans, the latter consists of Ago1, Ago2, Ago3, and Ago4, which are widely expressed and associate with miRNAs and siRNAs. Conversely, members of the Piwi subfamily are mainly restricted to the germline, where they associate with piwi-interacting RNAs (piRNAs).
Ago2-/- null embryos arrest during development early after implantation (
Morita et al., 2007) and
Ago2-/- mESCs have defects in self-renewal and differentiation due to defective miRNA function (
Shekar et al., 2011). Although it was previously thought that miRNAs and siRNAs were needed for guiding Ago proteins to their target mRNAs, Leung and colleagues showed by HITS-CLIP that in mESCs, Ago2 is also able to bind its targets in the absence of miRNAs (
Leung et al., 2011).
Lin28 is an RBP that regulates the pri-miR
let-7 family by promoting their degradation through uridylation (
Heo et al., 2008;
Heo et al., 2009;
Hagan et al., 2009;
Thornton et al., 2012). By reducing the mature miR
let-7 levels, Lin28 contributes to maintenance of pluripotency and prevents differentiation of ESCs (
Schulman et al., 2005;
Melton et al., 2010). Lin28 has also been shown to promote reprogramming of human somatic cells (
Yu et al., 2007). Besides the regulatory effect of Lin28 on
let-7 biogenesis, Lin28 also has a
let-7 independent function (
Balzer et al., 2010). RIP and HITS-CLIP studies in human and mouse ESCs have shown that Lin28 is able to bind exons and 5′ and 3′ UTRs of mRNAs, and consequently regulate their expression. As a result, Lin28 not only controls pri-miRNA maturation, but also regulates many other genes encoding cell cycle regulators, secretory proteins, other RBP genes (including Lin28 itself), and even the pluripotency factor Oct4, by affecting their translational efficiency (
Qiu et al., 2010;
Peng et al., 2011;
Cho et al., 2012;
Wilbert et al., 2012;
Hafner et al., 2013).
In the cytoplasm of hESCs, Lin28 interacts with L1td1 in P-bodies, a subcellular compartment where RNAs are directed either for translation or for silencing through miRNAs (
Närvä et al., 2012). L1td1 (or Ecat11) is another RBP whose expression is rapidly downregulated upon differentiation (
Wong et al., 2011). In contrast to mouse ESCs, L1td1 is required for maintenance of self-renewal in human ESCs (
Iwabuchi et al., 2011;
Närvä et al., 2012). Although its exact function alone, or together with Lin28 remains to be determined, L1td1’s association with proteins in P-bodies suggests a potential role in translational regulation.
The polypyrimidine-tract binding protein (PTB) is a known splicing regulator that has recently been shown to also regulate microRNA functions (
Xue et al., 2013). PTB can both activate or repress gene expression by binding to the 3′UTR of its targets, together or in proximity to Ago2. PTB influences the secondary structure of 3′ UTRs to enhance or inhibit miR recognition, and can also compete and influence Ago2 binding to their common targets. Similarly to PTB downregulation during neuronal differentiation, PTB knock-down is enough to cause transdifferentiation of different cell types to neuronal-like cells and functional neurons by affecting the stability of key neuronal genes and components of the REST complex (
Xu et al., 2013). This is the first study to describe an RBP being able to cause transdifferentiation by its own, and highlights the importance of RBP regulation in cell-wide processes.
RBPs regulating nuclear retention and export of RNAs
Once RNAs are transcribed and processed, a large amount of them must be exported to the cytoplasm to exert their functions. Several specific processes such as nuclear retention or export are mediated and regulated by RBPs in ESCs and during differentiation.
Paraspeckle proteins (PSPs) are widely expressed RBPs that are located in subnuclear structures known as paraspeckles (
Fox et al., 2002;
Fox and Lamond, 2010). Paraspeckles are nuclear domains formed by RBPs organized around the scaffolding lncRNA
Neat1 (
Clemson et al., 2009;
Souquere et al., 2010). Human ESCs do not possess paraspeckles. However, when they are induced to differentiate into trophectoderm (TE), increased
Neat1 expression leads to the formation of paraspeckles, which bind and retain some transcripts with inverted repeats (IRAlus) in their UTRs (
Prasanth et al., 2005;
Chen and Carmichael, 2009). One such IRAlus-containing transcript is the pluripotency-associated
Lin28, which is efficiently exported to the cytoplasm in hESCs, but becomes retained upon TE differentiation, resulting in reduced Lin28 protein levels. In mice,
Neat1 is expressed in ESCs and is further upregulated during differentiation (
Bond and Fox, 2009;
Sunwoo et al., 2009). Although paraspeckles have been shown to not be essential for development (
Nakagawa et al., 2011), some PSPs have been implicated in pluripotency and differentiation through transcriptional, co-transcriptional and export regulation (
Hata et al., 2008;
Park et al., 2013). The repertoire of PSP-interacting RNAs in ESCs remains to be determined in order to clarify the function of these proteins in pluripotency and differentiation processes.
The THO complex is a highly conserved protein complex that couples RNA splicing and export from the nucleus to the cytoplasm (
Katahira, 2012). Thoc2 and 5 are RBPs within the THO complex that have been identified as potential self-renewal regulators in mESCs from genome-wide RNAi screens (
Ivanova et al., 2006;
Ding et al., 2009;
Subramanian et al., 2009;
Chia et al., 2010). THO knockout mice are embryonic lethal (
Wang et al., 2006) and
Thoc2 or
Thoc5 knockdown in mESCs leads to loss of self-renewal (
Wang et al., 2013). The importance of THO relies on its function in exporting pluripotency transcripts such as
Nanog,
Sox2, and
Klf4 from the nucleus to the cytoplasm. In line with this finding,
Thoc2 or
Thoc5 knockdown abrogates somatic cell reprogramming and inner cell mass specification, highlighting the importance of RBP-controlled nuclear export regulation in pluripotency and reprogramming (
Saunders and Wang, 2014).
RBPs with functions in epigenetic regulation
RBPs can potentially bind any RNA species. Whereas RBPs involved in co-transcriptional and post-transcriptional regulation mainly associate with mRNAs and siRNAs/miRNAs, those involved in epigenetic regulation preferentially interact with non-coding and long non-coding RNAs (ncRNAs/lncRNAs). LncRNAs are non-coding RNAs longer than 200 nucleotides with diverse functions (
Wang and Chang, 2011). LncRNAs can function
in cis (i.e.
Kcnq1ot1) or
in trans (i.e.
HOTAIR) to the locus from which they are transcribed (
Wang and Chang, 2011;
Moran et al., 2012) as scaffolds for bringing together remodeling complexes that need to work in the same chromatin domain (
Wang and Chang, 2011). Other ncRNAs work
in cis as decoys to prevent epigenetic factors from binding to the loci from which they arise. Several lncRNAs are transcriptionally regulated in ESCs, and their functions in maintaining the pluripotent state have been reported (
Dinger et al., 2008;
Sheik Mohamed et al., 2010;
Guttman et al., 2011). RBP interaction with lncRNAs can result in both chromatin activation and repression, the latter being the most extensively studied (
Wang and Chang, 2011).
Epigenetic repression
The polycomb repressive complex 2 (PRC2) catalyzes the di- and trimethylation of H3K27 to mediate epigenetic silencing (
Cao et al., 2002;
Kuzmichev et al., 2002;
Kirmizis et al., 2004). PRC2 is dispensable for maintenance of ESC pluripotency but is necessary for ESC differentiation and embryonic development (
Pasini et al., 2007;
Chamberlain et al., 2008). Polycomb components are chromatin remodelers that do not possess DNA-binding domains themselves; and therefore need intermediate proteins and ncRNAs to bind their targets (
Margueron and Reinberg, 2011). Both catalytic subunits of PRC2, Ezh2 and Suz12, have been shown to interact with RNAs. The first study for determining PRC2-interacting RNAs by RIP-seq showed that between 10% and 25% of the ESC transcriptome was pulled-down by Ezh2 immunoprecipitation (
Zhao et al., 2010). Although there was a concern regarding the specificity of the interactions due to the large number of transcripts pulled-down, recent studies support this finding and propose a new model. In this model, PRC2 would not only bind to lncRNAs to mediate silencing, but would also bind to actively transcribed loci where ncRNA species would serve as decoys for PRC2 binding, keeping these loci in a transcriptionally active state (
Davidovich et al., 2013). A similar phenomenon has been described for the DNA methyltransferase Dnmt1, supporting the idea that an RNA-mediated decoy could serve as a more general co-transcriptional regulator than previously expected (
Di Ruscio et al., 2013).
X chromosome inactivation (XCI) is a process that occurs in females during early development for gene dosage compensation, which can also be observed
in vitro upon female ESC differentiation (
Augui et al., 2011;
Schulz and Heard, 2013). Moreover, it has recently been shown that maintenance of both X chromosomes active is a barrier for ESC differentiation (
Schulz et al., 2014). XCI requires the recruitment of PRC2 to the chromosome that will be silenced by the lncRNA
Xist (
Zhao et al., 2008). Ezh2 and Suz12 are thought to be responsible for mediating
Xist-PRC2 interaction, although they do not possess canonical RNA-binding domains (
Zhao et al., 2008;
Kaneko et al., 2010;
Kanhere et al., 2010). Nevertheless, recently the direct interaction of PRC2 and
Xist has been disputed by means of high super-resolution microscopy (
Cerase et al., 2014). More experiments are needed to confirm either of the proposed models and clarify the seemingly contradictory results.
Jarid2 is a PRC2 cofactor that was shown to be critical for PRC2 binding to its target genes for regulation of ESC differentiation (
Peng et al., 2009;
Pasini et al., 2010;
Shen et al., 2009). Two recent studies have identified Jarid2 as another
Xist-interacting RBP (
da Rocha et al., 2014;
Kaneko et al., 2014), which may promote PRC2 recruitment for XCI.
Ezh2 and Suz12 also bind other lncRNAs with critical functions during embryonic development, such as
HOTAIR and
Kcnq1ot1 that mediate
HOXD silencing and gene imprinting, respectively (
Rinn et al., 2007;
Kaneko et al., 2010;
Tsai et al., 2010) (for a deeper review see (
Brockdorff, 2013). In turn,
HOTAIR lncRNA is also bound by the histone demethylase Lsd1. Lsd1 is another epigenetic modifier with important functions in ESC differentiation through its repressive H3 demethylase activity (
Adamo et al., 2011;
Whyte et al., 2012).
HOTAIR may work
in trans by bridging Lsd1 and PRC2 to allow their cooperativity in mediating gene repression (
Tsai et al., 2010).
HOTAIR expression is induced during differentiation and contributes to self-renewal of adult stem cells. It is also required for epithelial-to-mesenchymal (EMT) transition and metastasis in cancer cell lines (
Gupta et al., 2010;
Pádua Alves et al., 2013).
Epigenetic activation
Potential roles of lncRNA-binding RBPs in gene activation are poorly understood. Wdr5 is the first RBP that links lncRNAs to gene activation in ESCs. Wdr5 is a core subunit of the MLL1-4 and SET1A/1B complexes, which is involved in the deposition of histone 3 lysine 4 methylation, a mark of active gene expression. Binding of Wdr5 to lncRNAs recruits it to several loci to facilitate transcription (
Wang et al., 2011;
Gomez et al., 2013). In human foreskin fibroblasts, Wdr5 binding to
HOTTIP lncRNA recruits the MLL complexes to the 5′
HOXA locus, generating a broad range of H3K4me3 and transcriptional activation (
Wang et al., 2011). The mutual regulation of
HOTTIP and Wdr5 generates a positive feedback loop that maintains the activity of the locus. Wdr5 contributes to self-renewal of ESCs through maintaining active chromatin and plays an important role in the generation of induced pluripotent stem cells from differentiated somatic cells (
Ang et al., 2011). A recent high-throughput study has detected over a thousand Wdr5-bound RNA molecules in mESCs, including mRNAs, lncRNAs, pri-miRNAs, and small nucleolar RNAs (
Yang et al., 2014). Some of the lncRNAs bound by Wdr5 were themselves known to be required for ESC pluripotency and differentiation (
Guttman et al., 2011), partly explaining Wdr5’s function in ESC maintenance. Thus, Wdr5 connects lncRNAs with epigenetic remodelers to generate a transcriptional regulation output in ESCs.
Conclusions and perspectives
Even though RBPs have been extensively studied for decades, it is not until recently that they have been implicated in ESC maintenance, differentiation and somatic cell reprogramming. The development of specific techniques for the characterization of RNA-protein interactions has led to an increasing number of RBPs and their binding targets identified and characterized. In ESCs more than five hundred RBPs, almost half of them with previously uncharacterized functions, have been described (
Kwon et al., 2013). The major challenge is to distinguish
bona fide RBPs and target RNAs from potential false positives due to indirect protein associations and the “sticky” nature of RNAs. The use of UV-crosslinking and PAR-CLIP should be sufficient to overcome these problems, revealing only specific RBP targets. The presence of a canonical RBD also supports the potential RBP function of a particular protein, although it is not always required for RNA-protein interaction. Domain mapping and
in vitro binding assays (such as RNA EMSA or REMSA) should help to clarify whether those non-canonical RBPs directly bind to or indirectly interact with RNA molecules.
After pre-mRNA transcription RBPs play important roles in all successive steps for RNA maturation, transport, and stabilization before its final translation into protein. It was historically believed that RBPs were playing merely housekeeping roles in RNA metabolism. Actually, most of the studies on RBPs were focused on their function in constitutive splicing or translational regulation. Nevertheless, an increasing number of studies are now demonstrating that RBPs not only play constitutive roles but are important for the maintenance of a specific cell-state or identity. A good example to illustrate this is the family of heterogeneous ribonucleoproteins (hnRNPs). hnRNPs constitute a diverse family of RBPs with well-known functions in preventing the folding of pre-mRNA into secondary structures, mRNA splicing and transporting to the cytoplasm (
Dreyfuss et al., 1993). Nevertheless, some of the hnRNPs have also been recently shown to be essential for early mouse development and stem cell maintenance, such as hnRNPI/PTB discussed earlier in this review (
Shibayama et al., 2009) or hnRNPA2 (
Ji and Tulin, 2012) and hnRNPU/AUF1 (
Choi et al., 2011). It is now evident that RBPs do not merely act as adaptors between RNAs and other proteins, but are implicated in the active maintenance of a defined cell state.
All the steps of RNA maturation can be subjected to RBP regulation as a fast way to respond to external signaling cues.
In vivo, stem cells exist as a transient pluripotent population that arises from the totipotent morula, and that will give rise to every cell type of the embryo.
In vitro, stem cells can be maintained indefinitely in culture under defined conditions although in a meta-stable transcriptional state that results in heterogeneous populations of cells coexisting in a dish (
Chambers et al., 2007;
Hayashi et al., 2008;
Toyooka et al., 2008;
Tanaka, 2009). Both
in vivo and
in vitro, the transition between pluripotent and primed states requires the action of fast-responding mechanisms. Likely, the RBP-mediated regulation of co-transcriptional regulation, nuclear export, mRNA stabilization or destabilization, and microRNA-mediated degradation is important to regulate the final steady-state levels of proteins needed for differentiation or reprogramming in those transitory states. As previously commented in this review, Oct4 levels, which need to be tightly regulated for ESC maintenance and commitment (
Niwa et al., 2000;
Karwacki-Neisius et al., 2013;
Radzisheuskaya et al., 2013), are fine-tuned post-transcriptionally by many RBPs, which function to control its alternative splicing (Son, Fus, and Tip110) or translation efficiency (Esrp1, Dazl) for example. On the other hand, the regulation is transcript-specific, as other core pluripotency factors such as Esrrb, Sox2, and Nanog are regulated by different RBPs. It remains to be determined how these RBPs are incorporated into pluripotency networks and how those networks interact with the transcriptional regulatory machinery.
RBPs use RNAs as a platform for recruitment of protein complexes that will either modify the RNA itself or other nearby proteins, thereby serving as adaptors between the RNA and protein interfaces. Many RBPs have already been identified with potential functional roles in ESC maintenance, differentiation, and/or somatic cell reprogramming (summarized in Fig. 1 and Table 3). The high prevalence of RBPs in the ESC proteome (
Kwon et al., 2013) suggests that many more are awaiting to be discovered. The recent technical improvement in RNA-immunoprecipitation protocols makes it feasible to determine new RBPs involved in ESC maintenance or differentiation in different ways. First, proteins bound to specific RNAs, such as mRNAs or ncRNAs with important roles in ESC maintenance or differentiation can be easily identified by RNA-IP coupled to mass spectrometry and further validated. Second, genome-wide assays of proteins with differential expression during reprogramming or differentiation are also valuable data sources for selecting and characterizing RBPs with potential important functions in stem cells. Furthermore, published interactomes of the core pluripotency protein network in ESCs include RBPs that could play an important role in coupling pluripotency maintenance and RNA metabolism. As a result, the integration of RBP function in the bigger picture of pluripotency and differentiation networks will lead to a much better understanding of molecular mechanisms underlying pluripotency and reprogramming, as well as directed differentiation of pluripotent cells.
Higher Education Press and Springer-Verlag Berlin Heidelberg