1 INTRODUCTION
A variety of circular RNAs can be produced
via distinct mechanisms, such as direct single-stranded RNA ligation as circular RNA genome, derived from processed rRNAs as intermediates, processed from self-splicing introns as unstable circular transcripts, and etc. (reviewed in [
1–
3]). Among them, at least two types of circular RNAs are processed from nuclear (m)RNA precursors through the spliceosomal pathway [
4], including circular intronic RNAs (ciRNAs) [
5] from excised introns and circular RNAs (circRNAs) [
6–
10] from back-spliced exon(s) (reviewed in [
1,
4]). By taking advantage of non-polyadenylated transcriptome enrichment [
11–
13] and specific computational approaches that identify reads mapped to back-splice junction sites with a reversed genomic orientation (Figure 1A) [
10,
15,
16] (reviewed in [
1,
2,
17]), a large amount of circRNAs have been recently re-discovered from thousands of gene loci in various cell lines/tissues and across different species [
6–
8,
10,
18–
21]. Increasing lines of evidence have suggested that circRNAs could play important roles in gene expression regulation with different mechanisms of action (reviewed in [
1]). These results thus expand our understanding on the complexity and diversity of eukaryotic circular RNAs.
It has been demonstrated that the biogenesis of circRNAs processed from back-splicing requires the canonical spliceosomal machinery [
22,
23]. Different to canonical splicing, back-splicing ligates a downstream 5′ splice site with an upstream 3′ splice site in a reversed order, which is believed to be inefficiently catalyzed by spliceosome (reviewed in [
1,
2]). Recent studies aimed to underscore mechanisms of circRNA biogenesis regulation have shown that both
cis-elements (mainly flanking intronic sequences) and
trans-factors (mainly RNA binding proteins, RBPs) can promote back-splicing for circRNA biogenesis (reviewed in [
1,
2]). Specifically, genome-wide analyses have revealed the positive correlation of across-intron RNA pairing with circRNA formation [
6,
10]. Furthermore, back-splicing is efficiently linked to fast RNA polymerase II elongation [
24] and largely occurs post-transcriptionally with low efficiency [
24,
25]. Here, we highlight recent research progress on the regulation of circRNA biogenesis, focusing on our current understanding of the complex regulation of
cis complementary sequences, especially
Alus in human, on circRNA formation.
2 GENERAL REGULATORY ROLE OF COMPLEMENTARY SEQUENCES ON circRNA FORMATION
Back-splicing is presumably catalyzed by the same canonical spliceosomal machinery as canonical splicing [
22,
23] (reviewed in [
1]), thus back-splicing has been found to compete with canonical splicing [
10,
22] (reviewed in [
2]). Although unfavorably processed in general, circRNA biogenesis can be significantly facilitated by orientation-opposite complementary sequences that juxtapose flanking introns of circularized exon(s) [
6,
10,
25]. These orientation-opposite sequences can be either repetitive sequences, such as
Alu in human, or non-repetitive but complementary sequences [
10,
18]. In theory, RNA pairing formed by orientation-opposite complementary sequences, as short as 30 to 40 nucleotides in length [
25], can bring the downstream donor and upstream acceptor sites close together to promote spliceosome assembly for back-splicing (Figure 1B, bottom)(reviewed in [
2]), however the detailed biochemical evidence is still missing. The large amount of repetitive elements, especially
Alu sequences in human, contribute the most for enhancing circRNA biogenesis [
10,
14,
18]. Over one million copies of
Alu sequences have been found in the human genome, and about half of them are located in intronic regions [
10,
26]. Due to their sequence similarity, orientation-opposite
Alus within certain distance have the potential to form inverted repeated
Alu sequences (IR
Alus) [
27], and when located across flanking introns, IR
Alus could promote circRNA production [
10].
Computational evaluation of pairing capacity of orientation-opposite complementary sequences across circRNA-flanking introns identifies that
SINEs (short interspersed nuclear repetitive DNA elements), especially
Alu elements in human, contribute the most for circRNA formation [
18]. More specifically, among all types of complementary sequences, 93.3% of circRNA-flanking introns in human exhibiting the strongest RNA pairing capacity are from IR
Alus and only a very small portion are from other non-
Alu repetitive sequences and other non-repetitive but complementary sequences [
18]. Despite of higher pairing capacity in both human and mouse, the non-repetitive but complementary sequences are only sparsely present in circRNA-flanking introns, indicating at best a limited role in circRNA formation [
18].
In addition to genome-wide annotation that shows the association of circRNA expression with complementary sequences [
6,
10], direct lines of experimental evidence also demonstrate that orientation-opposite complementary sequences can efficiently enhance circRNA formation. First, using expression vectors, the existence of IR
Alus mimicking endogenous conditions was shown to be required for circRNA expression [
10]. Short complementary sequences (30 to 40 nucleotides in length) were capable of promoting circRNA biogenesis from expression vectors [
25], but stronger pairing capacity with longer sequences could considerably enhance circRNA production [
10]. Second, in endogenous condition, disruption of intronic RNA pairing by CRISPR-Cas9 mediated genome editing resulted in the depletion of circRNA expression, while the linear mRNA counterpart remains largely unchanged [
24]. Finally, distal intronic sequences from different genes could also be juxtaposed after gene fusion to form RNA pairing that flanks the breakpoint of fusion genes, leading to the formation of aberrant fusion-circRNAs in cancer cells [
28].
Other than
cis intronic complementary sequences, several protein factors have been reported to regulate circRNA biogenesis, either in a positive [
22,
29] or negative [
19,
20] manner. It should be noted that
cis-elements and
trans-factors can also function in a combinatorial manner to control circRNA biogenesis [
30]. Since hundreds of RBPs have been recently identified [
31,
32], it will be of great interest to identify other
trans-factors and their combinatorial regulation with
cis-elements on circRNA biogenesis.
3 COMPETITION OF BACK-SPLICING ANDCANONICAL SPLICING
Back-splicing can compete with canonical splicing [
10,
22]. Orientation-opposite complementary sequences across two separate introns that flank circle-forming exons are efficient, but may not be sufficient, to boost back-splicing for circRNA formation [
10] (reviewed in [
2]). In fact, similar RNA pairing could also be formed within individual introns in the same gene locus, which competes with the formation of RNA pairing across flanking introns, leading to distinct choices for either canonical splicing or back-splicing (Figure 1B, top) [
10]. By introducing additional RNA pairing within individual intron in expression vector, canonical splicing was observed to compete against back-splicing, resulting in the reduction of circRNA biogenesis [
16]. Recently, a quantitative computational method was developed to evaluate RNA pairing capacity of complementary sequences across given circRNA-flanking introns, in which many factors, including sequence pairing strengths and competition ability with other similar complementary sequences, were considered [
18].
4 COMPETITION OF ALTERNATIVE INVERTED REPEATED Alu PAIRING ON ALTERNATIVE BACK-SPLICING SELECTION IN HUMAN
Multiple circRNAs could be generated in a single gene locus, through either alternative back-splicing or alternative splicing [
10,
16]. The alternative back-splice site selection is correlated with the existence of multiple RNA pairs that bracket different circle-forming exons [
16]. Specifically, an across-intron RNA pairing that flanks the proximal back-splice sites would lead to proximal back-splice site selection, and meanwhile, an across-intron RNA pairing that flanks the distal back-splice sites would lead to distal back-splice site selection (Figure 1C) [
16]. Alternative back-splicing is more common in human than in other examined non-primate species, largely due to the abundance of primate-specific
Alu sequences in the human genome [
14,
18]. Genome-wide analysis suggested that over 70% of highly-expressed circRNAs with alternative back-splicing contained alternative RNA pairing across both proximal and distal back-splice sites in human [
16]. Importantly, the competition of alternative RNA pairing leading to alternative back-splice site selection could be recapitulated in expression vectors [
16].
Although having the same genomic background, i.e., the same
Alu sequences in introns, expression of alternative back-spliced circRNAs is largely diverse in various examined human cell lines/tissues [
16]. It thus suggests other layers of regulation on alternative back-splicing. Very recently, by using genome-wide siRNA screening and an efficient circRNA expression reporter, we have identified a series of protein factors as key regulators in circRNA biogenesis [
33], including those involved in immune responses. Some protein factors regulate circRNA biogenesis in a combinatorial manner by especially associating with intronic
Alu elements [
33,
34]. Interestingly, a number of previously-unannotated exons were identified in circRNAs, but not in their linear RNA counterparts [
16]. It suggests that the canonical spliceosomal machinery might recognize and catalyze back-splicing in a different manner to splicing. Nevertheless, the finding of widespread alternative back-splicing increases our knowledge on circRNA biogenesis and its regulation.
5 CONCLUDING REMARKS
By identifying reads mapped to back-splice junction sites, circRNAs have been detected genome-wide in various cells/tissues, and are quantified somewhat by back-splice junction reads. Yet, no direct expression comparison has been established between circRNAs and their correlated linear RNAs bioinformatically, since they share most genomic sequences and that different strategies are applied for their quantification. On the one hand, linear RNAs are determined by normalized reads aligned to full-length genomic regions; on the other hand, circRNAs are determined solely by reads spanning back-splice junction sites (Figure 1A). Nevertheless, biochemical evidence has suggested that the expression of circRNAs is roughly 5%‒10% of their linear RNAs by qPCRs analysis of selected cases [
8]. In the future, it is of great interest to develop new pipeline(s) to directly compare abundance of circRNAs and their correlated linear RNAs in a genome-wide manner.
Global analyses have also revealed complex roles of
cis intronic complementary sequences on circRNA biogenesis [
10,
22]. Among these complementary sequences,
SINEs (short interspersed nuclear repetitive DNA elements), especially
Alus in human, contribute the most for circRNA formation [
18]. The existence of primate-specific
Alus also results in the complex alternative back-splice selection in human [
18]. It should be noted though that some circRNAs are processed without the regulation of complementary sequences, Such as
CDR1as [
7]. Interestingly,
trans-factors are also suggested to be involved in circRNA formation together with
cis-complementary sequences [
33,
34]. Finally, how circRNAs are exported and degraded also remains to be addressed. Fully understanding of the life-cycle of circRNAs and its regulation will provide molecular basis for elucidating their functions.
Higher Education Press and Springer-Verlag Berlin Heidelberg