INTRODUCTION
Non-coding RNAs, including ribosomal RNAs (rRNAs), transfer RNAs (tRNAs) and spliceosomal small RNAs (snRNAs), are extensively modified with more than 100 chemically distinct nucleotides that promote their activi-ties in translation and splicing. The recent development of transcriptome-wide approaches for the detection of RNA modifications led to the identification of
N6-methyladenosine (m
6A) [
1,
2], pseudouridine (Ψ) [
3–
6], 5-methylcytidine (m
5C) [
7], 5-hydroxymethylcytidine (hm
5C) [
8],
N1-methyladenosine (m
1A) [
9,
10], 2′-
O-methylation (Nm) [
11] and
N6, 2′-
O-dimethyladenosine (m
6A
m) [
12,
13] as marks on messenger RNAs (mRNA) of varying abundance (Figure 1A). While excitement about their regulatory potential has led to a race to identify new modifications, our understanding of the functions of these modifications in mRNA processing remains limited. As such, future research efforts are likely to be devoted to the functional characterization of individual modifications.
So far the identified modifications in mRNA have been implicated in regulating diverse steps of the mRNA life cycle including splicing, 3′ end processing, export, translation, mRNA stability and decay [
14–
17]. However, most RNA modifications have been identified in poly(A)
+ selected mRNA and their roles in nuclear processing are unknown. Largely indirect evidence suggests that various modifications are added to pre-mRNA in the nucleus where they could function in pre-mRNA processing. In this review, we will discuss what is known and unknown about the prevalence of RNA modifications in pre-mRNA and their roles in splicing. We will also present our view on the key areas of future investigation to determine the functions of RNA modifications in pre-mRNA. m
6A was the first modification that was mapped through the transcriptome, and thus there is a richer body of work to contextualize the function of m
6A in pre-mRNA processing. We will therefore frame our discussion around m
6A as an example to illuminate mechanisms and infer principles of how diverse RNA modifications may affect pre-mRNA processing.
SEQUENCING-BASED METHODS FOR IDENTIFICATION OF RNA MODIFICATIONS
Methods to identify RNA modifications can be grouped into two broad categories: antibody-based enrichment of the RNAs that contain the modification and nucleotide identification from reverse transcriptase stops. The first methods developed for sequencing-based detection of m
6A, m
1A and hm
5C used antibodies that were specific for the respective RNA modifications to immunoprecipitate RNA fragments and sequence the modification-containing RNAs [
1,
2,
8–
10]. These methods produced the first transcriptome-wide RNA modification maps but lacked precise single-nucleotide resolution. Newer versions of antibody-based detection methods include the addition of a UV crosslinking step to covalently link antibodies to modified RNA; subsequent analysis of crosslink-induced mutations precisely identifies the modification site [
12,
18]. Similarly, certain reverse transcriptases that mis-incorporate when they encounter the modified nucleotide have been combined with antibody enrichment to identify modified RNA sites [
19,
20].
Chemical reactivity that is specific for a given RNA modification has also been exploited to identify modified sites by sequencing. For example, pseudouridine (Ψ) can be selectively modified with the chemical
N-cyclohexyl-
N′-beta-(4-methylmorpholinium) ethylcarbodiimide p-tosylate (CMCT). The bulky covalent CMC-pseudouri-dine adducts block reverse transcriptase and allow for sequencing-based detection of pseudouridines [
3–
6]. Bisulfite sequencing has been used to identify m
5C sites in RNA based on the fact that this methylation makes cytosine resistant to chemical deamination to uracil in the presence of bisulfite [
7]. Other methods for detecting m
5C have relied on covalent capture of trapped substrates of the m
5C methyltransferase NSUN2. Immunoprecipitation of covalently-bound substrates allows for site-specific detection of m
5C-induced reverse transcriptase stops or transversions in the case of substrates captured by covalent trapping with the cytidine analog 5-azacytidine [
21,
22]. Variations of these techniques using modification-specific antibodies, distinct chemical reactivities and capture of enzymatic substrates are likely to be adapted for genome-wide identification of other mRNA modifications in diverse RNA populations.
EVIDENCE FOR NUCLEAR LOCALIZATION OF RNA-MODIFYING ENZYMES
To aid in the functional characterization of modifications in mRNA, it is important to know when the mark is deposited during the mRNA lifecycle. This information can significantly narrow the investigation to elucidate the functional effects of mRNA modifications. For example, if a modification is added co-transcriptionally, prior to splicing, it has the potential to regulate any step of mRNA processing whether nuclear or cytoplasmic, including pre-mRNA splicing. In contrast, if a modification is added in the nucleoplasm after release from chromatin it will likely have little effect on splicing [
23,
24], but may participate in nuclear export or nuclear RNA decay. Similarly, if an enzyme that adds or removes RNA modifications (called writers and erasers, respectively) is present in the cytoplasm then it is most likely to regulate cytoplasmic processing such as translation and mRNA stability. Determining the localizations of the modification writers, erasers and readers (proteins that recognize specific modifications) will therefore shed light on the function of a given modification (Figure 1B). Furthermore, this information is likely to reveal whether the fate of a mature mRNA in the cytoplasm can be quickly altered by a modification or if new transcription is required to incorporate the modification and activate a distinct gene expression program.
The m6A methyltransferase complex is nuclear
m
6A is present at RRACH motifs throughout coding sequences, but is dramatically enriched in last exons that often contain stop codons [
1,
2,
18]. The mechanism underlying this biased distribution is not understood and may involve restrictions on the activity of either the methyltransferase, a demethylase, or both. The mammalian m
6A methyltransferase complex consists of the catalytic subunit methyltransferase like 3 (METTL3), methyltransferase like 14 (METTL14), and a regulatory subunit Wilms’ tumor 1-associating protein (WTAP) [
25,
26]. WTAP is nuclear and has been previously studied as a splicing factor in human and
Drosophila cells [
27,
28]. The entire methyltransferase complex co-localizes with nuclear speckles, which are nuclear bodies that store splicing factors [
25,
26]. Analysis of the binding sites of the m
6A methyltransferase complex by photoactivatable ribonucleoside-enhanced crosslinking-immunoprecipitation (PAR-CLIP) and sequencing revealed that the binding motif for individual components of the methyltransferase complex matched the consensus RRACH where m
6A is found [
25]. Altogether, these initial observations suggested that m
6A was added to pre-mRNA in the nucleus.
Analysis of transcriptome-wide binding of RNA-modifying enzymes can provide insight into whether a modification is likely to be added to introns or pre-mRNA. For example, METTL3 was shown by overexpression and PAR-CLIP analysis of total RNA to bind primarily to exons, while ~30% of the identified binding sites were in introns [
25]. METTL3 binding to introns was at least suggestive of m
6A modification of pre-mRNA.
Recent studies found that METTL3 associates with Pol II in human cells, providing a link between transcription and m
6A addition (Figure 1B) [
29]. Similarly, components of the m
6A methyltransferase complex in
Drosophila co-localize with Pol II at sites of transcription [
30]. In human cells, expression of a slow Pol II mutant increased the levels of total m
6A-containing poly(A)
+ mRNA that could be immunoprecipitated with an m
6A antibody. Analysis of individual mRNAs revealed increased m
6A levels when Pol II elongation was slow, providing evidence of coupling between Pol II-mediated transcription and mRNA methylation [
29]. One other human m
6A methyltransferase, METTL16, was recently identified as the U6 snRNA methyltransferase with MAT2A as its only known direct mRNA methylation substrate. Consistent with its role in regulating MAT2A splicing [
31] and in methylating the U6 snRNA [
31,
32], METTL16 was found to localize to the nucleus (Figure 1B) [
33].
Pseudouridine synthases are active in the nucleus
Pseudouridine was found to be relatively uniformly distributed in mRNA with the majority of identified sites within coding regions [
3–
6]. Pseudouridylation in mRNA has so far been demonstrated to be catalyzed primarily by the standalone pseudouridine synthases (PUS) that also pseudouridylate tRNAs. In yeast, most mRNA pseudouridines have been genetically assigned to two conserved Pus proteins, Pus1 and Pus7, which are nuclear in growing cells. Pus7 was shown to re-localize to the cytoplasm during heat shock coincident with increased mRNA pseudouridylation [
4]. Seven out of nine of the known yeast Pus proteins have been proposed to pseudouridylate mRNA targets leading to the presence of this modification in diverse sequence contexts [
3–
5]. In addition, yeast
CBF5 has also been shown to be genetically required for pseudouridylation of a subset of mRNAs [
4].
CBF5 encodes the catalytic subunit of the box H/ACA snoRNA-guided pseudouridine synthase that targets nascent pre-rRNA in the nucleolus [
34,
35].
Human cells from patients with dyskeratosis congenita, which harbor mutations the human ortholog of Cbf5, likewise have decreased pseudouridylation signal at some mRNA pseudouridine sites implying the human enzyme also pseudouridylates mRNAs [
4]. Of the human standalone PUS enzymes, only PUS1, PUS7 and TRUB1 have thus far been demonstrated to pseudouridylate mRNA targets [
6,
36]. All three of these PUS proteins have been shown to at least partially localize to the nucleus or have nuclear isoforms (Figure 1B) [
36–
38]. PUS7 has been further shown to be chromatin-associated and, more specifically, associated with active Pol II promoters and enhancers [
39], suggesting it acts on nascent pre-mRNA. Furthermore, nuclear resident non-coding RNAs such as MALAT1 are pseudouridylated in human cells demonstrating that human pseudouridine synthases are active in the nucleus where they could act on pre-mRNA [
3].
m5C and hm5C
As with the pseudouridine synthases, the canonical function of the known mRNA m
5C methyltransferase NSUN2 [
7,
21,
22] is in tRNA methylation. The initial m
5C profiling by bisulfite sequencing reported thousands of putative m5C sites in mRNA [
7]. Low throughput experiments suggested that NSUN2 was an mRNA m
5C methyltransferase of a few m
5C containing mRNAs. A more recent study, found that depletion of NSUN2 globally reduced the m
5C to C ratio in poly(A)-selected and rRNA depleted mRNA based on mass spectrometry analysis [
40]. Furthermore, the signal for about half of the m
5C sites identified in the same study by bisulfite sequencing was modestly reduced upon depletion of NSUN2. Studies that sought to identify m
5C targets of NSUN2 by covalent capture of substrates found far fewer, around three hundred, m
5C sites in mRNA [
22]. The discrepancy in these results could be explained by increased specificity of the catalytic capture method for direct targets, in which case there should be unidentified m
5C methyltransferases whose activity is reduced by depletion of NSUN2. Alternatively, a signal for m
5C can represent non-conversion events or other cytidine modifications (
e.g., hm
5C) that are also resistant to bisulfite-based identification [
41]. Although the extent of mRNA m
5C remains controversial, the documented m
5C mRNA methyltransferase NSUN2 is nuclear and could deposit m
5C in nascent RNA (Figure 1B) [
38,
42].
m
5C can be oxidized to form a derivative modification, 5-hydroxymethylcytidine (hm
5C), by the ten-eleven translocation (Tet) enzymes. hm
5C was mapped in
Drosophila by antibody-based enrichment of hm
5C-modified RNA. Over 80% of identified sites that were enriched in UC-rich regions were reduced upon knockdown of
Drosophila TET (dTET) [
8]. All three of the human TET enzymes can convert m
5C to hm
5C in RNA, as determined from bulk measurements [
43,
44]. Consistent with their role in DNA hydroxymethylation, the TET proteins reside in the nucleus (Figure 1B) [
38,
42].
Most of the mRNA modifying enzymes characterized to date reside within the nucleus where they have the potential to interact with nascent pre-mRNA. However, the exact timing and location of mRNA modification has rarely been determined. In principle, analysis of the RNA binding sites of RNA modifying enzymes by methods such as PAR-CLIP in combination with sub-cellular fractionation could address this question. In practice, such analysis of RNA binding targets of enzymes should be interpreted with caution. Except in the case of mutant proteins engineered to trap catalytic intermediates, enzymes are likely to interact transiently with their substrates and no longer interact once the substrate has been modified. This was shown for METTL16, which methylates an adenosine in one of its substrates in SAM-replete conditions. When SAM is depleted there is no methyl donor available and the interaction between METTL16 and interaction with its substrate is stabilized [
31]. Therefore, interactions captured by typical CLIP assays may represent weaker or non-substrates. Performing RNA binding analysis on catalytically dead mutants or in the absence of cofactors necessary for modification may better capture the substrate-binding landscape of RNA modification enzymes.
CO-TRANSCRIPTIONAL PRE-mRNA MODIFICATIONS
As noted above, many modifications have been profiled exclusively in poly(A)
+ mRNA (pseudouridine, m
1A, m
5C, m
6A
m) and thus overlook potential sites in introns (Table 1). Even profiling of total RNA (
e.g., by antibody based enrichment) does not give a complete picture of the extent of pre-mRNA modification. Because total RNA is a mixture of mature mRNA, pre-mRNA, and mRNAs containing retained introns, intronic reads may not represent introns that come from pre-mRNA. Moreover, the much greater abundance of mature mRNA likely leads to reduced detection of modifications in introns. Notably, profiling m
5C in purified nuclear polyadenylated RNA suggested a much higher prevalence of intronic m
5C than similar studies of (predominantly cytoplasmic) poly(A)
+ mRNA (~60%
vs. ~20%) [
48]. Thus, estimates of the fraction of intronic modifications made by total RNA sequencing should be regarded as lower bounds.
While considerable progress has been made in profiling diverse modifications in mRNA, most of the studies have been performed on mature poly(A)
+ mRNA, precluding the identification of these modifications in introns and in pre-mRNA. Thus, defining the landscape of pre-mRNA modifications will illuminate whether it is feasible for any modification to have a direct effect on nuclear processing of pre-mRNA. Adapting nucleotide modification profiling techniques to sequence nascent RNA could reveal whether a modification is added co-transcriptionally to pre-mRNA. One promising approach to identify modi-fications in nascent pre-mRNA is to sequence chromatin-associated RNA. Using non-ionic detergent and 1 mol/L urea, it is possible to separate the soluble, loosely associated RNA and protein components from the insoluble chromatin fraction. Nascent RNA remains tethered to Pol II ternary complexes and thus it remains chromatin bound (based on Ref. [
50]). Recently, several groups have coupled this fractionation protocol to high-throughput RNA-sequencing to enrich for unspliced introns containing pre-mRNA and quantify the extent of co-transcriptional splicing [
23,
24,
51–
53].
So far only m
6A has been definitively shown to be added to pre-mRNA co-transcriptionally (Table 1). A recent study performed single-nucleotide resolution m
6A profiling to identify m
6A sites from three cellular fractions in HeLa cells: chromatin-associated, nucleoplasmic and cytoplasmic RNA. Surprisingly, although ~75% of chromatin-associated pre-mRNA reads are intronic and most pre-mRNAs in this fraction are incompletely spliced [
23,
45], ~90% of pre-mRNA m
6As reside in exons as compared to introns [
45]. This distribution cannot be explained by the occurrence of the RRACH motif recognized by METTL3 in introns compared to exons since the motif occurs more frequently in introns. Nevertheless, although intronic m
6As account for only ~10% of total m
6A peaks identified in pre-mRNA, over two thousand m
6A sites have been reported to reside in introns and may have important functions [
45,
46]. Analysis of the m
6A landscape in each of the three fractions revealed ~90% overlap in detected m
6A peaks suggesting limited dynamics of the m
6A mark after pre-mRNA release from chromatin [
45]. Thus, any regulation of m
6A levels by methyltransferases and demethylases is likely almost exclusively co-transcriptional under basal conditions, consistent with their localization. It is still possible that dynamic methylation and demethylation occurs co-transcriptionally in the conditions profiled, but that the steady state levels of total m
6A are the same in the cytoplasm. Whether pre-mRNA methylation is more dynamic in other biological conditions remains to be seen.
An outstanding mechanistic question is how cells achieve biased distribution of a modification over genomic features. m
6A is largely confined within exon boundaries, which could be achieved if the m
6A methyltransferase complex were recruited during exon definition prior to splicing, potentially by cross exon interactions between spliceosomal components. However, METTL3 does not appear to associate with components of the U1 and U2 snRNPs [
45]. METTL3 does associate with Pol II [
29], which is thought to transcribe exonic and intronic regions at different rates [
54]. Therefore, one potential explanation for the exon-restricted distribution of m
6A is that METTL3 rides along the gene body with Pol II and slower transcription through exons compared to introns increases the dwell time of METTL3 in the exon long enough to allow catalysis. Alternatively, the specificity of m
6A in exons in pre-mRNA may be imparted by demethylases. If demethylase activity were greater in introns, then preferential removal of intronic m
6As could lead to the observed exonic bias at steady state. In fact, one m
6A demethylase, the fat mass and obesity-associated protein (FTO) was reported to bind primarily to introns and these binding sites correlate with known sites of m
6A modification [
55]. Therefore, the balance of the expression of the methyltransferase and demethylase could determine the level of net modification in exons versus introns in different cell types and under different conditions.
Two m
6A demethylases have been identified to date, FTO and ALKBH5 [
47,
56]. Recently, FTO was found to preferentially demethylate m
6A in the context of m
6A
m as the first nucleotide of the cap both
in vitro and in bulk total mRNA assays [
13]. Taken at face value, this result suggests limited potential for regulation of FTO to affect internal intronic methylation levels. However, the bulk assays are likely insensitive to changes in a small set of internal m
6As that could be demethylated by FTO, as has been shown for a subset of internal m
6A residues in acute myeloid leukemia cells [
57]. Going forward, studies on the spatiotemporal relationship between Pol II elongation, spliceosome assembly and modification should clarify the mechanisms by which pre-mRNA modifications are added and removed and how these processes are interconnected.
SPLICING AND SPLICING REGULATION
The spliceosome assembles
de novo on the pre-mRNA through interactions with splice site sequences that define the exon/intron boundaries to catalyze intron removal and exon joining. Splicing is achieved by binding of the snRNPs to the splice sites, an interaction which is primarily mediated by base pairing interactions between the snRNA component of snRNPs and the splice site sequences [
58]. In mammals these interactions between the snRNPs and the splice site sequences are relatively weak, allowing for regulation. Alternative splicing is the differential inclusion of exons or introns, or a portion of these, into the mature mRNA. Assembly of the spliceosome at splice sites is regulated by auxiliary factors (splicing factors) recruited to regulatory sequences within an alternative exon or its flanking introns to either promote or repress the snRNP-pre-mRNA interactions. Two major splicing regulator protein families are the serine/arginine rich (SR) proteins and heterogeneous nuclear ribonucleoproteins (hnRNPs) [
59,
60]. SR proteins have been traditionally thought to bind to exons to enhance splice site usage and exon inclusion, whereas hnRNPs typically induce exon skipping. It is now clear that SR proteins can also repress and hnRNP proteins can also enhance exon inclusion depending on their binding position on the pre-mRNA [
61–
63]. Many aspects of splicing mechanism and alternative splicing have recently been reviewed in greater detail [
62,
64]. Here we will focus on specific splicing events that have been demonstrated or are likely to be affected by pre-mRNA modifications.
Modifications in the snRNAs influence splicing
The first evidence that RNA modifications function in splicing came from the study of modifications in the snRNAs. All five of the snRNAs contain modified nucleotides including extensive pseudouridylation and 2′-
O-methylation as well as one
N6-methyladenosine in U6 snRNA. Pseudouridylation and 2′-
O-methylation of the snRNAs are catalyzed primarily by the box H/ACA and box C/D snoRNPs, respectively. snRNA modifications are found in important functional regions and affect splicing activity as exemplified by studies of U2 snRNA. Pseudouridylation of U2 snRNA in the region that base pairs with the pre-mRNA branch point sequence is required for splicing of one tested pre-mRNA substrate in
Xenopus oocytes [
65]. Consistent with a stimulatory effect on splicing, these pseudouridines were found to stabilize pre-mRNA-snRNA interaction by solution NMR [
66]. Endogenous pseudouridylation at two of the positions of the yeast U2 snRNA in the branch site recognition region contributes to pre-mRNA splicing by altering the RNA secondary structure of the branchpoint - interacting stem loop of the U2 snRNA to facilitate the binding of the essential splicing factor Prp5 [
67]. U2 snRNA lacking either of three endogenous pseudouridines found in the first 20 nucleotides decreased splicing efficiency and U2 snRNA lacking all three pseudouridines abrogated splicing of a pre-mRNA substrate in human HeLa cell nuclear extracts depleted of U2 snRNA and reconstituted with the modification defective U2 snRNAs [
68]. Likewise, four out of five 2′-
O-methylated nucleotides in U2 snRNA were shown to be essential for splicing of the same substrate in HeLa nuclear extract [
68].
While U2 snRNA is the most heavily modified of the snRNAs, the other snRNAs all contain modified nucleotides. Pseudouridines in the U1 and U5 snRNAs are also clustered around the regions that base pair with the pre-mRNA, which are important for splicing [
69]. The pseudouridines and 2′-
O-methylated nucleotides in U4 and U6 snRNAs occur in regions where these two snRNAs base pair with each other and could contribute to duplex stabilization. Finally, m
6A is present in the U6 snRNA in the region that base pairs with the 5′ splice site [
70,
71]. The m
6A nucleotide in the U6 snRNA has been shown to base pair with a pre-mRNA substrate and that adenosine is critical for splicing catalysis. The mechanism by which the m
6A in U6 snRNA affects splicing is unknown, however it could influence the strength of base pairing with 5′ splice sites or be important for the extensive structural rearrangements in the U6 snRNA during spliceosome assembly and catalysis [
72,
73].
Some modifications in the snRNAs are induced by changing cellular conditions to influence splicing. The yeast U2 snRNA has stress-inducible pseudouridines: one pseudouridine is added by Pus7 in response to heat shock and nutrient deprivation and the other is added by the box H/ACA snoRNP Cbf5 in response to nutrient deprivation through the mTOR signaling pathway [
67,
74]. Both pseudouridines are in stem II of U2 snRNA, a region where conformational changes are necessary for splicing catalysis. Recently, it was shown that these pseudouri-dines alter the conformational dynamics of stem II in a distinct manner [
75]. The cellular functions for these inducible modifications remain to be determined, but these results suggest that pseudouridines may be employed to tune snRNA activity and impact splicing. Similarly, an inducible pseudouridine in the U6 snRNA is necessary for efficient splicing of suboptimal introns and is required for filamentous growth in yeast [
76]. The evidence presented above supports an important function for snRNA modifications in splicing. The extent to which snRNA modifications influence interactions with individual pre-mRNAs and alternative splicing is still an open question.
Evidence that pre-mRNA modifications function in alternative splicing
Since the first identification of m
6A locations transcriptome-wide, many studies have reported widespread changes in alternative splicing and isoform expression following depletion of the m
6A methyltransferase METTL3 and the demethylase FTO [
1,
45–
47,
55,
77–
79]. However, ~20%‒30% of METTL3- and FTO-sensitive alternatively spliced exons have been demonstrated to contain m
6A [
1,
18,
46,
47,
55,
78]. This is not surprising given that manipulation of the methyltransferases and demethylases is likely to have a plethora of indirect effects.
The extent to which m
6A directly affects splicing regulation is a subject of debate since the reported number of alternative splicing events that are regulated by m
6A levels (are affected by METTL3 knockdown/knockout) and also contain m
6A varies substantially (from one to several hundred) among published studies [
1,
18,
46,
47,
55,
78]. These discrepancies may exist for a number of reasons. Different cell lines are likely to have distinct pre-mRNA m
6A methylation pattern and thus varying effects on m
6A-dependent splicing regulation. Another factor that should be considered when generalizing from splicing analysis of RNA-seq data is that algorithms to detect alternative splicing can differ substantially in the number of splicing events detected, and produce different results with varying user-defined parameters. Some are more sensitive than others to outliers in the data; some are better at detecting complex splicing events and/or rely more/less on annotations of known isoforms [
80]. Thus, apparent discrepancies in the prevalence of m
6A-sensitive splicing events may reflect data analysis and arbitrary cutoffs rather than biological or experimental differences. Validation of the splicing changes detected by each algorithm by RT-PCR to establish a validation rate with the specified cutoff and datasets could help enhance the confidence in data analysis and conclusions derived from it. Nevertheless, multiple studies point to a limited, but perhaps direct role for m
6A in alternative splicing regulation.
Importantly, most of the differential splicing analysis described above has been limited to internal cassette exons and has not included alternative last exons or analysis of the effect of intronic m
6As on splicing . Given the enrichment of m
6A over last exons, m
6A-sensitive alternative last exon usage should be investigated in more detail. Similarly, analysis of the effect of intronic m
6As on constitutive and alternative splicing of their neighboring exons may reveal functions for intronic m
6As in pre-mRNA splicing regulation as has been shown in a recent study [
46].
Broad assertions about the widespread nature of alternative splicing regulation by mRNA modifications have come from analysis of poly(A)
+ mRNA-seq following knockdown or knockout of the “writers” and “erasers”. It is difficult to determine the role of pre-mRNA modifications from such experiments because of the high probability of indirect effects. What is needed is further investigation of individual pre-mRNA modifications in mechanistic detail. This will establish paradigms for modification mediated regulation, to then identify co-regulated events that share sensitivity to multiple factors (
e.g., splicing factor availability and/or binding, position of the modification and sequence context) from highthroughput genomic data sets.
In vivo mechanistic studies have proved to be challenging given that knockouts and knockdowns of RNA modification enzymes can lead to a plethora of indirect effects that can confound results. This is especially true for the mRNA modifications that co-opt the tRNA modifying machinery (such as m
1A and pseudouridine), because genetically manipulating these enzymes is likely to cause widespread effects downstream of perturbing the biogenesis of functional tRNA and other non-coding RNAs. Orthogonal approaches are urgently needed for site-specifically blocking or adding a modification to determine its ability to induce or rescue the molecular phenotypes identified from knockdown/knockout experiments. Along these lines, Yu and colleagues have engineered box H/ACA snoRNAs to guide site specific pseudouridylation of targets of interest, albeit with limited efficiency [
81,
82]. Other approaches to direct RNA modification writers or erasers to specific sites or subsets of sites would significantly advance the field.
Minigene reporters are useful tools for dissecting splicing regulatory mechanism and can readily be adapted to test the functional impact of a single modification on pre-mRNA splicing. These reporters, which contain the region of a regulated splicing event including the exon and flanking introns, must be tested to verify that they recapitulate the endogenous gene’s modification pattern and splicing sensitivity to genetic manipulation of RNA modification enzymes. Then, minigene mutagenesis to inhibit modification can be carried out to distinguish direct effects of an RNA modification on splicing from indirect effects of globally perturbing modification writers and erasers. If a local RNA modification directly influences splicing, the pre-mRNA processing effect observed upon modification enzyme knockout/knockdown will be abolished in the context of mutations to the modified nucleotide or mutations in the sequence or structure that is required for modification by the RNA modifying enzyme (
e.g., the RRACH motif for m
6A or the hairpin loop for TRUB1-dependent pseudouridylation [
36]). Mutations of the modified nucleotide that alter splicing could represent the need for modification or the requirement for the nucleotide (modified or unmodified) as part of a regulatory sequence. Thus, similar results from multiple mutant minigene constructs would provide strong evidence that it is indeed loss of modification at the given pre-mRNA site that has a functional consequence for splicing. Furthermore, advances in genome editing with CRISPR/Cas9 now allow testing such mutations in the endogenous context.
In vitro splicing of splicing reporters in splicing competent nuclear extracts can serve as a complementary approach to minigene mutagenesis to study the effects of RNA modifications on splicing. Site specific incorporation of a modified nucleotide into an
in vitro splicing substrate can easily be achieved
in vitro [
83], and allows direct testing of the consequence of a modified nucleotide in splicing without knowing the corresponding RNA-modifying enzyme that installs the modification. However, this approach may not be feasible for all pre-mRNA substrates since splicing is inefficient
in vitro and does not always recapitulate endogenous regulation. Pre-mRNA modifications that are found to be required for splicing regulation could influence splice site selection by diverse mechanisms.
Molecular mechanisms by which pre-mRNA modifications regulate splicing
Pre-mRNA modifications have the potential to impact splicing by three main mechanisms: by altering RNA-RNA interactions, by modulating RNA-protein interactions, and, indirectly, by influencing pre-mRNA-secondary structure (Figure 2). Pre-mRNA modifications that are deposited within the splice site sequences could influence RNA-RNA interactions between the snRNAs and the pre-mRNA (Figure 2A) and thereby regulate splicing directly, by stabilizing or destabilizing base-pairing between splice sites and spliceosomal snRNAs (as discussed for snRNAs above). m
5C has been shown to have structural stabilization properties when present in tRNA [
84]. Similarly, pseudouridine [
85,
86], hm
5C [
87] and Nm [
88,
89] have all been shown by thermal denaturation experiments to stabilize RNA duplexes by (1–2 kcal/mol) as compared to their unmodified counterparts, while m
6A has been shown to destabilize RNA duplexes to a similar magnitude [
90,
91]. Regulated modifications in the splice sites could influence alternative splice selection by increasing or decreasing the stability of base pairing interactions between the pre-mRNA the snRNAs. Given the strict RRACH sequence requirement for METTL3-mediated m
6A addition, it is unlikely that many m
6As are found at the splice site sequences (5′ splice site GURAGU, branch site YNYURAC, 3′ YAG consensus) under any conditions [
45]. However, other modifications such as pseudouridine are added in diverse sequence contexts and as such could be added to the splice site sequences [
3]. In one study, 2′-
O-methylation of the branch site adenosine inhibited splicing of an intron of the adenovirus pre-mRNA, as expected since the 2′ hydroxyl of the adenosine is required for the first step of splicing catalysis [
92]. If cryptic sites were available, these were used when the canonical branch point adenosine was methylated. Thus, 2′-
O-methylation of an endogenous branch site could inhibit splicing or modulate branch site selection although mammalian branch site identification remains challenging [
93–
95]. Nm has been identified as a moderately abundant modification in purified human mRNA by mass spec analysis, and a sequencing based method to map the Nm RNA modification found 16% of putative Nm sites in introns [
11]. Further validation of intronic Nm sites is needed as the enrichment of a primer sequence motif among these candidate Nm sites suggests a pervasive artefact due to mispriming during reverse transcription [
96].
Pre-mRNA modifications could also influence splicing by directly affecting binding of various RNA-binding proteins to their RNA targets (Figure 2B). In the case of m6A, dedicated readers such as the YTH domain family of proteins directly and specifically interact with the methylated adenosine. For example, YTHDC1, the nuclear reader of m6A, preferentially interacts with m6A-containing pre-mRNA (as discussed below). In addition, modifications in pre-mRNA could more subtly alter the affinity of known splicing factors to their binding sites on pre-mRNA. In this manner, controlled deposition of a modification in exonic or intronic splicing enhancer or silencer elements could co-regulate splicing of a subset of targets. Alternatively, modifications that alter the stability of RNA duplexes could indirectly alter splice site accessibility and/or splicing factor binding by changing pre-mRNA secondary structure, as has been demonstrated for certain intronic m6A modification sites (see below) (Figure 2C). Although the exact magnitude of direct splicing regulation by m6A remains to be refined, in individual cases that have been studied in mechanistic detail m6A does function to regulate pre-mRNA splicing for important biological processes. In the following sections, we will review the known mechanisms of m6A-dependent splicing regulation through altered direct and indirect RNA-protein interactions.
METTL16 acts as an m6A reader to promote intron splicing
The primary SAM synthetase in human cells is encoded by the gene
MAT2A and serves as the primary methyl donor for most methylation reactions in the cell. In SAM replete conditions, METTL16, the U6 snRNA methyltransferase, transiently interacts with and specifically methylates an adenosine in a hairpin in the 3′-UTR of the
MAT2A gene leading to retention of the upstream intron (Figure 2D) [
31]. The intron retained isoform of MAT2A is degraded in the nucleus and consequently
MAT2A mRNA and protein levels decrease [
97]. In contrast, when SAM levels are low, slowed catalysis caused by lack of the methyl donor increases
METTL16 occupancy on the hairpin which promotes splicing of the intron. The positive local effect of METTL16 on splicing was verified by tethering of METTL16 to the 3′-UTR hairpin, which was sufficient to enhance splicing of the upstream intron. Mutational studies provided further support for this regulatory mechanism: mutation of the methylated adenosine or the METTL16 recognition sequence of the hairpin reduced METTL16 binding, abrogated methylation of the hairpin, and decreased METTL16-induced splicing. In addition, METTL16 was shown to directly methylate the
MAT2A hairpin
in vitro.
METTL16 depletion reduced m
6A content at more than two thousand sites in poly(A)
+ mRNA, most which were in presumably retained introns. Some of these changes are likely to be an indirect consequence of altered SAM levels in METTL16 knockdown cells, downstream of perturbing
MAT2A splicing and expression. Indeed, knockdown or overexpression of MAT2A modulated bulk m
6A levels in mRNA similarly, and direct METTL16 binding was not detected at several tested candidate
METTL16 m
6A sites. Regardless, METTL16 may have additional methylation targets in pre-mRNA and one study reported METTL16 binding primarily to introns by CLIP-seq [
32]. Interestingly, METTL16 acts as both a writer and reader of m
6A in the case of MAT2A [
98–
100]. This METTL16-mediated feedback loop that controls SAM synthetase expression in response to SAM levels provides an example of a signal-induced regulation of m
6A deposition that in turn directly regulates pre-mRNA splicing and mRNA expression. Most studies to date have focused on basal levels of mRNA modifications and not regulated cases. This example should motivate studies to search for instances in which mRNA modification enzymes are regulated in response to extracellular signals to alter pre-mRNA processing.
YTHDC1 recruits SRSF3 to promote exon inclusion and antagonize SRSF10
YTHDC1 or YT521-B was the inaugural member of the YTH family of proteins and a known splicing factor [
98–
100]. It was first identified as a splicing factor interacting protein from yeast two-hybrid and co-immunoprecipitation experiments [
98,
101]. It interacts with numerous splicing factors including SRSF2, TRA2A, TRA2B, SRSF3, SRSF10, SAM68 and hnRNPG [
78,
98,
101]. The YTH domain of YTHDC1 was identified and characterized to bind single stranded RNA and found to have homologs across eukaryotes [
100]. Later, YTHDC1 was shown to bind methylated RNAs preferentially [
102]. Consistent with the view that YTHDC1 is an m
6A binding protein
in vivo, PAR-CLIP of YTHDC1 showed a significant overlap between its binding sites and known m
6A sites identified in HeLa cells, most of which were distributed in exons [
102]. YTHDC1 influences alternative splicing both indirectly, through interacting with other splicing factors independent of its YTH domain [
101,
103], and by direct interaction with pre-mRNA targets dependent on its YTH domain [
100]. Specifically, overexpression of YTHDC1 modulates splice site selection of alternative exons of pre-mRNAs and deletion of the YTH domain abrogates this effect.
YTHDC1 interacts with exons in pre-mRNA to recruit SRSF3 to methylated pre-mRNAs and promote exon inclusion. SRSF3 binding to total RNA was reduced following YTHDC1 or METTL3 knockdown suggesting its binding to RNA is sensitive to m
6A content. Binding of SRSF3 to a methylated oligo
in vitro was enhanced by addition of recombinant YTHDC1. From RNA-seq and PAR-CLIP data it was inferred that SRSF3 and YTDC1 bind and co-regulate alternative splicing events at least in part by antagonizing SRSF10 binding to internal cassette exons [
78]. A handful of cassette exons in three pre-mRNAs (SP4, ZNF638 and ALG11) were validated as co-regulated by METTL3, SRSF3 and YTHDC1, and antagonized by SRSF10 (Figure 2D). m
6A binding by YTHDC1 is required to promote exon inclusion since a YTH domain mutant does not rescue exon skipping that results from YTHDC1 depletion. Finally, a minigene of ZNF638 recapitulated endogenous co-regulation by YTHDC1, SRSF3 and SRSF10. Importantly, mutations to the predicted binding site of each protein on the alternative exon, including the RRACH motif, had the predicted effect of reducing exon skipping (YTHDC1 and SRSF3) or promoting exon inclusion (SRSF10).
A similar mechanism, involving cooperativity between YTHDC1, SRSF3 and antagonism by SRSF10, was found to influence splicing of pre-mRNAs during lytic replication of Kaposi’s sarcoma-associated herpesvirus (KSHV) [
104]. Several stimuli that induce lytic replication in various cell lines increase m
6A content. FTO and METTL3 knockdown increased and decreased, respectively, lytic-induced m
6A levels of the viral pre-mRNA replication transcription activator (RTA), a key lytic protein. Chemical inhibition of m
6A formation with 3-deazaadenosine (DAA) inhibited lytic-induced splicing of the RTA intron, decreased RTA protein expression and attenuated lytic replication. m
6A is present in both the regulated intron and the downstream exon of RTA and mutation of three of the methylated RRACH motifs, two in the intron and one in the exon, in a minigene context inhibited intron splicing. UV crosslinking followed by immunoprecipitation and qRT-PCR of wildtype RTA pre-mRNA compared to m
6A-deficient mutant RTA demonstrated that lytic-induced m
6A addition and interactions with YTHDC1 and SRSF3 were abolished in each of the m
6A mutants while SRSF10 interaction was enhanced. Thus, YTHDC1 and SRSF3 have been implicated as positive factors in both host and viral splicing and may represent a common mode of m
6A-dependent splicing regulation.
YT521-B, Drosophila YTHDC1, promotes female-specific splicing of Sex-lethal
YT521-B is the
Drosophila ortholog of human YTHDC1, and it localizes exclusively to the nucleus [
101]. mRNA m
6A methylation is widespread in
Drosophila and
Drosophila has orthologs of all identified members of the human
N6-methyladenosine methyltransferase complex: inducer of meiosis (Ime4, homologous to METTL3), karyogamy protein (KAR4, METTL14), female-lethal (fl(2)d, WTAP), Virilizer (Vir, KIAA1429) and Spenito (Nito, RBM15/15B) [
30,
105,
106]. Fl(2)d, Vir, and Spenito were previously characterized splicing factors known to be required for splicing-mediated sex determination in
Drosophila [
107–
113]. Specifically, these proteins promote female-specific splicing of
Sex-lethal, the master regulator of sex determination in
Drosophila [
107,
111]. The male-specific isoform of Sex-lethal (Sxl) results from inclusion of a cassette exon that introduces a premature termination codon and thereby decreases
Sex-lethal expression. In females, binding of
Sex-lethal itself to the introns flanking the male cassette exon represses exon inclusion in a feedback loop.
The sex-specific phenotypes of mutants affecting the methyltransferase complex implicated m
6A in splicing regulation of
Sxl.
Ime4 (
METTL3) mutant female flies are flightless, display male specific features and develop ovarian tumors suggesting a link between the putative methyltransferase and sex determination [
30,
105].
Ime4 and
Sxl interact genetically leading to reduced survival of female progeny and
YT521-B mutant flies phenocopy these methyltransferase complex mutants [
30,
105,
106].
Ime4 mutants showed reduced overall levels of m
6A as expected, and displayed a concomitant shift toward the male-specific, exon-included isoform of Sxl [
105,
106]. Similarly,
Kar4 and
YT521-B mutants promoted exon inclusion [
105,
106].
The effect of the methyltransferase mutants on
Sxl splicing is likely to be direct. The introns flanking the
Sxl cassette exons were found to contain m
6A near the Sxl binding sites [
30,
105]. The m
6A binding protein YT521-B binds to the m
6A containing introns of
Sex lethal and overexpression of YT521-B is sufficient to repress inclusion of the
Sxl male-specific exon (Figure 2D) [
30,
105]. Furthermore, m
6A was found to be widespread in
Drosophila based on m
6A-seq and miCLIP experiments that revealed a distribution similar to human cells and enrichments of the RRACH motif at m
6A peaks [
105,
106]. Depletion of
Ime4 and concomitant loss of m
6A led to changes in alternative splicing of additional genes, the majority of which were co-regulated by YT521-B [
30,
106]. Thus it is likely that YT521-B regulates m
6A-dependent alternative splicing of additional genes in
Drosophila.
hnRNPC and hnRNPG binding to m6A-sensitive structures influence splicing
Heterogeneous nuclear ribonucleoprotein C (hnRNP C) is an abundant nuclear protein member of the hnRNP family of splicing factors that binds to polyuridine sequences primarily in introns to influence pre-mRNA splicing. One mechanism by which hnRNP C represses exon inclusion is by competing with the core splicing factor U2AF65, which binds to the uridine-rich polypyrimidine tract and promotes 3′ splice site recognition [
114]. hnRNP C was identified as a protein that preferentially interacts with a hairpin in the abundant nuclear resident ncRNA
MALAT1 in an m
6A-dependent manner [
46]. Structure probing of this
MALAT1 hairpin revealed that the methylated adenosine is present in the stem opposite a poly(U) stretch that serves as the binding site for hnRNP C (Figure 2D). RNA duplex destabilization by m
6A was proposed to act as a structural switch: when an adenosine near a poly(U) element is unmethylated, it forms a stable base pair with the hnRNP C binding motif, sequestering the site in a stem and precluding hnRNP C binding. Methylation of the adenosine destabilizes the stem and allows hnRNP C binding. This finding prompted the search for m
6A-switches transcriptome wide. Extending the previously reported enrichment of hnRNP C binding in introns [
114], PAR-CLIP of hnRNP C followed by m
6A-sequencing of bound RNA revealed that hnRNP C binds primarily to intronic m
6A sites that resemble the
MALAT1 m
6A switches [
46]. Approximately two thousand hnRNP C-bound m
6A sites were no longer bound by hnRNP C following METTL3 and METTL14 knockdown, suggesting this is a widespread mechanism. Finally, a subset of alternatively spliced exons were shown to have neighboring intronic m
6A switches that were bound by hnRNP C and co-regulated by hnRNP C, METTL3 and METTL14 (Figure 2D) [
46].
Similarly, heterogeneous nuclear ribonucleoprotein G (hnRNP G) was also identified as a protein that preferentially binds m
6A-modified MALAT1 [
115]. As in the case of hnRNP C, m
6A methylation increases the accessibility of the hnRNP G binding site. Interestingly, binding of hnRNP G to RNA in this context appeared to be mediated by a low complexity domain and not its canonical RNA recognition motif (RRM). As with hnRNP C, transcriptome-wide analysis revealed widespread m
6A sites that regulate hnRNP G binding, and hnRNP G co-regulated numerous alternative splicing events with METTL3 and METTL14. Overall, these findings suggest that m
6A may have widespread indirect effects on splicing factor binding by altering pre-mRNA structure and consequently binding site accessibility. One additional study proposed that another hnRNP, hnRNP A2/B1, binds m
6A directly. They identified an RGCA motif similar to the RRACH METTL3 consensus sequence as enriched in hnRNP A2/B1 binding sites. hnRNP A2/B1 associated with m
6A containing nuclear RNA with some overlap between specific hnRNP A2/B1 binding events and m
6A sites [
79]. A more recent study showed by gel shift and isothermal titration calorimetry that hnRNP A2/B1 does not have increased affinity for m
6A-containing RNA
in vitro [
116]. Furthermore, their crystal structure of the hnRNP A2/B1 RRMs in complex with RNA did not reveal the aromatic cage-like surface that is required to directly contact m
6A in the m
6A “readers” YTHDC1 and YTHDF1 [
116]. Therefore, hnRNP A2/B1 binding is likely indirectly affected by m
6A’s ability to open local structure.
SRSF2 and FTO co-regulate alternative splicing
One study reported that SRSF2 binding at or around m
6A sites increased upon FTO knockdown [
47]. A panel of target genes that included the RUNX1T1 adipogenesis factor showed an increase in exon inclusion following FTO knockdown, which was correlated with increased m
6A content and increased SRSF2 binding. FTO knockdown resulted in a decrease in the short, exon-skipped isoform of RUNX1T1, that was proposed to lead to increased adipogenesis based on isoform-specific overexpression studies. Further analysis is necessary to determine if these alternative splicing events are indeed m
6A dependent and not indirectly influenced by FTO depletion.
Potential “readers” of other modifications relevant to splicing
Whether mRNA modifications other than m
6A are present in pre-mRNA and regulate pre-mRNA splicing is currently unknown. However, other modifications have already been shown to impact protein binding in endogenous or artificial contexts. For instance, m
5C has one nuclear reader, ALYREF, that was shown to facilitate mRNA export from the nucleus [
40]. ALYREF has not been implicated in splicing, but establishes precedent for other m
5C specific nuclear readers. Although no nuclear reader has been identified for Nm in mRNA, the presence of Nm has likewise been shown to affect protein interactions with mRNA including RIG-I the immune sensor demonstrating the capacity of Nm to influence protein binding [
117]. Because this modification is incompatible with Watson-Crick base pairing, m
1A has the potential to indirectly affect splicing factor binding via changes in RNA secondary structure.
Pseudouridine has been shown to affect multiple RNA-protein interactions including well-characterized splicing factors. Artificial pseudouridylation of a single uridine at either of two positions in the polypyrimidine tract of an adenoviral pre-mRNA was sufficient to abolish
in vitro binding of the core-splicing factor U2AF65 and inhibit splicing of the pseudouridylated intron [
83]. This striking effect of pseudouridine was attributed to its rigidifying of the polypyrimidine tract since locked nucleic acid substitutions at the same positions, which constrained the flexibility of the RNA backbone, led to similar inhibition of U2AF65 binding and defective splicing [
83]. Likewise, binding of the splicing factor MBNL1 was significantly affected when pseudouridines were artificially added to its recognition motif in pre-mRNA [
118]. In another example, pseudouridine indirectly influenced PRP5 binding to the U2 snRNA by stabilizing a structure, the branchpoint-interacting stem loop [
67], a mechanism that could also happen in mRNA. Additionally, binding of the cytoplasmic RNA-binding protein Pumilio 2 to its UGUAR binding motif was modestly decreased when the second uridine was pseudouridine [
119]. UGΨAR is relatively abundant among endogenously pseudouridylated mRNAs due to pseudouridylation by PUS7 in the same context [
3]. Together, these examples strongly suggest that if pseudouridine, m
5C or Nm are identified in the core splicing recognition signals — or in splicing regulatory sequences — they could significantly impact splicing factor binding and splicing outcome.
CONCLUSIONS
Most mRNA modifying enzymes discovered to date localize to the nucleus where they may encounter nascent pre-mRNA. Diverse pre-mRNA modifications are therefore likely to add an additional layer of regulation to pre-mRNA processing as has been demonstrated from studies on m6A. In terms of splicing regulation, pre-mRNA modifications may influence which cis-regulatory elements are bound by a given trans-acting splicing factor and thereby mediate distinct regulation of a subset of targets to allow for fine tuning of splicing regulation against a backdrop of global regulation of splicing factor levels or activity. In this manner pre-mRNA targets of multiple splicing factors could be coordinately regulated and functionally linked in different cell types or under different growth conditions where the activity and/or the balance of RNA modifying enzymes may be different.
Our understanding of the function of pre-mRNA modifications in splicing is currently limited to examples of m
6A-mediated splicing regulation. Whether other modifications are added to nascent pre-mRNA and, if so, whether and how they function in splicing regulation is mostly unknown. Moving forward, efforts to determine if and when a particular RNA modification is added to nascent pre-mRNA using approaches such as sequencing of chromatin-associated RNA [
23,
24,
51–
53] or metabolic labeling of newly transcribed RNA [
49,
120–
122] will be a critical step towards illuminating the potential functions of additional modifications in pre-mRNA processing. In addition, the development of new methods to site-specifically manipulate RNA modifications including m
6A would significantly advance the field by making it possible to distinguish direct versus indirect effects of global perturbations of RNA modifying enzymes on pre-mRNA splicing and nuclear processing. Given the demonstrated ability of modified nucleotides to affect diverse RNA-protein interactions, future studies to identify time and mechanism of addition of RNA modifications to pre-mRNA and to determine the function of individual modifications are likely to reveal new mechanisms by which nuclear pre-mRNA processing is controlled.
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature