INTRODUCTION
RNA-binding proteins (RBPs) play important roles in the post-transcriptional regulation of RNAs [
1]. For example, RBP Nova regulates alternative splicing and poly(A) site usage [
2]. It is essential to study these regulatory roles of RBPs because the abnormal expression of RBP-encoding genes is associated with human diseases [
3–
6]. Recently, cross-linking and immunoprecipitation followed by sequencing (CLIP-seq) have been widely used to study RNA-protein interaction at high resolution [
2,
7–
10].
Correspondingly, a large number of computational approaches have been developed for analyzing CLIP-seq data [
11–
20], which can be classified into two major categories. The first class of computational tools are designed to detect RBP binding peaks or differential binding between two conditions [
12,
15,
16,
21]. The second category of computational tools are developed to identify the RBPs’ functional targets based on the binding peaks [
22–
24]. These computational tools help elucidate not only the biological functions of RBPs but also the underlying mechanisms.
Even though recent studies tried to dissect the functional roles of hundreds of human RBPs by associating their binding features with different functional assays, few studies have systematically examined the relationship between the position-dependent binding of RBPs and their functions. The availability of a large-scale collection of eCLIP data in the integrated ENCyclopedia of DNA Elements in the human genome (ENCODE) makes it possible to systematically analyze the association between RBPs’ positional binding features and their functional roles.
In this paper, we study the position-dependent functional bindings of RBPs by applying a new approach named Protein-RNA Association Strength (PRAS) [
22] to the eCLIP data of RNA decay regulators in K562 and HepG2 cells. We demonstrate that the translation termination site on the transcript is essential for RBP binding to achieve their RNA decay related functions. By comparing the RBP functional targets in the two aforementioned cell types, we report the cell type-specific RBP regulation. These analyses results shed light on the study of the mechanisms of RBPs’ functions.
RESULTS
Binding to the translation termination site is important to most human RBPs in the RNA decay regulation
We first aimed to study the role of the binding positions of RBPs in achieving their biological functions. Recently, a computational method named PRAS was developed for predicting functional targets of RBPs, which integrates the intensities and positions of the binding peaks of RBPs in the prediction. PRAS requires a reference site on a transcript as input, which is regarded as the most important binding site for the RBP regulation and should be chosen from 4 candidates: transcription start site (TSS), translation initiation site (TIS), translation termination site (TTS), and transcription end site (TES). By comparing the performance of the four reference sites in PRAS, we were able to evaluate the importance of the reference sites for the RBP to achieve its function. The study design, data structure and analysis procedures are summarized in Fig. 1. Specifically, we analyzed the eCLIP data [
25] in two human cell types, K562 and HepG2, from the ENCODE Consortium. Following the similar procedure as in our previous study [
22], we applied PRAS to the RBPs that are related to the RNA decay function [
26]. To quantify the RBP’s function in RNA decay, we collected the differential expression (DE) analysis results by DESeq [
27] between the wild-type and RBP knockdown (KD) conditions from ENCODE for the corresponding RBPs in the two aforementioned cell types. We extracted the DE genes (adjusted
p-value<= 0.05) and their log fold change (LFC) of the expression values in the RBP KD condition over wild-type. The genes with positive or negative LFCs were regarded as degraded and stabilized targets of the corresponding RBP, respectively. We then applied PRAS to the eCLIP data of these target genes using the 4 reference site candidates, respectively. After obtaining the PRAS scores, we did the ROC analysis between the RBP-stabilized and RBP-degraded genes for each RBP. To set a comparison with other existing methods, we included the performance of the two other methods mentioned in our previous study [
22]: peak per kilobase (PPK) in Wang
et al. [
24], which calculated the score as the number of significant CLIP peaks per kilobase; and the expressRNA procedure of Rot
et al. [
23], which sums the number of reads in CLIP peaks within 200 nt upstream and downstream flanking the polyadenylation sites. We noted the PRAS with TSS, TIS, TTS and TES as the reference site as PRAS-TSS, PRAS-TIS, PRAS-TTS, and PRAS-TES, respectively. For each RBP, we assessed the performance of the methods by calculating the area under the curve (AUC) of the ROC curves. For example, PRAS-TTS obtained the highest AUC (0.702) among the compared methods in differentiating the stabilized and degraded targets of AU RNA binding methylglutaconyl-CoA hydratase (AUH) (Fig. 2A), a human RBP that is related to the mRNA decay via binding to the AU-rich elements (AREs). We found that the performance of PRAS on the prediction of AUH regulated targets varies a lot with different reference sites (Fig. 2A), among which PRAS-TSS (AUC= 0.552) and PRAS-TIS (AUC= 0.57) have much worse performance than PRAS-TTS. These results not only indicate that PRAS can capture the functional targets with the appropriate reference site but also suggest the translation termination site is important for the function of AUH in regulating RNA decay.
By applying the similar analysis to other known RNA decay regulators, we found PRAS-TTS scores achieved significantly higher AUC compared to PRAS-TSS, PRAS-TIS, and PRAS-TES, with the Wilcoxon signed-rank test’s
p-value equal to 7.5e–7, 3.8e–3, and 1.1e–3, respectively. However, the largest value of AUC was not always achieved by using TTS as the reference site (Fig. 2B), which suggests that the RBPs achieved their functions by using different reference sites. By assigning the reference site with the largest AUC value among the PRAS scores to each RBP, we found nearly half of the RBPs (44%) belongs to the class with TTS as the reference site (Fig. 3A). In addition, we examined the reference site usages for the stabilization and degradation functions of RBPs. Following the procedure of Wagnon
et al. [
28], we further separated the RBPs to two groups: RBPs with stronger change in expression for the stabilized targets than degraded targets are regarded as “stabilization” group, and RBPs with stronger change in expression for the degraded targets than stabilized ones are regarded as the “degradation” group. By comparing the reference site usage in these two groups, we found that the “stabilization” group prefer TES and TTS as the reference site, while the “degradation” group prefer TTS and TIS (Fig. 3B). On one hand, the common usage of the TTS indicates that this position is important to most human RBPs in the RNA decay regulation. On the other hand, the difference between the “stabilization” and “degradation” group of RBPs in their choice of reference site indicates the two directions of regulation is associated with different positions on the transcript. To further investigate the binding around the reference sites for the two regulation groups, we performed the meta-gene analysis on the eCLIP data of the RBPs. We calculated the eCLIP enrichment as
as suggested by Van Nostrand
et al. [
26], where
is the IP read count and
is the input read count from the eCLIP experiments [
26]. In this way, a positive and a negative peak intensity represents the enriched and the depleted eCLIP signal compared to the input experiment, respectively. We found that both the “stabilization” and “degradation” RBPs have an enrichment of binding close to the TTS compared to the other reference sites (Fig. 3C). However, the “degradation” RBPs achieve the strongest binding at the TTS while the “stabilization” RBPs’ binding peak is around 200nt downstream to the TTS. Even though RBPs in the “stabilization” group prefer TES as the reference site, the binding intensities around TES is not comparable to those around TTS. In addition, the binding intensities of the “degradation” RBPs around TIS is much lower than those around TTS, regardless of the fact that TIS is the second choice as the reference site for “degradation” RBPs. All these results suggest that the TTS is essential for RBP’s binding to regulate the RNA fate.
We sought to test whether the pattern of the reference site usage holds in the ROC analysis between RBP-stabilized vs. RBP-degraded targets selected based on different rigorousness. Specifically, we first set a sequence of cutoffs as the quantiles (10%, 30%, and 50%) of the distribution of the absolute value of the expression LFCs. Then, for each cutoff, we extracted a subset of genes whose absolute expression LFC is larger or equal to the cutoff. Third, for each subset of potential RBP targets, we calculated the AUC in the ROC analysis of the PRAS scores with all four reference sites. Finally, we assigned the reference site with the largest AUC value to the corresponding RBP. We observed that the proportion of RBPs with TTS as the reference site is relatively consistent (~50%) across different expression LFC cutoff (Supplementary Fig. S1). The results suggest that TTS that is selected by PRAS for most RNA decay regulators is essential to the RBPs’ binding to achieve their functions.
We then evaluated the strength of the RBPs’ regulation by examining the absolute value of expression LFC in the RBP KD condition over wild-type of their top ranked targets. We found that the top 500 targets of the “TTS” group have significantly higher absolute value of expression LFC than those of the “TSS” group with the Kolmogorov–Smirnov test (KS-test) p-value less than 2.2e–16 (Fig. 4A). Surprisingly, we found that the “TES” group targets also have significantly higher absolute value of expression LFC than those of the “TSS” groups with the KS-test p-value less than 2.2e–16. Since TTS and TES are the boundaries of the 3’ UTR of RNA, these results indicate the essential association between the 3’UTR of transcripts and the regulation of their expression level by RBPs. We then selected four representative examples with the largest AUC value from each reference site group, respectively. By examining the difference in the absolute gene expression LFC between the top 500 and bottom 500 ranked target genes by their corresponding PRAS scores, we found that AUH in the TTS group obtained larger difference between the top and bottom ranked genes (Fig. 4D, KS test p-value equal to 0), compared to YBX3 in the TSS group (Fig. 4B, KS test p-value equal to 2.9e–6), FMR1 in the TIS group (Fig. 4C, KS test p-value equal to 7.1e–4) and PABPC4 in the TES group (Fig. 4E, KS test p-value equal to 2.9e–8). The two examples from the TTS and TES groups again show larger difference between top and bottom ranked targets than TSS and TIS groups, which confirms the importance of the 3’UTR for RBPs’ binding in RNA decay regulation.
Cell type specific regulation is a typical characteristic of RBP
We then examined the difference in the RBPs’ regulation between two cell types, K562 and HepG2. First, we found that the majority of the RBPs (62%) have different reference site usage in their RNA decay regulation (Supplementary Fig. S2). To study the RBP’s regulation difference in the target genes between two cell types, we analyzed the overlap of the top 500 ranked targets between the two cell types for each RBP. We found that the regulated group of genes are significantly overlapping with each other between K562 and HepG2 for all the 21 studied RBPs, with the adjusted Fisher’s Exact test
p-values plotted in Fig. 5A. This result indicates that the RBPs tend to target the similar group of genes in different cell types. However, by comparing the LFC of expression value in the KD condition between the top 500 targets in K562 and those in HepG2, we found large difference between the two cell types in the RNA decay regulation by the RBPs. Specifically, 13 out of the 21 studied RBPs have shown significant difference in the gene expression LFC between the top 500 ranked targets in K562 and those in HepG2 (Fig. 5B), which suggests that the stabilization and degradation function can vary across cell types even though the RBPs can target the similar group of genes. For example, Staphylococcal Nuclease and tudor Domain containing 1 (SND1) have significantly overlapping functional targets in the two cell types, with the Fisher’s Exact test
p-value equal to 4.9e-182 (Fig. 5A). We found that 336 out of the top 500 targets (67.2%) of SND1 in HepG2 obtained negative LFC value in gene expression between the SND1 KD and wild-type condition, which sugggests that the targets were stabilized by SND1 in the wild-type condition. However, SND1 degraded most of its targets (309 out of the top 500, 61.8%) in K562, in which the expression LFC is significantly different comparing to that in HepG2, with the Wilcoxon rank sum test
p-value equal to 1.57e-29 (Fig. 5C). Another RBP, Protein Quaking (QKI) that is associated with RNA stability [
29], showed opposite pattern to SND1 between the two cell types (Fig. 5B), where the top 500 ranked targets in HepG2 were significantly degraded comparing to those in K562, with the Wilcoxon rank sum test
p-value equal to 0 (Fig. 5D). The results of using the top 200 ranked targets provided the similar pattern as those of top 500 targets (Supplementary Fig. S3). These results suggest that the regulation of RBPs being studied follows a cell type specific pattern.
To further characterize the potential causes of the cell type specific RBP regulation, we examined the binding profiles in the two cell types for each RBP. We extracted the DE genes (adjusted p-value<= 0.05) of SND1 and divided them as the SND1-stabilized and SND1-degraded targets based on the direction of the gene expression LFC. We did the meta-gene analysis of the SND1 eCLIP enrichment (computed as above) for both stabilized and degraded targets in the two cell types (Fig. 6). We found that in HepG2, SND1-stabilized genes and SND1-degraded genes had different binding strength: while the SND1-stabilized genes were strongly enriched in coding sequence (CDS) region (Fig. 6A), the SND1-degraded genes were only weakly enriched in CDS (Fig. 6B). In addition, we found that SND1-stabilized targets showed distinctly stronger enrichment on CDS in HepG2 compared to that in K562 (Fig. 6A). However, the difference between the two cell types was subtle for SND1-degraded targets (Fig. 6B). These results of the example RBP, SND1, suggest that the different regulation between HepG2 and K562 by the same RBP is associated with the difference of the binding strength between the two cell types.
CONCLUSIONS AND DISCUSSIONS
In this study, through evaluating the performance of PRAS with different reference sites, we demonstrated that TTS, followed by TES and TIS, plays relatively more important roles for RNA decay regulation of RBPs. The comparison in RBP regulation between K562 and HepG2 cells informed us that the cell type-specific binding strength and functions are typical features for RBPs. These findings not only extended our knowledge in the regulation of RBPs but also made it possible to investigate either a single RBP or multiple RBPs’ binding patterns. Our study also threw light on the RBP regulatory networks in the future by showing that the position-specific binding of RBPs are closely related to their functions.
The fact that many RBPs bind to the 3’UTR to play an important role in regulating RNA stability has been reported in a wide range of studies. For example, a KH domain-containing ARE-binding protein (KSRP), is demonstrated to be an essential factor for mRNA decay via the 3’UTR [
30]. The underlying mechanisms were disclosed as that the ability of KSRP to promote mRNA decay correlates with the ability of binding to the ARE [
30]. Another RBP named HuR has been reported to bind mainly to the 3’UTR and to regulate the degradation of thousands of target genes [
31]. It was found that the KD of HuR triggers the upregulation of miR-7 that is a microRNA promoting mRNA decay [
31]. It is demonstrated in another study that UPF1 can sense the length of 3’UTR and accelerate the decay of mRNAs with relatively longer 3’UTRs [
32]. It was also shown that the translational readthrough around the translation termination site affects the binding of UPF1, which results in the inhibition of mRNA decay [
32]. Our large-scale analysis not only confirms the importance of 3’UTR for RBPs’ binding and regulation of mRNA decay, but also provides a refined list of candidates for further experimental validation.
The RBP binding and regulation differences between K562 and HepG2 suggest the cell type-specific mechanisms of mRNA stability regulation. Admittedly, other potential factors should also be taken into consideration. For instance, the RBPs’ regulatory activity and their targets could be modulated through cell type-specific gene expression [
26]. The difference of gene expression level across cell types can result in the variation of IP efficiency as well [
26]. In addition, the RBP knockdown efficiency may also differ, which makes the differential gene expression analysis less comparable across cell lines. However, by connecting the cell type-specific function with cell type-specific binding preferences, our analysis provides a refined list of RBP candidates for further experimental validation to understand the mechanisms.
MATERIALS AND METHODS
Data collection
We collected the eCLIP data and differential expression analysis results from the ENCODE Consortium [
33,
34]. We included the accession numbers in the Supplementary Materials 1 and Supplementary Materials 2 for eCLIP data and RNA-seq data, respectively.
PRAS
PRAS is a newly developed tool for eCLIP-seq analysis, which integrates the intensities and positions of the binding peaks of RBPs for functional mRNA targets prediction [
22]. Specifically, PRAS has three major steps to perform the functional targets prediction. First, PRAS merges the significant cross-linking sites that are within a small interval of each other (20 nt is the default setting). If the binding peaks are provided, PRAS will use them directly. Second, PRAS assign a reference site for the RBP score calculation. The reference site can be provided by the user is there is existing knowledge about the RBP; or it is automatically calculated by PRAS based on the binding intensities of the peaks. Third, PRAS scores the genes based on the intensities and distances to the reference site of the peaks. The gene list ranked by the PRAS score will be generated.
RBPs’ functional targets prediction
We applied PRAS to predict the functional targets of RBPs. Four reference sites were used: TSS, TIS, TTS, and TES. We set
as suggested by Lin
et al. [
22]. We only considered mature mRNAs but not the pre-mRNAs throughout the analysis.
Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature