ShapeShifter: a novel approach for identifying and quantifying stable lariat intronic species in RNAseq data
Allison J Taggart, William G Fairbrother
ShapeShifter: a novel approach for identifying and quantifying stable lariat intronic species in RNAseq data
Background: Most intronic lariats are rapidly turned over after splicing. However, new research suggests that some introns may have additional post-splicing functions. Current bioinformatics methods used to identify lariats require a sequencing read that traverses the lariat branchpoint. This method provides precise branchpoint sequence and position information, but is limited in its ability to quantify abundance of stabilized lariat species in a given RNAseq sample. Bioinformatic tools are needed to better address these emerging biological questions.
Methods: We used an unsupervised machine learning approach on sequencing reads from publicly available ENCODE data to learn to identify and quantify lariats based on RNAseq read coverage shape.
Results: We developed ShapeShifter, a novel approach for identifying and quantifying stable lariat species in RNAseq datasets. We learned a characteristic “lariat” curve from ENCODE RNAseq data and were able to estimate abundances for introns based on read coverage. Using this method we discovered new stable introns in these samples that were not represented using the older, branchpoint-traversing read method.
Conclusions: ShapeShifter provides a robust approach towards detecting and quantifying stable lariat species.
[1] |
Nam, K., Lee, G., Trambley, J., Devine, S. E. and Boeke, J. D. (1997) Severe growth defect in a Schizosaccharomyces pombe mutant defective in intron lariat degradation. Mol. Cell. Biol., 17, 809–818.
CrossRef
Pubmed
Google scholar
|
[2] |
Kim, J. W., Kim, H. C., Kim, G. M., Yang, J. M., Boeke, J. D. and Nam, K. (2000) Human RNA lariat debranching enzyme cDNA complements the phenotypes of Saccharomyces cerevisiae dbr1 and Schizosaccharomyces pombe dbr1 mutants. Nucleic Acids Res., 28, 3666–3673.
CrossRef
Pubmed
Google scholar
|
[3] |
Hubé, F. and Francastel, C. (2015) Mammalian introns: when the junk generates molecular diversity. Int. J. Mol. Sci., 16, 4429–4452.
CrossRef
Pubmed
Google scholar
|
[4] |
Qian, L., Vu, M. N., Carter, M. and Wilkinson, M. F. (1992) A spliced intron accumulates as a lariat in the nucleus of T cells. Nucleic Acids Res., 20, 5345–5350.
CrossRef
Pubmed
Google scholar
|
[5] |
Michaeli, T., Pan, Z. Q. and Prives, C. (1988) An excised SV40 intron accumulates and is stable in Xenopus laevis oocytes. Genes Dev., 2, 1012–1020.
CrossRef
Pubmed
Google scholar
|
[6] |
Farrell, M. J., Dobson, A. T. and Feldman, L. T. (1991) Herpes simplex virus latency-associated transcript is a stable intron. Proc. Natl. Acad. Sci. USA, 88, 790–794.
CrossRef
Pubmed
Google scholar
|
[7] |
Zabolotny, J. M., Krummenacher, C. and Fraser, N. W. (1997) The herpes simplex virus type 1 2.0-kilobase latency-associated transcript is a stable intron which branches at a guanosine. J. Virol., 71, 4199–4208
Pubmed
|
[8] |
Kulesza, C. A. and Shenk, T. (2004) Human cytomegalovirus 5-kilobase immediate-early RNA is a stable intron. J. Virol., 78, 13182–13189.
CrossRef
Pubmed
Google scholar
|
[9] |
Kulesza, C. A. and Shenk, T. (2006) Murine cytomegalovirus encodes a stable intron that facilitates persistent replication in the mouse. Proc. Natl. Acad. Sci. USA, 103, 18302–18307.
CrossRef
Pubmed
Google scholar
|
[10] |
Schwarz, T. M. and Kulesza, C. A. (2014) Stability determinants of murine cytomegalovirus long noncoding RNA7.2. J. Virol., 88, 11630–11633.
CrossRef
Pubmed
Google scholar
|
[11] |
Zheng, S., Vuong, B. Q., Vaidyanathan, B., Lin, J. Y., Huang, F. T. and Chaudhuri, J. (2015) Non-coding RNA generated following lariat debranching mediates targeting of AID to DNA. Cell, 161, 762–773.
CrossRef
Pubmed
Google scholar
|
[12] |
Zhang, Y., Zhang, X. O., Chen, T., Xiang, J. F., Yin, Q. F., Xing, Y. H., Zhu, S., Yang, L. and Chen, L. L. (2013) Circular intronic long noncoding RNAs. Mol. Cell, 51, 792–806.
CrossRef
Pubmed
Google scholar
|
[13] |
Gardner, E. J., Nizami, Z. F., Talbot, C. C. Jr and Gall, J. G. (2012) Stable intronic sequence RNA (sisRNA), a new class of noncoding RNA from the oocyte nucleus of Xenopus tropicalis. Genes Dev., 26, 2550–2559.
CrossRef
Pubmed
Google scholar
|
[14] |
Talhouarne, G. J. and Gall, J. G. (2014) Lariat intronic RNAs in the cytoplasm of Xenopus tropicalis oocytes. RNA, 20, 1476–1487.
CrossRef
Pubmed
Google scholar
|
[15] |
Domdey, H., Apostol, B., Lin, R. J., Newman, A., Brody, E. and Abelson, J. (1984) Lariat structures are in vivo intermediates in yeast pre-mRNA splicing. Cell, 39, 611–621.
CrossRef
Pubmed
Google scholar
|
[16] |
Rodriguez, J. R., Pikielny, C. W. and Rosbash, M. (1984) In vivo characterization of yeast mRNA processing intermediates. Cell, 39, 603–610.
CrossRef
Pubmed
Google scholar
|
[17] |
Zeitlin, S. and Efstratiadis, A. (1984) In vivo splicing products of the rabbit β-globin pre-mRNA. Cell, 39, 589–602.PMID:6096012
CrossRef
Google scholar
|
[18] |
Padgett, R. A., Konarska, M. M., Grabowski, P. J., Hardy, S. F. and Sharp, P. A. (1984) Lariat RNA’s as intermediates and products in the splicing of messenger RNA precursors. Science, 225, 898–903.
CrossRef
Pubmed
Google scholar
|
[19] |
Ruskin, B., Krainer, A. R., Maniatis, T. and Green, M. R. (1984) Excision of an intact intron as a novel lariat structure during pre-mRNA splicing in vitro. Cell, 38, 317–331.
CrossRef
Pubmed
Google scholar
|
[20] |
Gao, K., Masuda, A., Matsuura, T. and Ohno, K. (2008) Human branch point consensus sequence is yUnAy. Nucleic Acids Res., 36, 2257–2267.
CrossRef
Pubmed
Google scholar
|
[21] |
Vogel, J., Hess, W. R. and Börner, T. (1997) Precise branch point mapping and quantification of splicing intermediates. Nucleic Acids Res., 25, 2030–2031.
CrossRef
Pubmed
Google scholar
|
[22] |
Taggart, A. J., DeSimone, A. M., Shih, J. S., Filloux, M. E. and Fairbrother, W. G. (2012) Large-scale mapping of branchpoints in human pre-mRNA transcripts in vivo. Nat. Struct. Mol. Biol., 19, 719–721.
CrossRef
Pubmed
Google scholar
|
[23] |
Taggart, A. J., Lin, C. L., Shrestha, B., Heintzelman, C., Kim, S. and Fairbrother, W. G. (2017) Large-scale analysis of branchpoint usage across species and cell lines. Genome Res., 27, 639–649.
CrossRef
Pubmed
Google scholar
|
[24] |
Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M. and Gingeras, T. R. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29, 15–21.
CrossRef
Pubmed
Google scholar
|
[25] |
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R., and the 1000 Genome Project Data Processing Subgroup. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079.
CrossRef
Pubmed
Google scholar
|
[26] |
Quinlan, A. R. and Hall, I. M. (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26, 841–842.
CrossRef
Pubmed
Google scholar
|
[27] |
Kent, W. J., Zweig, A. S., Barber, G., Hinrichs, A. S. and Karolchik, D. (2010) BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics, 26, 2204–2207.
CrossRef
Pubmed
Google scholar
|
[28] |
Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M. and Haussler, D. (2002) The human genome browser at UCSC. Genome Res., 12, 996–1006.
CrossRef
Pubmed
Google scholar
|
/
〈 | 〉 |