Observations on potential novel transcripts from RNA-Seq data
Chao YE, Linxi LIU, Xi WANG, Xuegong ZHANG
Observations on potential novel transcripts from RNA-Seq data
With the rapid development of next generation deep sequencing technologies, sequencing cDNA reverse-transcribed from RNA molecules (RNA-Seq) has become a key approach in studying gene expression and transcriptomes. Because RNA-Seq does not rely on annotation of known genes, it provides the opportunity of discovering transcripts that have not been annotated in current databases. Studying the distribution of RNA-Seq signals and a systematic view on the potential new transcripts revealed from the signals is an important step toward the understanding of transcriptomes.
RNA-Seq / novel transcripts / next generation sequencing / bioinformatics
[1] |
Mercer T R, Dinger M E, Mattick J S. Long non-coding RNAs: insights into functions. Nature Reviews Genetics, 2009, 10(3): 155-159
CrossRef
Google scholar
|
[2] |
van Bakel H, Hughes T R. Establishing legitimacy and function in the new transcriptome. Briefings in Functional Genomics & Proteomics, 2009, 8(6): 424-436
CrossRef
Google scholar
|
[3] |
Schena M, Shalon D, Davis R W, Brown P O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 1995, 270(5235): 467-470
CrossRef
Google scholar
|
[4] |
Shendure J, Ji H. Next-generation DNA sequencing. Nature Biotechnology, 2008, 26(10): 1135-1145
CrossRef
Google scholar
|
[5] |
Metzker M L. Sequencing technologies — the next generation. Nature Reviews Genetics, 2010, 11(1): 31-46
CrossRef
Google scholar
|
[6] |
Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 2009, 10(1): 57-63
CrossRef
Google scholar
|
[7] |
Cock P J, Fields C J, Goto N, Heier M L, Rice P M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 2010, 38(6): 1767-1771
CrossRef
Google scholar
|
[8] |
Mortazavi A, Williams B A, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods, 2008, 5(7): 621-628
CrossRef
Google scholar
|
[9] |
Marioni J C, Mason C E, Mane S M, Stephens M, Gilad Y. RNA-Seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Research, 2008, 18(9): 1509-1517
CrossRef
Google scholar
|
[10] |
Friedlaender M R, Chen W, Adamidi C, Maaskola J, Einspanier R, Knespel S, Rajewsky N. Discovering micro-RNAs from deep sequencing data using miRDeep. Nature Biotechnology, 2008, 26(4): 407-415
CrossRef
Google scholar
|
[11] |
Pan Q, Shai O, Lee L J, Frey B J, Blencowe B J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics, 2008, 40(12): 1413-1415
CrossRef
Google scholar
|
[12] |
Wang E T, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore S F, Schroth G P, Burge C B. Alternative isoform regulation in human tissue transcriptomes. Nature, 2008, 456(7221): 470-476
CrossRef
Google scholar
|
[13] |
Jiang H,Wong WH. Statistical inferences for Isoform expression in RNA-Seq. Bioinformatics, 2009, 25(8): 1026-1032
CrossRef
Google scholar
|
[14] |
Homer N, Merriman B, Nelson S F. BFAST: an alignment tool for large scale genome resequencing. PLoS One, 2009, 4(11): e7767
CrossRef
Google scholar
|
[15] |
Langmead B, Trapnel C, Pop M, Salzberg S L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 2009, 10(3): R25
CrossRef
Google scholar
|
[16] |
Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 2008, 18(11): 1851-1858
CrossRef
Google scholar
|
[17] |
Trapnell C, Pachter L, Salzberg S L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 2009, 25(9): 1105-1111
CrossRef
Google scholar
|
[18] |
Au K F, Jiang H, Lin L, Xing Y, Wong W H. Detection of splice junctions from paired-end RNA-Seq data by SpliceMap. Nucleic Acids Research, 2010, 38(14): 4570-4578
CrossRef
Google scholar
|
[19] |
Wang K, Singh D, Zeng Z, Coleman S J, Huang Y, Savich G L, He X, Mieczkowski P, Grimm S A, Perou C M, MacLeod J N, Chiang D Y, Prins J F, Liu J. MapSplice: accurate mapping of RNA-Seq reads for splice junction discovery. Nucleic Acids Research, 2010, 38(18): e178
CrossRef
Google scholar
|
[20] |
Trapnell C, Salzberg S L. How to map billions of short reads onto genomes. Nature Biotechnology, 2009, 27(5): 455-457
CrossRef
Google scholar
|
[21] |
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/MAP format and SAMtools. Bioinformatics, 2009, 25(16): 2078-2079
CrossRef
Google scholar
|
[22] |
Pruitt K D, Tatusova T, Maglott D R. NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research, 2005, 33(suppl 1): D501-D504
CrossRef
Google scholar
|
[23] |
Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I, Clamp M. The Ensemble genome database project. Nucleic Acids Research, 2002, 30(1): 38-41
CrossRef
Google scholar
|
[24] |
Harrow J, Denoeud F, Frankish A, Reymond A, Chen C K, Chrast J, Lagarde J, Gilbert J G R, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis S E, Guigo R. GENCODE: producing a reference annotation for ENCODE. Genome Biology, 2006, 7(Suppl 1): S4.1-S4.9
|
[25] |
Wang L K, Feng Z X, Wang X, Wang X W, Zhang X G. DEGseq: an R package for identifying differentially expressed genes from RNA-Seq data. Bioinformatics, 2010, 26 (1): 136-138
CrossRef
Google scholar
|
[26] |
Kent W J, Sugnet C W, Furey T S, Roskin K M, Pringle T H, Zahler A M, Haussler D. The human genome browser at UCSC. Genome Research, 2002, 12(6): 996-1006
|
[27] |
Robinson J T, Thorvaldsdóttir H, Winckler W, Guttman M, Lander E S, Getz G, Mesirov J P. Integrative genomics viewer. Nature Biotechnology, 2011, 29(1): 24-26
CrossRef
Google scholar
|
[28] |
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 1995, 57(1): 289-300
|
/
〈 | 〉 |