NPEST: a nonparametric method and a database for transcription start site prediction

Tatiana Tatarinova , Alona Kryshchenko , Martin Triska , Mehedi Hassan , Denis Murphy , Michael Neely , Alan Schumitzky

Quant. Biol. ›› 2013, Vol. 1 ›› Issue (4) : 261 -271.

PDF (572KB)
Quant. Biol. ›› 2013, Vol. 1 ›› Issue (4) : 261 -271. DOI: 10.1007/s40484-013-0022-2
RESEARCH ARTICLE
RESEARCH ARTICLE

NPEST: a nonparametric method and a database for transcription start site prediction

Author information +
History +
PDF (572KB)

Abstract

In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glacombio.net/NPEST.

Keywords

transcription start site (TSS) / nonparametric maximum likelihood

Cite this article

Download citation ▾
Tatiana Tatarinova, Alona Kryshchenko, Martin Triska, Mehedi Hassan, Denis Murphy, Michael Neely, Alan Schumitzky. NPEST: a nonparametric method and a database for transcription start site prediction. Quant. Biol., 2013, 1(4): 261-271 DOI:10.1007/s40484-013-0022-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Berendzen, K. W., Stüber, K., Harter, K. and Wanke, D. (2006) Cis-motifs upstream of the transcription and translation initiation sites are effectively revealed by their positional disequilibrium in eukaryote genomes using frequency distribution curves. BMC Bioinformatics, 7, 522

[2]

Pritsker, M., Liu, Y.-C., Beer, M. A. and Tavazoie, S. (2004) Whole-genome discovery of transcription factor binding sites by network-level conservation. Genome Res., 14, 99–108

[3]

Ohler, U., Liao, G. C., Niemann, H. and Rubin, G. M. (2002) Computational analysis of core promoters in the Drosophila genome. Genome Biol., 3, H0087

[4]

Ohler, U. (2006) Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction. Nucleic Acids Res., 34, 5943–5950

[5]

Suzuki, Y. and Sugano, S. (1997) Generation of the 5′ EST using 5′-end enriched cDNA library. Tanpakushitsu Kakusan Koso, 42, 2836–2843

[6]

Fickett, J. W. and Hatzigeorgiou, A. G. (1997) Eukaryotic promoter recognition. Genome Res., 7, 861–878

[7]

Down, T. A. and Hubbard, T. J. (2002) Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res., 12, 458–461

[8]

King, O. D. and Roth, F. P. (2003) A non-parametric model for transcription factor binding sites. Nucleic Acids Res., 31, e116

[9]

Abeel, T., Peer, Y. and Saeys, Y. (2009) Toward a gold standard for promoter prediction evaluation. Bioinformatics,25.

[10]

Gordon, L., Chervonenkis, A. Y., Gammerman, A. J., Shahmuradov, I. A. and Solovyev, V. V. (2003) Sequence alignment kernel for recognition of promoter regions. Bioinformatics, 19, 1964–1971

[11]

Shahmuradov, I. A., Solovyev, V. V. and Gammerman, A. J. (2005) Plant promoter prediction with confidence estimation. Nucleic Acids Res., 33, 1069–1076

[12]

Anwar,F., Baker, S., Jabid, T., Hasan,M., Shoyaib, M., Khan, H. and Walshe, R. (2008) Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach. BMC Bioinformatics, 9, 414

[13]

Troukhan, M., Tatarinova, T., Bouck, J., Flavell, R., and Alexandrov, N. (2009) Genome-wide discovery of cis-elements in promoter sequences using gene expression data. OMICS: A Journal of Integrative Biolog, 13

[14]

Joun, H., Lanske, B., Karperien, M., Qian, F., Defize, L. and Abou-Samra, A. (1997) Tissue-specific transcription start sites and alternative splicing of the parathyroid hormone (PTH)/PTH-related peptide (PTHrP) receptor gene: a new PTH/PTHrP receptor splice variant that lacks the signal peptide. Endocrinology, 138, 1742–1749

[15]

Tran, P., Leclerc, D., Chan, M., Pai, A., Hiou-Tim, F., Wu, Q., Goyette, P., Artigas, C., Milos, R. and Rozen, R. (2002) Multiple transcription start sites and alternative splicing in the methylenetetrahydrofolate reductase gene result in two enzyme isoforms. Mamm. Genome, 13, 483–492

[16]

Rach, E. A., Yuan, H.-Y., Majoros, W. H., Tomancak, P. and Ohler, U. (2009) Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome, Genome Biology, 10.

[17]

Lamesch, P., Berardini, T. Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., Muller, R., Dreher, K., Alexander, D. L., Garcia-Hernandez, M., (2012) The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res., 40, D1202–D1210 .

[18]

Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K. and Madden, T. L. (2009) BLAST+: architecture and applications. BMC Bioinformatics, 10, 421

[19]

Tatarinova, T., Neely, M., Bartroff, J., van Guilder, M., Yamada, W., Bayard, D., Jelliffe, R., Leary, R., Chubatiuk, A. and Schumitzky, A. (2013) Two general methods for population pharmacokinetic modeling: non-parametric adaptive grid and non-parametric Bayesian. J. Pharmacokinet Pharmacodyn, 40, 189–199

[20]

Mallet, A. (1986) A maximum likelihood estimation method for random coefficient regression models. Biometrika, 73, 645–656.

[21]

Schumitzky, A. (1991) Nonparametric EM algorithms for estimating prior distributions. Appl. Math. Comput., 45, 141–157.

[22]

Lindsay, B. (1983) The geometry of mixture likelihoods: a general theory. Ann. Stat., 11, 86–94.

[23]

MATLAB version 7.10.0,2010.

[24]

Tora, L. (2002) A unified nomenclature for TATA box binding protein (TBP)-associated factors (TAFs) involved in RNA polymerase II transcription. Genes Dev., 16, 673–675

[25]

Smale, S. T. (2001) Core promoters: active contributors to combinatorial gene regulation. Genes Dev., 15, 2503–2508

[26]

Lenhard, B., Sandelin, A. and Carninci, P. (2012) Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat. Rev. Genet., 13, 233–245

[27]

Shahmuradov, I. A., Gammerman, A. J., Hancock, J. M., Bramley, P. M. and Solovyev, V. V. (2003) PlantProm: a database of plant promoter sequences. Nucleic Acids Res., 31, 114–117

[28]

Yamamoto, Y. Y., Yoshitsugu, T., Sakurai, T., Seki, M., Shinozaki, K. and Obokata, J. (2009) Heterogeneity of Arabidopsis core promoters revealed by high-density TSS analysis. Plant J., 60, 350–362

[29]

Chodavarapu, R. K., Feng, S., Bernatavichute, Y. V., Chen, P. Y., Stroud, H., Yu, Y., Hetzel, J. A., Kuo, F., Kim, J., Cokus, S. J., (2010) Relationship between nucleosome positioning and DNA methylation. Nature, 466, 388–392

[30]

Triska, M., Grocutt, D., Southern, J., Murphy, D. J. and Tatarinova, T. (2013) cisExpress: motif detection in DNA sequences. Bioinformatics, 29, 2203–2205

[31]

Tatarinova, T., Elhaik, E. and Pellegrini, M. (2013) Cross-species analysis of genic GC3 content and DNA methylation patterns. Genome Biol Evol, 5, 1443–1456

[32]

Alexandrov, N. N., Troukhan, M. E., Brover, V. V., Tatarinova, T., Flavell, R. B. and Feldmann, K. A. (2006) Features of Arabidopsis genes and genome discovered using full-length cDNAs. Plant Mol. Biol., 60, 69–85

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap
PDF (572KB)

2438

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/