Determination of specificity influencing residues for key transcription factor families

Ronak Y. Patel, Christian Garde, Gary D. Stormo

PDF(1173 KB)
PDF(1173 KB)
Quant. Biol. ›› 2015, Vol. 3 ›› Issue (3) : 115-123. DOI: 10.1007/s40484-015-0045-y
RESEARCH ARTICLE
RESEARCH ARTICLE

Determination of specificity influencing residues for key transcription factor families

Author information +
History +

Abstract

Transcription factors (TFs) are major modulators of transcription and subsequent cellular processes. The binding of TFs to specific regulatory elements is governed by their specificity. Considering the gap between known TFs sequence and specificity, specificity prediction frameworks are highly desired. Key inputs to such frameworks are protein residues that modulate the specificity of TF under consideration. Simple measures like mutual information (MI) to delineate specificity influencing residues (SIRs) from alignment fail due to structural constraints imposed by the three-dimensional structure of protein. Structural restraints on the evolution of the amino-acid sequence lead to identification of false SIRs. In this manuscript we extended three methods (direct information, PSICOV and adjusted mutual information) that have been used to disentangle spurious indirect protein residue-residue contacts from direct contacts, to identify SIRs from joint alignments of amino-acids and specificity. We predicted SIRs for homeodomain (HD), helix-loop-helix, LacI and GntR families of TFs using these methods and compared to MI. Using various measures, we show that the performance of these three methods is comparable but better than MI. Implication of these methods in specificity prediction framework is discussed. The methods are implemented as an R package and available along with the alignments at http://stormo.wustl.edu/SpecPred.

Graphical abstract

Keywords

protein-DNA interactions / residue co-variance / motifs / co-evolution / feature selection / direct information / specificity determinants

Cite this article

Download citation ▾
Ronak Y. Patel, Christian Garde, Gary D. Stormo. Determination of specificity influencing residues for key transcription factor families. Quant. Biol., 2015, 3(3): 115‒123 https://doi.org/10.1007/s40484-015-0045-y

References

[1]
Balwierz, P. J., Pachkov, M., Arnold, P., Gruber, A. J., Zavolan, M. and van Nimwegen, E. (2014) ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs. Genome Res., 24, 869−884
CrossRef Pubmed Google scholar
[2]
Khurana, E., Fu, Y., Colonna, V., Mu, X. J., Kang, H. M., Lappalainen, T., Sboner, A., Lochovsky, L., Chen, J., Harmanci, A., (2013) Integrative annotation of variants from 1092 humans: application to cancer genomics. Science, 342, 1235587
CrossRef Pubmed Google scholar
[3]
Wright, D. A., Li, T., Yang, B. and Spalding, M. H. (2014) TALEN-mediated genome editing: prospects and perspectives. Biochem. J., 462, 15−24
CrossRef Pubmed Google scholar
[4]
Mendenhall, E. M., Williamson, K. E., Reyon, D., Zou, J. Y., Ram, O., Joung, J. K. and Bernstein, B. E. (2013) Locus-specific editing of histone modifications at endogenous enhancers. Nat. Biotechnol., 31, 1133−1136
CrossRef Pubmed Google scholar
[5]
Lin, Y., Chomvong, K., Acosta-Sampson, L., Estrela, R., Galazka, J. M., Kim, S. R., Jin, Y. S. and Cate, J. H. (2014) Leveraging transcription factors to speed cellobiose fermentation by Saccharomyces cerevisiae. Biotechnol. Biofuels, 7, 126
Pubmed
[6]
Cheng, C., Alexander, R., Min, R., Leng, J., Yip, K. Y., Rozowsky, J., Yan, K. K., Dong, X., Djebali, S., Ruan, Y., (2012) Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res., 22, 1658−1667
CrossRef Pubmed Google scholar
[7]
Haynes, B. C., Maier, E. J., Kramer, M. H., Wang, P. I., Brown, H. and Brent, M. R. (2013) Mapping functional transcription factor networks from gene expression data. Genome Res., 23, 1319−1328
CrossRef Pubmed Google scholar
[8]
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. and Luscombe, N. M. (2009) A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet., 10, 252−263
CrossRef Pubmed Google scholar
[9]
Matthews, B. W. (1988) No code for recognition. Nature, 335, 294−295
CrossRef Pubmed Google scholar
[10]
Benos, P. V., Lapedes, A. S. and Stormo, G. D. (2002) Probabilistic code for DNA recognition by proteins of the EGR family. J. Mol. Biol., 323, 701−727
CrossRef Pubmed Google scholar
[11]
Gupta, A., Christensen, R. G., Bell, H. A., Goodwin, M., Patel, R. Y., Pandey, M., Enuameh, M. S., Rayla, A. L., Zhu, C., Thibodeau-Beganny, S., (2014) An improved predictive recognition model for Cys2-His2 zinc finger proteins. Nucleic Acids Res., 42, 4800−4812
CrossRef Pubmed Google scholar
[12]
Kaplan, T., Friedman, N. and Margalit, H. (2005) Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comput. Biol., 1, e1
CrossRef Pubmed Google scholar
[13]
Liu, J. and Stormo, G. D. (2008) Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors. Bioinformatics, 24, 1850−1857
CrossRef Pubmed Google scholar
[14]
Persikov, A. V., Osada, R. and Singh, M. (2009) Predicting DNA recognition by Cys2His2 zinc finger proteins. Bioinformatics, 25, 22−29
CrossRef Pubmed Google scholar
[15]
Persikov, A. V. and Singh, M. (2014) De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res., 42, 97−108
CrossRef Pubmed Google scholar
[16]
Wolfe, S. A., Nekludova, L. and Pabo, C. O. (2000) DNA recognition by Cys2His2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct., 29, 183−212
CrossRef Pubmed Google scholar
[17]
Christensen, R. G., Enuameh, M. S., Noyes, M. B., Brodsky, M. H., Wolfe, S. A. and Stormo, G. D. (2012) Recognition models to predict DNA-binding specificities of homeodomain proteins. Bioinformatics, 28, i84−i89
CrossRef Pubmed Google scholar
[18]
Stormo, G. D.(2013) Introduction to protein-DNA interactions: structure, thermodynamics, and bioinformatics. NewYork: Cold Spring Harbor Laboratory Press.
[19]
Giraud, B. G., Heumann, J. M. and Lapedes, A. S. (1999) Superadditive correlation. Phys. Rev. E, 59, 4983−4991
CrossRef Pubmed Google scholar
[20]
Lapedes, A. S., Giraud, B., Liu, L.C. and Stormo, G.D. (1999) Correlated mutations in models of protein sequences: phylogenetic and structural effects. The institute of mathematical statistics lecture notes- monograph series, 33, 236−256.
[21]
Lapedes, A., Giraud, B. and Jarzynski, C. (2002)Using sequence alignments to predict protein structure and stability with high accuracy. q-bio. <?Pub Caret?>QM, arXiv:1207.2484
[22]
Cocco, S., Monasson, R. and Weigt, M. (2013) From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction. PLoS Comput. Biol., 9, e1003176
CrossRef Pubmed Google scholar
[23]
Jones, D. T., Buchan, D. W., Cozzetto, D. and Pontil, M. (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics, 28, 184−190
CrossRef Pubmed Google scholar
[24]
Kamisetty, H., Ovchinnikov, S. and Baker, D. (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc. Natl. Acad. Sci. USA, 110, 15674−15679
CrossRef Pubmed Google scholar
[25]
Marks, D. S., Colwell, L. J., Sheridan, R., Hopf, T. A., Pagnani, A., Zecchina, R. and Sander, C. (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One, 6, e28766
CrossRef Pubmed Google scholar
[26]
Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D. S., Sander, C., Zecchina, R., Onuchic, J. N., Hwa, T. and Weigt, M. (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. USA, 108, E1293−E1301
CrossRef Pubmed Google scholar
[27]
Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. and Hwa, T. (2009) Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl. Acad. Sci. USA, 106, 67−72
CrossRef Pubmed Google scholar
[28]
Burger, L. and van Nimwegen, E. (2008) Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method. Mol. Syst. Biol., 4, 165
CrossRef Pubmed Google scholar
[29]
Ovchinnikov, S., Kamisetty, H. and Baker, D. (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife, 3, e02030
CrossRef Pubmed Google scholar
[30]
Feizi, S., Marbach, D., Médard, M. and Kellis, M. (2013) Network deconvolution as a general method to distinguish direct dependencies in networks. Nat. Biotechnol., 31, 726−733
CrossRef Pubmed Google scholar
[31]
Zhu, L. J., Christensen, R. G., Kazemian, M., Hull, C. J., Enuameh, M. S., Basciotta, M. D., Brasefield, J. A., Zhu, C., Asriyan, Y., Lapointe, D. S., (2011) FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res., 39, D111−D117
CrossRef Pubmed Google scholar
[32]
Robasky, K. and Bulyk, M. L. (2011) UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res., 39, D124−D128
CrossRef Pubmed Google scholar
[33]
Jolma, A., Yan, J., Whitington, T., Toivonen, J., Nitta, K. R., Rastas, P., Morgunova, E., Enge, M., Taipale, M., Wei, G., (2013) DNA-binding specificities of human transcription factors. Cell, 152, 327−339
CrossRef Pubmed Google scholar
[34]
Novichkov, P. S., Kazakov, A. E., Ravcheev, D. A., Leyn, S. A., Kovaleva, G. Y., Sutormin, R. A., Kazanov, M. D., Riehl, W., Arkin, A. P., Dubchak, I., (2013) RegPrecise 3.0—a resource for genome-scale exploration of transcriptional regulation in bacteria. BMC Genomics, 14, 745
CrossRef Pubmed Google scholar
[35]
Magrane, M. and Consortium, U. (2011) UniProt Knowledgebase: a hub of integrated protein data. Database, 2011, bar009
CrossRef Pubmed Google scholar
[36]
Dehal, P. S., Joachimiak, M. P., Price, M. N., Bates, J. T., Baumohl, J. K., Chivian, D., Friedland, G. D., Huang, K. H., Keller, K., Novichkov, P. S., (2010) MicrobesOnline: an integrated portal for comparative and functional genomics. Nucleic Acids Res., 38, D396−D400
CrossRef Pubmed Google scholar
[37]
Eddy, S. R. (2011) Accelerated profile HMM searches. PLoS Comput. Biol., 7, e1002195
CrossRef Pubmed Google scholar
[38]
Finn, R. D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Heger, A., Hetherington, K., Holm, L., Mistry, J., (2014) Pfam: the protein families database. Nucleic Acids Res., 42, D222−D230
CrossRef Pubmed Google scholar
[39]
Wang, T. and Stormo, G. D. (2003) Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics, 19, 2369−2380
CrossRef Pubmed Google scholar
[40]
Wang, T. and Stormo, G. D. (2005) Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. USA, 102, 17400−17405
CrossRef Pubmed Google scholar
[41]
Mahony, S. and Benos, P.V. (2007) STAMP: a web tool for exploring DNA-binding motif similarities.Nucleic Acids Res, 35(Web Server issue), W253−W258.
[42]
Kwan, C. (2014) A regression- based interpretation of the inverse of thesample covariance matrix. Spreadsheets in Education (eJSiE): 7, Article 3..
[43]
Dunn, S. D., Wahl, L. M. and Gloor, G. B. (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics, 24, 333−340
CrossRef Pubmed Google scholar

ACKNOWLEDGEMENTS

This work was supported by NIH grant HG000249 to GDS. We also thank Chris Workman (DTU Systems Biology) and the Otto Monsted Foundation (J. NO. 13-70-1193) for the support of CG during his visit to Washington University.
COMPLIANCE WITH ETHICS GUIDELINES
Ronak Y. Patel, Christian Garde and Gary D. Stormo declare they have no conflict of interest.
This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(1173 KB)

Accesses

Citations

Detail

Sections
Recommended

/