The statistical practice of the GTEx Project: from single to multiple tissues
Xu Liao, Xiaoran Chai, Xingjie Shi, Lin S. Chen, Jin Liu
The statistical practice of the GTEx Project: from single to multiple tissues
Background: The Genotype-Tissue Expression (GTEx) Project has collected genetic and transcriptome profiles from a wide spectrum of tissues in nearly 1,000 ceased individuals, providing an opportunity to study the regulatory roles of genetic variants in transcriptome activities from both cross-tissue and tissue-specific perspectives. Moreover, transcriptome activities (e.g., transcript abundance and alternative splicing) can be treated as mediators between genotype and phenotype to achieve phenotypic alteration. Knowing the genotype associated transcriptome status, researchers can better understand the biological and molecular mechanisms of genetic risk variants in complex traits.
Results: In this article, we first explore the genetic architecture of gene expression traits, and then review recent methods on quantitative trait locus (QTL) and co-expression network analysis. To further exemplify the usage of associations between genotype and transcriptome status, we briefly review methods that either directly or indirectly integrate expression/splicing QTL information in genome-wide association studies (GWASs).
Conclusions: The GTEx Project provides the largest and useful resource to investigate the associations between genotype and transcriptome status. The integration of results from the GTEx Project and existing GWASs further advances our understanding of roles of gene expression changes in bridging both the genetic variants and complex traits.
In the genetic area, people have made extensive efforts to investigate the associations between genetic variants and disease traits. However, we are lacking the knowledge of underlying biological mechanisms through which the genetic factors could affect the phenotypic outcome. Genotype-Tissue Expression (GTEx) Project provided us several angles to think about this question, including quantitative trait locus, alternative spicing patterns, and tissue-specific effect of genetic variants, and so on. In this article, we are providing a comprehensive review of their methods and results, and also suggest several down-stream analysis methods (e.g., TWAS, co-expression network) by which we can go deeper into the regulatory mechanisms triggered by genetic factors.
the Genotype-Tissue Expression Project / quantitative trait loci (QTL) / transcriptome-wide association studies / genome-wide association studies
[1] |
Finucane, H. K., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P. R., Anttila, V., Xu, H., Zang, C., Farh, K.,
CrossRef
Pubmed
Google scholar
|
[2] |
Maurano, M. T., Humbert, R., Rynes, E., Thurman, R. E., Haugen, E., Wang, H., Reynolds, A. P., Sandstrom, R., Qu, H., Brody, J.,
CrossRef
Pubmed
Google scholar
|
[3] |
Nica, A. C., Montgomery, S. B., Dimas, A. S., Stranger, B. E., Beazley, C., Barroso, I. and Dermitzakis, E. T. (2010) Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet., 6, e1000895
CrossRef
Google scholar
|
[4] |
Visscher, P.M., Wray, N.R., Zhang, Q., Sklar, P., McCarthy, M.I., Brown, M.A. and Yang, J. (2017)10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet., 101, 5–22
|
[5] |
ENCODE Project Consortium. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74.
CrossRef
Pubmed
Google scholar
|
[6] |
Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M. J.,
CrossRef
Pubmed
Google scholar
|
[7] |
Lonsdale, J., Thomas, J., Salvatore, M., Phillips, R., Lo, E., Shad, S., Hasz, R., Walters, G., Garcia, F., Young, N.,
CrossRef
Pubmed
Google scholar
|
[8] |
Aguet, F., Barbeira, A.N., Bonazzola, R., Brown, A., Castel, S.E., Jo, B., Kasela, S., Kim-Hellmuth, S., Liang, Y., Oliva, M.,
|
[9] |
Rockman, M. V. and Kruglyak, L. (2006) Genetics of global gene expression. Nat. Rev. Genet., 7, 862–872.
CrossRef
Pubmed
Google scholar
|
[10] |
Gilad, Y., Rifkin, S. A. and Pritchard, J. K. (2008) Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet., 24, 408–415.
CrossRef
Pubmed
Google scholar
|
[11] |
Shabalin, A. A. (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics, 28, 1353–1358.
CrossRef
Pubmed
Google scholar
|
[12] |
Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. and Delaneau, O. (2016) Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics, 32, 1479–1485.
CrossRef
Pubmed
Google scholar
|
[13] |
Grundberg, E., Small, K. S., Hedman, Å. K., Nica, A. C., Buil, A., Keildson, S., Bell, J. T., Yang, T. P., Meduri, E., Barrett, A.,
CrossRef
Pubmed
Google scholar
|
[14] |
Petretto, E., Bottolo, L., Langley, S. R., Heinig, M., McDermott-Roe, C., Sarwar, R., Pravenec, M., Hübner, N., Aitman, T. J., Cook, S. A.,
CrossRef
Pubmed
Google scholar
|
[15] |
Sul, J. H., Han, B., Ye, C., Choi, T. and Eskin, E. (2013) Effectively identifying eQTLs from multiple tissues by combining mixed model and meta-analytic approaches. PLoS Genet., 9, e1003491
CrossRef
Pubmed
Google scholar
|
[16] |
Li, G., Shabalin, A. A., Rusyn, I., Wright, F. A. and Nobel, A. B. (2018) An empirical Bayes approach for multiple tissue eQTL analysis. Biostatistics, 19, 391–406.
CrossRef
Pubmed
Google scholar
|
[17] |
Urbut, S. M., Wang, G., Carbonetto, P. and Stephens, M. (2019) Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet., 51, 187–195.
CrossRef
Pubmed
Google scholar
|
[18] |
Castel, S.E., Aguet, F., Mohammadi, P., GTEx Consortium, Ardlie, K.G., Lappalainen, T. (2019) A vast resource of allelic expression data spanning human tissues. bioRxiv, 792911
|
[19] |
Albert, F. W. and Kruglyak, L. (2015) The role of regulatory variation in complex traits and disease. Nat. Rev. Genet., 16, 197–212.
CrossRef
Pubmed
Google scholar
|
[20] |
Cookson, W., Liang, L., Abecasis, G., Moffatt, M. and Lathrop, M. (2009) Mapping complex disease traits with global gene expression. Nat. Rev. Genet., 10, 184–194.
CrossRef
Pubmed
Google scholar
|
[21] |
Gamazon, E. R., Wheeler, H. E., Shah, K. P., Mozaffari, S. V., Aquino-Michaels, K., Carroll, R. J., Eyler, A. E., Denny, J. C., Nicolae, D. L., Cox, N. J.,
CrossRef
Pubmed
Google scholar
|
[22] |
Gusev, A., Ko, A., Shi, H., Bhatia, G., Chung, W., Penninx, B. W., Jansen, R., de Geus, E. J., Boomsma, D. I., Wright, F. A.,
CrossRef
Pubmed
Google scholar
|
[23] |
Yang, Y., Shi, X., Jiao, Y., Huang, J., Chen, M., Zhou, X., Sun, L., Lin, X., Yang, C. and Liu, J. (2019) CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies. bioRxiv, 652263
CrossRef
Google scholar
|
[24] |
Barbeira, A. N., Pividori, M., Zheng, J., Wheeler, H. E., Nicolae, D. L. and Im, H. K.. (2019) Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet., 15, e1007889
CrossRef
Google scholar
|
[25] |
Hu, Y., Li, M., Lu, Q., Weng, H., Wang, J., Zekavat, S.M., Yu, Z., Li, B., Gu, J., Muchnik, S., Shi, Y.,
|
[26] |
Shi, X., Chai, X., Yang, Y., Cheng, Q., Jiao, Y., Huang, J., Yang, C. and Liu, J. (2019) A tissue-specific collaborative mixed model for jointly analyzing multiple tissues in transcriptome-wide association studies. bioRxiv, 789396
|
[27] |
Andreassen, O. A., Thompson, W. K., Schork, A. J., Ripke, S., Mattingsdal, M., Kelsoe, J. R., Kendler, K. S., O’Donovan, M. C., Rujescu, D., Werge, T.,
CrossRef
Pubmed
Google scholar
|
[28] |
Chung, D., Yang, C., Li, C., Gelernter, J. and Zhao, H. (2014) GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS Genet., 10, e1004787
CrossRef
Pubmed
Google scholar
|
[29] |
Liu, J., Wan, X., Ma, S. and Yang, C. (2016) EPS: an empirical Bayes approach to integrating pleiotropy and tissue-specific information for prioritizing risk genes. Bioinformatics, 32, 1856–1864.
CrossRef
Pubmed
Google scholar
|
[30] |
Ming, J., Dai, M., Cai, M., Wan, X., Liu, J. and Yang, C. (2018) LSMM: a statistical approach to integrating functional annotations with genome-wide association studies. Bioinformatics, 34, 2788–2796.
CrossRef
Pubmed
Google scholar
|
[31] |
Carithers, L. J., Ardlie, K., Barcus, M., Branton, P. A., Britton, A., Buia, S. A., Compton, C. C., DeLuca, D. S., Peter-Demchok, J., Gelfand, E. T.,
CrossRef
Pubmed
Google scholar
|
[32] |
Siminoff, L. A., Wilson-Genderson, M., Gardiner, H. M., Mosavel, M. and Barker, K. L. (2018) Consent to a postmortem tissue procurement study: Distinguishing family decision makers’ knowledge of the genotype-tissue expression project. Biopreserv. Biobank., 16, 200–206.
CrossRef
Google scholar
|
[33] |
The International Schizophrenia Consortium (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 460, 748–752.
CrossRef
Google scholar
|
[34] |
Wheeler, H. E., Shah, K. P., Brenner, J., Garcia, T., Aquino-Michaels, K., Cox, N. J., Nicolae, D. L., Im, H. K., and the GTEx Consortium. (2016) Survey of the heritability and sparse architecture of gene expression traits across human tissues. PLoS Genet., 12, e1006423
CrossRef
Pubmed
Google scholar
|
[35] |
Zhou, X., Carbonetto, P. and Stephens, M. (2013) Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet., 9, e1003264
CrossRef
Pubmed
Google scholar
|
[36] |
Moser, G., Lee, S. H., Hayes, B. J., Goddard, M. E., Wray, N. R. and Visscher, P. M. (2015) Simultaneous discovery, estimation and prediction analysis of complex traits using a Bayesian mixture model. PLoS Genet., 11, e1004969
CrossRef
Pubmed
Google scholar
|
[37] |
Nicolae, D. L., Gamazon, E., Zhang, W., Duan, S., Dolan, M. E. and Cox, N. J. (2010) Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet., 6, e1000888
CrossRef
Google scholar
|
[38] |
Fusi, N., Stegle, O. and Lawrence, N. D. (2012) Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLOS Comput. Biol., 8, e1002330
CrossRef
Pubmed
Google scholar
|
[39] |
van de Geijn, B., McVicker, G., Gilad, Y. and Pritchard, J. K. (2015) Wasp: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods, 12, 1061–1063.
CrossRef
Google scholar
|
[40] |
Robinson, M. D. and Oshlack, A. (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol., 11, R25
CrossRef
Pubmed
Google scholar
|
[41] |
Stegle, O., Parts, L., Durbin, R. and Winn, J. (2010) A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLOS Comput. Biol., 6, e1000770
CrossRef
Pubmed
Google scholar
|
[42] |
.The GTEx Consortium (2017) Genetic effects on gene expression across human tissues. Nature, 550, 204–213.
CrossRef
Pubmed
Google scholar
|
[43] |
Flutre, T., Wen, X., Pritchard, J. and Stephens, M. (2013) A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet., 9, e1003486
CrossRef
Pubmed
Google scholar
|
[44] |
Wei, Y., Tenzen, T. and Ji, H. (2015) Joint analysis of differential gene expression in multiple studies using correlation motifs. Biostatistics, 16, 31–46.
CrossRef
Pubmed
Google scholar
|
[45] |
Zhang, B. and Horvath, S. (2005) A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol., 4, e17
CrossRef
Pubmed
Google scholar
|
[46] |
Langfelder, P. and Horvath, S. (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559
CrossRef
Pubmed
Google scholar
|
[47] |
Langfelder, P. and Horvath, S. (2014) Tutorials for the WGCNA package
|
[48] |
Elena, A. (2002) Ananko, Nikolay L Podkolodny, Irina L Stepanenko, Elena V Ignatieva, Olga A Podkolodnaya, and Nikolay A Kolchanov. Genenet: a database on structure and functional organisation of gene networks. Nucleic Acids Res., 30, 398–401.
|
[49] |
Friedman, J., Hastie, T. and Tibshirani, R. (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441.
CrossRef
Pubmed
Google scholar
|
[50] |
Pierson, E., the GTEx Consortium, Koller, D., Battle, A., (2015) Sharing and specificity of co-expression networks across 35 human tissues. PLOS Comput. Biol., 11, e1004220
CrossRef
Pubmed
Google scholar
|
[51] |
Gerring, Z.F., Gamazon, E.R., Derks, E.M., for the Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium (2019) A gene co-expression networkbased analysis of multiple brain tissues reveals novel genes and molecular pathways underlying major depression. PLoS Genet 15, e1008245
|
[52] |
Yang, C., Wan, X., Lin, X., Chen, M., Zhou, X. and Liu, J. (2019) CoMM: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. Bioinformatics, 35, 1644–1652.
CrossRef
Pubmed
Google scholar
|
[53] |
Barbeira, A. N., Dickinson, S. P., Bonazzola, R., Zheng, J., Wheeler, H. E., Torres, J. M., Torstenson, E. S., Shah, K. P., Garcia, T., Edwards, T. L.,
CrossRef
Pubmed
Google scholar
|
[54] |
Wayne, A. Fuller. (2009) Measurement Error Models. Volume 305. New Jersey: John Wiley & Sons
|
[55] |
Liu, C., Rubin, D. B., Wu, Y-. N. (1998) and. Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika, 85, 755–770.
CrossRef
Google scholar
|
[56] |
Cheng, Q., Yang, Y., Shi, X., Yang, C., Peng, H. and Liu, J. (2019) MR-LDP: a two-sample Mendelian randomization for GWAS summary statistics accounting linkage disequilibrium and horizontal pleiotropy. bioRxiv, 684746
|
[57] |
Schork, A. J., Thompson, W. K., Pham, P., Torkamani, A., Roddey, J. C., Sullivan, P. F., Kelsoe, J. R., O’Donovan, M. C., Furberg, H., Schork, N. J.,
CrossRef
Pubmed
Google scholar
|
[58] |
Boyle, E. A., Li, Y. I. and Pritchard, J. K. (2017) An expanded view of complex traits: from polygenic to omnigenic. Cell, 169, 1177–1186.
CrossRef
Google scholar
|
[59] |
Kichaev, G., Yang, W.-Y., Lindstrom, S., Hormozdiari, F., Eskin, E., Price, A. L., Kraft, P. and Pasaniuc, B. (2014) Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet., 10, e1004722
CrossRef
Pubmed
Google scholar
|
[60] |
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. and Eskin, E. (2014) Identifying causal variants at loci with multiple signals of association. Genetics, 198, 497–508.
CrossRef
Pubmed
Google scholar
|
[61] |
Pickrell, J. K. (2014) Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet., 94, 559–573.
CrossRef
Pubmed
Google scholar
|
[62] |
Giambartolomei, C., Vukcevic, D., Schadt, E. E., Franke, L., Hingorani, A. D., Wallace, C. and Plagnol, V. (2014) Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet., 10, e1004383
CrossRef
Pubmed
Google scholar
|
[63] |
Wen, X., Pique-Regi, R. and Luca, F. (2017) Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization. PLoS Genet., 13, e1006646
CrossRef
Pubmed
Google scholar
|
[64] |
Giambartolomei, C., Zhenli Liu, J., Zhang, W., Hauberg, M., Shi, H., Boocock, J., Pickrell, J., Jaffe, A. E., Pasaniuc, B. and Roussos, P. (2018) A Bayesian framework for multiple trait colocalization from summary association statistics. Bioinformatics, 34, 2538–2545.
CrossRef
Pubmed
Google scholar
|
[65] |
Efron, B. (2008) Microarrays, empirical bayes and the two-groups model. Stat. Sci., 23, 1–22.
CrossRef
Google scholar
|
[66] |
Turcot, V., Lu, Y., Highland, H. M., Schurmann, C., Justice, A. E., Fine, R. S., Bradfield, J. P., Esko, T., Giri, A., Graff, M.,
CrossRef
Pubmed
Google scholar
|
/
〈 | 〉 |