Polygenic risk scores: effect estimation and model optimization
Zijie Zhao, Jie Song, Tuo Wang, Qiongshi Lu
Polygenic risk scores: effect estimation and model optimization
Background: Polygenic risk score (PRS) derived from summary statistics of genome-wide association studies (GWAS) is a useful tool to infer an individual’s genetic risk for health outcomes and has gained increasing popularity in human genetics research. PRS in its simplest form enjoys both computational efficiency and easy accessibility, yet the predictive performance of PRS remains moderate for diseases and traits.
Results: We provide an overview of recent advances in statistical methods to improve PRS’s performance by incorporating information from linkage disequilibrium, functional annotation, and pleiotropy. We also introduce model validation methods that fine-tune PRS using GWAS summary statistics.
Conclusion: In this review, we showcase methodological advances and current limitations of PRS, and discuss several emerging issues in risk prediction research.
The prosperity of powerful genome-wide association studies (GWASs) has facilitated rapid development of polygenic risk score (PRS). Many post-GWAS PRS methods have been introduced to directly address the mediocre prediction accuracy of traditional PRS built upon marginal estimates from GWAS. This review first summarizes PRS methods inspired by different biological concepts including LD, functional annotation, and pleiotropy to better quantify SNP effects. Then we introduce recent PRS frameworks that enable model optimization using summary statistics. Finally, we point out current pitfalls of risk prediction research. We expect emerging methods that address current challenges in the near future.
GWAS / polygenic risk score / summary statistics / model selection
[1] |
Purcell, S. M., Wray, N. R., Stone, J. L., Visscher, P. M., O’Donovan, M. C., Sullivan, P. F., Sklar, P., and the International Schizophrenia Consortium. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 460, 748–752
CrossRef
Pubmed
Google scholar
|
[2] |
Wray, N. R., Goddard, M. E. and Visscher, P. M. (2007) Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res., 17, 1520–1528
CrossRef
Pubmed
Google scholar
|
[3] |
Bush, W. S., Sawcer, S. J., de Jager, P. L., Oksenberg, J. R., McCauley, J. L., Pericak-Vance, M. A., Haines, J. L., and the International Multiple Sclerosis Genetics Consortium (IMSGC). (2010) Evidence for polygenic susceptibility to multiple sclerosis‒the shape of things to come. Am. J. Hum. Genet., 86, 621–625
CrossRef
Pubmed
Google scholar
|
[4] |
Zhou, X., Carbonetto, P. and Stephens, M. (2013) Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet., 9, e1003264
CrossRef
Pubmed
Google scholar
|
[5] |
Maier, R., Moser, G., Chen, G. B., Ripke, S., Coryell, W., Potash, J. B., Scheftner, W. A., Shi, J., Weissman, M. M., Hultman, C. M.,
CrossRef
Pubmed
Google scholar
|
[6] |
Speed, D. and Balding, D. J. (2014) MultiBLUP: improved SNP-based prediction for complex traits. Genome Res., 24, 1550–1557
CrossRef
Pubmed
Google scholar
|
[7] |
Lee, J. J., Wedow, R., Okbay, A., Kong, E., Maghzian, O., Zacher, M., Nguyen-Viet, T. A., Bowers, P., Sidorenko, J., Karlsson Linnér, R.,
CrossRef
Pubmed
Google scholar
|
[8] |
Yengo, L., Sidorenko, J., Kemper, K. E., Zheng, Z., Wood, A. R., Weedon, M. N., Frayling, T. M., Hirschhorn, J., Yang, J. and Visscher, P. M.,
CrossRef
Pubmed
Google scholar
|
[9] |
Warrington, N. M., Beaumont, R. N., Horikoshi, M., Day, F. R., Helgeland, Ø., Laurin, C., Bacelis, J., Peng, S., Hao, K., Feenstra, B.,
CrossRef
Pubmed
Google scholar
|
[10] |
Wray, N. R., Yang, J., Hayes, B. J., Price, A. L., Goddard, M. E. and Visscher, P. M. (2013) Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet., 14, 507–515
CrossRef
Pubmed
Google scholar
|
[11] |
Chatterjee, N., Shi, J. and García-Closas, M. (2016) Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet., 17, 392–406
CrossRef
Pubmed
Google scholar
|
[12] |
Choi, S. W. and O’Reilly, P. F. (2019) PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience, 8, giz082
CrossRef
Pubmed
Google scholar
|
[13] |
Vilhjálmsson, B. J., Yang, J., Finucane, H. K., Gusev, A., Lindström, S., Ripke, S., Genovese, G., Loh, P. R., Bhatia, G., Do, R.,
CrossRef
Pubmed
Google scholar
|
[14] |
Ge, T., Chen, C. Y., Ni, Y., Feng, Y. A. and Smoller, J. W. (2019) Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun., 10, 1776
CrossRef
Pubmed
Google scholar
|
[15] |
Hu, Y., Lu, Q., Powles, R., Yao, X., Yang, C., Fang, F., Xu, X. and Zhao, H. (2017) Leveraging functional annotations in genetic risk prediction for human complex diseases. PLOS Comput. Biol., 13, e1005589
CrossRef
Pubmed
Google scholar
|
[16] |
Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. and Sham, P. C. (2017) Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol., 41, 469–480
CrossRef
Pubmed
Google scholar
|
[17] |
Chen, T.-H., Chatterjee, N., Landi, M. T. and Shi, J. (2020) A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information. J. Am. Stat. Assoc., 116, 133–143
CrossRef
Google scholar
|
[18] |
Hu, Y., Lu, Q., Liu, W., Zhang, Y., Li, M. and Zhao, H. (2017) Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction. PLoS Genet., 13, e1006836
CrossRef
Pubmed
Google scholar
|
[19] |
Maier, R. M., Zhu, Z., Lee, S. H., Trzaskowski, M., Ruderfer, D. M., Stahl, E. A., Ripke, S., Wray, N. R., Yang, J., Visscher, P. M.,
|
[20] |
Chung, W., Chen, J., Turman, C., Lindstrom, S., Zhu, Z., Loh, P.-R., Kraft, P. and Liang, L. (2019) Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nat. Commun. 10, 569
|
[21] |
Turley, P., Walters, R. K., Maghzian, O., Okbay, A., Lee, J. J., Fontana, M. A., Nguyen-Viet, T. A., Wedow, R., Zacher, M., Furlotte, N. A.,
CrossRef
Pubmed
Google scholar
|
[22] |
Grotzinger, A. D., Rhemtulla, M., de Vlaming, R., Ritchie, S.J., Mallard, T.T., Hill, W.D., Ip, H. F., Marioni, R. E., McIntosh, A. M., Deary, I. J.,
|
[23] |
Song, L., Liu, A., Shi, J., Gejman, P. V., Sanders, A. R., Duan, J., Cloninger, C. R., Svrakic, D. M., Buccola, N. G., Levinson, D. F.,
CrossRef
Pubmed
Google scholar
|
[24] |
Zhao, Z., Yi, Y., Wu, Y., Zhong, X., Lin, Y., Hohman, T. J., Fletcher, J. (2019) Fine-tuning polygenic risk scores with GWAS summary statistics. bioRxiv, doi: https://doi.org/10.1101/810713
|
[25] |
Lloyd-Jones, L.R., Zeng, J., Sidorenko, J., Yengo, L., Moser, G., Kemper, K.E.Wang, H., Zheng, Z., Magi, R., Esko, T.,
|
[26] |
Robinson, M.R., Kleinman, A., Graff, M., Vinkhuyzen, A.A.E., Couper, D., Miller, M.B., Peyrot, W. J., Abdellaoui, A., Zietsch, B. P., Nolte, I. M.,
|
[27] |
Yang, S. and Zhou, X. (2020) Accurate and scalable construction of polygenic scores in large biobank data sets. Am. J. Hum. Genet., 106, 679–693
|
[28] |
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller, J., Sklar, P., de Bakker, P. I., Daly, M. J.,
CrossRef
Pubmed
Google scholar
|
[29] |
GTEx Consortium (2017) Genetic effects on gene expression across human tissues. Nature, 550, 204–213
|
[30] |
The ENCODE Project Consortium (2020) Perspectives on ENCODE. Nature, 583, 693–698
|
[31] |
Roadmap Epigenomics Consortium (2015)
|
[32] |
Pasaniuc, B. and Price, A. L. (2017) Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet., 18, 117–127
CrossRef
Pubmed
Google scholar
|
[33] |
Finucane, H. K., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P.-R., Anttila, V., Xu, H., Zang, C., Farh, K.,
CrossRef
Pubmed
Google scholar
|
[34] |
Lu, Q., Powles, R. L., Wang, Q., He, B. J. and Zhao, H. (2016) Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies. PLoS Genet., 12, e1005947
CrossRef
Pubmed
Google scholar
|
[35] |
Yang, C., Li, C., Wang, Q., Chung, D. and Zhao, H. (2015) Implications of pleiotropy: challenges and opportunities for mining Big Data in biomedicine. Front. Genet., 6, 229
CrossRef
Pubmed
Google scholar
|
[36] |
Bulik-Sullivan, B., Finucane, H. K., Anttila, V., Gusev, A., Day, F. R., Loh, P. R., Duncan, L., Perry, J. R., Patterson, N., Robinson, E. B.,
CrossRef
Pubmed
Google scholar
|
[37] |
Lu, Q., Li, B., Ou, D., Erlendsdottir, M., Powles, R. L., Jiang, T., Hu, Y., Chang, D., Jin, C., Dai, W.,
CrossRef
Pubmed
Google scholar
|
[38] |
Zhang, P. (1993) Model selection via multifold cross validation. Ann. Stat., 21, 299–313
CrossRef
Google scholar
|
[39] |
Kulm, S., Marderstein, A., Mezey, J. and Elemento, O. (2020) Benchmarking the accuracy of polygenic risk scores and their generative methods. medRxiv, 2020.04.06.20055574
CrossRef
Google scholar
|
[40] |
Khera, A. V., Chaffin, M., Aragam, K. G., Haas, M. E., Roselli, C., Choi, S. H., Natarajan, P., Lander, E. S., Lubitz, S. A., Ellinor, P. T.,
CrossRef
Pubmed
Google scholar
|
[41] |
Wu, Y., Zhong, X., Lin, Y., Zhao, Z., Chen, J., Zheng, B., Li, J. J., Fletcher, J. M. and Lu, Q. (2020) Estimating genetic nurture with summary statistics of multi-generational genome-wide association studies. bioRxiv, 2020.10.06.328724
|
[42] |
Young, A. I., Benonisdottir, S., Przeworski, M. and Kong, A. (2019) Deconstructing the sources of genotype-phenotype associations in humans. Science, 365, 1396–1400
CrossRef
Pubmed
Google scholar
|
[43] |
Mostafavi, H., Harpak, A., Agarwal, I., Conley, D., Pritchard, J. K. and Przeworski, M. (2020) Variable prediction accuracy of polygenic scores within an ancestry group. eLife, 9, e48376
CrossRef
Pubmed
Google scholar
|
[44] |
Martin, A. R., Kanai, M., Kamatani, Y., Okada, Y., Neale, B. M. and Daly, M. J. (2019) Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet., 51, 584–591
CrossRef
Pubmed
Google scholar
|
[45] |
Martin, A. R., Gignoux, C. R., Walters, R. K., Wojcik, G. L., Neale, B. M., Gravel, S., Daly, M. J., Bustamante, C. D. and Kenny, E. E. (2017) Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet., 100, 635–649
CrossRef
Pubmed
Google scholar
|
[46] |
Rosenberg, N. A., Edge, M. D., Pritchard, J. K. and Feldman, M. W. (2019) Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences. Evol. Med. Public Health, 2019, 26–34
CrossRef
Pubmed
Google scholar
|
[47] |
Adhikari, K., Mendoza-Revilla, J., Sohail, A., Fuentes-Guajardo, M., Lampert, J., Chacón-Duque, J. C., Hurtado, M., Villegas, V., Granja, V., Acuña-Alonzo, V.,
CrossRef
Pubmed
Google scholar
|
[48] |
Mills, M.C. and Rahal, C. (2019) A scientometric review of genome-wide association studies. Commun. Biol., 2, 9
|
[49] |
Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. and Tang, H. (2017) Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet., 101, 218–226
CrossRef
Pubmed
Google scholar
|
[50] |
Amariuta, T., Ishigaki, K., Sugishita, H., Ohta, T., Matsuda, K., Murakami, Y., Price, A. L., Kawakami, E.,Terao, C. and Raychaudhuri, S. (2020) In silico integration of thousands of epigenetic datasets into 707 cell type regulatory annotations improves the trans-ethnic portability of polygenic risk scores. bioRxiv, 2020.02.21.959510
CrossRef
Google scholar
|
/
〈 | 〉 |