Polygenic risk scores: effect estimation and model optimization

Zijie Zhao, Jie Song, Tuo Wang, Qiongshi Lu

PDF(177 KB)
PDF(177 KB)
Quant. Biol. ›› 2021, Vol. 9 ›› Issue (2) : 133-140. DOI: 10.15302/J-QB-021-0238
REVIEW
REVIEW

Polygenic risk scores: effect estimation and model optimization

Author information +
History +

Abstract

Background: Polygenic risk score (PRS) derived from summary statistics of genome-wide association studies (GWAS) is a useful tool to infer an individual’s genetic risk for health outcomes and has gained increasing popularity in human genetics research. PRS in its simplest form enjoys both computational efficiency and easy accessibility, yet the predictive performance of PRS remains moderate for diseases and traits.

Results: We provide an overview of recent advances in statistical methods to improve PRS’s performance by incorporating information from linkage disequilibrium, functional annotation, and pleiotropy. We also introduce model validation methods that fine-tune PRS using GWAS summary statistics.

Conclusion: In this review, we showcase methodological advances and current limitations of PRS, and discuss several emerging issues in risk prediction research.

Author summary

The prosperity of powerful genome-wide association studies (GWASs) has facilitated rapid development of polygenic risk score (PRS). Many post-GWAS PRS methods have been introduced to directly address the mediocre prediction accuracy of traditional PRS built upon marginal estimates from GWAS. This review first summarizes PRS methods inspired by different biological concepts including LD, functional annotation, and pleiotropy to better quantify SNP effects. Then we introduce recent PRS frameworks that enable model optimization using summary statistics. Finally, we point out current pitfalls of risk prediction research. We expect emerging methods that address current challenges in the near future.

Graphical abstract

Keywords

GWAS / polygenic risk score / summary statistics / model selection

Cite this article

Download citation ▾
Zijie Zhao, Jie Song, Tuo Wang, Qiongshi Lu. Polygenic risk scores: effect estimation and model optimization. Quant. Biol., 2021, 9(2): 133‒140 https://doi.org/10.15302/J-QB-021-0238

References

[1]
Purcell, S. M., Wray, N. R., Stone, J. L., Visscher, P. M., O’Donovan, M. C., Sullivan, P. F., Sklar, P., and the International Schizophrenia Consortium. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 460, 748–752
CrossRef Pubmed Google scholar
[2]
Wray, N. R., Goddard, M. E. and Visscher, P. M. (2007) Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res., 17, 1520–1528
CrossRef Pubmed Google scholar
[3]
Bush, W. S., Sawcer, S. J., de Jager, P. L., Oksenberg, J. R., McCauley, J. L., Pericak-Vance, M. A., Haines, J. L., and the International Multiple Sclerosis Genetics Consortium (IMSGC). (2010) Evidence for polygenic susceptibility to multiple sclerosis‒the shape of things to come. Am. J. Hum. Genet., 86, 621–625
CrossRef Pubmed Google scholar
[4]
Zhou, X., Carbonetto, P. and Stephens, M. (2013) Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet., 9, e1003264
CrossRef Pubmed Google scholar
[5]
Maier, R., Moser, G., Chen, G. B., Ripke, S., Coryell, W., Potash, J. B., Scheftner, W. A., Shi, J., Weissman, M. M., Hultman, C. M., (2015) Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am. J. Hum. Genet., 96, 283–294
CrossRef Pubmed Google scholar
[6]
Speed, D. and Balding, D. J. (2014) MultiBLUP: improved SNP-based prediction for complex traits. Genome Res., 24, 1550–1557
CrossRef Pubmed Google scholar
[7]
Lee, J. J., Wedow, R., Okbay, A., Kong, E., Maghzian, O., Zacher, M., Nguyen-Viet, T. A., Bowers, P., Sidorenko, J., Karlsson Linnér, R., (2018) Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet., 50, 1112–1121
CrossRef Pubmed Google scholar
[8]
Yengo, L., Sidorenko, J., Kemper, K. E., Zheng, Z., Wood, A. R., Weedon, M. N., Frayling, T. M., Hirschhorn, J., Yang, J. and Visscher, P. M., (2018) Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum. Mol. Genet., 27, 3641–3649
CrossRef Pubmed Google scholar
[9]
Warrington, N. M., Beaumont, R. N., Horikoshi, M., Day, F. R., Helgeland, Ø., Laurin, C., Bacelis, J., Peng, S., Hao, K., Feenstra, B., (2019) Maternal and fetal genetic effects on birth weight and their relevance to cardio-metabolic risk factors. Nat. Genet., 51, 804–814
CrossRef Pubmed Google scholar
[10]
Wray, N. R., Yang, J., Hayes, B. J., Price, A. L., Goddard, M. E. and Visscher, P. M. (2013) Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet., 14, 507–515
CrossRef Pubmed Google scholar
[11]
Chatterjee, N., Shi, J. and García-Closas, M. (2016) Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet., 17, 392–406
CrossRef Pubmed Google scholar
[12]
Choi, S. W. and O’Reilly, P. F. (2019) PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience, 8, giz082
CrossRef Pubmed Google scholar
[13]
Vilhjálmsson, B. J., Yang, J., Finucane, H. K., Gusev, A., Lindström, S., Ripke, S., Genovese, G., Loh, P. R., Bhatia, G., Do, R., (2015) Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet., 97, 576–592
CrossRef Pubmed Google scholar
[14]
Ge, T., Chen, C. Y., Ni, Y., Feng, Y. A. and Smoller, J. W. (2019) Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun., 10, 1776
CrossRef Pubmed Google scholar
[15]
Hu, Y., Lu, Q., Powles, R., Yao, X., Yang, C., Fang, F., Xu, X. and Zhao, H. (2017) Leveraging functional annotations in genetic risk prediction for human complex diseases. PLOS Comput. Biol., 13, e1005589
CrossRef Pubmed Google scholar
[16]
Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. and Sham, P. C. (2017) Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol., 41, 469–480
CrossRef Pubmed Google scholar
[17]
Chen, T.-H., Chatterjee, N., Landi, M. T. and Shi, J. (2020) A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information. J. Am. Stat. Assoc., 116, 133–143
CrossRef Google scholar
[18]
Hu, Y., Lu, Q., Liu, W., Zhang, Y., Li, M. and Zhao, H. (2017) Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction. PLoS Genet., 13, e1006836
CrossRef Pubmed Google scholar
[19]
Maier, R. M., Zhu, Z., Lee, S. H., Trzaskowski, M., Ruderfer, D. M., Stahl, E. A., Ripke, S., Wray, N. R., Yang, J., Visscher, P. M., (2018) Improving genetic prediction by leveraging genetic correlations among human diseases and traits. Nat. Commun. 9, 989
[20]
Chung, W., Chen, J., Turman, C., Lindstrom, S., Zhu, Z., Loh, P.-R., Kraft, P. and Liang, L. (2019) Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nat. Commun. 10, 569
[21]
Turley, P., Walters, R. K., Maghzian, O., Okbay, A., Lee, J. J., Fontana, M. A., Nguyen-Viet, T. A., Wedow, R., Zacher, M., Furlotte, N. A., (2018) Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet., 50, 229–237
CrossRef Pubmed Google scholar
[22]
Grotzinger, A. D., Rhemtulla, M., de Vlaming, R., Ritchie, S.J., Mallard, T.T., Hill, W.D., Ip, H. F., Marioni, R. E., McIntosh, A. M., Deary, I. J., (2019) Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525
[23]
Song, L., Liu, A., Shi, J., Gejman, P. V., Sanders, A. R., Duan, J., Cloninger, C. R., Svrakic, D. M., Buccola, N. G., Levinson, D. F., (2019) SummaryAUC: a tool for evaluating the performance of polygenic risk prediction models in validation datasets with only summary level statistics. Bioinformatics, 35, 4038–4044
CrossRef Pubmed Google scholar
[24]
Zhao, Z., Yi, Y., Wu, Y., Zhong, X., Lin, Y., Hohman, T. J., Fletcher, J. (2019) Fine-tuning polygenic risk scores with GWAS summary statistics. bioRxiv, doi: https://doi.org/10.1101/810713
[25]
Lloyd-Jones, L.R., Zeng, J., Sidorenko, J., Yengo, L., Moser, G., Kemper, K.E.Wang, H., Zheng, Z., Magi, R., Esko, T., (2019) Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun., 10, 5086
[26]
Robinson, M.R., Kleinman, A., Graff, M., Vinkhuyzen, A.A.E., Couper, D., Miller, M.B., Peyrot, W. J., Abdellaoui, A., Zietsch, B. P., Nolte, I. M., (2017) Genetic evidence of assortative mating in humans. Nat. Hum. Behav., 1, 0016
[27]
Yang, S. and Zhou, X. (2020) Accurate and scalable construction of polygenic scores in large biobank data sets. Am. J. Hum. Genet., 106, 679–693
[28]
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller, J., Sklar, P., de Bakker, P. I., Daly, M. J., (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet., 81, 559–575
CrossRef Pubmed Google scholar
[29]
GTEx Consortium (2017) Genetic effects on gene expression across human tissues. Nature, 550, 204–213
[30]
The ENCODE Project Consortium (2020) Perspectives on ENCODE. Nature, 583, 693–698
[31]
Roadmap Epigenomics Consortium (2015)Integrative analysis of 111 reference human epigenomes. Nature, 518, 317–330
[32]
Pasaniuc, B. and Price, A. L. (2017) Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet., 18, 117–127
CrossRef Pubmed Google scholar
[33]
Finucane, H. K., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P.-R., Anttila, V., Xu, H., Zang, C., Farh, K., (2015) Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet., 47, 1228–1235
CrossRef Pubmed Google scholar
[34]
Lu, Q., Powles, R. L., Wang, Q., He, B. J. and Zhao, H. (2016) Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies. PLoS Genet., 12, e1005947
CrossRef Pubmed Google scholar
[35]
Yang, C., Li, C., Wang, Q., Chung, D. and Zhao, H. (2015) Implications of pleiotropy: challenges and opportunities for mining Big Data in biomedicine. Front. Genet., 6, 229
CrossRef Pubmed Google scholar
[36]
Bulik-Sullivan, B., Finucane, H. K., Anttila, V., Gusev, A., Day, F. R., Loh, P. R., Duncan, L., Perry, J. R., Patterson, N., Robinson, E. B., (2015) An atlas of genetic correlations across human diseases and traits. Nat. Genet., 47, 1236–1241
CrossRef Pubmed Google scholar
[37]
Lu, Q., Li, B., Ou, D., Erlendsdottir, M., Powles, R. L., Jiang, T., Hu, Y., Chang, D., Jin, C., Dai, W., (2017) A powerful approach to estimating annotation-stratified genetic covariance via GWAS summary statistics. Am. J. Hum. Genet., 101, 939–964
CrossRef Pubmed Google scholar
[38]
Zhang, P. (1993) Model selection via multifold cross validation. Ann. Stat., 21, 299–313
CrossRef Google scholar
[39]
Kulm, S., Marderstein, A., Mezey, J. and Elemento, O. (2020) Benchmarking the accuracy of polygenic risk scores and their generative methods. medRxiv, 2020.04.06.20055574
CrossRef Google scholar
[40]
Khera, A. V., Chaffin, M., Aragam, K. G., Haas, M. E., Roselli, C., Choi, S. H., Natarajan, P., Lander, E. S., Lubitz, S. A., Ellinor, P. T., (2018) Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet., 50, 1219–1224
CrossRef Pubmed Google scholar
[41]
Wu, Y., Zhong, X., Lin, Y., Zhao, Z., Chen, J., Zheng, B., Li, J. J., Fletcher, J. M. and Lu, Q. (2020) Estimating genetic nurture with summary statistics of multi-generational genome-wide association studies. bioRxiv, 2020.10.06.328724
[42]
Young, A. I., Benonisdottir, S., Przeworski, M. and Kong, A. (2019) Deconstructing the sources of genotype-phenotype associations in humans. Science, 365, 1396–1400
CrossRef Pubmed Google scholar
[43]
Mostafavi, H., Harpak, A., Agarwal, I., Conley, D., Pritchard, J. K. and Przeworski, M. (2020) Variable prediction accuracy of polygenic scores within an ancestry group. eLife, 9, e48376
CrossRef Pubmed Google scholar
[44]
Martin, A. R., Kanai, M., Kamatani, Y., Okada, Y., Neale, B. M. and Daly, M. J. (2019) Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet., 51, 584–591
CrossRef Pubmed Google scholar
[45]
Martin, A. R., Gignoux, C. R., Walters, R. K., Wojcik, G. L., Neale, B. M., Gravel, S., Daly, M. J., Bustamante, C. D. and Kenny, E. E. (2017) Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet., 100, 635–649
CrossRef Pubmed Google scholar
[46]
Rosenberg, N. A., Edge, M. D., Pritchard, J. K. and Feldman, M. W. (2019) Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences. Evol. Med. Public Health, 2019, 26–34
CrossRef Pubmed Google scholar
[47]
Adhikari, K., Mendoza-Revilla, J., Sohail, A., Fuentes-Guajardo, M., Lampert, J., Chacón-Duque, J. C., Hurtado, M., Villegas, V., Granja, V., Acuña-Alonzo, V., (2019) A GWAS in Latin Americans highlights the convergent evolution of lighter skin pigmentation in Eurasia. Nat. Commun., 10, 358
CrossRef Pubmed Google scholar
[48]
Mills, M.C. and Rahal, C. (2019) A scientometric review of genome-wide association studies. Commun. Biol., 2, 9
[49]
Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. and Tang, H. (2017) Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet., 101, 218–226
CrossRef Pubmed Google scholar
[50]
Amariuta, T., Ishigaki, K., Sugishita, H., Ohta, T., Matsuda, K., Murakami, Y., Price, A. L., Kawakami, E.,Terao, C. and Raychaudhuri, S. (2020) In silico integration of thousands of epigenetic datasets into 707 cell type regulatory annotations improves the trans-ethnic portability of polygenic risk scores. bioRxiv, 2020.02.21.959510
CrossRef Google scholar

ACKNOWLEDGEMENTS

We acknowledge research support from the University of Wisconsin-Madison Office of the Chancellor and the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Zijie Zhao, Jie Song, Tuo Wang, and Qiongshi Lu declare that they have no conflict of interests or financial conflicts to disclose.
This article is a review article and does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2021 Higher Education Press
AI Summary AI Mindmap
PDF(177 KB)

Accesses

Citations

Detail

Sections
Recommended

/