Polygenic risk scores: effect estimation and model optimization

Zijie Zhao; Jie Song; Tuo Wang; Qiongshi Lu

doi:10.15302/J-QB-021-0238

PDF(177 KB)

Quant. Biol. ›› 2021, Vol. 9 ›› Issue (2) : 133-140. DOI: 10.15302/J-QB-021-0238

REVIEW

Polygenic risk scores: effect estimation and model optimization

Zijie Zhao¹ ,
Jie Song² ,
Tuo Wang¹ ,
Qiongshi Lu¹^,²^,³

Author information +

History +

Abstract

Background: Polygenic risk score (PRS) derived from summary statistics of genome-wide association studies (GWAS) is a useful tool to infer an individual’s genetic risk for health outcomes and has gained increasing popularity in human genetics research. PRS in its simplest form enjoys both computational efficiency and easy accessibility, yet the predictive performance of PRS remains moderate for diseases and traits.

Results: We provide an overview of recent advances in statistical methods to improve PRS’s performance by incorporating information from linkage disequilibrium, functional annotation, and pleiotropy. We also introduce model validation methods that fine-tune PRS using GWAS summary statistics.

Conclusion: In this review, we showcase methodological advances and current limitations of PRS, and discuss several emerging issues in risk prediction research.

Author summary

The prosperity of powerful genome-wide association studies (GWASs) has facilitated rapid development of polygenic risk score (PRS). Many post-GWAS PRS methods have been introduced to directly address the mediocre prediction accuracy of traditional PRS built upon marginal estimates from GWAS. This review first summarizes PRS methods inspired by different biological concepts including LD, functional annotation, and pleiotropy to better quantify SNP effects. Then we introduce recent PRS frameworks that enable model optimization using summary statistics. Finally, we point out current pitfalls of risk prediction research. We expect emerging methods that address current challenges in the near future.

Graphical abstract

Keywords

GWAS / polygenic risk score / summary statistics / model selection

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Zijie Zhao, Jie Song, Tuo Wang, Qiongshi Lu. Polygenic risk scores: effect estimation and model optimization. Quant. Biol., 2021, 9(2): 133‒140 https://doi.org/10.15302/J-QB-021-0238

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Purcell, S. M., Wray, N. R., Stone, J. L., Visscher, P. M., O’Donovan, M. C., Sullivan, P. F., Sklar, P., and the International Schizophrenia Consortium. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 460, 748–752 CrossRef Pubmed Google scholar

[2]	Wray, N. R., Goddard, M. E. and Visscher, P. M. (2007) Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res., 17, 1520–1528 CrossRef Pubmed Google scholar

[3]

Bush, W. S., Sawcer, S. J., de Jager, P. L., Oksenberg, J. R., McCauley, J. L., Pericak-Vance, M. A., Haines, J. L., and the International Multiple Sclerosis Genetics Consortium (IMSGC). (2010) Evidence for polygenic susceptibility to multiple sclerosis‒the shape of things to come. Am. J. Hum. Genet., 86, 621–625

CrossRef Pubmed Google scholar

[4]	Zhou, X., Carbonetto, P. and Stephens, M. (2013) Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet., 9, e1003264 CrossRef Pubmed Google scholar

[5]

Maier, R., Moser, G., Chen, G. B., Ripke, S., Coryell, W., Potash, J. B., Scheftner, W. A., Shi, J., Weissman, M. M., Hultman, C. M., (2015) Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am. J. Hum. Genet., 96, 283–294

CrossRef Pubmed Google scholar

[6]	Speed, D. and Balding, D. J. (2014) MultiBLUP: improved SNP-based prediction for complex traits. Genome Res., 24, 1550–1557 CrossRef Pubmed Google scholar

[7]

Lee, J. J., Wedow, R., Okbay, A., Kong, E., Maghzian, O., Zacher, M., Nguyen-Viet, T. A., Bowers, P., Sidorenko, J., Karlsson Linnér, R., (2018) Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet., 50, 1112–1121

CrossRef Pubmed Google scholar

[8]

Yengo, L., Sidorenko, J., Kemper, K. E., Zheng, Z., Wood, A. R., Weedon, M. N., Frayling, T. M., Hirschhorn, J., Yang, J. and Visscher, P. M., (2018) Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum. Mol. Genet., 27, 3641–3649

CrossRef Pubmed Google scholar

[9]

Warrington, N. M., Beaumont, R. N., Horikoshi, M., Day, F. R., Helgeland, Ø., Laurin, C., Bacelis, J., Peng, S., Hao, K., Feenstra, B., (2019) Maternal and fetal genetic effects on birth weight and their relevance to cardio-metabolic risk factors. Nat. Genet., 51, 804–814

CrossRef Pubmed Google scholar

[10]	Wray, N. R., Yang, J., Hayes, B. J., Price, A. L., Goddard, M. E. and Visscher, P. M. (2013) Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet., 14, 507–515 CrossRef Pubmed Google scholar

[11]	Chatterjee, N., Shi, J. and García-Closas, M. (2016) Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet., 17, 392–406 CrossRef Pubmed Google scholar

[12]	Choi, S. W. and O’Reilly, P. F. (2019) PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience, 8, giz082 CrossRef Pubmed Google scholar

[13]	Vilhjálmsson, B. J., Yang, J., Finucane, H. K., Gusev, A., Lindström, S., Ripke, S., Genovese, G., Loh, P. R., Bhatia, G., Do, R., (2015) Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet., 97, 576–592 CrossRef Pubmed Google scholar

[14]	Ge, T., Chen, C. Y., Ni, Y., Feng, Y. A. and Smoller, J. W. (2019) Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun., 10, 1776 CrossRef Pubmed Google scholar

[15]	Hu, Y., Lu, Q., Powles, R., Yao, X., Yang, C., Fang, F., Xu, X. and Zhao, H. (2017) Leveraging functional annotations in genetic risk prediction for human complex diseases. PLOS Comput. Biol., 13, e1005589 CrossRef Pubmed Google scholar

[16]	Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. and Sham, P. C. (2017) Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol., 41, 469–480 CrossRef Pubmed Google scholar

[17]	Chen, T.-H., Chatterjee, N., Landi, M. T. and Shi, J. (2020) A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information. J. Am. Stat. Assoc., 116, 133–143 CrossRef Google scholar

[18]	Hu, Y., Lu, Q., Liu, W., Zhang, Y., Li, M. and Zhao, H. (2017) Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction. PLoS Genet., 13, e1006836 CrossRef Pubmed Google scholar

[19]	Maier, R. M., Zhu, Z., Lee, S. H., Trzaskowski, M., Ruderfer, D. M., Stahl, E. A., Ripke, S., Wray, N. R., Yang, J., Visscher, P. M., (2018) Improving genetic prediction by leveraging genetic correlations among human diseases and traits. Nat. Commun. 9, 989

[20]	Chung, W., Chen, J., Turman, C., Lindstrom, S., Zhu, Z., Loh, P.-R., Kraft, P. and Liang, L. (2019) Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nat. Commun. 10, 569

[21]	Turley, P., Walters, R. K., Maghzian, O., Okbay, A., Lee, J. J., Fontana, M. A., Nguyen-Viet, T. A., Wedow, R., Zacher, M., Furlotte, N. A., (2018) Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet., 50, 229–237 CrossRef Pubmed Google scholar

[22]

Grotzinger, A. D., Rhemtulla, M., de Vlaming, R., Ritchie, S.J., Mallard, T.T., Hill, W.D., Ip, H. F., Marioni, R. E., McIntosh, A. M., Deary, I. J., (2019) Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525

[23]

Song, L., Liu, A., Shi, J., Gejman, P. V., Sanders, A. R., Duan, J., Cloninger, C. R., Svrakic, D. M., Buccola, N. G., Levinson, D. F., (2019) SummaryAUC: a tool for evaluating the performance of polygenic risk prediction models in validation datasets with only summary level statistics. Bioinformatics, 35, 4038–4044

CrossRef Pubmed Google scholar

[24]	Zhao, Z., Yi, Y., Wu, Y., Zhong, X., Lin, Y., Hohman, T. J., Fletcher, J. (2019) Fine-tuning polygenic risk scores with GWAS summary statistics. bioRxiv, doi: https://doi.org/10.1101/810713

[25]	Lloyd-Jones, L.R., Zeng, J., Sidorenko, J., Yengo, L., Moser, G., Kemper, K.E.Wang, H., Zheng, Z., Magi, R., Esko, T., (2019) Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun., 10, 5086

[26]	Robinson, M.R., Kleinman, A., Graff, M., Vinkhuyzen, A.A.E., Couper, D., Miller, M.B., Peyrot, W. J., Abdellaoui, A., Zietsch, B. P., Nolte, I. M., (2017) Genetic evidence of assortative mating in humans. Nat. Hum. Behav., 1, 0016

[27]	Yang, S. and Zhou, X. (2020) Accurate and scalable construction of polygenic scores in large biobank data sets. Am. J. Hum. Genet., 106, 679–693

[28]	Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., Maller, J., Sklar, P., de Bakker, P. I., Daly, M. J., (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet., 81, 559–575 CrossRef Pubmed Google scholar

[29]	GTEx Consortium (2017) Genetic effects on gene expression across human tissues. Nature, 550, 204–213

[30]	The ENCODE Project Consortium (2020) Perspectives on ENCODE. Nature, 583, 693–698

[31]	Roadmap Epigenomics Consortium (2015)Integrative analysis of 111 reference human epigenomes. Nature, 518, 317–330

[32]	Pasaniuc, B. and Price, A. L. (2017) Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet., 18, 117–127 CrossRef Pubmed Google scholar

[33]	Finucane, H. K., Bulik-Sullivan, B., Gusev, A., Trynka, G., Reshef, Y., Loh, P.-R., Anttila, V., Xu, H., Zang, C., Farh, K., (2015) Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet., 47, 1228–1235 CrossRef Pubmed Google scholar

[34]

Lu, Q., Powles, R. L., Wang, Q., He, B. J. and Zhao, H. (2016) Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies. PLoS Genet., 12, e1005947

CrossRef Pubmed Google scholar

[35]	Yang, C., Li, C., Wang, Q., Chung, D. and Zhao, H. (2015) Implications of pleiotropy: challenges and opportunities for mining Big Data in biomedicine. Front. Genet., 6, 229 CrossRef Pubmed Google scholar

[36]	Bulik-Sullivan, B., Finucane, H. K., Anttila, V., Gusev, A., Day, F. R., Loh, P. R., Duncan, L., Perry, J. R., Patterson, N., Robinson, E. B., (2015) An atlas of genetic correlations across human diseases and traits. Nat. Genet., 47, 1236–1241 CrossRef Pubmed Google scholar

[37]	Lu, Q., Li, B., Ou, D., Erlendsdottir, M., Powles, R. L., Jiang, T., Hu, Y., Chang, D., Jin, C., Dai, W., (2017) A powerful approach to estimating annotation-stratified genetic covariance via GWAS summary statistics. Am. J. Hum. Genet., 101, 939–964 CrossRef Pubmed Google scholar

[38]	Zhang, P. (1993) Model selection via multifold cross validation. Ann. Stat., 21, 299–313 CrossRef Google scholar

[39]	Kulm, S., Marderstein, A., Mezey, J. and Elemento, O. (2020) Benchmarking the accuracy of polygenic risk scores and their generative methods. medRxiv, 2020.04.06.20055574 CrossRef Google scholar

[40]

Khera, A. V., Chaffin, M., Aragam, K. G., Haas, M. E., Roselli, C., Choi, S. H., Natarajan, P., Lander, E. S., Lubitz, S. A., Ellinor, P. T., (2018) Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet., 50, 1219–1224

CrossRef Pubmed Google scholar

[41]	Wu, Y., Zhong, X., Lin, Y., Zhao, Z., Chen, J., Zheng, B., Li, J. J., Fletcher, J. M. and Lu, Q. (2020) Estimating genetic nurture with summary statistics of multi-generational genome-wide association studies. bioRxiv, 2020.10.06.328724

[42]	Young, A. I., Benonisdottir, S., Przeworski, M. and Kong, A. (2019) Deconstructing the sources of genotype-phenotype associations in humans. Science, 365, 1396–1400 CrossRef Pubmed Google scholar

[43]	Mostafavi, H., Harpak, A., Agarwal, I., Conley, D., Pritchard, J. K. and Przeworski, M. (2020) Variable prediction accuracy of polygenic scores within an ancestry group. eLife, 9, e48376 CrossRef Pubmed Google scholar

[44]	Martin, A. R., Kanai, M., Kamatani, Y., Okada, Y., Neale, B. M. and Daly, M. J. (2019) Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet., 51, 584–591 CrossRef Pubmed Google scholar

[45]	Martin, A. R., Gignoux, C. R., Walters, R. K., Wojcik, G. L., Neale, B. M., Gravel, S., Daly, M. J., Bustamante, C. D. and Kenny, E. E. (2017) Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet., 100, 635–649 CrossRef Pubmed Google scholar

[46]	Rosenberg, N. A., Edge, M. D., Pritchard, J. K. and Feldman, M. W. (2019) Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences. Evol. Med. Public Health, 2019, 26–34 CrossRef Pubmed Google scholar

[47]

Adhikari, K., Mendoza-Revilla, J., Sohail, A., Fuentes-Guajardo, M., Lampert, J., Chacón-Duque, J. C., Hurtado, M., Villegas, V., Granja, V., Acuña-Alonzo, V., (2019) A GWAS in Latin Americans highlights the convergent evolution of lighter skin pigmentation in Eurasia. Nat. Commun., 10, 358

CrossRef Pubmed Google scholar

[48]	Mills, M.C. and Rahal, C. (2019) A scientometric review of genome-wide association studies. Commun. Biol., 2, 9

[49]	Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. and Tang, H. (2017) Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet., 101, 218–226 CrossRef Pubmed Google scholar

[50]

Amariuta, T., Ishigaki, K., Sugishita, H., Ohta, T., Matsuda, K., Murakami, Y., Price, A. L., Kawakami, E.,Terao, C. and Raychaudhuri, S. (2020) In silico integration of thousands of epigenetic datasets into 707 cell type regulatory annotations improves the trans-ethnic portability of polygenic risk scores. bioRxiv, 2020.02.21.959510

CrossRef Google scholar

ACKNOWLEDGEMENTS

We acknowledge research support from the University of Wisconsin-Madison Office of the Chancellor and the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Zijie Zhao, Jie Song, Tuo Wang, and Qiongshi Lu declare that they have no conflict of interests or financial conflicts to disclose.

This article is a review article and does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2021 Higher Education Press

AI Summary AI Mindmap

PDF(177 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Revised	Accepted	Published
13 Oct 2020	12 Jan 2021	13 Jan 2021	15 Jun 2021
Just Accepted Date	Online First Date	Issue Date
26 Mar 2021	13 Apr 2021	13 Jul 2021

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs

Abstract

Author summary

Graphical abstract

Keywords

Cite this article

{{custom_sec.title}}

{{custom_sec.title}}

References

ACKNOWLEDGEMENTS

COMPLIANCE WITH ETHICS GUIDELINES

RIGHTS & PERMISSIONS