Identification of genomic regions distorting population structure inference in diverse continental groups

Qiuxuan Liu, Degang Wu, Chaolong Wang

PDF(4586 KB)
PDF(4586 KB)
Quant. Biol. ›› 2022, Vol. 10 ›› Issue (3) : 287-298. DOI: 10.15302/J-QB-022-0303
RESEARCH ARTICLE
RESEARCH ARTICLE

Identification of genomic regions distorting population structure inference in diverse continental groups

Author information +
History +

Abstract

Background: Inference of population structure is crucial for studies of human evolutionary history and genome-wide association studies. While several genomic regions have been reported to distort population structure analysis of European populations, no systematic analysis has been performed on non-European continental groups and with the latest human genome assembly.

Methods: Using the 1000 Genomes Project high coverage whole-genome sequencing data from four major continental groups (Europe, East Asia, South Asia, and Africa), we developed a statistical framework and systematically detected genomic regions with unusual contributions to the inference of population structure for each of the continental groups.

Results: We identified and characterized 27 unusual genomic regions mapped to GRCh38, including 13 regions around centromeres, 2 with chromosomal inversions, 8 under natural selection, and 4 with unknown causes. Excluding these regions would result in a more interpretable population structure inferred by principal components analysis and ADMIXTURE analysis.

Conclusions: Unusual genomic patterns in certain regions can distort the inference of population structure. Our compiled list of these unusual regions will be useful for many population-genetic studies, including those from non-European populations. Availability: The code to reproduce our results is available at the website of Github (/dwuab/UnRegFinder).

Author summary

We propose a systematical analysis framework based on principal component analysis (PCA) to identify such genomic regions. Based on whole-genome sequencing data from four major continental groups with no recent admixture from the 1000 Genomes Project, we compile a list of 27 unusual genomic regions and demonstrate that excluding these regions can lead to more interpretable population structure results. We recommend removing these regions as a routine in the analysis of population structure to avoid artifact results.

Graphical abstract

Keywords

population genetics / population structure / linkage disequilibrium / principal component analysis / natural selection

Cite this article

Download citation ▾
Qiuxuan Liu, Degang Wu, Chaolong Wang. Identification of genomic regions distorting population structure inference in diverse continental groups. Quant. Biol., 2022, 10(3): 287‒298 https://doi.org/10.15302/J-QB-022-0303

References

[1]
Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd, K. K., Zhivotovsky, L. A. Feldman, M. ( 2002). Genetic structure of human populations. Science, 298 : 2381– 2385
CrossRef Google scholar
[2]
The, 1000 Genomes Project Consortium ( 2015). A global reference for human genetic variation. Nature, 526 : 68– 74
[3]
Wang, C., llner, S. Rosenberg, N. ( 2012). A quantitative comparison of the similarity between genes and geography in worldwide human populations. PLoS Genet., 8 : e1002886
CrossRef Google scholar
[4]
Wu, D., Dou, J., Chai, X., Bellis, C., Wilm, A., Shih, C. C., Soon, W. W. J., Bertin, N., Lin, C. B., Khor, C. C. . ( 2019). Large-scale whole-genome sequencing of three diverse Asian populations in Singapore. Cell, 179 : 736– 749.e15
CrossRef Google scholar
[5]
Marchini, J., Cardon, L. R., Phillips, M. S. ( 2004). The effects of human population structure on large genetic association studies. Nat. Genet., 36 : 512– 517
CrossRef Google scholar
[6]
Price, A. L., Zaitlen, N. A., Reich, D. ( 2010). New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet., 11 : 459– 463
CrossRef Google scholar
[7]
Chen, H., Wang, C., Conomos, M. P., Stilp, A. M., Li, Z., Sofer, T., Szpiro, A. A., Chen, W., Brehm, J. M., . ( 2016). Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet., 98 : 653– 666
CrossRef Google scholar
[8]
Wang, C., Zhan, X., Bragg-Gresham, J., Kang, H. M., Stambolian, D., Chew, E. Y., Branham, K. E., Heckenlively, J., Fulton, R., Wilson, R. K. . ( 2014). Ancestry estimation and control of population stratification for sequence-based association studies. Nat. Genet., 46 : 409– 415
CrossRef Google scholar
[9]
Wojcik, G. L., Graff, M., Nishimura, K. K., Tao, R., Haessler, J., Gignoux, C. R., Highland, H. M., Patel, Y. M., Sorokin, E. P., Avery, C. L. . ( 2019). Genetic analyses of diverse populations improves discovery for complex traits. Nature, 570 : 514– 518
CrossRef Google scholar
[10]
Chen, J., Spracklen, C. N., Marenne, G., Varshney, A., Corbin, L. J., Luan, J., Willems, S. M., Wu, Y., Zhang, X., Horikoshi, M. . ( 2021). The trans-ancestral genomic architecture of glycemic traits. Nat. Genet., 53 : 840– 860
CrossRef Google scholar
[11]
Zhu, C. ( 2009). Nonmetric multidimensional scaling corrects for population structure in association mapping with different sample types. Genetics, 182 : 875– 888
CrossRef Google scholar
[12]
Patterson, N., Price, A. L. ( 2006). Population structure and eigenanalysis. PLoS Genet., 2 : e190
CrossRef Google scholar
[13]
Wang, C., Zhan, X., Liang, L., Abecasis, G. R. ( 2015). Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation. Am. J. Hum. Genet., 96 : 926– 937
CrossRef Google scholar
[14]
Falush, D., Stephens, M. Pritchard, J. ( 2003). Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 164 : 1567– 1587
CrossRef Google scholar
[15]
Alexander, D. H., Novembre, J. ( 2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Res., 19 : 1655– 1664
CrossRef Google scholar
[16]
Yang, W. Y., Novembre, J., Eskin, E. ( 2012). A model-based approach for analysis of spatial structure in genetic data. Nat. Genet., 44 : 725– 731
CrossRef Google scholar
[17]
Jakobsson, M., Scholz, S. W., Scheet, P., Gibbs, J. R., VanLiere, J. M., Fung, H. C., Szpiech, Z. A., Degnan, J. H., Wang, K., Guerreiro, R. . ( 2008). Genotype, haplotype and copy-number variation in worldwide human populations. Nature, 451 : 998– 1003
CrossRef Google scholar
[18]
Tang, H., Choudhry, S., Mei, R., Morgan, M., Rodriguez-Cintron, W., Burchard, E. G. Risch, N. ( 2007). Recent genetic selection in the ancestral admixture of Puerto Ricans. Am. J. Hum. Genet., 81 : 626– 633
CrossRef Google scholar
[19]
Galinsky, K. J., Bhatia, G., Loh, P. R., Georgiev, S., Mukherjee, S., Patterson, N. J. Price, A. ( 2016). Fast principal-component analysis reveals convergent evolution of ADH1B in Europe and East Asia. Am. J. Hum. Genet., 98 : 456– 472
CrossRef Google scholar
[20]
Luu, K., lmsson, B. J. Blum, M. G. ( 2020). Performing highly efficient genome scans for local adaptation with R package pcadapt version 4. Mol. Biol. Evol., 37 : 2153– 2154
CrossRef Google scholar
[21]
Price A. L., Weale M. E., Patterson N., Myers S. R., Need A. C., Shianna K. V., Ge D., Rotter J. I., Torres E., Taylor K. D.,. ( 2008) Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet., 83, 132– 135., author reply 135–139
[22]
Lachance, J. Tishkoff, S. ( 2013). SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it. BioEssays, 35 : 780– 786
CrossRef Google scholar
[23]
Byrska-Bishop, M., Evani, U. S., Zhao, X., Basile, A. O., Abel, H. J., Regier, A. A., Corvelo, A., Clarke, W. E., Musunuri, R., Nagulapalli, K. . ( 2021). High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. bioRxiv, 430068
CrossRef Google scholar
[24]
Bersaglieri, T., Sabeti, P. C., Patterson, N., Vanderploeg, T., Schaffner, S. F., Drake, J. A., Rhodes, M., Reich, D. E. Hirschhorn, J. ( 2004). Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet., 74 : 1111– 1120
CrossRef Google scholar
[25]
Broman K. W., Matsumoto N., Giglio S., Martin C. L., Roseberry J. A., Zuffardi O., Ledbetter D. H. Weber J.. ( 2003) Common long human inversion polymorphism on chromosome 8p. In: Statistics and Science: a Festschrift for Terry Speed. GOLDSTEIN, D. R., pp. 237– 246. Beachwood, OH: Institute of Mathematical Statistics
[26]
Herva, R. ( 1976). A large pericentric inversion of human chromosome 8. Am. J. Hum. Genet., 28 : 208– 212
[27]
Stefansson, H., Helgason, A., Thorleifsson, G., Steinthorsdottir, V., Masson, G., Barnard, J., Baker, A., Jonasdottir, A., Ingason, A., Gudnadottir, V. G. . ( 2005). A common inversion under selection in Europeans. Nat. Genet., 37 : 129– 137
CrossRef Google scholar
[28]
Lamason, R. L., Mohideen, M. A., Mest, J. R., Wong, A. C., Norton, H. L., Aros, M. C., Jurynec, M. J., Mao, X., Humphreville, V. R., Humbert, J. E. . ( 2005). SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science, 310 : 1782– 1786
CrossRef Google scholar
[29]
Abdellaoui, A., Hottenga, J. de Knijff, P., Nivard, M. G., Xiao, X., Scheet, P., Brooks, A., Ehli, E. A., Hu, Y., Davies, G. E. . ( 2013). Population structure, migration, and diversifying selection in the Netherlands. Eur. J. Hum. Genet., 21 : 1277– 1285
CrossRef Google scholar
[30]
Smith, A. V., Thomas, D. J., Munro, H. M. Abecasis, G. ( 2005). Sequence features in regions of weak and strong linkage disequilibrium. Genome Res., 15 : 1519– 1534
CrossRef Google scholar
[31]
Salm, M. P., Horswell, S. D., Hutchison, C. E., Speedy, H. E., Yang, X., Liang, L., Schadt, E. E., Cookson, W. O., Wierzbicki, A. S., Naoumova, R. P. . ( 2012). The origin, global distribution, and functional impact of the human 8p23 inversion polymorphism. Genome Res., 22 : 1144– 1153
CrossRef Google scholar
[32]
Stevison, L. S., Hoehn, K. B. Noor, M. ( 2011). Effects of inversions on within- and between-species recombination and divergence. Genome Biol. Evol., 3 : 830– 841
CrossRef Google scholar
[33]
Prugnolle, F., Manica, A., Charpentier, M., gan, J. F., Guernier, V. ( 2005). Pathogen-driven selection and worldwide HLA class I diversity. Curr. Biol., 15 : 1022– 1027
CrossRef Google scholar
[34]
Watson, C. T. ( 2012). The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease. Genes Immun., 13 : 363– 373
CrossRef Google scholar
[35]
Yang, Z., Zhong, H., Chen, J., Zhang, X., Zhang, H., Luo, X., Xu, S., Chen, H., Lu, D., Han, Y. . ( 2016). A genetic mechanism for convergent skin lightening during recent human evolution. Mol. Biol. Evol., 33 : 1177– 1187
CrossRef Google scholar
[36]
Jarvis, J. P., Scheinfeldt, L. B., Soi, S., Lambert, C., Omberg, L., Ferwerda, B., Froment, A., Bodo, J. M., Beggs, W., Hoffman, G. . ( 2012). Patterns of ancestry, signatures of natural selection, and genetic association with stature in Western African pygmies. PLoS Genet., 8 : e1002641
CrossRef Google scholar
[37]
Climer, S., Templeton, A. R. ( 2015). Human gephyrin is encompassed within giant functional noncoding yin-yang sequences. Nat. Commun., 6 : 6534
CrossRef Google scholar
[38]
Ameur, A., Enroth, S., Johansson, A., Zaboli, G., Igl, W., Johansson, A. C. V., Rivas, M. A., Daly, M. J., Schmitz, G., Hicks, A. A. . ( 2012). Genetic adaptation of fatty-acid metabolism: a human-specific haplotype increasing the biosynthesis of long-chain omega-3 and omega-6 fatty acids. Am. J. Hum. Genet., 90 : 809– 820
CrossRef Google scholar
[39]
Mathieson, S. ( 2018). FADS1 and the timing of human adaptation to agriculture. Mol. Biol. Evol., 35 : 2957– 2970
CrossRef Google scholar
[40]
Hudjashov, G., Villems, R. ( 2013). Global patterns of diversity and selection in human tyrosinase gene. PLoS One, 8 : e74307
CrossRef Google scholar
[41]
Lao, O., de Gruijter, J. M., van Duijn, K., Navarro, A. ( 2007). Signatures of positive selection in genes associated with human skin pigmentation as revealed from analyses of single nucleotide polymorphisms. Ann. Hum. Genet., 71 : 354– 369
CrossRef Google scholar
[42]
Giannuzzi, G., Siswara, P., Malig, M., Marques-Bonet, T., Mullikin, J. C., Ventura, M. Eichler, E. E. ( 2013). Evolutionary dynamism of the primate LRRC37 gene family. Genome Res., 23 : 46– 59
CrossRef Google scholar
[43]
Lee, Y. R., Yuan, W. C., Ho, H. C., Chen, C. H., Shih, H. M. Chen, R. ( 2010). The Cullin 3 substrate adaptor KLHL20 mediates DAPK ubiquitination to control interferon responses. EMBO J., 29 : 1748– 1761
CrossRef Google scholar
[44]
Burkardt, D. D., Rosenfeld, J. A., Helgeson, M. L., Angle, B., Banks, V., Smith, W. E., Gripp, K. W., Moline, J., Moran, R. T., Niyazov, D. M. . ( 2011). Distinctive phenotype in 9 patients with deletion of chromosome 1q24-q25. Am. J. Med. Genet. A., 155 : 1336– 1351
CrossRef Google scholar
[45]
Bustamante Rivera, Y. Y., tting, C., Schmidt, C., Volkmer, I. Staege, M. ( 2018). Endogenous retrovirus 3—history, physiology, and pathology. Front. Microbiol., 8 : 2691
CrossRef Google scholar
[46]
Lacombe, J., Rishavy, M. A., Berkner, K. L. ( 2018). VKOR paralog VKORC1L1 supports vitamin K-dependent protein carboxylation in vivo. JCI Insight, 3 : e96501
CrossRef Google scholar
[47]
Szpak, M., Mezzavilla, M., Ayub, Q., Chen, Y., Xue, Y. ( 2018). FineMAV: prioritizing candidate genetic variants driving local adaptations in human populations. Genome Biol., 19 : 5
CrossRef Google scholar
[48]
Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M. Lee, J. ( 2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 4 : 7
CrossRef Google scholar
[49]
Frichot, E. ( 2015). LEA: An R package for landscape and ecological association studies. Methods Ecol. Evol., 6 : 925– 929
CrossRef Google scholar
[50]
Devlin, B. ( 1999). Genomic control for association studies. Biometrics, 55 : 997– 1004
CrossRef Google scholar
[51]
Šidák, ( 1967). Rectangular confidence regions for the means of multivariate normal distribution. J. Am. Stat. Assoc., 62 : 626– 633
CrossRef Google scholar
[52]
Kopelman, N. M., Mayzel, J., Jakobsson, M., Rosenberg, N. A. ( 2015). Clumpak: a program for identifying clustering modes and packaging population structure inferences across K. Mol. Ecol. Resour., 15 : 1179– 1191
CrossRef Google scholar

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at https://doi.org/10.15302/J-QB-022-0303.

ACKNOWLEDGEMENTS

This study was funded by the National Natural Science Foundation of China (No. 81973148).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Qiuxuan Liu, Degang Wu, and Chaolong Wang declare that they have no conflict of interest or financial conflicts to disclose. This article does not contain any studies with human or animal subjects performed by any of the authors.

OPEN ACCESS

This article is licensed by the CC By under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

RIGHTS & PERMISSIONS

2022 The Author (s). Published by Higher Education Press.
AI Summary AI Mindmap
PDF(4586 KB)

Accesses

Citations

Detail

Sections
Recommended

/