SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model

Yahan Li, Xinrui Cai, Junliang Shang, Yuanyuan Zhang, Jin-Xing Liu

Quant. Biol. ›› 2024, Vol. 12 ›› Issue (2) : 197-204.

PDF(563 KB)
PDF(563 KB)
Quant. Biol. ›› 2024, Vol. 12 ›› Issue (2) : 197-204. DOI: 10.1002/qub2.42
RESEARCH ARTICLE

SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model

Author information +
History +

Abstract

Epistasis is a ubiquitous phenomenon in genetics, and is considered to be one of main factors in current efforts to unveil missing heritability of complex diseases. Simulation data is crucial for evaluating epistasis detection tools in genome-wide association studies (GWAS). Existing simulators normally suffer from two limitations: absence of support for high-order epistasis models containing multiple single nucleotide polymorphisms (SNPs), and inability to generate simulation SNP data independently. In this study, we proposed a simulator SimHOEPI, which is capable of calculating penetrance tables of high-order epistasis models depending on either prevalence or heritability, and uses a resampling strategy to generate simulation data independently. Highlights of SimHOEPI are the preservation of realistic minor allele frequencies in sampling data, the accurate calculation and embedding of high-order epistasis models, and acceptable simulation time. A series of experiments were carried out to verify these properties from different aspects. Experimental results show that SimHOEPI can generate simulation SNP data independently with high-order epistasis models, implying that it might be an alternative simulator for GWAS.

Keywords

high-order epistasis model / penetrance table / resampling strategy / simulation / single nucleotide polymorphisms

Cite this article

Download citation ▾
Yahan Li, Xinrui Cai, Junliang Shang, Yuanyuan Zhang, Jin-Xing Liu. SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model. Quant. Biol., 2024, 12(2): 197‒204 https://doi.org/10.1002/qub2.42

References

[1]
Manolio TA , Collins FS , Cox NJ , Goldstein DB , Hindorff LA , Hunter DJ , et al. Finding the missing heritability of complex diseases. Nature. 2009; 461 (7265): 747- 53.
CrossRef Google scholar
[2]
Tuo SH , Li C , Liu F , Li AM , He L , Geem ZW , et al. MTHSA-DHEI: multitasking harmony search algorithm for detecting high-order SNP epistatic interactions. Complex Intell Syst. 2023; 9 (1): 637- 58.
CrossRef Google scholar
[3]
Escalona M , Rocha S , Posada D . A comparison of tools for the simulation of genomic next-generation sequencing data. Nat Rev Genet. 2016; 17 (8): 459- 69.
CrossRef Google scholar
[4]
Li C , Li M . Gwasimulator: a rapid whole-genome simulation program. Bioinformatics. 2008; 24 (1): 140- 2.
CrossRef Google scholar
[5]
Su Z , Marchini J , Donnelly P . HAPGEN2: simulation of multiple disease SNPs. Bioinformatics. 2011; 27 (16): 2304- 5.
CrossRef Google scholar
[6]
Spencer CC , Su Z , Donnelly P , Marchini J . Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 2009; 5: e1000477.
CrossRef Google scholar
[7]
Urbanowicz RJ , Kiralis J , Sinnott-Armstrong NA , Heberling T , Fisher JM , Moore JH . GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 2012; 5 (1): 16.
CrossRef Google scholar
[8]
Shang J , Zhang J , Lei X , Zhao W , Dong Y . EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes & Genomics. 2013; 35 (3): 305- 16.
CrossRef Google scholar
[9]
Blumenthal DB , Viola L , List M , Baumbach J , Tieri P , Kacprowski T . EpiGEN: an epistasis simulation pipeline. Bioinformatics. 2020; 36 (19): 4957- 9.
CrossRef Google scholar
[10]
Ponte-Fernandez C , Gonzalez-Dominguez J , Carvajal-Rodriguez A , Martin MJ . Toxo: a library for calculating penetrance tables of high-order epistasis models. BMC Bioinf. 2020; 21 (1): 138.
CrossRef Google scholar
[11]
Shang JL , Cai XR , Zhang TD , Sun Y , Zhang YY , Liu JX , et al. EpiReSIM: a resampling method of epistatic model without marginal effects using under-determined system of equations. Genes. 2022; 13 (12): 2286.
CrossRef Google scholar
[12]
Wan X , Yang C , Yang Q , Xue H , Fan X , Tang NL , et al. BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet. 2010; 87 (3): 325- 40.
CrossRef Google scholar
[13]
Sun Y , Shang J , Liu J-X , Li S , Zheng C-H . EpiACO—a method for identifying epistasis based on ant colony optimization algorithm. BioData Min. 2017; 10: 1- 17.
CrossRef Google scholar
[14]
Sun Y , Shang J , Liu J , Li S . An improved ant colony optimization algorithm for the detection of SNP-SNP interactions. In: Intelligent computing methodologies: 12th international conference, ICIC 2016, Proceedings, Part III 12. Springer; 2016. p. 21- 32.
CrossRef Google scholar
[15]
Tuo S . FDHE-IW: a fast approach for detecting high-order epistasis in genome-wide case-control studies. Genes. 2018; 9: 435.
CrossRef Google scholar
[16]
Sun Y , Wang X , Shang J , Liu J-X , Zheng C-H , Lei X . Introducing heuristic information into ant colony optimization algorithm for identifying epistasis. IEEE ACM Trans Comput Biol Bioinf. 2018; 17: 1253- 61.
CrossRef Google scholar
[17]
Jing PJ , Shen HB . MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics. 2015; 31 (5): 634- 41.
CrossRef Google scholar
[18]
Zhang W , Shang J , Li H , Sun Y , Liu J-X . SIPSO: selectively informed particle swarm optimization based on mutual information to determine SNP-SNP interactions. In: Intelligent computing theories and application: 12th international conference, 2016, proceedings, Part I 12. Springer; 2016. p. 112- 21.
CrossRef Google scholar
[19]
Tuo SH , Li C , Liu F , Zhu YL , Chen TR , Feng ZY , et al. A novel multitasking ant colony optimization method for detecting multiorder SNP interactions. Interdiscip Sci. 2022; 14 (4): 814- 32.
CrossRef Google scholar
[20]
Shi M , Umbach DM , Wise AS , Weinberg CR . Simulating autosomal genotypes with realistic linkage disequilibrium and a spiked-in genetic effect. BMC Bioinf. 2018; 19 (2): 2.
CrossRef Google scholar
[21]
Miller DJ , Zhang Y , Yu G , Liu Y , Chen L , Langefeld CD , et al. An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics. 2009; 25 (19): 2478- 85.
CrossRef Google scholar
[22]
Hartl DL , Clark AG , Clark AG . Principles of population genetics. MA: Sinauer associates Sunderland; 1997.
[23]
Hallgrimsdottir IB , Yuster DS . A complete classification of epistatic two-locus models. BMC Genet. 2008; 9 (1): 17.
CrossRef Google scholar
[24]
Tang W , Wu X , Jiang R , Li Y . Epistatic module detection for case-control studies: a Bayesian model with a gibbs sampling strategy. PLoS Genet. 2009; 5: e1000464.
CrossRef Google scholar
[25]
Zhang Y , Liu JS . Bayesian inference of epistatic interactions in case-control studies. Nat Genet. 2007; 39 (9): 1167- 73.
CrossRef Google scholar

RIGHTS & PERMISSIONS

2024 2024 The Authors. Quantitative Biology published by John Wiley & Sons Australia, Ltd on behalf of Higher Education Press.
AI Summary AI Mindmap
PDF(563 KB)

Accesses

Citations

Detail

Sections
Recommended

/