
SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model
Yahan Li, Xinrui Cai, Junliang Shang, Yuanyuan Zhang, Jin-Xing Liu
Quant. Biol. ›› 2024, Vol. 12 ›› Issue (2) : 197-204.
SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model
Epistasis is a ubiquitous phenomenon in genetics, and is considered to be one of main factors in current efforts to unveil missing heritability of complex diseases. Simulation data is crucial for evaluating epistasis detection tools in genome-wide association studies (GWAS). Existing simulators normally suffer from two limitations: absence of support for high-order epistasis models containing multiple single nucleotide polymorphisms (SNPs), and inability to generate simulation SNP data independently. In this study, we proposed a simulator SimHOEPI, which is capable of calculating penetrance tables of high-order epistasis models depending on either prevalence or heritability, and uses a resampling strategy to generate simulation data independently. Highlights of SimHOEPI are the preservation of realistic minor allele frequencies in sampling data, the accurate calculation and embedding of high-order epistasis models, and acceptable simulation time. A series of experiments were carried out to verify these properties from different aspects. Experimental results show that SimHOEPI can generate simulation SNP data independently with high-order epistasis models, implying that it might be an alternative simulator for GWAS.
high-order epistasis model / penetrance table / resampling strategy / simulation / single nucleotide polymorphisms
[1] |
Manolio TA , Collins FS , Cox NJ , Goldstein DB , Hindorff LA , Hunter DJ , et al. Finding the missing heritability of complex diseases. Nature. 2009; 461 (7265): 747- 53.
CrossRef
Google scholar
|
[2] |
Tuo SH , Li C , Liu F , Li AM , He L , Geem ZW , et al. MTHSA-DHEI: multitasking harmony search algorithm for detecting high-order SNP epistatic interactions. Complex Intell Syst. 2023; 9 (1): 637- 58.
CrossRef
Google scholar
|
[3] |
Escalona M , Rocha S , Posada D . A comparison of tools for the simulation of genomic next-generation sequencing data. Nat Rev Genet. 2016; 17 (8): 459- 69.
CrossRef
Google scholar
|
[4] |
Li C , Li M . Gwasimulator: a rapid whole-genome simulation program. Bioinformatics. 2008; 24 (1): 140- 2.
CrossRef
Google scholar
|
[5] |
Su Z , Marchini J , Donnelly P . HAPGEN2: simulation of multiple disease SNPs. Bioinformatics. 2011; 27 (16): 2304- 5.
CrossRef
Google scholar
|
[6] |
Spencer CC , Su Z , Donnelly P , Marchini J . Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 2009; 5: e1000477.
CrossRef
Google scholar
|
[7] |
Urbanowicz RJ , Kiralis J , Sinnott-Armstrong NA , Heberling T , Fisher JM , Moore JH . GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 2012; 5 (1): 16.
CrossRef
Google scholar
|
[8] |
Shang J , Zhang J , Lei X , Zhao W , Dong Y . EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes & Genomics. 2013; 35 (3): 305- 16.
CrossRef
Google scholar
|
[9] |
Blumenthal DB , Viola L , List M , Baumbach J , Tieri P , Kacprowski T . EpiGEN: an epistasis simulation pipeline. Bioinformatics. 2020; 36 (19): 4957- 9.
CrossRef
Google scholar
|
[10] |
Ponte-Fernandez C , Gonzalez-Dominguez J , Carvajal-Rodriguez A , Martin MJ . Toxo: a library for calculating penetrance tables of high-order epistasis models. BMC Bioinf. 2020; 21 (1): 138.
CrossRef
Google scholar
|
[11] |
Shang JL , Cai XR , Zhang TD , Sun Y , Zhang YY , Liu JX , et al. EpiReSIM: a resampling method of epistatic model without marginal effects using under-determined system of equations. Genes. 2022; 13 (12): 2286.
CrossRef
Google scholar
|
[12] |
Wan X , Yang C , Yang Q , Xue H , Fan X , Tang NL , et al. BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet. 2010; 87 (3): 325- 40.
CrossRef
Google scholar
|
[13] |
Sun Y , Shang J , Liu J-X , Li S , Zheng C-H . EpiACO—a method for identifying epistasis based on ant colony optimization algorithm. BioData Min. 2017; 10: 1- 17.
CrossRef
Google scholar
|
[14] |
Sun Y , Shang J , Liu J , Li S . An improved ant colony optimization algorithm for the detection of SNP-SNP interactions. In: Intelligent computing methodologies: 12th international conference, ICIC 2016, Proceedings, Part III 12. Springer; 2016. p. 21- 32.
CrossRef
Google scholar
|
[15] |
Tuo S . FDHE-IW: a fast approach for detecting high-order epistasis in genome-wide case-control studies. Genes. 2018; 9: 435.
CrossRef
Google scholar
|
[16] |
Sun Y , Wang X , Shang J , Liu J-X , Zheng C-H , Lei X . Introducing heuristic information into ant colony optimization algorithm for identifying epistasis. IEEE ACM Trans Comput Biol Bioinf. 2018; 17: 1253- 61.
CrossRef
Google scholar
|
[17] |
Jing PJ , Shen HB . MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics. 2015; 31 (5): 634- 41.
CrossRef
Google scholar
|
[18] |
Zhang W , Shang J , Li H , Sun Y , Liu J-X . SIPSO: selectively informed particle swarm optimization based on mutual information to determine SNP-SNP interactions. In: Intelligent computing theories and application: 12th international conference, 2016, proceedings, Part I 12. Springer; 2016. p. 112- 21.
CrossRef
Google scholar
|
[19] |
Tuo SH , Li C , Liu F , Zhu YL , Chen TR , Feng ZY , et al. A novel multitasking ant colony optimization method for detecting multiorder SNP interactions. Interdiscip Sci. 2022; 14 (4): 814- 32.
CrossRef
Google scholar
|
[20] |
Shi M , Umbach DM , Wise AS , Weinberg CR . Simulating autosomal genotypes with realistic linkage disequilibrium and a spiked-in genetic effect. BMC Bioinf. 2018; 19 (2): 2.
CrossRef
Google scholar
|
[21] |
Miller DJ , Zhang Y , Yu G , Liu Y , Chen L , Langefeld CD , et al. An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics. 2009; 25 (19): 2478- 85.
CrossRef
Google scholar
|
[22] |
Hartl DL , Clark AG , Clark AG . Principles of population genetics. MA: Sinauer associates Sunderland; 1997.
|
[23] |
Hallgrimsdottir IB , Yuster DS . A complete classification of epistatic two-locus models. BMC Genet. 2008; 9 (1): 17.
CrossRef
Google scholar
|
[24] |
Tang W , Wu X , Jiang R , Li Y . Epistatic module detection for case-control studies: a Bayesian model with a gibbs sampling strategy. PLoS Genet. 2009; 5: e1000464.
CrossRef
Google scholar
|
[25] |
Zhang Y , Liu JS . Bayesian inference of epistatic interactions in case-control studies. Nat Genet. 2007; 39 (9): 1167- 73.
CrossRef
Google scholar
|
/
〈 |
|
〉 |