SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model

Yahan Li , Xinrui Cai , Junliang Shang , Yuanyuan Zhang , Jin-Xing Liu

Quant. Biol. ›› 2024, Vol. 12 ›› Issue (2) : 197 -204.

PDF (563KB)
Quant. Biol. ›› 2024, Vol. 12 ›› Issue (2) : 197 -204. DOI: 10.1002/qub2.42
RESEARCH ARTICLE

SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model

Author information +
History +
PDF (563KB)

Abstract

Epistasis is a ubiquitous phenomenon in genetics, and is considered to be one of main factors in current efforts to unveil missing heritability of complex diseases. Simulation data is crucial for evaluating epistasis detection tools in genome-wide association studies (GWAS). Existing simulators normally suffer from two limitations: absence of support for high-order epistasis models containing multiple single nucleotide polymorphisms (SNPs), and inability to generate simulation SNP data independently. In this study, we proposed a simulator SimHOEPI, which is capable of calculating penetrance tables of high-order epistasis models depending on either prevalence or heritability, and uses a resampling strategy to generate simulation data independently. Highlights of SimHOEPI are the preservation of realistic minor allele frequencies in sampling data, the accurate calculation and embedding of high-order epistasis models, and acceptable simulation time. A series of experiments were carried out to verify these properties from different aspects. Experimental results show that SimHOEPI can generate simulation SNP data independently with high-order epistasis models, implying that it might be an alternative simulator for GWAS.

Keywords

high-order epistasis model / penetrance table / resampling strategy / simulation / single nucleotide polymorphisms

Cite this article

Download citation ▾
Yahan Li, Xinrui Cai, Junliang Shang, Yuanyuan Zhang, Jin-Xing Liu. SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model. Quant. Biol., 2024, 12(2): 197-204 DOI:10.1002/qub2.42

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Manolio TA , Collins FS , Cox NJ , Goldstein DB , Hindorff LA , Hunter DJ , et al. Finding the missing heritability of complex diseases. Nature. 2009; 461 (7265): 747- 53.

[2]

Tuo SH , Li C , Liu F , Li AM , He L , Geem ZW , et al. MTHSA-DHEI: multitasking harmony search algorithm for detecting high-order SNP epistatic interactions. Complex Intell Syst. 2023; 9 (1): 637- 58.

[3]

Escalona M , Rocha S , Posada D . A comparison of tools for the simulation of genomic next-generation sequencing data. Nat Rev Genet. 2016; 17 (8): 459- 69.

[4]

Li C , Li M . Gwasimulator: a rapid whole-genome simulation program. Bioinformatics. 2008; 24 (1): 140- 2.

[5]

Su Z , Marchini J , Donnelly P . HAPGEN2: simulation of multiple disease SNPs. Bioinformatics. 2011; 27 (16): 2304- 5.

[6]

Spencer CC , Su Z , Donnelly P , Marchini J . Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 2009; 5: e1000477.

[7]

Urbanowicz RJ , Kiralis J , Sinnott-Armstrong NA , Heberling T , Fisher JM , Moore JH . GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 2012; 5 (1): 16.

[8]

Shang J , Zhang J , Lei X , Zhao W , Dong Y . EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes & Genomics. 2013; 35 (3): 305- 16.

[9]

Blumenthal DB , Viola L , List M , Baumbach J , Tieri P , Kacprowski T . EpiGEN: an epistasis simulation pipeline. Bioinformatics. 2020; 36 (19): 4957- 9.

[10]

Ponte-Fernandez C , Gonzalez-Dominguez J , Carvajal-Rodriguez A , Martin MJ . Toxo: a library for calculating penetrance tables of high-order epistasis models. BMC Bioinf. 2020; 21 (1): 138.

[11]

Shang JL , Cai XR , Zhang TD , Sun Y , Zhang YY , Liu JX , et al. EpiReSIM: a resampling method of epistatic model without marginal effects using under-determined system of equations. Genes. 2022; 13 (12): 2286.

[12]

Wan X , Yang C , Yang Q , Xue H , Fan X , Tang NL , et al. BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet. 2010; 87 (3): 325- 40.

[13]

Sun Y , Shang J , Liu J-X , Li S , Zheng C-H . EpiACO—a method for identifying epistasis based on ant colony optimization algorithm. BioData Min. 2017; 10: 1- 17.

[14]

Sun Y , Shang J , Liu J , Li S . An improved ant colony optimization algorithm for the detection of SNP-SNP interactions. In: Intelligent computing methodologies: 12th international conference, ICIC 2016, Proceedings, Part III 12. Springer; 2016. p. 21- 32.

[15]

Tuo S . FDHE-IW: a fast approach for detecting high-order epistasis in genome-wide case-control studies. Genes. 2018; 9: 435.

[16]

Sun Y , Wang X , Shang J , Liu J-X , Zheng C-H , Lei X . Introducing heuristic information into ant colony optimization algorithm for identifying epistasis. IEEE ACM Trans Comput Biol Bioinf. 2018; 17: 1253- 61.

[17]

Jing PJ , Shen HB . MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics. 2015; 31 (5): 634- 41.

[18]

Zhang W , Shang J , Li H , Sun Y , Liu J-X . SIPSO: selectively informed particle swarm optimization based on mutual information to determine SNP-SNP interactions. In: Intelligent computing theories and application: 12th international conference, 2016, proceedings, Part I 12. Springer; 2016. p. 112- 21.

[19]

Tuo SH , Li C , Liu F , Zhu YL , Chen TR , Feng ZY , et al. A novel multitasking ant colony optimization method for detecting multiorder SNP interactions. Interdiscip Sci. 2022; 14 (4): 814- 32.

[20]

Shi M , Umbach DM , Wise AS , Weinberg CR . Simulating autosomal genotypes with realistic linkage disequilibrium and a spiked-in genetic effect. BMC Bioinf. 2018; 19 (2): 2.

[21]

Miller DJ , Zhang Y , Yu G , Liu Y , Chen L , Langefeld CD , et al. An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics. 2009; 25 (19): 2478- 85.

[22]

Hartl DL , Clark AG , Clark AG . Principles of population genetics. MA: Sinauer associates Sunderland; 1997.

[23]

Hallgrimsdottir IB , Yuster DS . A complete classification of epistatic two-locus models. BMC Genet. 2008; 9 (1): 17.

[24]

Tang W , Wu X , Jiang R , Li Y . Epistatic module detection for case-control studies: a Bayesian model with a gibbs sampling strategy. PLoS Genet. 2009; 5: e1000464.

[25]

Zhang Y , Liu JS . Bayesian inference of epistatic interactions in case-control studies. Nat Genet. 2007; 39 (9): 1167- 73.

RIGHTS & PERMISSIONS

2024 The Authors. Quantitative Biology published by John Wiley & Sons Australia, Ltd on behalf of Higher Education Press.

AI Summary AI Mindmap
PDF (563KB)

369

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/