Kinship classification from a machine learning perspective: a pilot study based on genotyping data

Fanzhang Lei , Xiaolian Wu , Qinglin Liu , Tong Xie , Bofeng Zhu

Journal of Translational Genetics and Genomics ›› 2026, Vol. 10 ›› Issue (2) : 119 -139.

PDF
Journal of Translational Genetics and Genomics ›› 2026, Vol. 10 ›› Issue (2) :119 -139. DOI: 10.20517/jtgg.2025.109
Original Article
Kinship classification from a machine learning perspective: a pilot study based on genotyping data
Author information +
History +
PDF

Abstract

Aim: Kinship analysis in trace amounts and degraded biological samples has consistently posed a challenge in forensic practice. With shorter amplicons and no stutter peak, Insertion/Deletion polymorphisms (InDels) significantly improve kinship analyses of deceased individuals and their potential living relatives. However, room for improvement remains in identifying 2nd-degree and more distant kinships. To address this issue, a kinship analysis workflow based on machine learning (ML) models was proposed.

Methods: Based on multiple kinship parameters including identity-by-state (IBS) scores, k coefficients, proportion identity-by-descent (IBD), and likelihood ratio (LR) values, this pilot study applied a recently validated InDel locus to preliminarily develop an ML workflow for forensic kinship multi-classification.

Results: In the binary classification of 2nd-degree relatives and unrelated pairs, the LR cutoff threshold workflow and the ML workflow achieved a similar accuracy of 0.9194. However, the ML method had a conclusiveness rate (CR) of 1.0, compared to 0.7066 for the LR workflow. In the multiclass task, the LR-based workflow had a macro F1 score of 0.6955/0.5212 and a CR of 0.7375/0.7046 for single and dual thresholds methods, respectively. However, the ML-based workflow showed that the optimal model - feature combination (XGBoost-IBD+LR) could classify all samples conclusively, with a macro F1 score of 0.9020.

Conclusion: In summary, the ML workflow enhanced the kinship analysis efficiency based on the InDel genotyping system by combining multiple parameters, aiming to provide a more flexible and efficient solution for large-scale database screening.

Keywords

Insertion/Deletion polymorphism / capillary electrophoresis / kinship classification / machine learning / population genetics

Cite this article

Download citation ▾
Fanzhang Lei, Xiaolian Wu, Qinglin Liu, Tong Xie, Bofeng Zhu. Kinship classification from a machine learning perspective: a pilot study based on genotyping data. Journal of Translational Genetics and Genomics, 2026, 10(2): 119-139 DOI:10.20517/jtgg.2025.109

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Pereira R,Phillips C,Alves C,Amorim A,Carracedo A,Gusmão L. A new multiplex for human identification using insertion/deletion polymorphisms Electrophoresis. 2009 30 3682 90

[2]

Manta F,Caiafa A,Pereira R.et al. Indel markers: genetic diversity of 38 polymorphisms in Brazilian populations and application in a paternity investigation with post mortem material Forensic Sci Int Genet. 2012 6 658 61

[3]

Zhang YD,Shen CM,Jin R.et al. Forensic evaluation and population genetic study of 30 insertion/deletion polymorphisms in a Chinese Yi group Electrophoresis. 2015 36 1196 201

[4]

Fan H,He Y,Li S.et al. Systematic evaluation of a novel 6-dye direct and multiplex PCR-CE-based InDel typing system for forensic purposes Front Genet. 2021 12 744645 PMC8784372

[5]

Liu J,Du W,Jiang L.et al. Development and validation of a forensic multiplex InDel assay: the AGCU InDel 60 kit Electrophoresis. 2022 43 1871 81

[6]

Chen X,Nie S,Hu L.et al. Forensic efficacy evaluation and genetic structure exploration of the Yunnan Miao group by a multiplex InDel panel Electrophoresis. 2022 43 1765 73

[7]

Fang Y,Zhao C,Jin X.et al. Genetic characterization evaluation of a novel multiple system containing 57 deletion/insertion polymorphic loci with short amplicons in Hunan Han population and its intercontinental populations analyses Gene. 2022 809 146006

[8]

Chen M,Cui W,Bai X.et al. Comprehensive evaluations of individual discrimination, kinship analysis, genetic relationship exploration and biogeographic origin prediction in Chinese Dongxiang group by a 60-plex DIP panel Hereditas. 2023 160 14 PMC10052841

[9]

Xu H,Nie S,Hu L.et al. Comprehensive understanding the forensic systematic effectiveness in Chinese Yunnan Hani group and intercontinental population Architecture differentiation analyses via a novel set of autosomal InDel markers Front Biosci. 2023 28 5

[10]

Chakraborty R,Jin L. Determination of relatedness between individuals using DNA fingerprinting Hum Biol. 1993 65 875 95

[11]

Weir BS,Anderson AD,Hepler AB. Genetic relatedness analysis: modern data and new challenges Nat Rev Genet. 2006 7 771 80

[12]

Kling D,Tillmar A. Forensic genealogy-A comparison of methods to infer distant relationships based on dense SNP data Forensic Sci Int Genet. 2019 42 113 24

[13]

Coble MD,Buckleton J,Butler JM.et al. DNA Commission of the International Society for Forensic Genetics: recommendations on the validation of software programs performing biostatistical calculations for forensic genetics applications Forensic Sci Int Genet. 2016 25 191 7

[14]

Heinrich V,Kamphans T,Mundlos S,Robinson PN,Krawitz PM. A likelihood ratio-based method to predict exact pedigrees for complex families from next-generation sequencing data Bioinformatics. 2017 33 72 8 PMC5408770

[15]

Galván-Femenía I,Barceló-Vidal C,Sumoy L,Moreno V,de Cid R,Graffelman J. A likelihood ratio approach for identifying three-quarter siblings in genetic databases Heredity. 2021 126 537 47 PMC8027836

[16]

Xu Q,Wang Z,Kong Q.et al. Improving the system power of complex kinship analysis by combining multiple systems Forensic Sci Int Genet. 2022 60 102741

[17]

Chen T,Guestrin C. XGBoost: A scalable tree boosting system. In: KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery, New York, NY, USA, 2016: pp. 785-94

[18]

Ke G,Meng Q,Finley T.et al. LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc.; 2017. p. 3149-57. Available from https://hal.science/hal-03953007/ [accessed 30 March 2026]

[19]

Byrska-Bishop M,Evani US,Zhao X.et al. ; Human Genome Structural Variation Consortium. High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios Cell. 2022 185 3426 3440.e19

[20]

Wang M,Du W,Tang R.et al. Genomic history and forensic characteristics of Sherpa highlanders on the Tibetan Plateau inferred from high-resolution InDel panel and genome-wide SNPs Forensic Sci Int Genet. 2022 56 102633

[21]

Gouy A,Zieger M. STRAF-A convenient online tool for STR data evaluation in forensic genetics Forensic Sci Int Genet. 2017 30 148 51

[22]

Excoffier L,Lischer HE. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows Mol Ecol Resour. 2010 10 564 7

[23]

Kling D,Tillmar AO,Egeland T. Familias 3 - extensions and new functionality Forensic Sci Int Genet. 2014 13 121 7

[24]

Geman S,Bienenstock E,Doursat R. Neural networks and the bias/variance dilemma. Neural Comput. 1992;4:1-58

[25]

Neal B. On the bias-variance tradeoff: textbooks need an update arXiv 2019 arXiv 1912.08286

[26]

LaRue BL,Ge J,King JL,Budowle B. A validation study of the Qiagen Investigator DIPplex® kit; an INDEL-based assay for human identification Int J Legal Med. 2012 126 533 40

[27]

Pereira R,Gusmão L. Capillary electrophoresis of 38 noncoding biallelic mini-Indels for degraded samples and as complementary tool in paternity testing Methods Mol Biol. 2012 830 141 57

[28]

Alladio E,Poggiali B,Cosenza G,Pilli E. Multivariate statistical approach and machine learning for the evaluation of biogeographical ancestry inference in the forensic field Sci Rep. 2022 12 8974 PMC9148302

[29]

Sun K,Yao Y,Yun L.et al. Application of machine learning for ancestry inference using multi-InDel markers Forensic Sci Int Genet. 2022 59 102702

[30]

Pilli E,Morelli S,Poggiali B,Alladio E. Biogeographical ancestry, variable selection, and PLS-DA method: a new panel to assess ancestry in forensic samples via MPS technology Forensic Sci Int Genet. 2023 62 102806

[31]

Wolpert D,Macready W. No free lunch theorems for optimization IEEE Trans Evol Comput. 1997 1 67 82

[32]

Prokhorenkova L,Gusev G,Vorobev A,Dorogush AV,Gulin A. CatBoost: unbiased boosting with categorical features. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems; Red Hook, NY, USA: Curran Associates Inc.; 2018. p. 6639-49. Available from https://proceedings.neurips.cc/paper_files/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf. [accessed 30 March 2026]

PDF

0

Accesses

0

Citation

Detail

Sections
Recommended

/