Applications of speech analysis in diseases’ assessment, prediction and diagnosis: a scoping review

Xi Xu; Ying Zhang; Qiufei Niu; Nianjiao Long; Jianqiang Li; Linna Zhao; Jian Yin; Jijiang Yang

doi:10.20517/ais.2025.70

Artificial Intelligence Surgery ›› 2026, Vol. 6 ›› Issue (1) :114 -49. DOI: 10.20517/ais.2025.70

Review

Applications of speech analysis in diseases’ assessment, prediction and diagnosis: a scoping review

Author information +

History +

PDF

Abstract

Background: Speech production is a coordinated physiological process and a vital digital biomarker for health assessment. Recent advances in artificial intelligence (AI), particularly in representation learning, have substantially expanded the application of speech analysis across diverse clinical domains.

Methods: This review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR). Five major bibliographic databases were systematically searched for studies published between 2015 and 2025. Eligible studies applied AI-driven speech analysis for clinical diagnosis or monitoring, while those lacking quantitative evaluation or sufficient methodological detail were excluded.

Results: A total of 124 studies were analyzed, covering neurological, psychiatric, and respiratory disorders. The field has transitioned from traditional machine learning with handcrafted features to deep learning and foundation models. Parkinson’s disease, Alzheimer’s disease, depression, and coronavirus disease 2019 (COVID-19) are the most frequently investigated conditions. The included studies were charted and synthesized to map disease coverage, methodological trends, and clinical application scenarios.

Conclusion: Speech analysis offers a non-invasive approach for early disease detection and remote monitoring in telemedicine. To support clinical translation, future research should prioritize model robustness and interpretability across diverse clinical populations.

Keywords

Speech analysis / disease phenotyping / neurological disorders / psychiatric disorders / respiratory disorders

Cite this article

Download citation ▾

Xi Xu, Ying Zhang, Qiufei Niu, Nianjiao Long, Jianqiang Li, Linna Zhao, Jian Yin, Jijiang Yang. Applications of speech analysis in diseases’ assessment, prediction and diagnosis: a scoping review. Artificial Intelligence Surgery, 2026, 6(1): 114-49 DOI:10.20517/ais.2025.70

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Duffy JR. Motor speech disorders: Substrates, differential diagnosis, and management. Elsevier Health Sciences, 2012. Available from: URL:https://books.google.com/books/about/Motor_Speech_Disorders.html?id=M8t-KgGhjjwC [Last accessed on 13 Feb 2026]

[2]	Zhao Q,Li J,Qiao L.The application of artificial intelligence in Alzheimer's research.Tsinghua Sci. Technol.2024;29:13-33

[3]	Latif S,Qayyum A,Younis S.Speech technology for healthcare: opportunities, challenges, and state of the art.IEEE Rev Biomed Eng.2021;14:342-56

[4]	Aryagopal HRT,Sai Jagadeesh DV,Pati PB.Parkinson’s Disease diagnosis from patients speech analysis. In I2CT 2024: 2024 IEEE 9th International Conference for Convergence in Technology (I2CT); 2024 Apr 5-7; Pune, India. IEEE; 2024. pp. 1-5.

[5]	Szatloczki G,Vincze V,Pakaski M.Speaking in Alzheimer’s disease, is that an early sign?.Front Aging Neurosci.2015;7:195 PMCID:PMC4611852

[6]	Gosztolya G,Tóth L,Kálmán J.Identifying mild cognitive impairment and mild Alzheimer’s disease based on spontaneous speech using ASR and linguistic features.Comput Speech Lang.2019;53:181-97

[7]	Shin D,Park CHK.Detection of minor and major depression through voice as a biomarker using machine learning.J Clin Med.2021;10:3046 PMCID:PMC8303477

[8]	Dumpala SH,Rodriguez S.Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity.Sci Rep.2023;13:11155 PMCID:PMC10333314

[9]	Cortes C.Support-vector networks.Mach Learn.1995;20:273-97

[10]	Breiman L.Random forests.Mach Learn.2001;45:5-32

[11]	Cover T.Nearest neighbor pattern classification.IEEE Trans. Inf. Theory1967;13:21-7

[12]	Sangeetha KB,Shaik AB.A Parkinson’s disease detection using support vector machine in machine learning. 2024 Eighth International Conference on Parallel, Distributed and Grid Computing (PDGC); 2024 Dec 18-20; Waknaghat, Solan, India. IEEE; 2024. pp. 640-4.

[13]	Haq AU,Memon MH.Feature selection based on L1-norm support vector machine and effective recognition system for Parkinson’s disease using voice recordings.IEEE Access.2019;7:37718-34

[14]	LeCun Y,Bengio Y.Gradient-based learning applied to document recognition.Proceedings of the IEEE2002;86:2278

[15]	Vaswani A,Parmar N.Attention is all you need. In The Thirty-First Annual Conference on Neural Information Processing Systems, Advances in Neural Information Processing Systems, Long Beach, USA. 2017 Dec 4-9; Neural Information Processing Systems Foundation, Inc.; 2017. Vol. 30, pp. 6000-10.

[16]	Frid A,Svechin D.Diagnosis of Parkinson’s disease from continuous speech using deep convolutional networks without manual selection of features. 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE); 2016 Nov 16-18; EILAT, Israel. IEEE; 2016. pp. 1-4.

[17]	Radford A,Salimans T. Improving language understanding by generative pre-training. In: (2018). Available from: https://api.semanticscholar.org/CorpusID:49313245 [Last accessed on 13 Feb 2026]

[18]	Devlin J,Lee K.BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North; Minneapolis, MN, USA. 2019 Jun 2-7; Association for Computational Linguistics; 2019. Vol. 1, pp. 4171-86. PMCID:11055402

[19]	Idrisoglu A,Anderberg P.Applied machine learning techniques to diagnose voice-affecting conditions and disorders: systematic literature review.J Med Internet Res.2023;25:e46105 PMCID:PMC10398366

[20]	De Silva U,Olsen S.Clinical decision support using speech signal analysis: systematic scoping review of neurological disorders.J Med Internet Res.2025;27:e63004 PMCID:PMC11773292

[21]	Hecker P,Eyben F,Arnrich B.Voice analysis for neurological disorder recognition-a systematic review and perspective on emerging trends.Front Digit Health.2022;4:842301 PMCID:PMC9309252

[22]	Khaskhoussy R.Speech processing for early Parkinson’s disease diagnosis: machine learning and deep learning-based approach.Soc Netw Anal Min.2022;12:73

[23]	Ding K,Noori Hoshyar A,Klein B.Speech based detection of Alzheimer’s disease: a survey of AI techniques, datasets and challenges.Artif Intell Rev.2024;57:325

[24]	Birger Moell et al. The order in speech disorder: a scoping review of state of the art machine learning methods for clinical speech classification. ArXiv 2025;arXiv:2503.04802. Available from https://doi.org/10.48550/arXiv.2503.04802

[25]

Radford A,Xu T,Mcleavey C. Robust speech recognition via large-scale weak supervision. In the 40th International Conference on Machine Learning: Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA. 2023, Jul 23-29; MLResearch Press; 2023. Vol 202, pp. 28492-518. Available from https://proceedings.mlr.press/v202/radford23a.html [Last accessed on 13 February 2026]

[26]	Baevski A,Mohamed A.wav2vec 2.0: A framework for self-supervised learning of speech representations.Adv Neural Inf Process Syst (NeurIPS)2020;33:12449-60https://proceedings.neurips.cc/paper_files/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07 [Last accessed on 13 February 2026]

[27]	Tricco AC,Zarin W.PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation.Ann Intern Med.2018;169:467-73

[28]	Little MA,Hunter EJ,Ramig LO.Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease.IEEE Trans Biomed Eng.2009;56:1015

[29]	Mendes-Laureano J,Guerrero-López A.Neurovoz: a Castillian Spanish corpus of parkinsonian speech.Sci Data2024;11:1367 PMCID:PMC11655668

[30]	Jaeger H,Stadtschnitzer M. Mobile Device Voice Recordings at King’s College London (MDVR-KCL) from both early and advanced Parkinson’s disease patients and healthy controls. Available from https://zenodo.org/record/2867216 [Last accessed on 13 February 2026]

[31]	Dementiabank database guide. Available from https://dementia.talkbank.org/

[32]	Rusko M,Trnka M.Slovak database of speech affected by neurodegenerative diseases.Sci Data.2024;11:1320 PMCID:PMC11618578

[33]

Gratch J,Lucas GM. The distress analysis interview corpus of human and computer interviews In the Ninth International Conference on Language Resources and Evaluation, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland 2014 May 26-31; European Language Resources Association: Luxembourg, Fance, 2014: pp. 3123-8. Available from https://dcapswoz.ict.usc.edu/wp-content/uploads/2022/02/DAICWOZDepression_Documentation.pdf [Last accessed on 13 February 2026]

[34]	Cai H,Gao Y.A multi-modal open dataset for mental-disorder analysis.Sci Data.2022;9:178 PMCID:PMC9018722

[35]	He L,Li Y,Zhang J.WNSA-Net: an axial-attention-based network for schizophrenia detection using wideband and narrowband spectrograms.IEEE/ACM Trans. Audio Speech Lang. Process.2023;31:721-33

[36]	Bhattacharya D,Dutta D.Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection.Sci Data.2023;10:397 PMCID:PMC10287715

[37]	Orlandic L,Atienza D.The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms.Sci Data.2021;8:156 PMCID:PMC8222356

[38]	Xia T,Ch J. COVID-19 sounds: a large-scale audio dataset for digital respiratory screening. Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2). 2021. Available from https://openreview.net/forum?id=9KArJb4r5ZQ [Last accessed on 13 February 2026]

[39]	Fraiwan M,Khassawneh B.A dataset of lung sounds recorded from the chest wall using an electronic stethoscope.Data Brief.2021;35:106913 PMCID:PMC7937981

[40]	Upadhya SS.Discriminating parkinson and healthy people using phonation and cepstral features of speech.Procedia Comput Sci.2018;143:197-202

[41]	Meghanani A.S. A, Ramakrishnan AG. An exploration of Log-Mel spectrogram and MFCC features for Alzheimer’s Dementia recognition from spontaneous speech. 2021 IEEE Spoken Language Technology Workshop (SLT); 2021 Jan 19-22; Shenzhen, China. IEEE; 2021. pp. 670-7.

[42]	Haider F,Luz S.An assessment of paralinguistic acoustic features for detection of Alzheimer’s Dementia in spontaneous speech.IEEE J. Sel. Top. Signal Process.2020;14:272-81

[43]	Vizza P,Mirarchi D.Methodologies of speech analysis for neurodegenerative diseases evaluation.Int J Med Inf.2019;122:45-54

[44]	Faragó P,Cordoș CG.CNN-based identification of Parkinson’s disease from continuous speech in noisy environments.Bioengineering (Basel).2023;10 PMCID:PMC10215644

[45]	Gillivan-Murphy P,Carding P.Voice tremor in Parkinson’s disease: an acoustic study.J Voice.2019;33:526-35

[46]	Bayestehtashk A,Shafran I.Fully automated assessment of the severity of Parkinson’s disease from speech.Comput Speech Lang.2015;29:172-85

[47]	Berardi M,Pfarr JK.Relative importance of speech and voice features in the classification of schizophrenia and depression.Transl Psychiatry.2023;13:298 PMCID:PMC10509176

[48]	Sahu S.Speech features for depression detection.Interspeech2016:1928-32

[49]	Wang M,Mo S.Distinctive acoustic changes in speech in Parkinson’s disease.Comput Speech Lang.2022;75:101384

[50]	Hason L.Spontaneous speech feature analysis for Alzheimer’s disease screening using a random forest classifier.Front Digit Health.2022;4:901419

[51]	Karan B.Speech-based Parkinson’s disease prediction using XGBoost-Based features selection and the stacked ensemble of classifiers.J Inst Eng India Ser B.2023;104:475-83

[52]	Tunc HC,Apaydin H.Estimation of Parkinson’s disease severity using speech features and extreme gradient boosting.Med Biol Eng Comput.2020;58:2757-73

[53]	Rumelhart DE,Williams RJ.Learning representations by back-propagating errors.Nature1986;323:533-536

[54]	Hochreiter S.Long short-term memory.Neural Comput1997;9:1735-80

[55]

Cho K,Gulcehre C.Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Moschitti A, Pang B, Daelemans W, Editors. EMNLP 2014: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014 Oct 25-29; Doha, Qatar. Association for Computational Linguistics; 2014. pp.1724-34.

[56]	Chu Y,Zhou X. Qwen-audio: advancing universal audio understanding via unified large-scale audio-language models. ArXiv 2023;arXiv:2311.07919. Available online: https://doi.org/10.48550/arXiv.2311.07919

[57]	Touvron H,Stone K. Llama 2: open foundation and fine-tuned chat models. ArXiv 2023;arXiv:2307.09288. Available online: https://doi.org/10.48550/arXiv.2307.09288

[58]	Wang T,Zhang Z.VioLA: Conditional language models for speech recognition, synthesis, and translation.IEEE/ACM Trans. Audio Speech Lang Process.2024;32:3709-16

[59]	Zhao L,Xu X.A deep learning-based ocular structure segmentation for assisted myasthenia gravis diagnosis from facial images.Tsinghua Sci Technol2025;30:2592-605

[60]	Steinmetz JD,Schiess N.Global, regional, and national burden of disorders affecting the nervous system, 1990-2021: a systematic analysis for the Global Burden of Disease Study 2021.Lancet Neurol.2024;23:344-81

[61]	Moro-Velazquez L,Godino-Llorente JI.Phonetic relevance and phonemic grouping of speech in the automatic detection of Parkinson’s disease.Sci Rep.2019;9:19066 PMCID:PMC6910953

[62]	Shastry KA.An ensemble nearest neighbor boosting technique for prediction of Parkinson’s disease.Healthc Anal.2023;3:100181

[63]	Mohammadi AG,Naseri A.Parkinson’s disease diagnosis: the effect of autoencoders on extracting features from vocal characteristics.Array.2021;11:100079

[64]	Govindu A.Early detection of Parkinson’s disease using machine learning.Procedia Comput Sci.2023;218:249-61

[65]	Mahesh TR,Khan SB.An artificial intelligence-based decision support system for early and accurate diagnosis of Parkinson’s disease.Decis Anal J.2024;10:100381

[66]	Mudawi NA.Developing a model for Parkinson’s disease detection using machine learning algorithms.CMC.2024;79:4945-62

[67]	Jain V,Gupta A.Exploring binary classification models for Parkinson’s disease detection.Procedia Comput Sci.2024;235:2332-41

[68]	Wang Y,Zhang X,Li Y.Intra-subject enveloped multilayer fuzzy sample compression for speech diagnosis of Parkinson’s disease.Med Biol Eng Comput.2024;62:371-88

[69]	Deepa P.Parkinson’s disease detection and classification: leveraging voice features and ensemble methods with feature selection and ERT classifier.Procedia Computer Sci.2024;235:1695-706

[70]	Torghabeh F, Hosseini SA, Ahmadi Moghadam E. Enhancing Parkinson’s disease severity assessment through voice-based wavelet scattering, optimized model selection, and weighted majority voting.Med Nov Technol Devices.2023;20:100266

[71]	Laudis LL,Jambek AB.A nature inspired optimization algorithm for Parkinson’s disease classification through speech analysis.Procedia Comput Sci.2024;235:840-51

[72]	Yuan L,Feng H.Parkinson disease prediction using machine learning-based features from speech signal.SOCA.2023;18:101-7

[73]	Wrobel K.Diagnosing Parkinson’s disease by means of ensemble classification of patients’ voice samples.Procedia Comput Sci.2021;192:3905-14

[74]	Liu W,Peng T.Prediction of Parkinson’s disease based on artificial neural networks using speech datasets.J Ambient Intell Human Comput.2022;14:13571-84

[75]	Quan C,Luo Z,Ling Y.End-to-end deep learning approach for Parkinson’s disease detection from speech signals.Biocybern Biomed Eng.2022;42:556-74

[76]	Tayebi Arasich S,Noeth E. Federated learning for secure development of AI models for Parkinson’s disease detection using speech from different languages. ArXiv 2023;arXiv:2305.11284. Available online: https://doi.org/10.48550/arXiv.2305.11284

[77]	Wang Y,Zhang X,Li Y. Subject enveloped deep sample fuzzy ensemble learning algorithm of Parkinson’s speech data. ArXiv 2021;arXiv:2111.09014. Available online: https://doi.org/10.48550/arXiv.2111.09014

[78]	Xu Z,Zhang Y. Voiceprint recognition of Parkinson patients based on deep learning. ArXiv 2018;arXiv:1812.06613. Available online: https://doi.org/10.48550/arXiv.1812.06613

[79]	Pandey PVK,Karan B.Parkinson Disease Prediction Using CNN-LSTM Model from voice signal.SN COMPUT. SCI.2024;5:381

[80]	Mishra S,Mishra N.PD-DETECTOR: a sustainable and computationally intelligent mobile application model for Parkinson’s disease severity assessment.Heliyon.2024;10:e34593

[81]	Jeancolas L,Mangone G.X-Vectors: new quantitative biomarkers for early Parkinson’s disease detection from speech.Front Neuroinform.2021;15:578369

[82]	Chronowski M,Dec-Cwiek M. Parkinson’s disease diagnostics using AI and natural language knowledge transfer. ArXiv 2022;arXiv:2204.12559. Available online: https://doi.org/10.48550/arXiv.2204.12559

[83]	Khaskhoussy R.A deep convolutional autoencoder-based approach for Parkinson’s disease diagnosis through speech signals. In: Chen W, Yao L, Cai T, Pan S, Shen T, Li X, Editors. Advanced Data Mining and Applications. Cham: Springer Nature Switzerland; 2022. pp. 15-26.

[84]	Hireš M,Drotár P,Motin MA.Convolutional neural network ensemble for Parkinson’s disease detection from voice recordings.Comput Biol Med.2022;141:105021

[85]	Akila B.Parkinson classification neural network with mass algorithm for processing speech signals.Neural Comput Applic.2024;36:10165-81

[86]	Palakayala AR.Differentiating Parkinson’s disease from other neuro diseases and diagnosis using deep learning with nature inspired algorithms and ensemble learning.Procedia Comput Sci.2024;235:588-97

[87]	Skibińska J.Computerized analysis of hypomimia and hypokinetic dysarthria for improved diagnosis of Parkinson’s disease.Heliyon.2023;9:e21175

[88]	Tusar MTHK,Sakil AH. An experimental study for early diagnosing Parkinson’s disease using machine learning. ArXiv 2023;arXiv:2310.13654. Available online: https://doi.org/10.48550/arXiv.2310.13654

[89]	Yousif NR,Haikal AY.A generic optimization and learning framework for Parkinson disease via speech and handwritten records.J Ambient Intell Humaniz Comput.2022:1-21 PMCID:PMC9411848

[90]	Yuan J,Church K.Pause-encoded language models for recognition of Alzheimer’s disease and emotion. In ICASSP 2021: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2021 Jun 6-11; Toronto, ON, Canada. IEEE; 2021. pp. 7293-7.

[91]	Koo J,Pyo J,Lee K. Exploiting multi-modal features from pre-trained networks for Alzheimer's dementia recognition. ArXiv 2020;arXiv:2009.04070. Available online: https://doi.org/10.48550/arXiv.2009.04070

[92]	Li J,Ye Z.A comparative study of acoustic and linguistic features classification for Alzheimer’s disease detection. In ICASSP 2021: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2021 Jun 6-11; Toronto, ON, Canada. IEEE; 2021. pp. 6423-7.

[93]	Pappagari R,Moro-velázquez L.Using state of the art speaker recognition and natural language processing technologies to detect Alzheimer’s disease and assess its severity.Interspeech2020:2177-81

[94]	Cummins N,Ren Z.A comparison of acoustic and linguistics methodologies for Alzheimer’s Dementia recognition.Interspeech2020:2182-6

[95]	Khodabakhsh A,Guner E.Evaluation of linguistic and prosodic features for detection of Alzheimer’s disease in Turkish conversational speech.J AUDIO SPEECH MUSIC PROC.2015;2015:9

[96]	Konig A,Sorin A.Use of speech analyses within a mobile application for the assessment of cognitive impairment in elderly people.Curr Alzheimer Res.2018;15:120-9

[97]	Li B,Rudzicz F. Detecting dementia in mandarin Chinese using transfer learning from a parallel corpus. ArXiv 2019;arXiv:1903.00933. Available online: https://doi.org/10.48550/arXiv.1903.00933

[98]	Ammar RB.Language-related features for early detection of Alzheimer disease.Procedia Comput Sci.2020;176:763-70

[99]	Nasrolahzadeh M,Haddadnia J.Analysis of mean square error surface and its corresponding contour plots of spontaneous speech signals in Alzheimer’s disease with adaptive wiener filter.Comput Hum Behav.2016;61:364-71

[100]

König A,Sorin A.Automatic speech analysis for the assessment of patients with predementia and Alzheimer’s disease.Alzheimers Dement (Amst).2015;1:112-24

[101]

López-de-ipiña K,Solé-casals J.On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature.Cogn Comput.2013;7:44-55

[102]

García-Gutiérrez F,Marquié M.Unveiling the sound of the cognitive status: Machine Learning-based speech analysis in the Alzheimer’s disease spectrum.Alzheimers Res Ther.2024;16:26 PMCID:PMC10835990

[103]

Kim TM,Chun JW.Comparison of AI with and without hand-crafted features to classify Alzheimer’s disease in different languages.Comput Biol Med.2024;180:108950

[104]

Chien YW,Cheah WT,Chang YL.An automatic assessment system for Alzheimer’s disease based on speech using feature sequence generator and recurrent neural network.Sci Rep.2019;9:19597

[105]

Roshanzamir A,Soleymani Baghshah M.Transformer-based deep neural network language models for Alzheimer’s disease risk assessment from targeted speech.BMC Med Inform Decis Mak.2021;21:92 PMCID:PMC7971114

[106]

Dong Z,Xu W,Ou J.HAFFormer: A hierarchical attention-free framework for Alzheimer’s disease detection from spontaneous speech. In ICASSP 2024: Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2024 Apr 14-19; Seoul, Korea, Republic of. IEEE; 2024. pp. 11246-50.

[107]

Liu N,Tang Q.Improving Alzheimer’s disease detection for speech based on feature purification network.Front Public Health.2021;9:835960

[108]

Liu Z,Ling Z.Detecting Alzheimer’s disease from speech using neural networks with bottleneck features and data augmentation. In ICASSP 2021: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2021 Jun 6-11; Toronto, ON, Canada. IEEE; 2021. pp. 7323-7.

[109]

Ahn K,Kim SW.Deep learning of speech data for early detection of Alzheimer’s disease in the elderly.Bioengineering (Basel).2023;10:1093 PMCID:PMC10525115

[110]

Farazi S.Voice pathology detection on spontaneous speech data using deep learning models.Int J Speech Technol.2024;27:739-51

[111]

Mittal A,Datar A,Shalu H. Multi-modal detection of Alzheimer’s disease from speech and text[J]. ArXiv 2021 arXiv:2012.00096. Available online: https://doi.org/10.48550/arXiv.2012.00096

[112]

Haulcy R.Classifying Alzheimer’s disease using audio and text-based representations of speech.Front Psychol.2020;11:624137

[113]

Li H,Dai Y,Hu L.Judgment of Alzheemer’s desease based on multi-feature mixed model. In PRAI 2022: Proceedings of 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI). 2022 Aug 19-21; Chengdu, China. IEEE, 2022. pp. 1239-44.

[114]

Jang H,Rizzo M.Classification of Alzheimer’s disease leveraging multi-task machine learning analysis of speech and eye-movement data.Front Hum Neurosci.2021;15:716670

[115]

Ablimit A,Schultz T.Deep learning approaches for detecting Alzheimer’s Dementia from conversational speech of ILSE study.Interspeech2022:3348-52

[116]

Martinc M,Pollak S.Temporal integration of text transcripts and acoustic features for Alzheimer’s diagnosis based on spontaneous speech.Front Aging Neurosci.2021;13:642647

[117]

Mahajan P.Acoustic and language based deep learning approaches for Alzheimer’s Dementia detection from spontaneous speech.Front Aging Neurosci.2021;13:623607

[118]

Zhang X,Liang M. Soft-weighted CrossEntropy loss for continous Alzheimer’s disease detection. ArXiv 2024;arXiv:2402.11931. Available online: https://doi.org/10.48550/arXiv.2402.11931

[119]

Li J.Whisper-based transfer learning for Alzheimer disease classification: leveraging speech segments with full transcripts as prompts. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2024 Apr 14-19; Seoul, Korea, Republic of. IEEE; 2024. pp. 11211-5.

[120]

Bang JU,Kang BO.Alzheimer’s disease recognition from spontaneous speech using large language models.ETRI J.2024;46:96-105

[121]

Cui Z,Zhang W,Zhang C.Transferring speech-generic and depression-specific knowledge for Alzheimer’s disease detection. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU); 2023 Dec 16-20; Taipei, Taiwan. IEEE; 2023. pp. 1-8.

[122]

Fan Y,Yang Z.Global burden of mental disorders in 204 countries and territories, 1990-2021: results from the global burden of disease study 2021.BMC Psychiatry.2025;25:486

[123]

Scibelli F,Tayarani M.Depression speaks: automatic discrimination between depressed and non-depressed speakers based on nonverbal speech features. In ICASSP 2018: Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2018 Apr 15-20; Calgary, Canada. IEEE, 2018. pp. 6842-46.

[124]

Jiang H,Liu Z.Detecting depression using an ensemble logistic regression model based on multiple speech features.Comput Math Methods Med.2018;2018:6508319 PMCID:PMC6174772

[125]

Xu S,Chakraborty D.Identifying psychiatric manifestations in schizophrenia and depression from audio-visual behavioural indicators through a machine-learning approach.Schizophrenia (Heidelb).2022;8:92 PMCID:PMC9640655

[126]

Shankayi R,Salimi M.Identifying depressed from healthy cases using speech processing. In ICBME 2012: Proceedings of 2012 19th Iranian Conference of Biomedical Engineering (ICBME). 2012 Dec 20-21; Tehran, Iran. IEEE, 2012: pp. 242-5.

[127]

Zulfiker MS,Biswas AA,Uddin MS.An in-depth analysis of machine learning approaches to predict depression.Curr Res Behav Sci.2021;2:100044

[128]

König A,Mallick E.Detecting subtle signs of depression with automated speech analysis in a non-clinical sample.BMC Psychiatry.2022;22:830 PMCID:PMC9793349

[129]

Kim K,Lee BJ.A machine-learning-algorithm-based prediction model for psychotic symptoms in patients with depressive disorder.J Pers Med.2022;12:1218 PMCID:PMC9394314

[130]

He L.Automated depression analysis using convolutional neural networks from speech.J Biomed Inform.2018;83:103-11

[131]

Chlasta K,Krejtz I.Automated speech-based screening of depression using deep convolutional neural networks.Procedia Comput Sci.2019;164:618-28

[132]

Muzammel M,Hoffmann Y,Othmani A.AudVowelConsNet: a phoneme-level based deep CNN architecture for clinical depression diagnosis.Mach Learn Appl.2020;2:100005

[133]

Srimadhur N.An end-to-end model for detection and assessment of depression levels using speech.Procedia Comput Sci.2020;171:12-21

[134]

Kim AY,Lee SH,Park JG.Automatic depression detection using smartphone-based text-dependent speech signals: deep convolutional neural network approach.J Med Internet Res.2023;25:e34474 PMCID:PMC9909514

[135]

Ishimaru M,Uchiyama R,Toyoshima I.Classification of depression and its severity based on multiple audio features using a graphical convolutional neural network.Int J Environ Res Public Health.2023;20:1588 PMCID:PMC9864471

[136]

Das AK.A deep learning model for depression detection based on MFCC and CNN generated spectrogram features.Biomed Signal Process Control.2024;90:105898

[137]

Zhang X,Chen W,Yu C.Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments.Sci Rep.2024;14:9543 PMCID:PMC11045867

[138]

Gupta S,Agarwal S.Depression detection using cascaded attention based deep learning framework using speech data.Multimed Tools Appl.2024;83:66135-73

[139]

Lin Y,Sun Y.A deep learning-based model for detecting depression in senior population.Front Psychiatry.2022;13:1016676 PMCID:PMC9677587

[140]

Pandey SK,Prasanna SRM,Jasuja R.A deep tensor-based approach for automatic depression recognition from speech utterances.PLoS ONE.2022;17:e0272659 PMCID:PMC9371305

[141]

Huang X,Gao Y.Depression recognition using voice-based pre-training model.Sci Rep.2024;14:12734 PMCID:PMC11637030

[142]

Wang J,Flint J.Speechformer-CTC: Sequential modeling of depression detection with speech temporal classification.Speech Commun.2024;163 PMCID:PMC11449263

[143]

Tian H,Jing X.Deep learning for depression recognition from speech.Mobile Netw Appl.2023;29:1212-27

[144]

Harati S,Mayberg H.Depression severity classification from speech emotion.Annu Int Conf IEEE Eng Med Biol Soc.2018;2018:5763-6

[145]

He K,Ren S.Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Las Vegas, NV, USA. IEEE; 2016. pp. 770-8.

[146]

Liu Z,Li G.Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.Front Neurosci.2023;17:1141621

[147]

Ravi V,Flint J.Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement.Comput Speech Lang.2024;86

[148]

Wang H,Yu Y,Yuan L.MFE-Former: disentangling emotion-identity dynamics via self-supervised learning for enhancing speech-driven depression detection.IEEE J Biomed Health Inform.2025;1-12

[149]

Yang S,Wang L,You J.Enhancing multimodal depression diagnosis through representation learning and knowledge transfer.Heliyon.2024;10:e25959

[150]

Rejaibi E,Meriaudeau F,Othmani A.MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech.Biomed Signal Process Control.2022;71:103107

[151]

Zhang X,Xu K. When LLMs meets acoustic landmarks: an efficient approach to integrate speech into large language models for depression detection. arXiv2024;arXiv:2402.13276. Available online: https://doi.org/10.48550/arXiv.2402.13276

[152]

Tank C,Katoch V,Anand A. Depression detection and analysis using large language models on textual and audio-visual modalities. ArXiv 2024;arXiv:2407.06125. Available online: https://doi.org/10.48550/arXiv.2407.06125

[153]

Patapati, SV. Integrating large language models into a tri-modal architecture for automated depression classification on the DAIC-WOZ. ArXiv 2024;arXiv:2407.19340. Available online: https://doi.org/10.48550/arXiv.2407.19340

[154]

Parola A,Bliksted V.Voice patterns in schizophrenia: A systematic review and Bayesian meta-analysis.Schizophr Res.2020;216:24-40

[155]

He L,Li Y,Zhang J.WNSA-Net: An axial-attention-based network for schizophrenia detection using wideband and narrowband spectrograms.IEEE/ACM Trans Audio Speech Lang Process2023;31:721-33

[156]

He F,Zhang J,Xiong X.Automatic Detection of Affective Flattening in Schizophrenia: Acoustic Correlates to Sound Waves and Auditory Perception.IEEE/ACM Trans Audio Speech Lang Process2021;29:3321-34

[157]

Chakraborty D,Tahir Y,Dauwels J.Prediction of negative symptoms of schizophrenia from emotion related low-level speech signals. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2018, pp. 6024-8.

[158]

Premanamin G.Self-supervised multimodal speech representations for the assessment of Schizophrenia symptoms. In: ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2025, pp. 1-5.

[159]

Tong Xia et al. COVID-19 sounds: a large-scale audio dataset for digital respiratory screening. Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2). 2021. Available from https://openreview.net/forum?id=9KArJb4r5ZQ [Last accessed on 13 February 2026]

[160]

Dash TK,Mahapatra S.Gradient boosting machine and efficient combination of features for speech-based detection of COVID-19.IEEE J Biomed Health Inform2022;26:5364-71

[161]

Zhu Y.Fusion of modulation spectral and spectral features with symptom metadata for improved speech-based COVID-19 detection. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2022, pp. 8997-9001.

[162]

Xia T,Qendro L,Mascolo C. Uncertainty-aware covid-19 detection from imbalanced sound data. ArXiv 2021;arXiv:2104.02005. Available online: https://doi.org/10.48550/arXiv.2104.02005

[163]

Zhang X,Zhou J.et al. Robust cough feature extraction and classification method for COVID-19 cough detection based on vocalization characteristics.Interspeech2022:2168-72

[164]

Cai C,Tao J,Lu J.End-to-end network based on transformer for automatic detection of COVID-19. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2022, pp. 9082-6.

[165]

Liu S,Schuller BW.COVID-19 detection from speech in noisy conditions. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2023, pp. 1-5.

[166]

Reiter M.Acoustic COVID-19 detection using multiple instance learning.IEEE J Biomed Health Inform2025;29:620-30

[167]

Chen XY,Zhang J.Supervised and self-supervised pretraining based COVID-19 detection using acoustic breathing/cough/speech signals. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2022, pp. 561-5.

[168]

Dutta D,Ganapathy S,Mittal D.Interpretable acoustic representation learning on breathing and speech signals for COVID-19 detection. ArXiv 2022;arXiv:2206.13365. Available online: https://doi.org/10.48550/arXiv.2206.13365 [accessed 13 February 2026]

[169]

Srikanth Nallanthighal V, Härmä A, Strik H. Detection of COPD exacerbation from speech: comparison of acoustic features and deep learning based speech breathing models. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2022, pp. 9097-101.

[170]

Claxton S,Brisbane J.Identifying acute exacerbations of chronic obstructive pulmonary disease using patient-reported symptoms and cough feature analysis.NPJ Digit Med2021;4:107

[171]

Roy A.AsthmaSCELNet: a lightweight supervised contrastive embedding learning framework for asthma classification using lung sounds.entropy2023:1282:100

[172]

Frost G,Niesler T. TB or not TB? Acoustic cough analysis for tuberculosis classification. ArXiv 2022;arXiv:2209.00934. Available online: https://doi.org/10.48550/arXiv.2209.00934

[173]

Guan Y,Li J.Deep learning blockchain integration framework for ureteropelvic junction obstruction diagnosis using ultrasound images.Tsinghua Sci Technol2023;29:1-12

[174]

Shen M,Rahgozar A.Explainable artificial intelligence to diagnose early Parkinson’s disease via voice analysis.Sci Rep2025;15:11687

[175]

Fang M,Liang Y,Liu S.A multimodal fusion model with multi-level attention mechanism for depression detection.Biomed Signal Process Control2023;82:104561

[176]

Williamson JR,Nierenberg AA,Helfer BS.Tracking depression severity from audio and video based on speech articulatory coordination.Comput Speech Lang2019;55:40-56