ViT-LPATA: a vision transformer model for autism detection in children using facial images

Li Deng; Wenqiu Zhu; Yingbo Wu

doi:10.1007/s11801-026-5110-4

Optoelectronics Letters ›› 2026, Vol. 22 ›› Issue (6) :379 -384. DOI: 10.1007/s11801-026-5110-4

Article

research-article

ViT-LPATA: a vision transformer model for autism detection in children using facial images

Author information +

History +

PDF

Abstract

To address the difficulty in recognizing subtle differences in facial biomarkers in children with autism, a learnable positional encoding enhancement (LPEE) module was combined with the adaptive token aggregation (ATA) module. The vision transformer with learnable positional encoding and adaptive token aggregation (ViT-LPATA), a predictive model for autism, was proposed. The model leverages the LPEE module to dynamically capture facial geometric deformation features and integrates the ATA module to enhance the feature representation capability of pathological regions, thereby establishing precise mappings of biomarker differences. Experiments on a publicly available autism facial dataset demonstrated that the ViT-LPATA achieved optimal performance, with 99.2% accuracy and an area under the curve (AUC) value of 0.940.

Keywords

Cite this article

Download citation ▾

Li Deng, Wenqiu Zhu, Yingbo Wu. ViT-LPATA: a vision transformer model for autism detection in children using facial images. Optoelectronics Letters, 2026, 22 (6) : 379-384 DOI:10.1007/s11801-026-5110-4

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Tawankanjanachot N, Melville C, Habib A, et al.. Systematic review of the effectiveness and cultural adaptation of social skills interventions for adolescents with autism spectrum disorders in Asia. Research in autism spectrum disorders, 2023, 104: 102163. J].

[2]	Christensen D, Zubler J. CE: from the CDC: understanding autism spectrum disorder. The American journal of nursing, 2020, 120(10): 30-33. J].

[3]	Nayyar J M, Stapleton A V, Guerin S, et al.. Exploring lived experiences of receiving a diagnosis of autism in adulthood: a systematic review. Autism in adulthood, 2025, 7(1): 1-12. J].

[4]	Ding Y, Zhang H, Qiu T. Deep learning approach to predict autism spectrum disorder: a systematic review and meta-analysis. BMC psychiatry, 2024, 24(1): 739. J].

[5]	Fuller E A, Kaiser A P. The effects of early intervention on social communication outcomes for children with autism spectrum disorder: a meta-analysis. Journal of autism and developmental disorders, 2020, 50(5): 1683-1700. J].

[6]	Uddin M Z, Shahriar M A, Mahamood M N, et al.. Deep learning with image-based autism spectrum disorder analysis: a systematic review. Engineering applications of artificial intelligence, 2024, 127: 107185. J].

[7]	Gomez G D, Correa D G, Trapp B, et al.. Holoprosen-cephaly spectrum: an up-to-date overview of classification, genetics, and neuroimaging. Japanese journal of radiology, 2025, 43(1): 13-31. J].

[8]	Taneera S, Alhajj R. Diagnosis of autism spectrum disorder: a systematic review of clinical and artificial intelligence methods. Network modeling analysis in health informatics and bioinformatics, 2025, 14(1): 1-23. J].

[9]	Quatrosi G, Genovese D, Galliano G, et al.. Cranio-facial characteristics in autism spectrum disorder: a SCO review. Journal of clinical medicine, 2024, 13(3): 729729. J].

[10]	Zahan S, Gilani Z, Hassan G M, et al.. Human gesture and gait analysis for autism detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 17–21, 2023, Vancouver, Canada, 2023. New York, IEEE: 3328-3337. [C].

[11]	FULCERI F, CARUSO A, MICAI M, et al. Autism diagnosis in children and adolescents: a systematic review and meta-analysis of test accuracy[J]. Neuroscience & biobehavioral reviews, 2025, 106164.

[12]

Guan T, Liu F, Wu X, et al.. Hallusionbench: an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 17–21, 2024, Seattle, USA, 2024. New York, IEEE: 14375-14385. [C].

[13]	Wang J Z, Zhao S, Wu C, et al.. Unlocking the emotional world of visual media: an overview of the science, research, and impact of understanding emotion. Proceedings of the IEEE, 2023, 111(10): 1236-1286. J].

[14]	Li X Q, Li P H, Fang Z D, et al.. Research on EEG emotion recognition based on CNN+BiLSTM+self-attention model. Optoelectronics letters, 2023, 19: 506-512. J].

[15]	Kavitha S, Inbarani H H. MHWF-CNN: multiscale horizontal wavelet fusion convolutional neural network with transfer learning for image classification. Evolving systems, 2025, 16(2): 73. J].

[16]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. (2020-10-22) [2025-06-11]. https://arxiv.org/abs/2010.11929.

[17]	Shamshad F, Khan S, Zamir S W, et al.. Transformers in medical imaging: a survey. Medical image analysis, 2023, 88: 102802. J].

[18]	Mao X, Qi G, Chen Y, et al.. Towards robust vision transformer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 19–24, 2022, New Orleans, LA, USA, 2022. New York, IEEE: 12042-12051. [C].

[19]	Ibadi H, Lakizadeh A. ASDvit: enhancing autism spectrum disorder classification using vision transformer models based on static facial features images. Intelligence-based medicine, 2025, 2(1): 100226. J].

[20]	Wang Y, Pan K, Shao Y, et al.. Applying a convolutional vision transformer for emotion recognition in children with autism: fusion of facial expressions and speech features. Applied sciences, 2025, 15(6): 3083-3095. J].

[21]	Cao X, Ye W, Sizikova E, et al.. ViTASD: robust vision transformer baselines for autism spectrum disorder facial diagnosis. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, June 4–10, 2023, Toronto, Canada, 2023. New York, IEEE: 1-5. [C].

[22]	Lee S, Choi J, Kim H J. Multi-criteria token fusion with one-step-ahead attention for efficient vision transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 17–21, 2024, Seattle, USA, 2024. New York, IEEE: 15741-15750. [C].

[23]	PIOSENKA G. Autism spectrum disorder children dataset[EB/OL]. (2023-03-15) [2025-06-11]. https://drive.google.com/drive/folders/1XQU0pluL0m3TIlXqntano12d68peMb8A.