·AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition

Ying-Zhang Wu , Wen-Bo Li , Yu-Jing Liu , Guan-Zhong Zeng , Cheng-Mou Li , Hua-Min Jin , Shen Li , Gang Guo

Advances in Manufacturing ›› 2025, Vol. 13 ›› Issue (3) : 525 -538.

PDF
Advances in Manufacturing ›› 2025, Vol. 13 ›› Issue (3) : 525 -538. DOI: 10.1007/s40436-024-00519-8
Article
research-article

·AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition

Author information +
History +
PDF

Abstract

Advances in artificial intelligence (AI) technology are propelling the rapid development of automotive intelligent cockpits. The active perception of driver emotions significantly impacts road traffic safety. Consequently, the development of driver emotion recognition technology is crucial for ensuring driving safety in the advanced driver assistance system (ADAS) of the automotive intelligent cockpit. The ongoing advancements in AI technology offer a compelling avenue for implementing proactive affective interaction technology. This study introduced the multimodal driver emotion recognition network (MDERNet), a dual-branch deep learning network that temporally fused driver facial expression features and driving behavior features for non-contact driver emotion recognition. The proposed model was validated on publicly available datasets such as CK+, RAVDESS, DEAP, and PPB-Emo, recognizing discrete and dimensional emotions. The results indicated that the proposed model demonstrated advanced recognition performance, and ablation experiments confirmed the significance of various model components. The proposed method serves as a fundamental reference for multimodal feature fusion in driver emotion recognition and contributes to the advancement of ADAS within automotive intelligent cockpits.

Keywords

Driver emotion / Artificial intelligence (AI) / Facial expression / Driving behavior / Intelligent cockpit

Cite this article

Download citation ▾
Ying-Zhang Wu, Wen-Bo Li, Yu-Jing Liu, Guan-Zhong Zeng, Cheng-Mou Li, Hua-Min Jin, Shen Li, Gang Guo. ·AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition. Advances in Manufacturing, 2025, 13(3): 525-538 DOI:10.1007/s40436-024-00519-8

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

LiW, WuL, WangC, et al.. Intelligent cockpit for intelligent vehicle in metaverse: a case study of empathetic auditory regulation of human emotion. IEEE Trans Syst Man Cybern Syst, 2023, 53(4): 2173-2187.

[2]

ZhaoY, TianW, ChengH. Pyramid Bayesian method for model uncertainty evaluation of semantic segmentation in autonomous driving. Automot Innov, 2022, 5: 70-78.

[3]

ZengX, WangF, WangB, et al.. In-vehicle sensing for smart cars. IEEE Open J Veh Technol, 2022, 3: 221-242.

[4]

GreenwoodPM, LennemanJK, BaldwinCL. Advanced driver assistance systems (ADAS): demographics, preferred sources of information, and accuracy of ADAS knowledge. Transp Res Pt F Traffic Psychol Behav, 2022, 86: 131-150.

[5]

Zhang W, Tang J (2022) Technology developing state and trend about advanced driving assistance system and calculating chip. In: The 4th international academic exchange conference on science and technology innovation (IAECST), Guangzhou, 9–11 Dec, pp 938–943. https://doi.org/10.1109/IAECST57965.2022.10061965

[6]

TanZ, DaiN, SuY, et al.. Human-machine interaction in intelligent and connected vehicles: a review of status quo, issues, and opportunities. IEEE Trans Intell Transp Syst, 2021, 23: 13954-13975.

[7]

World Health Organization (2018) Global status report on road safety 2018: summary. World Health Organization

[8]

Ministry of Public Security of the People’s Republic of China (2020) One person dies in a car accident every 8 minutes! The highest rate of traffic accidents are these behaviors. http://www.xinhuanet.com/politics/2020-12/02/c_1126809938.htm

[9]

QuanteL, ZhangM, PreukK, et al.. Human performance in critical scenarios as a benchmark for highly automated vehicles. Automot Innov, 2021, 4: 274-283.

[10]

Pace-SchottEF, AmoleMC, AueT, et al.. Physiological feelings. Neurosci Biobehav Rev, 2019, 103: 267-304.

[11]

AdolphsR, AndersonDThe neuroscience of emotion: a new synthesis, 2018Princeton University Press. .

[12]

Hu H, Zhu Z, Gao Z et al (2018) Analysis on biosignal characteristics to evaluate road rage of younger drivers: a driving simulator study. In: 2018 IEEE intelligent vehicles symposium (IV), 26–30 June, Changshu, pp 156–161

[13]

Bethge D, Kosch T, Grosse-Puppendahl T et al (2021) Vemotion: using driving context for indirect emotion prediction in real-time. In: The 34th annual ACM symposium on user interface software and technology, 10–13 Oct, pp 638–651

[14]

WuX, WangY, PengZ, et al.. A questionnaire survey on road rage and anger-provoking situations in China. Accid Anal Prev, 2018, 111: 210-221.

[15]

ChenG, ChenK, ZhangL, et al.. VCANet: vanishing-point-guided context-aware network for small road object detection. Automot Innov, 2021, 4: 400-412.

[16]

TianC, LengB, HouX, et al.. Robust identification of road surface condition based on ego-vehicle trajectory reckoning. Automot Innov, 2022, 5: 376-387.

[17]

HuangTR, HsuSM, FuLC. Data augmentation via face morphing for recognizing intensities of facial emotions. IEEE Trans Affect Comput, 2021, 14: 1228-1235.

[18]

WuY, LiJ. Multimodal emotion identification fusing facial expression and EEG. Multimed Tools Appl, 2023, 82: 10901-10919.

[19]

BarrettLF, AdolphsR, MarsellaS, et al.. Emotional expressions reconsidered: challenges to inferring emotion from human facial movements. Psychol Sci Public Interest, 2019, 20: 1-68.

[20]

WangX, LiuY, WangF, et al.. Feature extraction and dynamic identification of drivers’ emotions. Transp Res Pt F Traffic Psychol Behav, 2019, 62: 175-191.

[21]

ZhangX, LiuJ, ShenJ, et al.. Emotion recognition from multimodal physiological signals using a regularized deep fusion of kernel machine. IEEE T Cybern, 2020, 51: 4386-4399.

[22]

EkmanP. An argument for basic emotions. Cognit Emot, 1992, 6: 169-200.

[23]

ShuL, XieJ, YangM, et al.. A review of emotion recognition using physiological signals. Sensors, 2018, 182074.

[24]

LangPJ. The emotion probe: studies of motivation and attention. Am Psychol, 1995, 50372.

[25]

MehrabianA. Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol, 1996, 14: 261-292.

[26]

EkmanP, OsterH. Facial expressions of emotion. Annu Rev Psychol, 1979, 30: 527-554.

[27]

RussellJA, BachorowskiJA, Fernández-DolsJM. Facial and vocal expressions of emotion. Annu Rev Psychol, 2003, 54: 329-349.

[28]

Shiota M, Kalat J (2011) Emotion (2nd eds). Wadsworth Cengage Learning Belmont, Australia

[29]

BachorowskiJA, OwrenMJ. Vocal expressions of emotion. Handb Emot, 2008, 3: 196-210

[30]

RaniP, LiuC, SarkarN, et al.. An empirical study of machine learning techniques for affect recognition in human-robot interaction. Pattern Anal Appl, 2006, 9: 58-69.

[31]

Ali K, Hughes CE (2023) A unified transformer-based network for multimodal emotion recognition. arXiv preprint arXiv:230814160. https://doi.org/10.48550/arXiv.2308.14160

[32]

LiW, XueJ, TanR, et al.. Global-local-feature-fused driver speech emotion detection for intelligent cockpit in automated driving. IEEE Trans Intell Veh, 2023, 8: 2684-2697.

[33]

LiuS, GaoP, LiY, et al.. Multimodal fusion network with complementarity and importance for emotion recognition. Inf Sci, 2023, 619: 679-694.

[34]

MocanuB, TapuR, ZahariaT. Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning. Image Vis Comput, 2023, 133104676.

[35]

Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 18–23 June, Salt Lake City, pp 6848–6856

[36]

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 18–23 June, Salt Lake City, pp 7132–7141

[37]

Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: The 32nd conference on neural information processing systems. https://doi.org/10.48550/arXiv.1805.07836

[38]

ChiccoD, JurmanG. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 2020, 21: 1-13.

[39]

Rao CR (1980) Some comments on the minimum mean square error as a criterion of estimation. Statistics Related Topics. https://doi.org/10.21236/ADA093824

[40]

KimDH, BaddarWJ, JangJ, et al.. Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Trans Affect Comput, 2017, 10: 223-236.

[41]

Guo Y, Zhang L, Hu Y et al (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: Leibe B, Matas J, Sebe N et al (eds) Lecture notes in computer science, vol 9907. Springer, Cham. https://doi.org/10.1007/978-3-319-46487-9_6

[42]

Lucey P, Cohn JF, Kanade T et al (2010) The extended cohn-kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition, 13–18 June, San Francisco, pp 94–101

[43]

LivingstoneSR, RussoFA. The ryerson audiovisual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PloS one, 2018, 13e0196391.

[44]

KoelstraS, MuhlC, SoleymaniM, et al.. Deap: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput, 2011, 3: 18-31.

[45]

LiW, TanR, XingY, et al.. A multimodal psychological, physiological and behavioural dataset for human emotions in driving tasks. Sci Data, 2022, 9481.

[46]

ZhangK, ZhangZ, LiZ, et al.. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett, 2016, 23: 1499-1503.

[47]

LawrenceI, LinK. A concordance correlation coefficient to evaluate reproducibility. Biometrics, 1989, 45: 255-268.

[48]

DengS, LvZ, GalvánE, et al.. Evolutionary neural architecture search for facial expression recognition. IEEE Trans Emerg Top Comput Intell, 2023, 7(5): 1405-1419.

[49]

Rayhan AhmedMd, IslamS, Muzahidul IslamAKM, et al.. An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Syst Appl, 2023, 218119633.

[50]

TangJ, MaZ, GanK, et al.. Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment. Inf Fus, 2024, 103102129.

[51]

LiW, ZengG, ZhangJ, et al.. CogEmoNet: a cognitive-feature-augmented driver emotion recognition model for smart cockpit. IEEE Trans Comput Soc Syst, 2021, 9(3): 667-678.

Funding

National Natural Science Foundation of China(52302497)

RIGHTS & PERMISSIONS

Shanghai University and Periodicals Agency of Shanghai University and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF

338

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/