·AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition
Ying-Zhang Wu , Wen-Bo Li , Yu-Jing Liu , Guan-Zhong Zeng , Cheng-Mou Li , Hua-Min Jin , Shen Li , Gang Guo
Advances in Manufacturing ›› 2025, Vol. 13 ›› Issue (3) : 525 -538.
·AI-enabled intelligent cockpit proactive affective interaction: middle-level feature fusion dual-branch deep learning network for driver emotion recognition
Advances in artificial intelligence (AI) technology are propelling the rapid development of automotive intelligent cockpits. The active perception of driver emotions significantly impacts road traffic safety. Consequently, the development of driver emotion recognition technology is crucial for ensuring driving safety in the advanced driver assistance system (ADAS) of the automotive intelligent cockpit. The ongoing advancements in AI technology offer a compelling avenue for implementing proactive affective interaction technology. This study introduced the multimodal driver emotion recognition network (MDERNet), a dual-branch deep learning network that temporally fused driver facial expression features and driving behavior features for non-contact driver emotion recognition. The proposed model was validated on publicly available datasets such as CK+, RAVDESS, DEAP, and PPB-Emo, recognizing discrete and dimensional emotions. The results indicated that the proposed model demonstrated advanced recognition performance, and ablation experiments confirmed the significance of various model components. The proposed method serves as a fundamental reference for multimodal feature fusion in driver emotion recognition and contributes to the advancement of ADAS within automotive intelligent cockpits.
Driver emotion / Artificial intelligence (AI) / Facial expression / Driving behavior / Intelligent cockpit
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
Zhang W, Tang J (2022) Technology developing state and trend about advanced driving assistance system and calculating chip. In: The 4th international academic exchange conference on science and technology innovation (IAECST), Guangzhou, 9–11 Dec, pp 938–943. https://doi.org/10.1109/IAECST57965.2022.10061965 |
| [6] |
|
| [7] |
World Health Organization (2018) Global status report on road safety 2018: summary. World Health Organization |
| [8] |
Ministry of Public Security of the People’s Republic of China (2020) One person dies in a car accident every 8 minutes! The highest rate of traffic accidents are these behaviors. http://www.xinhuanet.com/politics/2020-12/02/c_1126809938.htm |
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
Hu H, Zhu Z, Gao Z et al (2018) Analysis on biosignal characteristics to evaluate road rage of younger drivers: a driving simulator study. In: 2018 IEEE intelligent vehicles symposium (IV), 26–30 June, Changshu, pp 156–161 |
| [13] |
Bethge D, Kosch T, Grosse-Puppendahl T et al (2021) Vemotion: using driving context for indirect emotion prediction in real-time. In: The 34th annual ACM symposium on user interface software and technology, 10–13 Oct, pp 638–651 |
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
Shiota M, Kalat J (2011) Emotion (2nd eds). Wadsworth Cengage Learning Belmont, Australia |
| [29] |
|
| [30] |
|
| [31] |
Ali K, Hughes CE (2023) A unified transformer-based network for multimodal emotion recognition. arXiv preprint arXiv:230814160. https://doi.org/10.48550/arXiv.2308.14160 |
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 18–23 June, Salt Lake City, pp 6848–6856 |
| [36] |
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 18–23 June, Salt Lake City, pp 7132–7141 |
| [37] |
Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: The 32nd conference on neural information processing systems. https://doi.org/10.48550/arXiv.1805.07836 |
| [38] |
|
| [39] |
Rao CR (1980) Some comments on the minimum mean square error as a criterion of estimation. Statistics Related Topics. https://doi.org/10.21236/ADA093824 |
| [40] |
|
| [41] |
Guo Y, Zhang L, Hu Y et al (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: Leibe B, Matas J, Sebe N et al (eds) Lecture notes in computer science, vol 9907. Springer, Cham. https://doi.org/10.1007/978-3-319-46487-9_6 |
| [42] |
Lucey P, Cohn JF, Kanade T et al (2010) The extended cohn-kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition, 13–18 June, San Francisco, pp 94–101 |
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
Shanghai University and Periodicals Agency of Shanghai University and Springer-Verlag GmbH Germany, part of Springer Nature
/
| 〈 |
|
〉 |