Multimodal fusion recognition for digital twin

Author links open overlay panelTianzhe Zhou , Xuguang Zhang , Bing Kang , Mingkai Chen

›› 2024, Vol. 10 ›› Issue (2) : 337 -346.

PDF
›› 2024, Vol. 10 ›› Issue (2) :337 -346. DOI: 10.1016/j.dcan.2022.10.009
Research article
research-article

Multimodal fusion recognition for digital twin

Author information +
History +
PDF

Abstract

The digital twin is the concept of transcending reality, which is the reverse feedback from the real physical space to the virtual digital space. People hold great prospects for this emerging technology. In order to realize the upgrading of the digital twin industrial chain, it is urgent to introduce more modalities, such as vision, haptics, hearing and smell, into the virtual digital space, which assists physical entities and virtual objects in creating a closer connection. Therefore, perceptual understanding and object recognition have become an urgent hot topic in the digital twin. Existing surface material classification schemes often achieve recognition through machine learning or deep learning in a single modality, ignoring the complementarity between multiple modalities. In order to overcome this dilemma, we propose a multimodal fusion network in our article that combines two modalities, visual and haptic, for surface material recognition. On the one hand, the network makes full use of the potential correlations between multiple modalities to deeply mine the modal semantics and complete the data mapping. On the other hand, the network is extensible and can be used as a universal architecture to include more modalities. Experiments show that the constructed multimodal fusion network can achieve 99.42% classification accuracy while reducing complexity.

Keywords

Digital twin / Multimodal fusion / Object recognition / Deep learning / Transfer learning

Cite this article

Download citation ▾
Author links open overlay panelTianzhe Zhou, Xuguang Zhang, Bing Kang, Mingkai Chen. Multimodal fusion recognition for digital twin. , 2024, 10(2): 337-346 DOI:10.1016/j.dcan.2022.10.009

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

F. Tao, H. Zhang, A. Liu, A.Y.C. Nee, Digital twin in industry: state-of-the-art, IEEE Trans. Ind. Inf. 15 (4) (2019) 2405-2415.

[2]

J. Wu, Y. Yang, X. Cheng, H. Zuo, Z. Cheng, The development of digital twin technology review, in: 2020 Chinese Automation Congress, CAC, 2020, pp. 4901-4906.

[3]

M. Grieves, J. Vickers, Digital twin: mitigating unpredictable, undesirable emergent behavior in complex systems,in: Transdisciplinary Perspectives on Complex Systems, Springer, 2017, pp. 85-113.

[4]

Y. Wu, K. Zhang, Y. Zhang, Digital twin networks: a survey, IEEE Internet Things J. 8 (18) (2021) 13789-13804.

[5]

L. Zhou, D. Wu, X. Wei, J. Chen, Cross-modal stream scheduling for ehealth, IEEE J. Sel. Area. Commun. 39 (2) (2021) 426-437.

[6]

L. Zhou, D. Wu, J. Chen, X. Wei, Cross-modal collaborative communications, IEEE Wireless Commun. 27 (2) (2020) 112-117.

[7]

M. Strese, C. Schuwerk, A. Iepure, E. Steinbach, Multimodal feature-based surface material classification, IEEE Trans. Hapt. 10 (2) (2017) 226-239.

[8]

R.S. Dahiya, G. Metta, M. Valle, G. Sandini, Tactile sensing—from humans to humanoids, IEEE Trans. Robot. 26 (1) (2010) 1-20.

[9]

J.M. Gandarias, J.M. Gómez-de Gabriel, A.J. García-Cerezo, Tactile sensing and machine learning for human and object recognition in disaster scenarios, in: Iberian Robotics Conference, Springer, 2017, pp. 165-175.

[10]

A. Filippeschi, J.M. Jacinto Villegas, M. Satler, C.A. Avizzano, A novel diagnostician haptic interface for tele-palpation, in: 2018 27th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN, 2018, pp. 328-335.

[11]

K. Kishishita, N. Sakamoto, H. Muranami, K. Hattori, Y. Kurita, Effect of surface textures on the fingertip forces perception and output, in: 2021 IEEE World Haptics Conference, WHC), 2021, pp. 25-30.

[12]

V. Yem, H. Kajimoto, Combination of cathodic electrical stimulation and mechanical damped sinusoidal vibration to express tactile softness in the tapping process, in: 2018 IEEE Haptics Symposium, HAPTICS, 2018, pp. 84-88.

[13]

N. Saito, T. Ogata, S. Funabashi, H. Mori, S. Sugano, How to select and use tools? : active perception of target objects using multimodal deep learning, IEEE Rob. Autom. Lett. 6 (2) (2021) 2517-2524.

[14]

A. Wang, J. Lu, J. Cai, T.-J. Cham, G. Wang, Large-margin multi-modal deep learning for rgb-d object recognition, IEEE Trans. Multimed. 17 (11) (2015) 1887-1898.

[15]

L. Bo, X. Ren, D. Fox,Depth kernel descriptors for object recognition, in:2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011, pp. 821-826.

[16]

M. Strese, J.-Y. Lee, C. Schuwerk, Q. Han, H.-G. Kim, E. Steinbach, A haptic texture database for tool-mediated texture recognition and classification, in: 2014 IEEE International Symposium on Haptic, Audio and Visual Environments and Games (HAVE) Proceedings, 2014, pp. 118-123.

[17]

L. Bo, X. Ren, D. Fox, Hierarchical matching pursuit for image classification: architecture and fast algorithms,in:Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, Curran Associates Inc., Red Hook, NY, USA, 2011, pp. 2115-2123.

[18]

M. Cimpoi, S. Maji, A. Vedaldi, Deep filter banks for texture recognition and segmentation, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2015, pp. 3828-3836.

[19]

D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-propagating errors, Nature 323 (6088) (1986) 533-536.

[20]

Y. Gao, L.A. Hendricks, K.J. Kuchenbecker, T. Darrell, Deep learning for tactile understanding from visual and haptic data, in: 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 536-543.

[21]

K. Takahashi, J. Tan, Deep visuo-tactile learning: estimation of tactile properties from images,in:2019 International Conference on Robotics and Automation, ICRA, 2019, pp. 8951-8957.

[22]

J.M. Gandarias, F. Pastor, A.J. García-Cerezo, J.M. Gómez-de Gabriel, Active tactile recognition of deformable objects with 3d convolutional neural networks, in: 2019 IEEE World Haptics Conference, WHC, 2019, pp. 551-555.

[23]

Y. Chebotar, K. Hausman, Z. Su, G.S. Sukhatme, S. Schaal, Self-supervised regrasping using spatio-temporal tactile features and reinforcement learning, in: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2016, pp. 1960-1966.

[24]

R.B. Hellman, C. Tekin, M. van der Schaar, V.J. Santos, Functional contour-following via haptic perception and reinforcement learning, IEEE Trans. Hapt. 11 (1) (2018) 61-72.

[25]

K. Lai, L. Bo, X. Ren, D. Fox, A large-scale hierarchical multi-view rgb-d object dataset, in: 2011 IEEE International Conference on Robotics and Automation, 2011, pp. 1817-1824.

[26]

A. Johnson, M. Hebert, Using spin images for efficient object recognition in cluttered 3d scenes, IEEE Trans. Pattern Anal. Mach. Intell. 21 (5) (1999) 433-449.

[27]

D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2) (2004) 91-110.

[28]

T. Leung, J. Malik, Representing and recognizing the visual appearance of materials using three-dimensional textons, Int. J. Comput. Vis. 43 (1) (2001) 29-44.

[29]

A. Abdel-Hakim, A. Farag,Csift: a sift descriptor with color invariant characteristics, in: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2, 2006, pp. 1978-1983.

[30]

A. Hota, Comparison of some bag-of-words models for image recognition, in: 2014 X International Symposium on Telecommunications, BIHTEL, 2014, pp. 1-5.

[31]

L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5-32.

[32]

F.H.C. Tivive, A. Bouzerdoum, Texture classification using convolutional neural networks, in: TENCON 2006-2006 IEEE Region 10 Conference, 2006, pp. 1-4.

[33]

S. Basu, M. Karki, S. Mukhopadhyay, S. Ganguly, R. Nemani, R. DiBiano, S. Gayaka, A theoretical analysis of deep neural networks for texture classification, in: 2016 International Joint Conference on Neural Networks, IJCNN, 2016, pp. 992-999.

[34]

M. Ji, L. Fang, H. Zheng, M. Strese, E. Steinbach, Preprocessing-free surface material classification using convolutional neural networks pretrained by sparse autoencoder, in: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing, MLSP, 2015, pp. 1-6.

[35]

T. Baltrušaitis, C. Ahuja, L.-P. Morency, Multimodal machine learning: a survey and taxonomy, IEEE Trans. Pattern Anal. Mach. Intell. 41 (2) (2019) 423-443.

[36]

Y. LeCun, Y. Bengio, G. Hinton, Deep learn. nat. 521 (7553) (2015) 436-444.

[37]

K. Simonyan, A. Zisserman, Two-stream Convolutional Networks for Action Recognition in Videos, 2014 arXiv: 1406. 2199.

[38]

G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504-507.

[39]

R.R. Murphy, Computer vision and machine learning in science fiction, Sci. Robot. 4 (30) (2019) 7221-7235.

[40]

S.E. Kahou, C. Pal, X. Bouthillier, P. Froumenty c. Gülçehre, R. Memisevic, P. Vincent, A. Courville, Y. Bengio, R.C. Ferrari, M. Mirza, S. Jean, P.-L. Carrier, Y. Dauphin, N. Boulanger-Lewandowski, A. Aggarwal, J. Zumer, P. Lamblin, J.-P. Raymond, G. Desjardins, R. Pascanu, D. Warde-Farley, A. Torabi, A. Sharma, E. Bengio, M. Côté, K.R. Konda, Z. Wu, Combining modality specific deep neural networks for emotion recognition in video, in: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI ’13, Association for Computing Machinery, New York, NY, USA, 2013, pp. 543-550.

[41]

Z. Wu, L. Cai, H. Meng, Multi-level fusion of audio and visual features for speaker identification, in: Proceedings of the 2006 International Conference on Advances in Biometrics, ICB’06, Springer-Verlag, Berlin, Heidelberg, 2006, pp. 493-499.

[42]

D. Wu, L. Pigou, P.-J. Kindermans, N.D.-H. Le, L. Shao, J. Dambre, J.-M. Odobez, Deep dynamic neural networks for multimodal gesture segmentation and recognition, IEEE Trans. Pattern Anal. Mach. Intell. 38 (8) (2016) 1583-1597.

[43]

M. Gönen, E. Alpaydın, Multiple kernel learning algorithms, J. Mach. Learn. Res. 12 (2011) 2211-2268.

[44]

Z.-z. Lan, L. Bao, S.-I. Yu, W. Liu, A.G. Hauptmann, Multimedia classification and event detection using double fusion, Multimed. Tool. Appl. 71 (1) (2014) 333-347.

[45]

K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, 2015 arXiv:1409.1556.

[46]

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (1)(2014) 1929-1958.

[47]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770-778.

[48]

D. Ramachandram, G.W. Taylor, Deep multimodal learning: a survey on recent advances and trends, IEEE Signal Process. Mag. 34 (6) (2017) 96-108.

[49]

L. Zhou, D. Wu, X. Wei, Z. Dong, Seeing isn't believing: qoe evaluation for privacy-aware users, IEEE J. Sel. Area. Commun. 37 (7) (2019) 1656-1665.

[50]

H. Zheng, L. Fang, M. Ji, M. Strese, Y. Özer, E. Steinbach, Deep learning for surface material classification using haptic and visual information, IEEE Trans. Multimed. 18 (12) (2016) 2407-2416.

[51]

M. Strese, C. Schuwerk, E. Steinbach, Surface classification using acceleration signals recorded during human freehand movement, in: 2015 IEEE World Haptics Conference, WHC, 2015, pp. 214-219.

[52]

N. Landin, J.M. Romano, W. McMahan, K.J. Kuchenbecker, Dimensional reduction of high-frequency accelerations for haptic rendering, in: Proceedings of the 2010 International Conference on Haptics - Generating and Perceiving Tangible Sensations: Part II, EuroHaptics'10, Springer-Verlag, Berlin, Heidelberg, 2010, pp. 79-86.

[53]

F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, Q. He, A comprehensive survey on transfer learning, Proc. IEEE 109 (1) (2021) 43-76.

[54]

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: an Imperative Style, High-Performance Deep Learning Library, 2019 arXiv: 1912.01703.

AI Summary AI Mindmap
PDF

64

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/