Multi-classifier information fusion for human activity recognition in healthcare facilities

Da HU , Mengjun WANG , Shuai LI

Front. Eng ›› 2025, Vol. 12 ›› Issue (1) : 99 -116.

PDF (4245KB)
Front. Eng ›› 2025, Vol. 12 ›› Issue (1) : 99 -116. DOI: 10.1007/s42524-024-4074-y
Information Management and Information Systems
RESEARCH ARTICLE

Multi-classifier information fusion for human activity recognition in healthcare facilities

Author information +
History +
PDF (4245KB)

Abstract

In healthcare facilities, including hospitals, pathogen transmission can lead to infectious disease outbreaks, highlighting the need for effective disinfection protocols. Although disinfection robots offer a promising solution, their deployment is often hindered by their inability to accurately recognize human activities within these environments. Although numerous studies have addressed Human Activity Recognition (HAR), few have utilized scene graph features that capture the relationships between objects in a scene. To address this gap, our study proposes a novel hybrid multi-classifier information fusion method that combines scene graph analysis with visual feature extraction for enhanced HAR in healthcare settings. We first extract scene graphs, complete with node and edge attributes, from images and use a graph classification network with a graph attention mechanism for activity recognition. Concurrently, we employ Swin Transformer and convolutional neural network models to extract visual features from the same images. The outputs from these three models are then integrated using a hybrid information fusion approach based on Dempster-Shafer theory and a weighted majority vote. Our method is evaluated on a newly compiled hospital activity data set, consisting of 5,770 images across 25 activity categories. The results demonstrate an accuracy of 90.59%, a recall of 90.16%, and a precision of 90.31%, outperforming existing HAR methods and showing its potential for practical applications in healthcare environments.

Graphical abstract

Keywords

human activity classification / scene graph / graph neural network / multi-classifier fusion / healthcare facility

Cite this article

Download citation ▾
Da HU, Mengjun WANG, Shuai LI. Multi-classifier information fusion for human activity recognition in healthcare facilities. Front. Eng, 2025, 12(1): 99-116 DOI:10.1007/s42524-024-4074-y

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Aggarwal A K, (2020a). Enhancement of GPS position accuracy using machine vision and deep learning techniques. Journal of Computational Science, 16( 5): 651–659

[2]

AggarwalA K (2020b). Fusion and enhancement techniques for processing of multispectral images. In: Ran A & Teiji W (Eds.), Unmanned Aerial Vehicle: Applications in Agriculture and Environment Cham: Springer International Publishing. 159–175

[3]

AlonUYahav E (2020). On the bottleneck of graph neural networks and its practical implications. ArXiv: 2006.05205

[4]

Assadian O, Harbarth S, Vos M, Knobloch JK, Asensio A, Widmer AF, (2021). Practical recommendations for routine cleaning and disinfection procedures in healthcare institutions: A narrative review. Journal of Hospital Infection, 113: 104–114

[5]

Bas C, Ikizler-Cinbis N, (2022). Top-down and bottom-up attentional multiple instance learning for still image action recognition. Signal Processing Image Communication, 104: 116664

[6]

Bera A, Wharton Z, Liu Y, Bessis N, Behera A, (2021). Attend and guide (AG-Net): A keypoints-driven attention-based deep network for image recognition. IEEE Transactions on Image Processing, 30: 3691–3704

[7]

BrodySAlon UYahavE (2021). How attentive are graph attention networks? 2022 International Conference on Learning Representations (ICLR)

[8]

CDC (2024). [Internet]. Available from the wobsit of CDC

[9]

ChenKZhang DYaoLGuoBYuZ LiuY (2021). Deep learning for sensor-based human activity recognition: Overview, challenges, and opportunities. ACM Comput. Surv., 54: 77:1–77:40

[10]

DanielJLauffenburger J P (2011). Conflict management in multi-sensor dempster-shafer fusion for speed limit determination. 2011 IEEE Intelligent Vehicles Symposium (IV). p. 987–992.

[11]

Ding Y, Zhang Z, Zhao X, Hong D, Cai W, Yang N, Wang B, (2023). Multi-scale receptive fields: Graph attention neural network for hyperspectral image classification. Expert Systems with Applications, 223: 119858

[12]

Ding Y, Zhang Z, Zhao X, Hong D, Cai W, Yu C, Yang N, Cai W, (2022). Multi-feature fusion: Graph neural network and CNN combining for hyperspectral image classification. Neurocomputing, 501: 246–257

[13]

Dong E, Du H, Gardner L, (2020). An interactive web-based dashboard to track COVID-19 in real time. Lancet. Infectious Diseases, 20( 5): 533–534

[14]

FengWZhang XHuangXLuoZ (2017). Attention focused spatial pyramid pooling for boxless action recognition in still images. In: Lintas A, Rovetta S, Verschure PFMJ, and Villa AEP, eds. Artificial Neural Networks and Machine Learning – ICANN 2017 Cham: Springer International Publishing. 574–581

[15]

GkioxariGGirshick RMalikJ (2015a). Actions and attributes from wholes and parts. In: 2015 IEEE International Conference on Computer Vision (ICCV). 2470–2478

[16]

GkioxariGGirshick RMalikJ (2015b). Contextual action recognition with R* CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV). 1080–1088

[17]

Guettari M, Gharbi I, Hamza S, (2021). UVC disinfection robot. Environmental Science and Pollution Research International, 28( 30): 40394–40399

[18]

Guo G, Lai A, (2014). A survey on still image based human action recognition. Pattern Recognition, 47( 10): 3343–3361

[19]

Guo L, Wang L, Liu J, Zhou W, Lu B, (2018). HuAc: Human activity recognition using crowdsourced WIFI signals and skeleton data. Wireless Communications and Mobile Computing, 2018: 1–15

[20]

Haque M, Sartelli M, McKimm J, Abu Bakar M B, (2018). Health care-associated infections—An overview. Infection and Drug Resistance, 11: 2321–2333

[21]

Hong D, Gao L, Yao J, Zhang B, Plaza A, Chanussot J, (2021). Graph convolutional networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 59( 7): 5966–5978

[22]

Hong D, Yokoya N, Chanussot J, Zhu X, (2019). An augmented linear mixing model to address spectral variability for hyperspectral unmixing. IEEE Transactions on Image Processing, 28: 1923–1938

[23]

Hu D, Li S, (2022). Recognizing object surface materials to adapt robotic disinfection in infrastructure facilities. Computer-Aided Civil and Infrastructure Engineering, 37( 12): 1521–1546

[24]

Hu D, Li S, Wang M, (2023). Object detection in hospital facilities: A comprehensive dataset and performance evaluation. Engineering Applications of Artificial Intelligence, 123: 106223

[25]

Hu D, Zhong H, Li S, Tan J, He Q, (2020). Segmenting areas of potential contamination for adaptive robotic disinfection in built environments. Building and Environment, 184: 107226

[26]

IkizlerNCinbis R GPehlivanSDuyguluP (2008). Recognizing actions from still images. In: 2008 19th International Conference on Pattern Recognition (ICPR). 1–4

[27]

JinTGuoF MengQZhu SXiXWangWMuZ SongW (2023). Fast contextual scene graph generation with unbiased context augmentation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6302–6311

[28]

Khaire P, Kumar P, Imran J, (2018). Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recognition Letters, 115: 107–116

[29]

Khan F S, Xu J, Van De Weijer J, Bagdanov A D, Anwer R M, Lopez A M, (2015). Recognizing actions through action-specific person detection. IEEE Transactions on Image Processing, 24( 11): 4422–4432

[30]

Kobak D, Berens P, (2019). The art of using t-SNE for single-cell transcriptomics. Nature Communications, 10( 1): 5416

[31]

Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L J, Shamma D A, Bernstein M S, Fei-Fei L, (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123( 1): 32–73

[32]

Li Q, Han Z, Wu X, (2018). Deeper insights into graph convolutional networks for semi-supervised learning. Proceedings of the AAAI Conference on Artificial Intelligence, 32( 1): 3538–3545

[33]

LiuZLinY CaoYHuH WeiYZhangZ LinSGuoB (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 9992–10002

[34]

LiuZMaoH WuC YFeichtenhofer CDarrellTXieS (2022). A ConvNet for the 2020s. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11966–11976

[35]

Dang L M, Min K, Wang H, Piran M J, Lee C H, Moon H, (2020). Sensor-based and vision-based human activity recognition: A comprehensive survey. Pattern Recognition, 108: 107561

[36]

Mudiyanselage S E, Nguyen P H D, Rajabi M S, Akhavian R, (2021). Automated workers’ ergonomic risk assessment in manual material handling using semg wearable sensors and machine learning. Electronics, 10( 20): 2558

[37]

OquabMBottou LLaptevISivicJ (2014). Learning and transferring mid-level image representations using convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1717–1724

[38]

PaszkeAGross SMassaFLererABradburyJ ChananGKilleen TLinZGimelsheinNAntigaL DesmaisonAKopf AYangEDeVitoZRaisonM TejaniAChilamkurthy SSteinerBFangLBaiJ ChintalaS (2019). PyTorch: An imperative style, high-performance deep learning library

[39]

PowersD M W (2010). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. ArXiv

[40]

Rashidi Nasab A, Elzarka H, (2023). Optimizing machine learning algorithms for improving prediction of bridge deck deterioration: A case study of ohio bridges. Buildings, 13( 6): 1517

[41]

Raza Usmani A, Kotowski S E, Davis K G, (2023). The impact of hospital bed height and gender on fall risk during bed egress. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 67( 1): 2434–2436

[42]

Rodríguez-Moreno I, Martínez-Otzeta J M, Sierra B, Rodriguez I, Jauregi E, (2019). Video activity recognition: State-of-the-Art. Sensors, 19( 14): 3160

[43]

RogovaG (2008). Combining the results of several neural network classifiers. In: RR Yager and L Liu, editor. Classic Works of the Dempster-Shafer Theory of Belief Functions Berlin, Heidelberg: Springer Berlin Heidelberg. p. 683–692

[44]

RuderS (2017). An overview of gradient descent optimization algorithms. ArXiv

[45]

Rutala W A, Weber D J, (2016). Monitoring and improving the effectiveness of surface cleaning and disinfection. American Journal of Infection Control, 44( 5): e69–e76

[46]

Singh R, Khurana R, Kushwaha A K S, Srivastava R, (2020). Combining CNN streams of dynamic image and depth data for action recognition. Multimedia Systems, 26( 3): 313–322

[47]

Siyal A R, Bhutto Z, Muhammad S, Iqbal A, Mehmood F, Hussain A, Ahmed S, (2020). Still image-based human activity recognition with deep representations and residual learning. International Journal of Advanced Computer Science and Applications, 11( 5): 471–477

[48]

TangKNiu YHuangJShiJZhangH (2020). Unbiased scene graph generation from biased training. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3713–3722

[49]

TangKZhang HWuBLuoWLiuW (2019). Learning to compose dynamic tree structures for visual contexts. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6612–6621

[50]

Uyguroğlu F, ToygarÖ Demirel H, (2024). CNN-based Alzheimer’s disease classification using fusion of multiple 3D angular orientations. Signal, Image and Video Processing, 18( 3): 2743–2751

[51]

Maaten L van der, Hinton G, (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9: 2579–2605

[52]

VeličkovićPCucurullGCasanovaA RomeroALiò PBengioY (2017). Graph attention networks

[53]

WangYJiang HDrewM SLiZ NMoriG (2006). Unsupervised discovery of action classes. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1654–1661

[54]

Xiao J, Aggarwal A K, Rage U K, Katiyar V, Avtar R, (2023). Deep learning-based spatiotemporal fusion of unmanned aerial vehicle and satellite reflectance images for crop monitoring. IEEE Access: Practical Innovations, Open Solutions, 11: 85600–85614

[55]

XuDZhuY ChoyC BFei-Fei L (2017). Scene graph generation by iterative message passing. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition. 3097–3106.

[56]

Yan S, Smith J S, Lu W, Zhang B, (2018). Multibranch attention networks for action recognition in still images. IEEE Transactions on Cognitive and Developmental Systems, 10( 4): 1116–1125

[57]

YaoBJiangX KhoslaALin A LGuibasLFei-FeiL (2011). Human action recognition by learning bases of action attributes and parts. In: 2011 IEEE International Conference on Computer Vision. 1331–1338

[58]

Yao J, Cao X, Hong D, Wu X, Meng D, Chanussot J, Xu Z, (2022). Semi-Active convolutional neural networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 99: 1–1

[59]

YunYGuI Y H AghajanH (2013). Riemannian manifold-based support vector machine for human activity classification in images. In: 2013 20th IEEE International Conference on Image Processing. 3466–3469

[60]

ZellersRYatskar MThomsonSChoiY (2018). Neural motifs: Scene graph parsing with global context. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5831–5840

[61]

Zemmar A, Lozano A M, Nelson B J, (2020). The rise of robots in surgical environments during COVID-19. Nature Machine Intelligence, 2( 10): 566–572

[62]

Zhang H B, Lei Q, Zhong B N, Du J X, Peng J, (2016). A survey on human pose estimation. Intelligent Automation & Soft Computing, 22( 3): 483–489

[63]

Zhao Z, Ma H, Chen X, (2016). Semantic parts based top-down pyramid for action recognition. Pattern Recognition Letters, 84: 134–141

[64]

Zhao Z, Ma H, Chen X, (2017a). Generalized symmetric pair model for action classification in still images. Pattern Recognition, 64: 347–360

[65]

ZhaoZMa HYouS (2017b). Single image action recognition using semantic body part actions. In: 2017 IEEE International Conference on Computer Vision. 3411–3419

[66]

Zheng X, Gong T, Lu X, Li X, (2022). Human action recognition by multiple spatial clues network. Neurocomputing, 483: 10–21

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (4245KB)

970

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/