A human mesh-centered approach to action recognition in the operating room

Benjamin Liu , Gilles Soenens , Joshua Villarreal , Jeffrey Jopling , Isabelle Van Herzeele , Anita Rau , Serena Yeung-Levy

Artificial Intelligence Surgery ›› 2024, Vol. 4 ›› Issue (2) : 92 -108.

PDF
Artificial Intelligence Surgery ›› 2024, Vol. 4 ›› Issue (2) :92 -108. DOI: 10.20517/ais.2024.19
Original Article

A human mesh-centered approach to action recognition in the operating room

Author information +
History +
PDF

Abstract

Aim: Video review programs in hospitals play a crucial role in optimizing operating room workflows. In scenarios where split-seconds can change the outcome of a surgery, the potential of such programs to improve safety and efficiency is profound. However, leveraging this potential requires a systematic and automated analysis of human actions. Existing methods predominantly employ manual methods, which are labor-intensive, inconsistent, and difficult to scale. Here, we present an AI-based approach to systematically analyze the behavior and actions of individuals from operating rooms (OR) videos.

Methods: We designed a novel framework for human mesh recovery from long-duration surgical videos by integrating existing human detection, tracking, and mesh recovery models. We then trained an action recognition model to predict surgical actions from the predicted temporal mesh sequences. To train and evaluate our approach, we annotated an in-house dataset of 864 five-second clips from simulated surgical videos with their corresponding actions.

Results: Our best model achieves an F1 score and the area under the precision-recall curve (AUPRC) of 0.81 and 0.85, respectively, demonstrating that human mesh sequences can be successfully used to recover surgical actions from operating room videos. Model ablation studies suggest that action recognition performance is enhanced by composing human mesh representations with lower arm, pelvic, and cranial joints.

Conclusion: Our work presents promising opportunities for OR video review programs to study human behavior in a systematic, scalable manner.

Keywords

Action recognition / human mesh recovery / operating room / surgery / artificial intelligence / computer vision / deep learning

Cite this article

Download citation ▾
Benjamin Liu, Gilles Soenens, Joshua Villarreal, Jeffrey Jopling, Isabelle Van Herzeele, Anita Rau, Serena Yeung-Levy. A human mesh-centered approach to action recognition in the operating room. Artificial Intelligence Surgery, 2024, 4(2): 92-108 DOI:10.20517/ais.2024.19

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Dumas RP,Hatchimonji JS,Maher Z.Trauma video review utilization: a survey of practice in the United States.Am J Surg2020;219:49-53 PMCID:PMC8428979

[2]

Lynch RJ,Sturm L.Measurement of foot traffic in the operating room: implications for infection control.Am J Med Qual2009;24:45-52

[3]

Hazlehurst B,Gorman PN.Distributed cognition in the heart room: how situation awareness arises from coordinated communications during cardiac surgery.J Biomed Inform2007;40:539-51

[4]

Harders M,Weight S.Improving operating room efficiency through process redesign.Surgery2006;140:509-14

[5]

Palmer G 2nd,Swinton G.Realizing improved patient care through human-centered operating room design: a human factors methodology for observing flow disruptions in the cardiothoracic operating room.Anesthesiology2013;119:1066-77

[6]

Catchpole K,Handa A.Teamwork and error in the operating room: analysis of skills and roles.Ann Surg2008;247:699-706

[7]

Mottaghi A,Yeung S.Adaptation of surgical activity recognition models across operating rooms. In: Wang L, Dou Q, Fletcher PT, Speidel S, Li S, editors. Medical image computing and computer assisted intervention - MICCAI 2022. Cham: Springer; 2022. pp. 530-40.

[8]

Bogo F,Lassner C,Romero J.Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe B, Matas J, Sebe N, Welling M, editors. Computer Vision - ECCV 2016. Cham: Springer; 2016. pp. 561-78.

[9]

Li H,Hong D,Schultz M.Leveraging OpenStreetMap and multimodal remote sensing data with joint deep learning for wastewater treatment plants detection.Int J Appl Earth Obs Geoinf2022;110:102804 PMCID:PMC9626640

[10]

Yuan Y,Molchanov P,Kautz J. GLAMR: Global occlusion-aware human mesh recovery with dynamic cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. pp. 11038-49. Available from: https://openaccess.thecvf.com/content/CVPR2022/html/Yuan_GLAMR_Global_Occlusion-Aware_Human_Mesh_Recovery_With_Dynamic_Cameras_CVPR_2022_paper.html. [Last accessed on 21 Jun 2024]

[11]

Kocabas M,Black MJ. Vibe: Video inference for human body pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. pp. 5253-63. Available from: https://openaccess.thecvf.com/content_CVPR_2020/html/Kocabas_VIBE_Video_Inference_for_Human_Body_Pose_and_Shape_Estimation_CVPR_2020_paper.html. [Last accessed on 21 Jun 2024]

[12]

Tian Y,Liu Y.Recovering 3D human mesh from monocular images: a survey.IEEE Trans Pattern Anal Mach Intell2023;45:15406-25

[13]

Shao S,Li B. CrowdHuman: a benchmark for detecting human in a crowd. arXiv. [Preprint.] Apr 30, 2018 [accessed 2024 Jun 21]. Available from: https://arxiv.org/abs/1805.00123.

[14]

Weng SK,Tu SK.Video object tracking using adaptive Kalman filter.J Vis Commun Image Represent2006;17:1190-208

[15]

Li Z,Zhang Z,Yan Y.CLIFF: carrying location information in full frames into human pose and shape estimation. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors. Computer Vision - ECCV 2022: 17th European Conference; 2022 Oct 23-27; Tel Aviv, Israel. Cham: Springer; 2022. pp. 590-606.

[16]

Kolotouros N,Black MJ. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2019. pp. 2252-61. Available from: https://openaccess.thecvf.com/content_ICCV_2019/html/Kolotouros_Learning_to_Reconstruct_3D_Human_Pose_and_Shape_via_Model-Fitting_ICCV_2019_paper.html. [Last accessed on 21 Jun 2024]

[17]

Lin TY,Belongie S.Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. In: Computer Vision - ECCV 2014. Cham: Springer; 2014. pp. 740-55.

[18]

Ionescu C,Olaru V.Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments.IEEE Trans Pattern Anal Mach Intell2014;36:1325-39

[19]

von Marcard T, Henschel R, Black MJ, Rosenhahn B, Pons-moll G. Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, editors. Computer Vision - ECCV 2018. Cham: Springer; 2018. pp. 614-31.

[20]

Redmon J,Girshick R. You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. pp. 779-88. Available from: https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Redmon_You_Only_Look_CVPR_2016_paper.html. [Last accessed on 21 Jun 2024]

[21]

Pavlakos G,Ghorbani N. Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019. pp. 10975-85. Available from: https://openaccess.thecvf.com/content_CVPR_2019/html/Pavlakos_Expressive_Body_Capture_3D_Hands_Face_and_Body_From_a_CVPR_2019_paper.html. [Last accessed on 21 Jun 2024]

[22]

Mentis HM,Manser K,Schwaitzberg SD.A systematic review of the effect of distraction on surgeon performance: directions for operating room policy and surgical training.Surg Endosc2016;30:1713-24 PMCID:PMC5663645

[23]

Tolstikhin IO,Kolesnikov A. MLP-Mixer: an all-MLP architecture for vision. In: Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Available from: https://proceedings.neurips.cc/paper/2021/hash/cba0a4ee5ccd02fda0fe3f9a3e7b89fe-Abstract.html. [Last accessed on 21 Jun 2024]

[24]

Choe J,Rameau F,Kweon IS.PointMixer: MLP-mixer for point cloud understanding. In: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors. Computer Vision - ECCV 2022. Cham: Springer; 2022. pp. 620-40.

[25]

Ekambaram V,Nguyen N,Kalagnanam J.TSMixer: lightweight MLP-mixer model for multivariate time series forecasting. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM; 2023. pp. 459-469.

[26]

Kingma DP. Adam: a method for stochastic optimization. arXiv. [Preprint.] Dec 22, 2014 [accessed 2024 Jun 21]. Available from: https://arxiv.org/abs/1412.6980.

[27]

Mehta D,Casas D. Monocular 3D human pose estimation in the wild using improved CNN supervision. arXiv. [Preprint.] Nov 29, 2016 [accessed 2024 Jun 21]. Available from: https://arxiv.org/abs/1611.09813.

[28]

Moon G,Lee KM. Accurate 3D hand pose estimation for whole-body 3D human mesh estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. pp. 2308-17. Available from: https://openaccess.thecvf.com/content/CVPR2022W/ABAW/html/Moon_Accurate_3D_Hand_Pose_Estimation_for_Whole-Body_3D_Human_Mesh_CVPRW_2022_paper.html. [Last accessed on 21 Jun 2024]

[29]

Zhang X,Mo H,Zheng W. End-to-end hand mesh recovery from a monocular rgb image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2019. pp. 2354-64. Available from: https://openaccess.thecvf.com/content_ICCV_2019/html/Zhang_End-to-End_Hand_Mesh_Recovery_From_a_Monocular_RGB_Image_ICCV_2019_paper.html. [Last accessed on 21 Jun 2024]

[30]

Doyen B,Soenens G.Introduction of a surgical Black Box system in a hybrid angiosuite: challenges and opportunities.Phys Med2020;76:77-84

[31]

Garrow CR,Li L.Machine learning for surgical phase recognition: a systematic review.Ann Surg2021;273:684-93

[32]

Hardie JA,Mitchell TE.Patient, Procedure, People (PPP): recognising and responding to intraoperative critical events.Ann R Coll Surg Engl2022;104:409-13 PMCID:PMC9157965

[33]

Fasting S.Serious intraoperative problems - a five-year review of 83,844 anesthetics.Can J Anaesth2002;49:545-53

[34]

Yu X,Wang R.Prediction of massive blood loss in scoliosis surgery from preoperative variables.Spine2013;38:350-5

[35]

Chadebecq F,Mazomenos E.Computer vision in the surgical operating room.Visc Med2020;36:456-62 PMCID:PMC7768144

[36]

Nasri BN,Jackson C,Guglielmi C.Distractions in the operating room: a survey of the healthcare team.Surg Endosc2023;37:2316-25 PMCID:PMC9450817

PDF

74

Accesses

0

Citation

Detail

Sections
Recommended

/