Surgical computer vision for intraoperative decision-support: a scoping review on performance metrics and readiness for real-time deployment
Jayvee Buchanan , Saxon Connor , John Pearson , Bruce Carey-Smith , Tim Eglinton
Artificial Intelligence Surgery ›› 2026, Vol. 6 ›› Issue (1) : 150 -70.
Background: Real-time computer vision-based artificial intelligence (CV-AI) systems for surgical video analysis are rapidly advancing. Current evaluation strategies and clinical-readiness reporting, however, remain inconsistent. This scoping review mapped contemporary CV-AI task domains, performance metrics, and evidence of readiness for real-time intraoperative deployment within general surgery.
Methods: This study followed Joanna Briggs Institute methodology for scoping reviews, and was reported in accordance with PRISMA-ScR. Eligible studies were identified by systematic literature search of the MEDLINE, Embase, PubMed, and Scopus databases. All studies published between 1 June 2015 and 1 June 2025 were eligible.
Results: A total of 490 articles were screened, with 113 studies meeting the inclusion criteria after full-text review. Retrospective feasibility analyses predominated, with only 13 studies (12%) evaluating real-time intraoperative integration. Five task domains were identified (phase recognition, anatomy identification, action-event recognition, instrument tracking, and skill-assessment). Forty-one unique performance metrics were reported, with predominant use of discrimination-style summary measures (e.g., accuracy, recall, F1 score), and comparatively sparse reporting of class imbalance, boundary-aware (e.g., Hausdorff distance) or real-time workflow factors (e.g., latency/stability, interface design, surgeon feedback). External validation was described in 13 (12%) studies. Nine studies (8%) referenced artificial intelligence-specific reporting frameworks.
Conclusion: Surgical CV-AI is advancing technically, but remains predominantly at an early feasibility stage. Variability in current metric application and limited real-time clinical evaluation limit potential for comparability, applicability and widespread adoption. Standardised metrics, evaluation frameworks, prospective clinical trials, and collaborative end-user engagement are critical to translate conceptual promise to reliable real-time decision-support tools that support surgeon judgement and integrate seamlessly into routine operative workflows.
Artificial intelligence / computer vision / deep learning / surgical video analysis / intraoperative decision support / minimally invasive surgery / laparoscopic surgery
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
|
| [69] |
|
| [70] |
|
| [71] |
|
| [72] |
|
| [73] |
|
| [74] |
|
| [75] |
|
| [76] |
|
| [77] |
|
| [78] |
|
| [79] |
|
| [80] |
|
| [81] |
|
| [82] |
|
| [83] |
|
| [84] |
|
| [85] |
|
| [86] |
|
| [87] |
|
| [88] |
|
| [89] |
|
| [90] |
|
| [91] |
|
| [92] |
|
| [93] |
|
| [94] |
|
| [95] |
|
| [96] |
|
| [97] |
|
| [98] |
|
| [99] |
|
| [100] |
|
| [101] |
|
| [102] |
|
| [103] |
|
| [104] |
|
| [105] |
|
| [106] |
|
| [107] |
|
| [108] |
|
| [109] |
|
| [110] |
|
| [111] |
|
| [112] |
|
| [113] |
|
| [114] |
|
| [115] |
|
| [116] |
|
| [117] |
|
| [118] |
|
| [119] |
|
| [120] |
|
| [121] |
|
| [122] |
|
| [123] |
|
| [124] |
|
| [125] |
|
| [126] |
|
| [127] |
|
| [128] |
|
| [129] |
|
| [130] |
|
| [131] |
|
| [132] |
|
| [133] |
|
| [134] |
|
| [135] |
|
| [136] |
|
| [137] |
|
| [138] |
Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.BMJ2007;335:806-8 PMCID:PMC2034723 |
| [139] |
|
| [140] |
|
| [141] |
|
| [142] |
|
| [143] |
|
| [144] |
Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ; SPIRIT-AI and CONSORT-AI Working Group. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension.Lancet Digit Health2020;2:e549-60 PMCID:PMC8212701 |
| [145] |
|
| [146] |
|
| [147] |
|
| [148] |
|
| [149] |
|
| [150] |
|
| [151] |
|
| [152] |
|
| [153] |
|
| [154] |
|
| [155] |
|
| [156] |
|
| [157] |
|
| [158] |
|
| [159] |
|
| [160] |
|
| [161] |
|
| [162] |
|
| [163] |
|
/
| 〈 |
|
〉 |