Diffusion-augmented YOLO26-Swin cascaded framework with hybrid SHAP-CAM for autonomous power grid inspection
Stefano Frizzo Stefenon , João Pedro Matos-Carvalho , Viviana Cocco Mariani , Leandro dos Santos Coelho , Kin-Choong Yow
Autonomous Intelligent Systems ›› 2026, Vol. 6 ›› Issue (1) : 13
Deep learning-based autonomous inspection of power grid insulators is challenged by data imbalance and model opacity. This paper presents an end-to-end solution integrating advanced data synthesis, detection, classification, and explainability. First, a conditional diffusion model generates realistic synthetic fault images to balance the dataset. A two-stage architecture based on You Only Look Once version 26 (YOLO26) extra-large and Shifted windows (Swin)-V2-B, called YOLO26-Swin, fine-tuned with Bayesian optimization, performs robust insulator detection and then fault classification. Finally, a novel SHapley Additive exPlanations with Class Activation Mapping (SHAP-CAM) method provides intuitive visual explanations for model predictions. Extensive experiments validate our framework’s superiority: it achieves an F1-score of 0.98149 and a mean Average Precision (mAP)@[0.5] of 0.98951, exceeding leading detection and classification models. This work highlights the efficacy of diffusion models for data augmentation in critical infrastructure and advances the interpretability of vision-based inspection systems.
Bayesian optimization / Diffusion models / Generative artificial intelligence / Explainable artificial intelligence
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
Y. Tian, Q. Ye, D. Doermann, YOLOv12: attention-centric real-time object detectors, 1, 1–13 (2025). https://doi.org/10.48550/arXiv.2502.12524. Preprint. arXiv:2502.12524 |
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection (2015). https://doi.org/10.48550/arXiv.1506.02640. arXiv preprint. arXiv:1506.02640 |
| [18] |
|
| [19] |
Y. Tian, Q. Ye, D. Doermann, YOLOv12: attention-centric real-time object detectors (2025). https://doi.org/10.48550/arXiv.2502.12524. arXiv preprint. arXiv:2502.12524 |
| [20] |
|
| [21] |
R. Sapkota, R.H. Cheppally, A. Sharda, M. Karkee, YOLO26: key architectural enhancements and performance benchmarking for real-time object detection, arXiv preprint (2026). https://doi.org/10.48550/arXiv.2509.25164. arXiv:2509.25164 |
| [22] |
|
| [23] |
N. Jegham, C.Y. Koh, M. Abdelatti, A. Hendawi, YOLO Evolution: a comprehensive benchmark and architectural review of YOLOv12, YOLO11, and their previous versions, arXiv preprint (2024). https://doi.org/10.48550/arXiv.2411.0020. arXiv:2411.0020 |
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
|
| [69] |
|
| [70] |
|
| [71] |
|
| [72] |
|
| [73] |
|
| [74] |
|
| [75] |
|
| [76] |
|
| [77] |
|
| [78] |
|
| [79] |
|
| [80] |
|
| [81] |
|
| [82] |
|
| [83] |
|
| [84] |
|
| [85] |
|
| [86] |
R.L. Draelos, L. Carin, Use hirescam instead of grad-cam for faithful explanations of convolutional neural networks, arXiv preprint (2021). https://doi.org/10.48550/arXiv.2011.08891. arXiv:2011.08891 |
| [87] |
|
| [88] |
|
| [89] |
|
| [90] |
|
| [91] |
|
| [92] |
|
| [93] |
|
| [94] |
|
| [95] |
|
| [96] |
|
| [97] |
|
| [98] |
D. Lewis, P. Kulkarni, Insulator Defect Detection. IEEE Dataport (2021). https://ieee-dataport.org/competitions/insulator-defect-detection. Accessed on March 15, 2025 |
The Author(s)
/
| 〈 |
|
〉 |