Fusion-oriented registration of UAV-acquired RGB–infrared images for maritime target detection and segmentation
Zhenyi Li , Xiaogang Yang , Tianxu Zhao , Shengke Wang
Intelligent Marine Technology and Systems ›› 2026, Vol. 4 ›› Issue (1) : 11
This study introduces a novel target-aware registration framework for infrared and visible images acquired by unmanned aerial vehicles (UAVs), designed to improve feature-matching accuracy and robustness. The proposed method begins with image segmentation to remove irrelevant background regions, thereby retaining only the target objects that require registration. This step significantly suppresses background interference during the registration process. The segmentation stage is further guided by bounding boxes generated from a target-detection model, improving the accuracy and stability of the segmentation results. In addition, we propose a novel evaluation strategy for assessing infrared–visible image registration performance. This metric segments both the original and registered images and then computes the mean intersection over union between the segmented regions and the original bounding boxes. Furthermore, we incorporate image-fusion metrics from downstream post-registration tasks to provide a more comprehensive assessment of registration quality. Extensive experimental results demonstrate that the proposed method outperforms existing approaches in terms of both registration accuracy and stability, providing a robust solution for infrared–visible image alignment.
UAV multimodal registration / Maritime surveillance / Target-aware segmentation / Image fusion / Deep learning
| [1] |
|
| [2] |
Chen ZW, Wei J, Li R (2022) Unsupervised multi-modal medical image registration via discriminator-free image-to-image translation. Preprint at arXiv:2204.13656 |
| [3] |
DeTone D, Malisiewicz V, Rabinovich A (2018) SuperPoint: self-supervised interest point detection and description. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, pp 337–349. https://doi.org/10.1109/CVPRW.2018.00060 |
| [4] |
Edstedt J, Athanasiadis I, Wadenbäck M, Felsberg M (2023) DKM: dense kernelized feature matching for geometry estimation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 17765–17775. https://doi.org/10.1109/CVPR52729.2023.01704 |
| [5] |
Edstedt J, Bökman G, Wadenbäck M, Felsberg M (2024a) DeDoDe: detect, don’t describe–describe, don’t detect for local feature matching. In: 2024 International Conference on 3D Vision (3DV). IEEE, pp 148–157. https://doi.org/10.1109/3DV62453.2024.00035 |
| [6] |
Edstedt J, Bökman G, Zhao ZJ (2024b) DeDoDe v2: analyzing and improving the DeDoDe keypoint detector. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, pp 4245–4253. https://doi.org/10.1109/CVPRW63382.2024.00428 |
| [7] |
|
| [8] |
He KM, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: 16th IEEE International Conference on Computer Vision (ICCV). IEEE, pp 2980–2988. https://doi.org/10.1109/ICCV.2017.322 |
| [9] |
He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 770–778. https://doi.org/10.1109/CVPR.2016.90 |
| [10] |
Jin Z, Xue P, Zhang YY, Cao XH, Shen DG (2022) Semantic-aware registration with weakly-supervised learning. In: MICCAI Workshop on Cancer Prevention through Early Detection. Springer, pp 159–168 |
| [11] |
Khanam R, Hussain M (2024) YOLOv11: an overview of the key architectural enhancements. Preprint at arXiv:2410.17725 |
| [12] |
|
| [13] |
|
| [14] |
Ravi N, Gabeur V, Hu YT, Hu RH, Ryali C, Ma TY et al (2024) SAM 2: segment anything in images and videos. Preprint at arXiv:2408.00714 |
| [15] |
Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: 2011 International Conference on Computer Vision. IEEE, pp 2564–2571. https://doi.org/10.1109/ICCV.2011.6126544 |
| [16] |
Sun JM, Shen ZH, Wang Y, Bao HJ, Zhou XW (2021) LoFTR: detector-free local feature matching with transformers. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 8918–8927. https://doi.org/10.1109/CVPR46437.2021.00881 |
| [17] |
|
| [18] |
Wang D, Liu JY, Fan X, Liu RS (2022) Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. Preprint at arXiv:2205.11876 |
| [19] |
Wang YF, He XY, Peng SD, Tan DL, Zhou XW (2024) Efficient LoFTR: semi-dense local feature matching with sparse-like speed. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 21666–21675. https://doi.org/10.1109/CVPR52733.2024.02047 |
| [20] |
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 16th IEEE International Conference on Computer Vision (ICCV). IEEE, pp 2242–2251. https://doi.org/10.1109/ICCV.2017.244 |
The Author(s)
/
| 〈 |
|
〉 |