A guided approach for cross-view geolocalization estimation with land cover semantic segmentation

Nathan A.Z. Xavier , Elcio H. Shiguemori , Marcos R.O.A. Maximo , Mubarak Shah

Biomimetic Intelligence and Robotics ›› 2025, Vol. 5 ›› Issue (2) : 100208

PDF (4280KB)
Biomimetic Intelligence and Robotics ›› 2025, Vol. 5 ›› Issue (2) : 100208 DOI: 10.1016/j.birob.2024.100208
Research Article

A guided approach for cross-view geolocalization estimation with land cover semantic segmentation

Author information +
History +
PDF (4280KB)

Abstract

Geolocalization is a crucial process that leverages environmental information and contextual data to accurately identify a position. In particular, cross-view geolocalization utilizes images from various perspectives, such as satellite and ground-level images, which are relevant for applications like robotics navigation and autonomous navigation. In this research, we propose a methodology that integrates cross-view geolocalization estimation with a land cover semantic segmentation map. Our solution demonstrates comparable performance to state-of-the-art methods, exhibiting enhanced stability and consistency regardless of the street view location or the dataset used. Additionally, our method generates a focused discrete probability distribution that acts as a heatmap. This heatmap effectively filters out incorrect and unlikely regions, enhancing the reliability of our estimations. Code is available at https://github.com/nathanxavier/CVSegGuide.

Keywords

Cross-view geolocalization / Semantic segmentation / Satellite and ground image fusion / Simultaneous localization and mapping (SLAM)

Cite this article

Download citation ▾
Nathan A.Z. Xavier, Elcio H. Shiguemori, Marcos R.O.A. Maximo, Mubarak Shah. A guided approach for cross-view geolocalization estimation with land cover semantic segmentation. Biomimetic Intelligence and Robotics, 2025, 5(2): 100208 DOI:10.1016/j.birob.2024.100208

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Nathan A.Z. Xavier: Writing - original draft, Visualization, Validation, Software, Methodology, Investigation, Formal analysis, Conceptualization. Elcio H. Shiguemori: Writing - review & editing, Supervision. Marcos R.O.A. Maximo: Writing - review & editing, Supervision. Mubarak Shah: Writing - review & editing, Supervision, Resources, Project administration.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) (88887.929508/2023-00 and 88887.937224/2024-00).

Marcos Maximo is partially funded by the National Research Council of Brazil (CNPq) (307525/2022-8).

This research was developed within the IDeepS which is supported by the Laboratório Nacional de Computação Científica (LNCC/MCTI, Brazil) via resources of the SDumont supercomputer (http://sdumont.lncc.br).

Appendix. Supplementary material

This appendix contains supplemental material that provides additional details on the neural network architecture described in Section 4.

Tables A.1 and A.2 present the architecture for the segmentation and heatmap prediction blocks, respectively, within the FeatUp backbone model. Similarly, Tables A.3 and A.4 provide the corresponding architectural details for the MST model.

Each of these tables includes information on the layer type, input and output shapes, and the number of learnable parameters.

References

[1]

C. Hegarty, E. Chatre, Evolution of the Global Navigation SatelliteSystem (GNSS), Proc. IEEE 96 (12) (2008) 1902-1917, http://dx.doi.org/10.1109/JPROC.2008.2006090.

[2]

S. Cobb, D. Lawrence, J. Christie, T. Walter, Y. Chao, D. Powell, B. Parkinson, Observed GPS signal continuity interruptions, in:Proceedings of Ion GPS, vol. 8, ION, INSTITUTE OF NAVIGATION, California, EUA, 1995, pp. 793-795.

[3]

E.L. Afraimovich, O.S. Lesyuta, I.I. Ushakov, Magnetospheric disturbances, and the GPS operation, arXiv, Online 2000, http://dx.doi.org/10.48550/ARXIV.PHYSICS/0009027.

[4]

Y. Xia, M. Song, J. Zhang, C. Hu, An autonomously navigation system for forestry quadrotor within GPS-denied below-canopy environment, in: 2018 IEEE CSAA Guidance, Navigation and Control Conference, CGNCC, IEEE, Xiamen, China, 2018, pp. 1-6, http://dx.doi.org/10.1109/GNCC42960.2018.9019136.

[5]

A.C.B. Chiella, B.O.S. Teixeira, G.A.S. Pereira, State estimation for aerial vehicles in forest environments, in: 2019 International Conference on Unmanned Aircraft Systems, ICUAS, IEEE, Atlanta, EUA, 2019, pp. 890-898, http://dx.doi.org/10.1109/ICUAS.2019.8797822.

[6]

M.S. Allauddin, G.S. Kiran, G.R. Kiran, G. Srinivas, G.U.R. Mouli, P.V. Prasad, Development of a surveillance system for forest fire detection and monitoring using drones, in: IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, IEEE, 2019, pp. 9361-9363, http://dx.doi.org/10.1109/IGARSS.2019.8900436.

[7]

V.A. Torres, B.R. Jaimes, E.S. Ribeiro, M.T. Braga, E.H. Shiguemori, H.F. Velho, L.C. Torres, A.P. Braga, Combined weightless neural network FPGA architecture for deforestation surveillance and visual navigation of UAVs, Eng. Appl. Artif. Intell. 87 (2020) 103227, http://dx.doi.org/10.1016/J.ENGAPPAI.2019.08.021.

[8]

D.R. Alves de Almeida, E. Broadbent, A.M. Almeyda Zambrano, M.P. Ferreira, P.H. Santin Brancalion, Fusion of lidar and hyperspectral data from drones for ecological questions: The gatoreye atlantic forest restoration case study, in: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, IEEE, Online, 2021, pp. 714-715, http://dx.doi.org/10.1109/IGARSS47720.2021.9554023.

[9]

Y. Xu, Y. Wei, D. Wang, K. Jiang, H. Deng, Multi-UAV path planning in GPS and communication denial environment, Sensors 23 (6) (2023) 2997, http://dx.doi.org/10.3390/S23062997.

[10]

J. Zeil, Visual navigation: properties, acquisition and use of views, J. Comp. Physiol. A 209 (4) (2022) 499-514, http://dx.doi.org/10.1007/S00359-022-01599-2.

[11]

D. Shah, A. Sridhar, N. Dashora, K. Stachowicz, K. Black, N. Hirose, S. Levine, ViNT: A foundation model for visual navigation, 2023, arXiv. http://dx.doi.org/10.48550/ARXIV.2306.14846.

[12]

G. Dimas, D.E. Diamantis, P. Kalozoumis, D.K. Iakovidis, Uncertainty-aware visual perception system for outdoor navigation of the visually challenged, Sensors 20 (8) (2020) 2385, http://dx.doi.org/10.3390/S20082385.

[13]

A. Sivakumar, S. Modi, M. Gasparino, C. Ellis, A. Baquero Velasquez, G. Chowdhary, S. Gupta, Learned visual navigation for under-canopy agricultural robots, in: Robotics: Science and Systems XVII, in: RSS 2021, Robotics: Science and Systems Foundation, 2021, http://dx.doi.org/10.15607/RSS.2021.XVII.019.

[14]

B. Fahima, N. Abdelkrim, Multispectral visual odometry using SVSF for mobile robot localization, Unmanned Syst. 10 (03) (2021) 273-288, http://dx.doi.org/10.1142/S2301385022500157.

[15]

J. Truong, A. Zitkovich, S. Chernova, D. Batra, T. Zhang, J. Tan, W. Yu, IndoorSim-to-OutdoorReal: Learning to navigate outdoors without any outdoor experience, IEEE Robot. Autom. Lett. 9 (5) (2024) 4798-4805, http://dx.doi.org/10.1109/LRA.2024.3385611.

[16]

M. Voodarla, S. Shrivastava, S. Manglani, A. Vora, S. Agarwal, P. Chakravarty, S- BEV: Semantic birds-eye view representation for weather and lighting invariant 3-DoF localization, 2021, http://dx.doi.org/10.48550/ARXIV.2101.09569, arXiv.

[17]

C.-J. Chen, Y.-Y. Huang, Y.-S. Li, Y.-C. Chen, C.-Y. Chang, Y.-M. Huang, Identification of fruit tree pests with deep learning on embedded drone to achieve accurate pesticide spraying, IEEE Access 9 (2021) 21986-21997, http://dx.doi.org/10.1109/ACCESS.2021.3056082.

[18]

A.B. Camiletto, A. Bochicchio, A. Liniger, D. Dai, A. Gawel, U-BEV: Height-aware bird’s-eye-view segmentation and neural map-based relocalization, 2023, http://dx.doi.org/10.48550/ARXIV.2310.13766, arXiv.

[19]

J. Luo, Q. Ye, UAV large oblique image geo-localization using satellite images in the dense buildings area, ISPRS Ann. Photogramm., Remote. Sens. Spatial Inf. Sci. X-1/W1-2023 (2023) 1065-1072, http://dx.doi.org/10.5194/ISPRS-ANNALS-X-1-W1-2023-1065-2023.

[20]

J. Ye, Q. Luo, J. Yu, H. Zhong, Z. Zheng, C. He, W. Li, SG-BEV: Satellite-guided BEV fusion for cross-view semantic segmentation, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2024, pp. 27748-27757, http://dx.doi.org/10.1109/CVPR52733.2024.02621.

[21]

L. Liu, H. Li, Lending orientation to neural networks for cross-view geo-localization, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2019, pp. 5617-5626, http://dx.doi.org/10.1109/CVPR.2019.00577.

[22]

S. Zhu, T. Yang, C. Chen, Revisiting street-to-aerial view image geo-localization and orientation estimation, in: 2021 IEEE Winter Conference on Applications of Computer Vision, WACV, IEEE, 2021, pp. 756-765, http://dx.doi.org/10.1109/WACV48630.2021.00080.

[23]

F. Ge, Y. Zhang, Y. Liu, G. Wang, S. Coleman, D. Kerr, L. Wang, Multibranch joint representation learning based on information fusion strategy for cross-view geo-localization, IEEE Trans. Geosci. Remote Sens. 62 (2024) 1-16, http://dx.doi.org/10.1109/TGRS.2024.3378453.

[24]

S. Zhu, M. Shah, C. Chen, TransGeo: Transformer is all you need for cross-view image geo-localization, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2022, pp. 1152-1161, http://dx.doi.org/10.1109/CVPR52688.2022.00123.

[25]

F. Fervers, S. Bullinger, C. Bodensteiner, M. Arens, R. Stiefelhagen, Uncertainty-aware vision-based metric cross-view geolocalization, in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2023, pp. 21621-21631, http://dx.doi.org/10.1109/CVPR52729.2023.02071.

[26]

F. Fervers, S. Bullinger, C. Bodensteiner, M. Arens, R. Stiefelhagen, C-BEV: Contrastive bird’s eye view training for cross-view image retrieval and 3-DoF pose estimation, 2023, http://dx.doi.org/10.48550/ARXIV.2312.08060, arXiv.

[27]

Z. Xia, O. Booij, J.F.P. Kooij, Convolutional cross-view pose estimation, IEEE Trans. Pattern Anal. Mach. Intell. 46 (5) (2024) 3813-3831, http://dx.doi.org/10.1109/TPAMI.2023.3346924.

[28]

S. Workman, R. Souvenir, N. Jacobs, Wide-area image geolocalization with aerial reference imagery, in: 2015 IEEE International Conference on Computer Vision, ICCV, IEEE, 2015, http://dx.doi.org/10.1109/ICCV.2015.451.

[29]

S. Zhu, T. Yang, C. Chen, VIGOR: Cross-view image geo-localization beyond one-to-one retrieval, in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2021, http://dx.doi.org/10.1109/CVPR46437.2021.00364.

[30]

N.N. Vo, J. Hays, Localizing and orienting street views using overhead imagery, in: Lecture Notes in Computer Science, Springer International Publishing, 2016, pp. 494-509, http://dx.doi.org/10.1007/978-3-319-46448-0_30.

[31]

Y. Tian, C. Chen, M. Shah, Cross-view image matching for geo-localization in urban environments, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2017, pp. 1998-2006, http://dx.doi.org/10.1109/CVPR.2017.216.

[32]

R. Cao, J. Zhu, W. Tu, Q. Li, J. Cao, B. Liu, Q. Zhang, G. Qiu, Integrating aerial and street view images for urban land use classification, Remote Sens. 10 (10) (2018) 1553, http://dx.doi.org/10.3390/RS10101553.

[33]

Y. Shi, X. Yu, L. Liu, T. Zhang, H. Li, Optimal feature transport for cross-view image geo-localization, Proc. AAAI Conf. Artif. Intell. 34 (07) (2020) 11990-11997, http://dx.doi.org/10.1609/AAAI.V34I07.6875.

[34]

Y. Shi, X. Yu, D. Campbell, H. Li, Where am I looking at? Joint location and orientation estimation by cross-view matching, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2020, pp. 4063-4071, http://dx.doi.org/10.1109/CVPR42600.2020.00412.

[35]

Y. Zhu, B. Sun, X. Lu, S. Jia, Geographic semantic network for cross-view image geo-localization, IEEE Trans. Geosci. Remote Sens. 60 (2022) 1-15, http://dx.doi.org/10.1109/TGRS.2021.3121337.

[36]

Y. Shi, X. Yu, L. Liu, D. Campbell, P. Koniusz, H. Li, Accurate 3-DoF camera geo-localization via ground-to-satellite image matching, IEEE Trans. Pattern Anal. Mach. Intell. (2022) 1-16, http://dx.doi.org/10.1109/TPAMI.2022.3189702.

[37]

T. Wang, S. Fan, D. Liu, C. Sun, Transformer-guided convolutional neural network for cross-view geolocalization, 2022, http://dx.doi.org/10.48550/ARXIV.2204.09967, arXiv.

[38]

L. Wang, R. Li, C. Zhang, S. Fang, C. Duan, X. Meng, P.M. Atkinson, UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery, ISPRS J. Photogramm. Remote Sens. 190 (2022) 196-214, http://dx.doi.org/10.1016/J.ISPRSJPRS.2022.06.008.

[39]

J. Zhao, Q. Zhai, P. Zhao, R. Huang, H. Cheng, Co-visual pattern-augmented generative transformer learning for automobile geo-localization, Remote Sens. 15 (9) (2023) 2221, http://dx.doi.org/10.3390/RS15092221.

[40]

Y. Shi, F. Wu, A. Perincherry, A. Vora, H. Li, Boosting 3-DoF ground-to-satellite camera localization accuracy via geometry-guided cross-view transformer, in: 2023 IEEE/CVF International Conference on Computer Vision, ICCV, IEEE, 2023, pp. 21459-21469, http://dx.doi.org/10.1109/ICCV51070.2023.01967.

[41]

K. Regmi, A. Borji, Cross-view image synthesis using geometry-guided conditional GANs, Comput. Vis. Image Underst. 187 (2019) 102788, http://dx.doi.org/10.1016/J.CVIU.2019.07.008.

[42]

K. Regmi, M. Shah, Bridging the domain gap for ground-to-aerial image matching, in: 2019 IEEE/CVF International Conference on Computer Vision, ICCV, IEEE, 2019, pp. 470-479, http://dx.doi.org/10.1109/ICCV.2019.00056.

[43]

A. Toker, Q. Zhou, M. Maximov, L. Leal-Taixe, Coming down to earth: Satellite-to-street view synthesis for geo-localization, in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2021, http://dx.doi.org/10.1109/CVPR46437.2021.00642.

[44]

S. Wu, H. Tang, X.-Y. Jing, J. Qian, N. Sebe, Y. Yan, Q. Zhang, Cross-view panorama image synthesis with progressive attention GANs, Pattern Recognit. 131 (2022) 108884, http://dx.doi.org/10.1016/J.PATCOG.2022.108884.

[45]

A. Durgam, S. Paheding, V. Dhiman, V. Devabhaktuni, Cross-view geo-localization: a survey, 2024, http://dx.doi.org/10.48550/ARXIV.2406.09722, arXiv.

[46]

Y. Zhuang, X. Sun, Y. Li, J. Huai, L. Hua, X. Yang, X. Cao, P. Zhang, Y. Cao, L. Qi, J. Yang, N. El-Bendary, N. El-Sheimy, J. Thompson, R. Chen, Multi-sensor integrated navigation/positioning systems using data fusion: From analytics-based to learning-based approaches, Inf. Fusion 95 (2023) 62-90, http://dx.doi.org/10.1016/J.INFFUS.2023.01.025.

[47]

M. Gadd, P. Newman, Checkout my map: Version control for fleetwide visual localisation, in: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2016, pp. 5729-5736, http://dx.doi.org/10.1109/IROS.2016.7759843.

[48]

R. Rodrigues, M. Tani, SemGeo: Semantic keywords for cross-view image geo-localization, in: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, IEEE, 2023, pp. 1-5, http://dx.doi.org/10.1109/ICASSP49357.2023.10094763.

[49]

V. Balaska, L. Bampis, A. Gasteratos, Self-localization based on terrestrial and satellite semantics, Eng. Appl. Artif. Intell. 111 (2022) 104824, http://dx.doi.org/10.1016/J.ENGAPPAI.2022.104824.

[50]

P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, G. Cottrell, Understanding convolution for semantic segmentation, in: 2018 IEEE Winter Conference on Applications of Computer Vision, WACV, IEEE, 2018, http://dx.doi.org/10.1109/WACV.2018.00163.

[51]

S. Kluckner, T. Mauthner, P.M. Roth, H. Bischof, Semantic classification in aerial imagery by integrating appearance and height information, in: Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2010, pp. 477-488, http://dx.doi.org/10.1007/978-3-642-12304-7_45.

[52]

S. Hu, G.H. Lee, -based geo-localization using satellite imagery, Int. J. Comput. Vis. 128 (5) (2019) 1205-1219, http://dx.doi.org/10.1007/S11263-019-01186-0.

[53]

M. Elhashash, R. Qin, Cross-view SLAM solver: Global pose estimation of monocular ground-level video frames for 3D reconstruction using a reference 3D model from satellite images, ISPRS J. Photogramm. Remote Sens. 188 (2022) 62-74, http://dx.doi.org/10.1016/J.ISPRSJPRS.2022.03.018.

[54]

Y. Zhang, Y. Shi, S. Wang, A. Vora, A. Perincherry, Y. Chen, H. Li, Increasing SLAM pose accuracy by ground-to-satellite image registration, in: 2024 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2024, pp. 8522-8528, http://dx.doi.org/10.1109/ICRA57147.2024.10611079.

[55]

S. Workman, M. Zhai, D.J. Crandall, N. Jacobs, A unified model for near and remote sensing, in: 2017 IEEE International Conference on Computer Vision, ICCV, IEEE, 2017, http://dx.doi.org/10.1109/ICCV.2017.293.

[56]

M. Zhai, Z. Bessinger, S. Workman, N. Jacobs, Predicting ground-level scene layout from aerial imagery, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2017, pp. 4132-4140, http://dx.doi.org/10.1109/CVPR.2017.440.

[57]

J. Dai, X. Hao, S. Liu, Z. Ren, Research on UAV robust adaptive positioning algorithm based on IMU/GNSS/VO in complex scenes, Sensors 22 (8) (2022) 2832, http://dx.doi.org/10.3390/S22082832.

[58]

O.L.F. de Carvalho, O.A. de Carvalho Júnior, C.R.e. Silva, A.O. de Albuquerque, N.C. Santana, D.L. Borges, R.A.T. Gomes, R.F. Guimarães, Panoptic segmentation meets remote sensing, Remote Sens. 14 (4) (2022) 965, http://dx.doi.org/10.3390/RS14040965.

[59]

B. Pan, J. Sun, H.Y.T. Leung, A. Andonian, B. Zhou, Cross-view semantic segmentation for sensing surroundings, IEEE Robot. Autom. Lett. 5 (3)(2020) 4867-4873, http://dx.doi.org/10.1109/LRA.2020.3004325.

[60]

B. Zhou, P. Krahenbuhl, Cross-view transformers for real-time map-view semantic segmentation, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2022, http://dx.doi.org/10.1109/CVPR52688.2022.01339.

[61]

G. Zhou, A. Liu, K. Yang, T. Wang, Z. Li, An embedded solution to visual mapping for consumer drones, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Columbus, USA, 2014, pp. 670-675, http://dx.doi.org/10.1109/CVPRW.2014.102.

[62]

S. Workman, M.U. Rafique, H. Blanton, N. Jacobs, Revisiting near/remote sensing with geospatial attention, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2022, pp. 1768-1777, http://dx.doi.org/10.1109/CVPR52688.2022.00182.

[63]

B. Cheng, I. Misra, A.G. Schwing, A. Kirillov, R. Girdhar, Masked-attention mask transformer for universal image segmentation, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2022, http://dx.doi.org/10.1109/CVPR52688.2022.00135.

[64]

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A.C. Berg, W.-Y. Lo, P. Dollár, R. Girshick, Segment anything, 2023, http://dx.doi.org/10.48550/ARXIV.2304.02643, arXiv.

[65]

N. Ravi, V. Gabeur, Y.-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. Rädle, C. Rolland, L. Gustafson, E. Mintun, J. Pan, K.V. Alwala, N. Carion, C.-Y. Wu, R. Girshick, P. Dollár, C. Feichtenhofer, SAM 2: Segment anything in images and videos, 2024, http://dx.doi.org/10.48550/ARXIV.2408.00714, arXiv.

[66]

S. Fu, M. Hamilton, L. Brandt, A. Feldman, Z. Zhang, W.T. Freeman, FeatUp: A model-agnostic framework for features at any resolution, 2024, http://dx.doi.org/10.48550/ARXIV.2403.10516, arXiv.

[67]

J. Kopf, M.F. Cohen, D. Lischinski, M. Uyttendaele, Joint bilateral upsampling, ACM Trans. Graph. 26 (3) (2007) 96, http://dx.doi.org/10.1145/1276377.1276497.

[68]

Z. Cui, P. Zhou, X. Wang, Z. Zhang, Y. Li, H. Li, Y. Zhang, A novel geo-localization method for UAV and satellite images using cross-view consistent attention, Remote Sens. 15 (19) (2023) 4667, http://dx.doi.org/10.3390/RS15194667.

[69]

A. Shetty, G.X. Gao, UAV pose estimation using cross-view geolocalization with satellite imagery, in: 2019 International Conference on Robotics and Automation, ICRA, IEEE, 2019, pp. 1827-1833, http://dx.doi.org/10.1109/ICRA.2019.8794228.

[70]

Z. Ye, C. Bao, X. Liu, H. Bao, Z. Cui, G. Zhang, Crossview mapping with graph-based geolocalization on city-scale street maps, in: 2022 International Conference on Robotics and Automation, ICRA, IEEE, 2022, pp. 7980-7987, http://dx.doi.org/10.1109/ICRA46639.2022.9811743.

[71]

D. Wilson, X. Zhang, W. Sultani, S. Wshah, Image and object geo-localization, Int. J. Comput. Vis. 132 (4) (2023) 1350-1392, http://dx.doi.org/10.1007/S11263-023-01942-3.

[72]

X. Zhang, X. Li, W. Sultani, Y. Zhou, S. Wshah, Cross-view geo-localization via learning disentangled geometric layout correspondence, Proc. AAAI Conf. Artif. Intell. 37 (3) (2023) 3480-3488, http://dx.doi.org/10.1609/AAAI.V37I3.25457.

[73]

Q. Zhang, Y. Zhu, Aligning geometric spatial layout in cross-view geo-localization via feature recombination, Proc. AAAI Conf. Artif. Intell. 38 (7)(2024) 7251-7259, http://dx.doi.org/10.1609/AAAI.V38I7.28554.

[74]

F. Deuser, K. Habel, N. Oswald, Sample4Geo: Hard negative sampling for cross-view geo-localisation, in: 2023 IEEE/CVF International Conference on Computer Vision, ICCV, IEEE, 2023, pp. 16801-16810, http://dx.doi.org/10.1109/ICCV51070.2023.01545.

[75]

P. Wang, Z. Yang, X. Chen, H. Xu, A transformer-based method for UAV-view geo-localization, in: Lecture Notes in Computer Science, Springer Nature Switzerland, 2023, pp. 332-344, http://dx.doi.org/10.1007/978-3-031-44223-0_27.

[76]

Z. Xia, O. Booij, M. Manfredi, J.F.P. Kooij, Visual cross-view metric localization with dense uncertainty estimates, in: European Conference on Computer Vision, Springer, 2022, pp. 90-106, http://dx.doi.org/10.48550/ARXIV.2208.08519.

[77]

D. Yuan, F. Maire, F. Dayoub, Cross-attention between satellite and ground views for enhanced fine-grained robot geo-localization, in: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV, IEEE, 2024, pp. 1238-1245, http://dx.doi.org/10.1109/WACV57701.2024.00128.

[78]

R. Cao, G. Qiu, Urban land use classification based on aerial and ground images, in: 2018 International Conference on Content-Based Multimedia Indexing, CBMI, IEEE, 2018, pp. 1-6, http://dx.doi.org/10.1109/CBMI.2018.8516552.

[79]

F. Fang, Y. Yu, S. Li, Z. Zuo, Y. Liu, B. Wan, Z. Luo, Synthesizing location semantics from street view images to improve urban land-use classification, Int. J. Geogr. Inf. Sci. 35 (9) (2020) 1802-1825, http://dx.doi.org/10.1080/13658816.2020.1831515.

[80]

I. Goodfellow, A. Courville, Y. Bengio, Deep learning, in: Adaptive Computation and Machine Learning, The MIT Press, Cambridge, Massachusetts, 2016, Includes bibliographical references and index.

[81]

N. Gong, L. Li, J. Sha, X. Sun, Q. Huang, A satellite-drone image cross-view geolocalization method based on multi-scale information and dual-channel attention mechanism, Remote Sens. 16 (6) (2024) 941, http://dx.doi.org/10.3390/RS16060941.

[82]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, 2020, http://dx.doi.org/10.48550/ARXIV.2010.11929, arXiv.

[83]

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, 2018, http://dx.doi.org/10.48550/ARXIV.1810.04805, arXiv

[84]

W. Wang, W. Chen, Q. Qiu, L. Chen, B. Wu, B. Lin, X. He, W. Liu, CrossFormer++: A versatile vision transformer hinging on cross-scale attention, IEEE Trans. Pattern Anal. Mach. Intell. 46 (5) (2024) 3123-3136, http://dx.doi.org/10.1109/TPAMI.2023.3341806.

[85]

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J.M. Alvarez, P. Luo, SegFormer: Simple and efficient design for semantic segmentation with transformers, 2021, http://dx.doi.org/10.48550/ARXIV.2105.15203, arXiv.

[86]

J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A.L. Yuille, Y. Zhou, TransUNet: Transformers make strong encoders for medical image segmentation, 2021, http://dx.doi.org/10.48550/ARXIV.2102.04306, arXiv.

[87]

B. Mildenhall, P.P. Srinivasan, M. Tancik, J.T. Barron, R. Ramamoorthi, R. Ng, NeRF: representing scenes as neural radiance fields for view synthesis, Commun. ACM 65 (1) (2021) 99-106, http://dx.doi.org/10.1145/3503250.

[88]

X. Li, J. Tupayachi, A. Sharmin, M. Martinez Ferguson, Drone-aided delivery methods, challenge, and the future: A methodological review, Drones 7 (3)(2023) 191, http://dx.doi.org/10.3390/DRONES7030191.

[89]

R. Strudel, R. Garcia, I. Laptev, C. Schmid, Segmenter: Transformer for semantic segmentation, in: 2021 IEEE/CVF International Conference on Computer Vision, ICCV, IEEE, 2021, http://dx.doi.org/10.1109/ICCV48922.2021.00717.

[90]

T. Lentsch, Z. Xia, H. Caesar, J.F.P. Kooij, SliceMatch: Geometry-guided aggregation for cross-view pose estimation, in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, IEEE, 2023, pp. 17225-17234, http://dx.doi.org/10.1109/CVPR52729.2023.01652.

[91]

F. Milletari, N. Navab, S.-A. Ahmadi, V-Net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 Fourth International Conference on 3D Vision, 3DV, IEEE, 2016, pp. 565-571, http://dx.doi.org/10.1109/3DV.2016.79.

[92]

M. Caron, H. Touvron, I. Misra, H. Jegou, J. Mairal, P. Bojanowski, A. Joulin, Emerging properties in self-supervised vision transformers, in: 2021 IEEE/CVF International Conference on Computer Vision, ICCV, IEEE, 2021, http://dx.doi.org/10.1109/ICCV48922.2021.00951.

[93]

S. Wazir, M.M. Fraz, HistoSeg: Quick attention with multi-loss function for multi-structure segmentation in digital histology images, in: 2022 12th International Conference on Pattern Recognition Systems, ICPRS, IEEE, 2022, pp. 1-7, http://dx.doi.org/10.1109/ICPRS54038.2022.9854067.

[94]

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An imperative style, high-performance deep learning library, 2019, http://dx.doi.org/10.48550/ARXIV.1912.01703, arXiv.

[95]

Y. Peng, X. Lin, N. Ma, J. Du, C. Liu, C. Liu, Q. Chen, SAM-LAD: Segment anything model meets zero-shot logic anomaly detection, 2024, http://dx.doi.org/10.48550/ARXIV.2406.00625, arXiv.

[96]

D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, http://dx.doi.org/10.48550/ARXIV.1412.6980, arXiv.

[97]

K. Berntorp, T. Hoang, S. Di Cairano, Motion planning of autonomous road vehicles by particle filtering, IEEE Trans. Intell. Veh. 4 (2) (2019) 197-210, http://dx.doi.org/10.1109/TIV.2019.2904394.

AI Summary AI Mindmap
PDF (4280KB)

417

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/