Enhancing monocular visual positioning: 3D target-assisted initialization

Junyu MA; Changku SUN; Peng WANG; Luhua FU

doi:10.62756/jmsi.1674-8042.2026003

Journal of Measurement Science and Instrumentation ›› 2026, Vol. 17 ›› Issue (1) :36 -48. DOI: 10.62756/jmsi.1674-8042.2026003

Measurement theory and technology

research-article

Enhancing monocular visual positioning: 3D target-assisted initialization

Author information +

History +

PDF (4903KB)

Abstract

Monocular visual positioning systems are valued for their low cost and straightforward calibration. However, the lack of real-scale information and the complexity of initialization processes limit their application in scenarios requiring accurate absolute positioning. Existing solutions present trade-offs: markerless approaches depend on environmental priors (e.g., fixed camera height constraints) for scale recovery, while marker-based methods typically necessitate that target patterns remain within the camera’s field of view throughout the process. To address these challenges, we propose a 3D target-assisted initialization method that enables scale recovery with just two target images. This modular approach can be seamlessly integrated into monocular simultaneous localization and mapping (SLAM) frameworks. We validated our proposed initialization method through integration with ORB-SLAM3 and semi-direct visual odometry (SVO). Experimental results demonstrated that our method provides real-scale information without compromising real-time performance, making it suitable for applications such as indoor navigation and industrial robot localization, where accurate absolute positioning is essential.

Keywords

monocular simultaneous localization and mapping(SLAM) / real-time positioning / scale recovery / 3D target / absolute positioning / visual odometry

Cite this article

Download citation ▾

Junyu MA, Changku SUN, Peng WANG, Luhua FU. Enhancing monocular visual positioning: 3D target-assisted initialization. Journal of Measurement Science and Instrumentation, 2026, 17(1): 36-48 DOI:10.62756/jmsi.1674-8042.2026003

登录浏览全文

4963

注册一个新账户忘记密码

Acknowledgement

This work was supported by the Science and Technology Program Project of Tianjin (No. 24ZXZSSS00300).

Declaration of conflicting interests

The authors have no conflict of interests related to this publication.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	LI X X, GE M R, DAI X L, et al. Accuracy and reliability of multi-GNSS real-time precise positioning: GPS, GLONASS, BeiDou, and Galileo. Journal of Geodesy, 2015, 89(6): 607-635.

[2]	MENG Q, LI S Y, JIANG Y Y, et al. Smartphone-based GNSS/PDR integration navigation enhanced by measurements resilient adjustment under challenging scenarios. GPS Solutions, 2024, 29(1): 23.

[3]	OU C W, CHAO C J, CHANG F S, et al. A ZigBee position technique for indoor localization based on proximity learning//2017 IEEE International Conference on Mechatronics and Automation, August 6-9, 2017, Takamatsu, Japan. New York: IEEE, 2017: 875-880.

[4]

BELLAVISTA-PARENT V, TORRES-SOSPEDRA J, PEREZ-NAVARRO A. New trends in indoor positioning based on WiFi and machine learning: a systematic review//2021 International Conference on Indoor Positioning and Indoor Navigation, November 29 - December 2, 2021, Lloret de Mar, Spain. New York: IEEE, 2022: 1-8.

[5]	CASTILLO-CARA M, LOVÓN-MELGAREJO J, BRAVO-ROCCA G, et al. An empirical study of the transmission power setting for bluetooth-based indoor localization mechanisms. Sensors, 2017, 17(6): 1318.

[6]	PLANGGER J, GOVINDASAMY RAVICHANDRAN H, RODIN S C, et al. System design and performance analysis of indoor real-time localization using UWB infrastructure//2023 IEEE International Systems Conference, April 17-20, 2023, Vancouver, BC, Canada. New York: IEEE, 2023: 1-8.

[7]	GUO Y G, LI Z C, ZHAO W B, et al. Two-laser-tracker system for precise coordinates transmission. Optics and Precision Engineering, 2020, 28(1): 30-38.

[8]	SAPUTRA M R U, MARKHAM A, TRIGONI N. Visual SLAM and structure from motion in dynamic environments: a survey. ACM Computing Surveys, 2019, 51(2): 1-36.

[9]	SCHÖNBERGER J L, FRAHM J M. Structure-from-motion revisited//2016 IEEE Conference on Computer Vision and Pattern Recognition, June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 2016: 4104-4113.

[10]	AL-TAWIL B, HEMPEL T, ABDELRAHMAN A, et al. A review of visual SLAM for robotics: evolution, properties, and future applications. Frontiers in Robotics and AI, 2024, 11: 1347985.

[11]	LU X Y, WANG H, TANG S M, et al. DM-SLAM: monocular SLAM in dynamic environments. Applied Sciences, 2020, 10(12): 4252.

[12]	NAWAL M, BRAHMANAGE G, LEUNG H. RGB-PD SLAM: scale consistent monocular SLAM using predicted depth//2024 International Conference on Consumer Electronics, July 9-11, 2024, Taichung, China. New York: IEEE, 2024: 359-360.

[13]	LI R H, WANG S, LONG Z Q, et al. UnDeepVO: monocular visual odometry through unsupervised deep learning//2018 IEEE International Conference on Robotics and Automation, May 21-25, 2018, Brisbane, QLD, Australia. New York: IEEE, 2018: 7286-7291.

[14]	KIM U H, KIM S H, KIM J H. SimVODIS: simultaneous visual odometry, object detection, and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1): 428-441.

[15]	KENDALL A, CIPOLLA R. Geometric loss functions for camera pose regression with deep learning//2017 IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York: IEEE, 2017: 6555-6564.

[16]	ARANDJELOVIĆ R, GRONAT P, TORII A, et al. NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1437-1451.

[17]	TIAN R, ZHANG Y Z, ZHU D L, et al. Accurate and robust scale recovery for monocular visual odometry based on plane geometry//2021 IEEE International Conference on Robotics and Automation, May 30 - June 5, 2021, Xi'an, China. New York: IEEE, 2021: 5296-5302.

[18]	LIU T, KUANG J, GE W F, et al. A simple positioning system for large-scale indoor patrol inspection using foot-mounted INS, QR code control points, and smartphone. IEEE Sensors Journal, 2021, 21(4): 4938-4948.

[19]	CAMPOS C, ELVIRA R, RODRÍGUEZ J J G, et al. ORB-SLAM3: an accurate open-source library for visual, visual–inertial, and multimap SLAM. IEEE Transactions on Robotics, 2021, 37(6): 1874-1890.

[20]	OLSON E. AprilTag: a robust and flexible visual fiducial system//2011 IEEE International Conference on Robotics and Automation, May 9-13, 2011, Shanghai, China. New York: IEEE, 2011: 3400-3407.

[21]	STURM J, ENGELHARD N, ENDRES F, et al. A benchmark for the evaluation of RGB-D SLAM systems//2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, October 7-12, 2012, Vilamoura-Algarve, Portugal. New York: IEEE, 2012: 573-580.

[22]	VON STUMBERG L, USENKO V, CREMERS D. Direct sparse visual-inertial odometry using dynamic marginalization//2018 IEEE International Conference on Robotics and Automation, May 21-25, 2018, Brisbane, QLD, Australia. New York: IEEE, 2018: 2510-2517.

[23]	QIN T, LI P L, SHEN S J. VINS-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 2018, 34(4): 1004-1020.