Composite-mask GAN based on refined optical flow and disparity map for SLAM visual odometry

Yuehui Ji , Jingwei Jiang , Junjie Liu , Yu Song , Qiang Gao

Optoelectronics Letters ›› 2025, Vol. 21 ›› Issue (12) : 730 -736.

PDF
Optoelectronics Letters ›› 2025, Vol. 21 ›› Issue (12) :730 -736. DOI: 10.1007/s11801-025-3160-7
Article
research-article

Composite-mask GAN based on refined optical flow and disparity map for SLAM visual odometry

Author information +
History +
PDF

Abstract

Although deep learning methods have been widely applied in slam visual odometry (VO) over the past decade with impressive improvements, the accuracy remains limited in complex dynamic environments. In this paper, a composite mask-based generative adversarial network (CMGAN) is introduced to predict camera motion and binocular depth maps. Specifically, a perceptual generator is constructed to obtain the corresponding parallax map and optical flow between two neighboring frames. Then, an iterative pose improvement strategy is proposed to improve the accuracy of pose estimation. Finally, a composite mask is embedded in the discriminator to sense structural deformation in the synthesized virtual image, thereby increasing the overall structural constraints of the network model, improving the accuracy of camera pose estimation, and reducing drift issues in the VO. Detailed quantitative and qualitative evaluations on the KITTI dataset show that the proposed framework outperforms existing conventional, supervised learning and unsupervised depth VO methods, providing better results in both pose estimation and depth estimation.

Keywords

A

Cite this article

Download citation ▾
Yuehui Ji, Jingwei Jiang, Junjie Liu, Yu Song, Qiang Gao. Composite-mask GAN based on refined optical flow and disparity map for SLAM visual odometry. Optoelectronics Letters, 2025, 21(12): 730-736 DOI:10.1007/s11801-025-3160-7

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Zhu Z, Liu J, Chen K, et al.. NeOR: neural exploration with feature-based visual odometry and tracking-failure-reduction policy. Optoelectronics letters, 2025, 21(5): 290-297 J]

[2]

Duan X Z, Xia C X, Luo S Q, et al.. Cross Bayesian-based fusion of global and local atmospheric light for depth prediction research. Journal of optoelectronics·laser, 2023, 34(7): 704-712[J]

[3]

Wang Y, Wang P, Yang Z H, et al.. UnOS: unified unsupervised optical-flow and stereo-depth estimation by watching videos. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16–20, 2019, Long Beach, CA, USA, 2019, New York, IEEE80638073[C]

[4]

ZHAO W T, WANG Y B, WANG Z H, et al. Self-supervised deep monocular visual odometry and depth estimation with observation variation[J]. Displays, 2023: 102553.

[5]

Song R J, Zhu R, Xiao Z L, et al.. ContextAVO: local context guided and refining poses for deep visual odometry. Neurocomputing, 2023, 533: 86-103 J]

[6]

Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16–21, 2012, Providence, RI, USA, 2012, New York, IEEE33543361 C]

[7]

Wang S, Clark R, Wen H, et al.. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. IEEE International Conference on Robotics and Automation (ICRA), May 29–June 3, 2017, Marina Bay Sands Convention Center, Singapore, 2017, New York, IEEE20432050[C]

[8]

Wang S, Clark R, Wen H, et al.. End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. International journal of robotics research, 2017, 4–5: 513-542[J]

[9]

Xue F, Wang X, Li S, et al.. Beyond tracking: selecting memory and refining poses for deep visual odometry. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16–20, 2019, Long Beach, CA, USA, 2019, New York, IEEE85758583[C]

[10]

Li Y, Ushiku Y, Harada T. Pose graph optimization for unsupervised monocular visual odometry. IEEE International Conference on Robotics and Automation (ICRA), May 20–24, 2019, Montreal, Canada, 2019, New York, IEEE54395445[C]

[11]

Almalioglu Y, Saputra M, De G, et al.. GANVO: unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. International Conference on Robotics and Automation (ICRA), May 20–24, 2019, Montreal, Canada, 2019, New York, IEEE54745480 C]

[12]

Zhao C, Yen G, Sun Q, et al.. Masked GAN for unsupervised depth and pose prediction with scale consistency. IEEE transactions on neural networks and learning systems, 2021, 32(12): 5392-5403 J]

[13]

Eddy I, Nikolaus M, Tonmoy S. FlowNet 2.0: evolution of optical flow estimation with deep networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21–26, 2017, Honolulu, HI, USA, 2017, New York, IEEE24622470[C]

[14]

Lin L, Wang W, Luo W, et al.. Unsupervised monocular visual odometry with decoupled camera pose estimation. Digital signal processing, 2021, 114: 103052 J]

[15]

Sun D, Yang X, Kautz J. PWC-net: CNNS for optical flow using pyramid, warping, and cost volume. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 18–22, 2018, Salt Lake City, UT, USA, 2018, New York, IEEE8934-8943[C]

[16]

Isola P, Zhu J, Zhou T, et al.. Image-to-image translation with conditional adversarial networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21–26, 2017, Honolulu, HI, USA, 2017, New York, IEEE11251134[C]

[17]

Kingma D, Ba Adam J. A method for stochastic optimization. International Conference on Learning Representations (ICLR), May 7–9, 2015, San Diego, CA, USA, 201517951812[C]

[18]

Luo Y, Xiao Y T, Zhang Y, et al.. Detection of loop closure in visual SLAM: a stacked assorted auto-encoder based approach. Optoelectronics letters, 2021, 17(6): 354-360 J]

[19]

Mur-Artal R, Tardos J. ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE transactions on robotics, 2017, 33(5): 1255-1262 J]

[20]

Shen T, Luo Z, Zhou L, et al.. Beyond photometric loss for self-supervised EGO-motion estimation. IEEE International Conference on Robotics and Automation (ICRA), May 20–24, 2019, Montreal, Canada, 2019, New York, IEEE63596365[C]

[21]

Li R, Wang S, Long Z, et al.. UnDeepVO: monocular visual odometry through unsupervised deep learning. IEEE International Conference on Robotics and Automation (ICRA), May 21–25, 2018, Brisbane, Queensland, Australia, 2018, New York, IEEE72867291[C]

[22]

Yasin A, Mehmet T, Muhamad R, et al.. SelfVIO: self-supervised deep monocular visual-inertial odometry and depth estimation. Neural networks, 2022, 150: 119-136 J]

[23]

Ranjan A, Jampani V, Balles L, et al.. Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 16–20, 2019, Long Beach, CA, USA, 2019, New York, IEEE1224012249[C]

RIGHTS & PERMISSIONS

Tianjin University of Technology

PDF

24

Accesses

0

Citation

Detail

Sections
Recommended

/