Unsupervised monocular depth estimation with aggregating image features and wavelet SSIM (Structural SIMilarity) loss

Bingen Li , Hao Zhang , Zhuping Wang , Chun Liu , Huaicheng Yan , Lingling Hu

Intelligence & Robotics ›› 2021, Vol. 1 ›› Issue (1) : 84 -98.

PDF
Intelligence & Robotics ›› 2021, Vol. 1 ›› Issue (1) :84 -98. DOI: 10.20517/ir.2021.06
Research Article
Research Article

Unsupervised monocular depth estimation with aggregating image features and wavelet SSIM (Structural SIMilarity) loss

Author information +
History +
PDF

Abstract

Unsupervised learning has shown to be effective for image depth prediction. However, the accuracy is restricted because of uncertain moving objects and the lack of other proper constraints. This paper focuses on how to improve the accuracy of depth prediction without increasing the computational burden of the depth network. Aggregated residual transformations are embedded in the depth network to extract high-dimensional image features. A more accurate mapping relationship between feature map and depth map can be built without bringing extra network computational burden. Additionally, the 2D discrete wavelet transform is applied to the structural similarity loss (SSIM) to reduce the photometric loss effectively, which can divide the entire image into various patches and obtain high-quality image information. Finally, the effectiveness of the proposed method is demonstrated. The training model can improve the performance of the depth network on the KITTI dataset and decrease the domain gap on the Make3D dataset.

Keywords

Unsupervised depth estimation / computational complexity / aggregated residual transformations / 2D discrete wavelet transform

Cite this article

Download citation ▾
Bingen Li, Hao Zhang, Zhuping Wang, Chun Liu, Huaicheng Yan, Lingling Hu. Unsupervised monocular depth estimation with aggregating image features and wavelet SSIM (Structural SIMilarity) loss. Intelligence & Robotics, 2021, 1(1): 84-98 DOI:10.20517/ir.2021.06

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Zhang K,Li Y.Visual tracking and depth estimation of mobile robots without desired velocity information..IEEE Trans Cybern2018;50:361-73

[2]

Xiao J,Gao Y.Robust fusion of color and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints..IEEE Trans Cybern2017;48:2485-99

[3]

Gedik OS.3-D rigid body tracking using vision and depth sensors..IEEE Trans Cybern2013;43:1395-405

[4]

van der Sommen F,With P.Accurate biopsy-needle depth estimation in limited-angle tomography using multi-view geometry. In: Medical Imaging 2016: Image-Guided Procedures, Robotic Interventions, and Modeling. vol. 9786. International Society for Optics and Photonics2016;97860D

[5]

Eigen D,Fergus R.Depth map prediction from a single image using a multi-scale deep network..arXiv preprint arXiv: 140622832014;

[6]

Chang Y,Sun J.Joint reflection removal and depth estimation from a single image..IEEE Trans Cybern2020;

[7]

Liu F,Lin G.Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2015;5162-70

[8]

Laina I,Belagiannis V,Navab N.Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV).2016;IEEE239-48

[9]

Chen W,Yang D.Single-image depth perception in the wild..Advances in Neural Information Processing Systems2016;29:730-38

[10]

Kuznietsov Y,Leibe B.Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2017;6647-55

[11]

Garg R,Carneiro G.Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: European Conference on Computer Vsion.2016;Springer740-56

[12]

Godard C,Brostow GJ.Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2017;270-79

[13]

Zhan H,Weerasekera CS.Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2018;340-49

[14]

Li R,Long Z.Undeepvo: Monocular visual odometry through unsupervised deep learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA).2018;IEEE7286-91

[15]

Poggi M,Tosi F.Towards real-time unsupervised monocular depth estimation on cpu. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).2018;IEEE5848-54

[16]

Zhou T,Snavely N.Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2017;1851-58

[17]

Casser V,Mahjourian R.Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In: Proceedings of the AAAI Conference on Artificial Intelligence.2019;33:8001-8

[18]

Yin Z.Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2018;1983-92

[19]

Luo C,Wang P.Every pixel counts++: Joint learning of geometry and motion with 3d holistic understanding..IEEE Trans Pattern Anal Mach Intell2019;42:2624-41

[20]

Xie S,Dollár P,He K.Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2017;1492-500

[21]

Yang HH,Tsai YCJ.Y-net: Multi-scale feature aggregation network with wavelet structure similarity loss function for single image dehazing. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).2020;IEEE2628-32

[22]

Godard C,Firman M.Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision2019;3828-38

[23]

Wang C,Zhu R.Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2018;2022-30

[24]

Ranjan A,Balles L.Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition2019;12240-49

[25]

Nair V.Rectified linear units improve restricted boltzmann machines. In: Icml2010;

[26]

Wang Z,Sheikh HR.Image quality assessment: from error visibility to structural similarity..IEEE Trans Image Process2004;13:600-612

[27]

Ketkar N.Introduction to pytorch. In: Deep learning with python.2017;Springer195-208

[28]

Kingma DP.Adam: a method for stochastic optimization..arXiv preprint arXiv:141269802014;

[29]

Yang Z,Xu W,Nevatia R.Unsupervised learning of geometry with edge-aware depth-normal consistency..arXiv preprint arXiv:1711036652017;

[30]

Mahjourian R,Angelova A.Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2018;5667-75

[31]

Zou Y,Huang JB.Df-net: Unsupervised joint learning of depth and flow using cross-task consistency. In: Proceedings of the European Conference on Computer Vision (ECCV)2018;36-53

[32]

Yang Z,Wang Y,Nevatia R.Lego: Learning edge with geometry all at once by watching videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2018;225-34

[33]

Mur-Artal R,Tardos JD.ORB-SLAM: a versatile and accurate monocular SLAM system..IEEE T ROBOT2015;31:1147-63

AI Summary AI Mindmap
PDF

139

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/