Long-term reprojection loss for self-supervised monocular depth estimation in endoscopic surgery

Xiaowei Shi , Beilei Cui , Matthew J. Clarkson , Mobarakol Islam

Artificial Intelligence Surgery ›› 2024, Vol. 4 ›› Issue (3) : 247 -57.

PDF
Artificial Intelligence Surgery ›› 2024, Vol. 4 ›› Issue (3) :247 -57. DOI: 10.20517/ais.2024.17
Original Article

Long-term reprojection loss for self-supervised monocular depth estimation in endoscopic surgery

Author information +
History +
PDF

Abstract

Aim: Depth information plays a key role in enhanced perception and interaction in image-guided surgery. However, it is difficult to obtain depth information with monocular endoscopic surgery due to a lack of reliable cues for perceiving depth. Although there are reprojection loss-based self-supervised learning techniques to estimate depth and pose, the temporal information from the adjacent frames is not efficiently utilized to handle occlusion in surgery.

Methods: We design long-term reprojection loss (LT-RL) self-supervised monocular depth estimation techniques by integrating longer temporal sequences into reprojection to learn better perception and to address occlusion artifacts in image-guided laparoscopic and robotic surgery. For this purpose, we exploit four temporally adjacent source frames before and after the target frame, where conventional reprojection loss uses two adjacent frames. The pixels that are visible in the target frame but occluded in the immediate two adjacent frames will produce the inaccurate depth but a higher chance to appear in the four adjacent frames during the calculation of minimum reprojection loss.

Results: We validate LT-RL on the benchmark surgical datasets of Stereo correspondence and reconstruction of endoscopic data (SCARED) and Hamlyn to compare the performance with other state-of-the-art depth estimation methods. The experimental results show that our proposed technique yields 2%-4% better root-mean-squared error (RMSE) over the baselines of vanilla reprojection loss.

Conclusion: Our LT-RL self-supervised depth and pose estimation technique is a simple yet effective method to tackle occlusion artifacts in monocular surgical video. It does not add any training parameters, making it flexible for integration with any network architecture and improving the performance significantly.

Keywords

Monocular depth estimation / self-supervised learning / reprojection loss / robotic surgery

Cite this article

Download citation ▾
Xiaowei Shi, Beilei Cui, Matthew J. Clarkson, Mobarakol Islam. Long-term reprojection loss for self-supervised monocular depth estimation in endoscopic surgery. Artificial Intelligence Surgery, 2024, 4(3): 247-57 DOI:10.20517/ais.2024.17

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Godard C,Firman M.Digging into self-supervised monocular depth estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27 - Nov 02; Seoul, Korea. IEEE; 2019. pp. 3827-37.

[2]

Godard C,Brostow GJ.Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21-26; Honolulu, USA. IEEE; 2017. pp. 6602-11.

[3]

Lyu X,Wang M.Hr-depth: high resolution self-supervised monocular depth estimation.AAAI Conf Artif Intell2021;35:2294301

[4]

Garg R,Carneiro G.Unsupervised CNN for single view depth estimation: Geometry to the rescue. In: Computer Vision - ECCV 2016: 14th European Conference; 2016 Oct 11-14; Amsterdam, the Netherlands. Springer; 2016. pp. 740-56.

[5]

Jaderberg M,Zisserman A.Spatial transformer networks. 2015..Available from: https://proceedings.neurips.cc/paper_files/paper/2015/file/33ceb07bf4eeb3da587e268d663aba1a-Paper.pdf. [Last accessed on 5 Sep 2024]

[6]

Wang Z,Bovik AC.Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003; 2023 Nov 09-12; Pacific Grove, USA. IEEE; 2003. pp. 1398-402.

[7]

Zhou T,Snavely N.Unsupervised learning of depth and ego-motion from video. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21-26; Honululu, USA. IEEE; 2017. pp. 6612-9.

[8]

Ranjan A,Balles L.Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15-20; Long Beach, USA. IEEE; 2019. pp. 12232-41.

[9]

Shao S,Chen W.Self-supervised monocular depth and ego-motion estimation in endoscopy: appearance flow to the rescue.Med Image Anal2022;77:102338

[10]

Ronneberger O,Brox T.U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015: 18th International Conference; 2015 Oct 5-9; Munich, Germany. Springer; 2015. pp. 234-41.

[11]

He K,Ren S.Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Las Vegas, USA. IEEE; 2016. pp. 770-8.

[12]

Allan M,Wang C. Stereo correspondence and reconstruction of endoscopic data challenge. arXiv. [Preprint.] Jan 28, 2021 [accessed on 2024 Sep 5]. Available from: https://doi.org/10.48550/arXiv.2101.01133.

[13]

Shao S,Chen W.Self-supervised learning for monocular depth estimation on minimally invasive surgery scenes. In: 2021 IEEE International Conference on Robotics and Automation (ICRA); 2021 May 30 - Jun 05; Xi’an, China. IEEE; 2021. pp. 7159-65.

[14]

Recasens D,Fácil JM,Civera J.Endo-depth-and-motion: reconstruction and tracking in endoscopic videos using depth networks and photometric constraints.IEEE Robot Autom Lett2021;6:7225-32

[15]

Li W,Oda M,Misawa K.Context encoder guided self-supervised siamese depth estimation based on stereo laparoscopic images. In: Medical Imaging 2021: Image-Guided Procedures, Robotic Interventions, and Modeling. 2021. pp. 77-82.

[16]

Zhang N,Vosselman G.Lite-mono: a lightweight CNN and transformer architecture for self-supervised monocular depth estimation. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 Jun 17-24; Vancouver, Canada. IEEE; 2023. pp. 18537-46.

[17]

Zhao C,Poggi M.Monovit: self-supervised monocular depth estimation with a vision transformer. In: 2022 International Conference on 3D Vision (3DV); 2022 Sep 12-16; Prague, Czech Pepublic. IEEE; 2022. pp. 668-78.

[18]

Yang L,Huang Z,Feng J. Depth anything: unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024. pp. 10371-81. Available from: https://openaccess.thecvf.com/content/CVPR2024/html/Yang_Depth_Anything_Unleashing_the_Power_of_Large-Scale_Unlabeled_Data_CVPR_2024_paper.html. [Last accessed on 5 Sep 2024]

AI Summary AI Mindmap
PDF

121

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/