Real-time lightweight self-supervised monocular depth estimation
Tianxiang YANG , Lingjun MENG , Hong JIN , Wenjie FENG , Xinhao LIU
Journal of Measurement Science and Instrumentation ›› 2026, Vol. 17 ›› Issue (2) : 278 -296.
Monocular depth estimation aims to predict depth information within a scene from a single RGB image, but many models remain computationally intensive for real-time inference on resource-constrained edge devices. This paper presents a lightweight self-supervised monocular depth estimation network that balances accuracy and efficiency through targeted encoder–decoder design. The encoder employed a synergistic modeling approach combining decomposable large-kernel convolutions and local depthwise convolutions to capture both long-range context and local details with low computational overhead. The decoder utilized cross-scale feature differences as guidance to dynamically fuse multi-scale features, enhancing detail recovery and geometric consistency under lightweight constraints. In addition, a temporal soft fusion reprojection loss was employed to better leverage the complementary information of forward and backward frames, improving the robustness of self-supervised training. The model contained 3.0 M parameters and required 3.5 GFLOPs of computation. On KITTI, it achieves Abs Rel=0.105 and δ1=0.892. On Make3D, it achieves Abs Rel=0.308 in a zero-shot setting. On a Rockchip RK3588S, a hybrid-quantized multi-thread implementation runs at 67 frames/s. The results demonstrated that the proposed method achieved a favorable accuracy–efficiency balance on edge devices, making it suitable for real-time monocular depth estimation tasks.
monocular depth estimation / deep learning / self-supervised learning / large-kernel attention / differential-driven dynamic fusion / lightweight network / RK3588S
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
BAE J, |
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
/
| 〈 |
|
〉 |