CAD-NeRF: learning NeRFs from uncalibrated few-view images by CAD model retrieval

Xin WEN, Xuening ZHU, Renjiao YI, Zhifeng WANG, Chenyang ZHU, Kai XU

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (10) : 1910706.

PDF(9718 KB)
PDF(9718 KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (10) : 1910706. DOI: 10.1007/s11704-024-40417-7
Image and Graphics
RESEARCH ARTICLE

CAD-NeRF: learning NeRFs from uncalibrated few-view images by CAD model retrieval

Author information +
History +

Abstract

Reconstructing from multi-view images is a longstanding problem in 3D vision, where neural radiance fields (NeRFs) have shown great potential and get realistic rendered images of novel views. Currently, most NeRF methods either require accurate camera poses or a large number of input images, or even both. Reconstructing NeRF from few-view images without poses is challenging and highly ill-posed. To address this problem, we propose CAD-NeRF, a method reconstructed from less than 10 images without any known poses. Specifically, we build a mini library of several CAD models from ShapeNet and render them from many random views. Given sparse-view input images, we run a model and pose retrieval from the library, to get a model with similar shapes, serving as the density supervision and pose initializations. Here we propose a multi-view pose retrieval method to avoid pose conflicts among views, which is a new and unseen problem in uncalibrated NeRF methods. Then, the geometry of the object is trained by the CAD guidance. The deformation of the density field and camera poses are optimized jointly. Then texture and density are trained and fine-tuned as well. All training phases are in self-supervised manners. Comprehensive evaluations of synthetic and real images show that CAD-NeRF successfully learns accurate densities with a large deformation from retrieved CAD models, showing the generalization abilities.

Graphical abstract

Keywords

Sparse-view NeRFs / CAD model retrieval

Cite this article

Download citation ▾
Xin WEN, Xuening ZHU, Renjiao YI, Zhifeng WANG, Chenyang ZHU, Kai XU. CAD-NeRF: learning NeRFs from uncalibrated few-view images by CAD model retrieval. Front. Comput. Sci., 2025, 19(10): 1910706 https://doi.org/10.1007/s11704-024-40417-7

Xin Wen received her BE degree from Chongqing University of Posts and Telecommunications, China and her MS degree from the University of Chinese Academy of Sciences, China. She is a PhD student at the National University of Defense Technology, China. Her research interests include image processing, medical image analysis, and 3D reconstruction

Xuening Zhu received the BE degree from Dalian University of Technology, China in 2021. Now she is pursuing the PhD degree in National University of Defense Technology, China. Her research interests include 3D vision, image-based rendering, and inverse rendering

Renjiao Yi received her Bachelor’s degree from the National University of Defense Technology, China in 2013 and her PhD from Simon Fraser University, Canada in 2019. She is currently an Associate Professor at the National University of Defense Technology, China. She is interested in 3D vision and computer graphics, including inverse rendering, image relighting, and scene reconstruction

Zhifeng Wang is currently a Master student at the College of Computer Science, National University of Defense Technology, China. His primary research interests include low-level vision, 3D vision, and medical image analysis

Chenyang Zhu is an Associate Professor at the School of Computer Science, National University of Defense Technology (NUDT), China. He received his Bachelor’s and Master’s degrees from NUDT, China in 2011 and 2013 respectively, and completed his PhD program at Simon Fraser University, Canada. He is interested in computer graphics, 3D vision, and robotics

Kai Xu (Senior Member, IEEE) received his PhD degree in computer science from the National University of Defense Technology (NUDT), China in 2011. From 2008 to 2010, he worked as a visiting PhD at the GrUVi Laboratory, Simon Fraser University, Canada. He is currently a Professor at the School of Computer Science, NUDT. He is also an Adjunct Professor at Simon Fraser University, Canada. His current research interests include 3D vision and embodied intelligence

References

[1]
Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R, Ng R . NeRF: representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2022, 65( 1): 99–106
[2]
Tancik M, Casser V, Yan X, Pradhan S, Mildenhall B P, Srinivasan P, Barron J T, Kretzschmar H. Block-NeRF: scalable large scene neural view synthesis. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 8248–8258
[3]
Barron J T, Mildenhall B, Tancik M, Hedman P, Martin-Brualla R, Srinivasan P P. Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 5855–5864
[4]
Martin-Brualla R, Radwan N, Sajjadi M S M, Barron J T, Dosovitskiy A, Duckworth D. NeRF in the wild: neural radiance fields for unconstrained photo collections. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 7210–7219
[5]
Sun J M, Wu T, Gao L . Recent advances in implicit representation-based 3D shape generation. Visual Intelligence, 2024, 2( 1): 9
[6]
Yan Y, Zhou Z, Wang Z, Gao J, Yang X . DialogueNeRF: towards realistic avatar face-to-face conversation video generation. Visual Intelligence, 2024, 2( 1): 24
[7]
Chen A, Xu Z, Zhao F, Zhang X, Xiang F, Yu J, Su H. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 14124–14133
[8]
Yu A, Ye V, Tancik M, Kanazawa A. pixelNeRF: neural radiance fields from one or few images. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 4578–4587
[9]
Tancik M, Mildenhall B, Wang T, Schmidt D, Srinivasan P P, Barron J T, Ng R. Learned initializations for optimizing coordinate-based neural representations. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 2846–2855
[10]
Jain A, Tancik M, Abbeel P. Putting NeRF on a diet: semantically consistent few-shot view synthesis. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 5885–5894
[11]
Zhang J Y, Yang G, Tulsiani S, Ramanan D. NeRS: neural reflectance surfaces for sparse-view 3D reconstruction in the wild. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 2283
[12]
Wang Q, Wang Z, Genova K, Srinivasan P, Zhou H, Barron J T, Martin-Brualla R, Snavely N, Funkhouser T. IBRNet: learning multi-view image-based rendering. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 4690–4699
[13]
Deng K, Liu A, Zhu J Y, Ramanan D. Depth-supervised NeRF: fewer views and faster training for free. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 12882–12891
[14]
Chang A X, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, Xiao J, Yi L, Yu F. ShapeNet: an information-rich 3D model repository. 2015, arXiv preprint arXiv: 1512.03012
[15]
Choy C B, Xu D, Gwak J, Chen K, Savarese S. 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Proceedings of the Computer Vision–ECCV 2016: 14th European Conference. 2016, 628–644
[16]
Girdhar R, Fouhey D F, Rodriguez M, Gupta A. Learning a predictable and generative vector representation for objects. In: Proceedings of the Computer Vision–ECCV 2016: 14th European Conference. 2016, 484–499
[17]
Yu Z, Zheng X, Huang F, Guo W, Sun L, Yu Z . A framework based on sparse representation model for time series prediction in smart city. Frontiers of Computer Science, 2021, 15( 1): 151305
[18]
Gkioxari G, Johnson J, Malik J. Mesh R-CNN. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 9785–9795
[19]
Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y G. Pixel2Mesh: generating 3D mesh models from single RGB images. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 52–67
[20]
Feng X, Wan W, Xu R Y D, Chen H, Li P, Sánchez J A . A perceptual quality metric for 3D triangle meshes based on spatial pooling. Frontiers of Computer Science, 2018, 12( 4): 798–812
[21]
Yang G, Huang X, Hao Z, Liu M Y, Belongie S, Hariharan B. PointFlow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 4541–4550
[22]
Fan H, Su H, Guibas L J. A point set generation network for 3D object reconstruction from a single image. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 605–613
[23]
Tian H, Qin Z, Yi R, Zhu C, Xu K. Tensorformer: normalized matrix attention transformer for high-quality point cloud reconstruction. 2023, arXiv preprint arXiv: 2306.15989
[24]
Tian H, Xu K. Surface reconstruction from point clouds via grid-based intersection prediction. 2024, arXiv preprint arXiv: 2403.14085
[25]
Sun Y, Zhang X, Miao Y . A review of point cloud segmentation for understanding 3D indoor scenes. Visual Intelligence, 2024, 2( 1): 14
[26]
Reiser C, Peng S, Liao Y, Geiger A. KiloNeRF: speeding up neural radiance fields with thousands of tiny MLPs. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021
[27]
Lin H, Peng S, Xu Z, Yan Y, Shuai Q, Bao H, Zhou X. Efficient neural radiance fields for interactive free-viewpoint video. In: Proceedings of the SIGGRAPH Asia Conference Papers. 2022
[28]
Xu Q, Xu Z, Philip J, Bi S, Shu Z, Sunkavalli K, Neumann U. Point-NeRF: point-based neural radiance fields. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022
[29]
Zhu X, Yi R, Wen X, Zhu C, Xu K. Relighting scenes with object insertions in neural radiance fields. 2024, arXiv preprint arXiv: 2406.14806
[30]
Cheng Z, Esteves C, Jampani V, Kar A, Maji S, Makadia A. Lu-nerf: Scene and pose estimation by synchronizing local unposed nerfs. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023, 18312-18321
[31]
Lin C H, Ma W C, Torralba A, Lucey S. BARF: bundle-adjusting neural radiance fields. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 5741–5751
[32]
Hamdi A, Ghanem B, Nießner M. SPARF: large-scale learning of 3D sparse radiance fields from few input images. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision Workshops. 2022
[33]
Fan B, Dai Y, Seo Y, He M . A revisit of the normalized eight-point algorithm and a self-supervised deep solution. Visual Intelligence, 2024, 2( 1): 3
[34]
Jeong Y, Ahn S, Choy C, Anandkumar A, Cho M, Park J. Self-calibrating neural radiance fields. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 5846–5854
[35]
Wu G, Li Y, Huang Y, Liu Y . Joint view synthesis and disparity refinement for stereo matching. Frontiers of Computer Science, 2019, 13( 6): 1337–1352
[36]
Niemeyer M, Barron J T, Mildenhall B, Sajjadi M S M, Geiger A, Radwan N. RegNeRF: regularizing neural radiance fields for view synthesis from sparse inputs. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 5480–5490
[37]
Kim M, Seo S, Han B. InfoNeRF: ray entropy minimization for few-shot neural volume rendering. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 12912–12921
[38]
Yuan Y J, Lai Y K, Huang Y H, Kobbelt L, Gao L . Neural radiance fields from sparse RGB-D images for high-quality view synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45( 7): 8713–8728
[39]
Gao Z, Yi R, Zhu C, Zhuang K, Chen W, Xu K. Generic objects as pose probes for few-shot view synthesis. 2024, arXiv preprint arXiv: 2408.16690
[40]
Deng Y, Yang J, Tong X. Deformed implicit field: modeling 3D shapes with learned dense correspondence. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 10286–10296
[41]
Qin F, Gao N, Peng Y, Wu Z, Shen S, Grudtsin A . Fine-grained leukocyte classification with deep residual learning for microscopic images. Computer Methods and Programs in Biomedicine, 2018, 162: 243–252
[42]
Qin F, Qiu S, Gao S, Bai J . 3D CAD model retrieval based on sketch and unsupervised variational autoencoder. Advanced Engineering Informatics, 2022, 51: 101427
[43]
Hou J, Luo C, Qin F, Shao Y, Chen X . FuS-GCN: efficient b-rep based graph convolutional networks for 3D-CAD model classification and retrieval. Advanced Engineering Informatics, 2023, 56: 102008
[44]
Schönberger J L, Frahm J M. Structure-from-motion revisited. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016
[45]
Riegler G, Ulusoy A O, Bischof H, Geiger A. OctNetFusion: learning depth fusion from data. In: Proceedings of 2017 International Conference on 3D Vision. 2017, 57–66
[46]
Wang Z, Wu S, Xie W, Chen M, Prisacariu V A. NeRF--: neural radiance fields without known camera parameters. 2021, arXiv preprint arXiv: 2102.07064
[47]
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J. 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015
[48]
Denninger M, Sundermeyer M, Winkelbauer D, Olefir D, Hodan T, Zidan Y, Elbadrawy M, Knauer M, Katam H T, Lodhi A. BlenderProc: reducing the reality gap with photorealistic rendering. In: Proceedings of the Robotics: Science and Systems (RSS) Workshops. 2020
[49]
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 721
[50]
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
[51]
Wang Z, Bovik A C, Sheikh H R, Simoncelli E P . Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 2004, 13( 4): 600–612
[52]
Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 586–595
[53]
Yang J, Pavone M, Wang Y. FreeNeRF: improving few-shot neural rendering with free frequency regularization. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 8254–8263
[54]
Sarlin P E, DeTone D, Malisiewicz T, Rabinovich A. SuperGlue: learning feature matching with graph neural networks. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 4938–4947
[55]
Bian W, Wang Z, Li K, Bian J W. NoPe-NeRF: optimising neural radiance field with no pose prior. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 4160–4169

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (NSFC) (Grant Nos. 62325221, 62132021, and 62372457), the Young Elite Scientists Sponsorship Program by CAST, China (Grant No. 2023QNRC001), the Natural Science Foundation of Hunan Province, China (Grant Nos. 2021RC3071 and 2022RC1104), and the National University of Defense Technology Research Grants, China (Grant No. ZK22-52).

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

RIGHTS & PERMISSIONS

2025 Higher Education Press
AI Summary AI Mindmap
PDF(9718 KB)

Accesses

Citations

Detail

Sections
Recommended

/