CAD-NeRF: learning NeRFs from uncalibrated few-view images by CAD model retrieval

Xin WEN; Xuening ZHU; Renjiao YI; Zhifeng WANG; Chenyang ZHU; Kai XU

doi:10.1007/s11704-024-40417-7

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (10) : 1910706 DOI: 10.1007/s11704-024-40417-7

Image and Graphics

RESEARCH ARTICLE

CAD-NeRF: learning NeRFs from uncalibrated few-view images by CAD model retrieval

Author information +

History +

PDF (9718KB)

Abstract

Reconstructing from multi-view images is a longstanding problem in 3D vision, where neural radiance fields (NeRFs) have shown great potential and get realistic rendered images of novel views. Currently, most NeRF methods either require accurate camera poses or a large number of input images, or even both. Reconstructing NeRF from few-view images without poses is challenging and highly ill-posed. To address this problem, we propose CAD-NeRF, a method reconstructed from less than 10 images without any known poses. Specifically, we build a mini library of several CAD models from ShapeNet and render them from many random views. Given sparse-view input images, we run a model and pose retrieval from the library, to get a model with similar shapes, serving as the density supervision and pose initializations. Here we propose a multi-view pose retrieval method to avoid pose conflicts among views, which is a new and unseen problem in uncalibrated NeRF methods. Then, the geometry of the object is trained by the CAD guidance. The deformation of the density field and camera poses are optimized jointly. Then texture and density are trained and fine-tuned as well. All training phases are in self-supervised manners. Comprehensive evaluations of synthetic and real images show that CAD-NeRF successfully learns accurate densities with a large deformation from retrieved CAD models, showing the generalization abilities.

Graphical abstract

Keywords

Sparse-view NeRFs / CAD model retrieval

Cite this article

Download citation ▾

Xin WEN, Xuening ZHU, Renjiao YI, Zhifeng WANG, Chenyang ZHU, Kai XU. CAD-NeRF: learning NeRFs from uncalibrated few-view images by CAD model retrieval. Front. Comput. Sci., 2025, 19(10): 1910706 DOI:10.1007/s11704-024-40417-7

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Mildenhall B, Srinivasan P P, Tancik M, Barron J T, Ramamoorthi R, Ng R . NeRF: representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2022, 65( 1): 99–106

[2]	Tancik M, Casser V, Yan X, Pradhan S, Mildenhall B P, Srinivasan P, Barron J T, Kretzschmar H. Block-NeRF: scalable large scene neural view synthesis. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 8248–8258

[3]	Barron J T, Mildenhall B, Tancik M, Hedman P, Martin-Brualla R, Srinivasan P P. Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 5855–5864

[4]	Martin-Brualla R, Radwan N, Sajjadi M S M, Barron J T, Dosovitskiy A, Duckworth D. NeRF in the wild: neural radiance fields for unconstrained photo collections. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 7210–7219

[5]	Sun J M, Wu T, Gao L . Recent advances in implicit representation-based 3D shape generation. Visual Intelligence, 2024, 2( 1): 9

[6]	Yan Y, Zhou Z, Wang Z, Gao J, Yang X . DialogueNeRF: towards realistic avatar face-to-face conversation video generation. Visual Intelligence, 2024, 2( 1): 24

[7]	Chen A, Xu Z, Zhao F, Zhang X, Xiang F, Yu J, Su H. MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 14124–14133

[8]	Yu A, Ye V, Tancik M, Kanazawa A. pixelNeRF: neural radiance fields from one or few images. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 4578–4587

[9]	Tancik M, Mildenhall B, Wang T, Schmidt D, Srinivasan P P, Barron J T, Ng R. Learned initializations for optimizing coordinate-based neural representations. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 2846–2855

[10]	Jain A, Tancik M, Abbeel P. Putting NeRF on a diet: semantically consistent few-shot view synthesis. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 5885–5894

[11]	Zhang J Y, Yang G, Tulsiani S, Ramanan D. NeRS: neural reflectance surfaces for sparse-view 3D reconstruction in the wild. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 2283

[12]	Wang Q, Wang Z, Genova K, Srinivasan P, Zhou H, Barron J T, Martin-Brualla R, Snavely N, Funkhouser T. IBRNet: learning multi-view image-based rendering. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 4690–4699

[13]	Deng K, Liu A, Zhu J Y, Ramanan D. Depth-supervised NeRF: fewer views and faster training for free. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 12882–12891

[14]	Chang A X, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, Xiao J, Yi L, Yu F. ShapeNet: an information-rich 3D model repository. 2015, arXiv preprint arXiv: 1512.03012

[15]	Choy C B, Xu D, Gwak J, Chen K, Savarese S. 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Proceedings of the Computer Vision–ECCV 2016: 14th European Conference. 2016, 628–644

[16]	Girdhar R, Fouhey D F, Rodriguez M, Gupta A. Learning a predictable and generative vector representation for objects. In: Proceedings of the Computer Vision–ECCV 2016: 14th European Conference. 2016, 484–499

[17]	Yu Z, Zheng X, Huang F, Guo W, Sun L, Yu Z . A framework based on sparse representation model for time series prediction in smart city. Frontiers of Computer Science, 2021, 15( 1): 151305

[18]	Gkioxari G, Johnson J, Malik J. Mesh R-CNN. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 9785–9795

[19]	Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y G. Pixel2Mesh: generating 3D mesh models from single RGB images. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 52–67

[20]	Feng X, Wan W, Xu R Y D, Chen H, Li P, Sánchez J A . A perceptual quality metric for 3D triangle meshes based on spatial pooling. Frontiers of Computer Science, 2018, 12( 4): 798–812

[21]	Yang G, Huang X, Hao Z, Liu M Y, Belongie S, Hariharan B. PointFlow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 4541–4550

[22]	Fan H, Su H, Guibas L J. A point set generation network for 3D object reconstruction from a single image. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 605–613

[23]	Tian H, Qin Z, Yi R, Zhu C, Xu K. Tensorformer: normalized matrix attention transformer for high-quality point cloud reconstruction. 2023, arXiv preprint arXiv: 2306.15989

[24]	Tian H, Xu K. Surface reconstruction from point clouds via grid-based intersection prediction. 2024, arXiv preprint arXiv: 2403.14085

[25]	Sun Y, Zhang X, Miao Y . A review of point cloud segmentation for understanding 3D indoor scenes. Visual Intelligence, 2024, 2( 1): 14

[26]	Reiser C, Peng S, Liao Y, Geiger A. KiloNeRF: speeding up neural radiance fields with thousands of tiny MLPs. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021

[27]	Lin H, Peng S, Xu Z, Yan Y, Shuai Q, Bao H, Zhou X. Efficient neural radiance fields for interactive free-viewpoint video. In: Proceedings of the SIGGRAPH Asia Conference Papers. 2022

[28]	Xu Q, Xu Z, Philip J, Bi S, Shu Z, Sunkavalli K, Neumann U. Point-NeRF: point-based neural radiance fields. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022

[29]	Zhu X, Yi R, Wen X, Zhu C, Xu K. Relighting scenes with object insertions in neural radiance fields. 2024, arXiv preprint arXiv: 2406.14806

[30]	Cheng Z, Esteves C, Jampani V, Kar A, Maji S, Makadia A. Lu-nerf: Scene and pose estimation by synchronizing local unposed nerfs. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023, 18312-18321

[31]	Lin C H, Ma W C, Torralba A, Lucey S. BARF: bundle-adjusting neural radiance fields. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 5741–5751

[32]	Hamdi A, Ghanem B, Nießner M. SPARF: large-scale learning of 3D sparse radiance fields from few input images. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision Workshops. 2022

[33]	Fan B, Dai Y, Seo Y, He M . A revisit of the normalized eight-point algorithm and a self-supervised deep solution. Visual Intelligence, 2024, 2( 1): 3

[34]	Jeong Y, Ahn S, Choy C, Anandkumar A, Cho M, Park J. Self-calibrating neural radiance fields. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 5846–5854

[35]	Wu G, Li Y, Huang Y, Liu Y . Joint view synthesis and disparity refinement for stereo matching. Frontiers of Computer Science, 2019, 13( 6): 1337–1352

[36]	Niemeyer M, Barron J T, Mildenhall B, Sajjadi M S M, Geiger A, Radwan N. RegNeRF: regularizing neural radiance fields for view synthesis from sparse inputs. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 5480–5490

[37]	Kim M, Seo S, Han B. InfoNeRF: ray entropy minimization for few-shot neural volume rendering. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 12912–12921

[38]	Yuan Y J, Lai Y K, Huang Y H, Kobbelt L, Gao L . Neural radiance fields from sparse RGB-D images for high-quality view synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45( 7): 8713–8728

[39]	Gao Z, Yi R, Zhu C, Zhuang K, Chen W, Xu K. Generic objects as pose probes for few-shot view synthesis. 2024, arXiv preprint arXiv: 2408.16690

[40]	Deng Y, Yang J, Tong X. Deformed implicit field: modeling 3D shapes with learned dense correspondence. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 10286–10296

[41]	Qin F, Gao N, Peng Y, Wu Z, Shen S, Grudtsin A . Fine-grained leukocyte classification with deep residual learning for microscopic images. Computer Methods and Programs in Biomedicine, 2018, 162: 243–252

[42]	Qin F, Qiu S, Gao S, Bai J . 3D CAD model retrieval based on sketch and unsupervised variational autoencoder. Advanced Engineering Informatics, 2022, 51: 101427

[43]	Hou J, Luo C, Qin F, Shao Y, Chen X . FuS-GCN: efficient b-rep based graph convolutional networks for 3D-CAD model classification and retrieval. Advanced Engineering Informatics, 2023, 56: 102008

[44]	Schönberger J L, Frahm J M. Structure-from-motion revisited. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016

[45]	Riegler G, Ulusoy A O, Bischof H, Geiger A. OctNetFusion: learning depth fusion from data. In: Proceedings of 2017 International Conference on 3D Vision. 2017, 57–66

[46]	Wang Z, Wu S, Xie W, Chen M, Prisacariu V A. NeRF--: neural radiance fields without known camera parameters. 2021, arXiv preprint arXiv: 2102.07064

[47]	Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J. 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015

[48]	Denninger M, Sundermeyer M, Winkelbauer D, Olefir D, Hodan T, Zidan Y, Elbadrawy M, Knauer M, Katam H T, Lodhi A. BlenderProc: reducing the reality gap with photorealistic rendering. In: Proceedings of the Robotics: Science and Systems (RSS) Workshops. 2020

[49]

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 721

[50]	Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

[51]	Wang Z, Bovik A C, Sheikh H R, Simoncelli E P . Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 2004, 13( 4): 600–612

[52]	Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 586–595

[53]	Yang J, Pavone M, Wang Y. FreeNeRF: improving few-shot neural rendering with free frequency regularization. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 8254–8263

[54]	Sarlin P E, DeTone D, Malisiewicz T, Rabinovich A. SuperGlue: learning feature matching with graph neural networks. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 4938–4947

[55]	Bian W, Wang Z, Li K, Bian J W. NoPe-NeRF: optimising neural radiance field with no pose prior. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 4160–4169