HFA-Transformer: hierarchical feature aggregation based Transformer for robust point cloud registration

Haiying XIA; Anran LEI; Lineng CHEN; Liping NONG; Shuxiang SONG

doi:10.1007/s11704-025-50289-0

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (4) :2104706 DOI: 10.1007/s11704-025-50289-0

Image and Graphics

RESEARCH ARTICLE

HFA-Transformer: hierarchical feature aggregation based Transformer for robust point cloud registration

Author information +

History +

PDF (3550KB)

Abstract

The coarse-to-fine feature matching paradigm has demonstrated highly effective in point cloud registration. This paradigm progressively propagates feature correspondences from the coarse level to the fine level through hierarchical feature extraction. However, it is limited by the low discriminability of coarse-level features due to insufficient modeling of global geometric structures, which results in unreliable initial correspondences. Furthermore, relying on single-level features leads to the irreversible loss of fine-grained information, especially in low-overlap scenarios. These limitations present significant challenges in maintaining global geometric consistency and result in a high incidence of feature mismatches. To address these limitations, we propose the HFA-Transformer, a novel Hierarchical Feature Aggregation Transformer framework with two key innovations: (1) a feature enhancement mechanism that jointly encodes spatial and channel-wise characteristics of point clouds, enriching the global feature representation; (2) a Hierarchical Feature Aggregation Module that integrates hierarchical features to refine coarse-level correspondence estimation. Extensive experiments conducted on both indoor and outdoor benchmarks validate the superior performance and robustness of the proposed HFA-Transformer.

Graphical abstract

Keywords

point cloud registration / coarse-to-fine paradigm / feature enhancement / correspondence matching / Transformer / hierarchical features

Cite this article

Download citation ▾

Haiying XIA, Anran LEI, Lineng CHEN, Liping NONG, Shuxiang SONG. HFA-Transformer: hierarchical feature aggregation based Transformer for robust point cloud registration. Front. Comput. Sci., 2027, 21(4): 2104706 DOI:10.1007/s11704-025-50289-0

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Azinović D, Martin-Brualla R, Goldman D B, Nießner M, Thies J. Neural RGB-D surface reconstruction. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 6280−6291

[2]	Deng K, Liu A, Zhu J Y, Ramanan D. Depth-supervised NeRF: fewer views and faster training for free. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 12872−12881

[3]	i K, Tang Y, Prisacariu V A, Torr P H S. BNV-fusion: dense 3D reconstruction using bi-level neural volume fusion. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 6156−6165

[4]	Zhou Z, Tulsiani S. SparseFusion: distilling view-conditioned diffusion for 3D reconstruction. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 12588−12597

[5]	Cai X, Lou J, Bu J, Dong J, Wang H, Yu H . Single depth image 3D face reconstruction via domain adaptive learning. Frontiers of Computer Science, 2024, 18( 1): 181342

[6]	Macario Barros A, Michel M, Moline Y, Corre G, Carrel F . A comprehensive survey of visual SLAM algorithms. Robotics, 2022, 11( 1): 24

[7]	Chaplot D S, Gandhi D, Gupta S, Gupta A, Salakhutdinov R. Learning to explore using active neural SLAM. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[8]	Huang R, Zhao M, Chen J, Li L. KDD-LOAM: jointly learned keypoint detector and descriptors assisted LiDAR odometry and mapping. In: Proceedings of 2024 IEEE International Conference on Robotics and Automation (ICRA). 2024, 8559−8565

[9]	Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Li F F, Savarese S. DenseFusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 3338−3347

[10]	Hua W, Zhou Z, Wu J, Huang H, Wang Y, Xiong R . REDE: end-to-end object 6D pose robust estimation using differentiable outliers elimination. IEEE Robotics and Automation Letters, 2021, 6( 2): 2886–2893

[11]	Hu S, Chen L, Wu P, Li H, Yan J, Tao D. ST-P3: end-to-end vision-based autonomous driving via spatial-temporal feature learning. In: Proceedings of the 17th European Conference on Computer Vision. 2022, 533−549

[12]	Yang Z, Chai Y, Anguelov D, Zhou Y, Sun P, Erhan D, Rafferty S, Kretzschmar H. SurfelGAN: synthesizing realistic sensor data for autonomous driving. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 11115−11124

[13]	Chitta K, Prakash A, Geiger A. NEAT: neural attention fields for end-to-end autonomous driving. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 15773−15783

[14]	Guan H, Song C, Zhang Z . GRAMO: geometric resampling augmentation for monocular 3D object detection. Frontiers of Computer Science, 2024, 18( 5): 185706

[15]	Arce J, Vödisch N, Cattaneo D, Burgard W, Valada A . PADLoC: LiDAR-based deep loop closure detection and registration using panoptic attention. IEEE Robotics and Automation Letters, 2023, 8( 3): 1319–1326

[16]	Xiong K, Zheng M, Xu Q, Wen C, Shen S, Wang C. SPEAL: skeletal prior embedded attention learning for cross-source point cloud registration. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 6279−6287

[17]	Li Y, Tang C, Yao R, Ye A, Wen F, Du S. HybridPoint: point cloud registration based on hybrid point sampling and matching. In: Proceedings of 2023 IEEE International Conference on Multimedia and Expo (ICME). 2023, 2021−2026

[18]	Wu Q, Ding Y, Luo L, Jiang H, Gu S, Zhou C, Xie J, Yang J. SGNet: salient geometric network for point cloud registration. In: Proceedings of 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2024, 3276−3282

[19]	Yuan M, Fu K, Li Z, Wang M . Decoupled deep hough voting for point cloud registration. Frontiers of Computer Science, 2024, 18( 2): 182703

[20]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010

[21]	Qin Z, Yu H, Wang C, Guo Y, Peng Y, Xu K. Geometric transformer for fast and robust point cloud registration. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 11133−11142

[22]	Wang M, Chen G, Yang Y, Yuan L, Yue Y . Point tree transformer for point cloud registration. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35( 7): 6756–6772

[23]	Huang S, Gojcic Z, Usvyatsov M, Wieser A, Schindler K. PREDATOR: registration of 3D point clouds with low overlap. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 4265−4274

[24]	Besl P J, McKay N D. Method for registration of 3-D shapes. In: Proceedings of SPIE 1611, Sensor Fusion IV: Control Paradigms and Data Structures. 1992, 586−606

[25]	Chen Y, Medioni G . Object modelling by registration of multiple range images. Image and Vision Computing, 1992, 10( 3): 145–155

[26]	Rusu R B, Blodow N, Beetz M. Fast Point Feature Histograms (FPFH) for 3D registration. In: Proceedings of 2009 IEEE International Conference on Robotics and Automation. 2009, 3212−3217

[27]	Salti S, Tombari F, Di Stefano L . SHOT: unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding, 2014, 125: 251–264

[28]	Fischler M A, Bolles R C . Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981, 24( 6): 381–395

[29]	Aoki Y, Goforth H, Srivatsan R A, Lucey S. PointNetLK: robust & efficient point cloud registration using PointNet. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 7156−7165

[30]	Qi C R, Su H, Kaichun M, Guibas L J. PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 77−85

[31]	Lucas B D, Kanade T. An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence - Volume 2. 1981, 674−679

[32]	Horn M, Engel N, Belagiannis V, Buchholz M, Dietmayer K. DeepCLR: correspondence-less architecture for deep end-to-end point cloud registration. In: Proceedings of 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). 2020, 1−7

[33]	Bai X, Luo Z, Zhou L, Chen H, Li L, Hu Z, Fu H, Tai C L. PointDSC: robust point cloud registration using deep spatial consistency. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 15854−15864

[34]	Liu J, Wang G, Liu Z, Jiang C, Pollefeys M, Wang H. RegFormer: an efficient projection-aware transformer network for large-scale point cloud registration. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. 2023, 8417−8426

[35]	Wang J, Li K, Zhang Y, Yuan X, Tao Z . S²S2-transformer for mask-aware hyperspectral image reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47( 6): 4299–4316

[36]	Xie Y, Wang B, Li S, Zhu J . Iterative feedback network for unsupervised point cloud registration. IEEE Robotics and Automation Letters, 2024, 9( 3): 2327–2334

[37]	Yew Z J, Lee G H. REGTR: end-to-end point cloud correspondences with transformers. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 6667−6676

[38]	Pang Y, Wang W, Tay F E H, Liu W, Tian Y, Yuan L. Masked autoencoders for point cloud self-supervised learning. In: Proceedings of the 17th European Conference on Computer Vision. 2022, 604−621

[39]	Thomas H, Qi C R, Deschaud J E, Marcotegui B, Goulette F, Guibas L J. KPConv: flexible and deformable convolution for point clouds. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6410−6419

[40]	Cao Y, Xu J, Lin S, Wei F, Hu H. GCNet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). 2019, 1971−1980

[41]	Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7132−7141

[42]	Huang R, Tang Y, Chen J, Li L. A consistency-aware spot-guided transformer for versatile and hierarchical point cloud registration. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 2245

[43]	Katharopoulos A, Vyas A, Pappas N, Fleuret F. Transformers are RNNs: fast autoregressive transformers with linear attention. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 478

[44]	Su J, Ahmed M, Lu Y, Pan S, Bo W, Liu Y . RoFormer: enhanced transformer with rotary position embedding. Neurocomputing, 2024, 568: 127063

[45]	Graham B, Engelcke M, van der Maaten L. 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 9224−9232

[46]	Guo M H, Liu Z N, Mu T J, Hu S M . Beyond self-attention: external attention using two linear layers for visual tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45( 5): 5436–5447

[47]	Sarlin P E, DeTone D, Malisiewicz T, Rabinovich A. SuperGlue: learning feature matching with graph neural networks. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 4937−4946

[48]	Sinkhorn R, Knopp P . Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 1967, 21( 2): 343–348

[49]	Zeng A, Song S, Nießner M, Fisher M, Xiao J, Funkhouser T. 3DMatch: learning local geometric descriptors from RGB-D reconstructions. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 199−208

[50]	Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 3354−3361

[51]	Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

[52]	Bai X, Luo Z, Zhou L, Fu H, Quan L, Tai C L. D3Feat: joint learning of dense detection and description of 3D local features. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 6358−6366

[53]	Gojcic Z, Zhou C, Wegner J D, Wieser A. The perfect match: 3D point cloud matching with smoothed densities. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 5540−5549

[54]	Yu H, Li F, Saleh M, Busam B, Ilic S. CoFiNet: reliable coarse-to-fine correspondences for robust point cloud registration. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 1828

[55]	Yang F, Guo L, Chen Z, Tao W. One-inlier is first: towards efficient position encoding for point cloud registration. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 506

[56]	Yu H, Qin Z, Hou J, Saleh M, Li D, Busam B, Ilic S. Rotation-invariant transformer for point cloud matching. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 5384−5393

[57]	Chen H, Yan P, Xiang S, Tan Y. Dynamic cues-assisted transformer for robust point cloud registration. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 21698−21707

[58]	Choy C, Park J, Koltun V. Fully convolutional geometric features. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 8957−8965

[59]	Ao S, Hu Q, Yang B, Markham A, Guo Y. SpinNet: learning a general surface descriptor for 3d point cloud registration. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 11748−11757

[60]	Yu J, Ren L, Zhou W, Zhang Y, Lin L, Dai G. PEAL: prior-embedded explicit attention learning for low-overlap point cloud registration. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 17702−17711

[61]	Chen Z, Ren Y, Zhang T, Dang Z, Tao W, Süsstrunk S, Salzmann M. DiffusionPCR: diffusion models for robust multi-step point cloud registration. 2023, arXiv preprint arXiv: 2312.03053