HFA-Transformer: hierarchical feature aggregation based Transformer for robust point cloud registration
Haiying XIA , Anran LEI , Lineng CHEN , Liping NONG , Shuxiang SONG
Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (4) : 2104706
The coarse-to-fine feature matching paradigm has demonstrated highly effective in point cloud registration. This paradigm progressively propagates feature correspondences from the coarse level to the fine level through hierarchical feature extraction. However, it is limited by the low discriminability of coarse-level features due to insufficient modeling of global geometric structures, which results in unreliable initial correspondences. Furthermore, relying on single-level features leads to the irreversible loss of fine-grained information, especially in low-overlap scenarios. These limitations present significant challenges in maintaining global geometric consistency and result in a high incidence of feature mismatches. To address these limitations, we propose the HFA-Transformer, a novel Hierarchical Feature Aggregation Transformer framework with two key innovations: (1) a feature enhancement mechanism that jointly encodes spatial and channel-wise characteristics of point clouds, enriching the global feature representation; (2) a Hierarchical Feature Aggregation Module that integrates hierarchical features to refine coarse-level correspondence estimation. Extensive experiments conducted on both indoor and outdoor benchmarks validate the superior performance and robustness of the proposed HFA-Transformer.
point cloud registration / coarse-to-fine paradigm / feature enhancement / correspondence matching / Transformer / hierarchical features
| [1] |
Azinović D, Martin-Brualla R, Goldman D B, Nießner M, Thies J. Neural RGB-D surface reconstruction. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 6280−6291 |
| [2] |
Deng K, Liu A, Zhu J Y, Ramanan D. Depth-supervised NeRF: fewer views and faster training for free. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 12872−12881 |
| [3] |
i K, Tang Y, Prisacariu V A, Torr P H S. BNV-fusion: dense 3D reconstruction using bi-level neural volume fusion. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 6156−6165 |
| [4] |
Zhou Z, Tulsiani S. SparseFusion: distilling view-conditioned diffusion for 3D reconstruction. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 12588−12597 |
| [5] |
|
| [6] |
|
| [7] |
Chaplot D S, Gandhi D, Gupta S, Gupta A, Salakhutdinov R. Learning to explore using active neural SLAM. In: Proceedings of the 8th International Conference on Learning Representations. 2020 |
| [8] |
|
| [9] |
Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Li F F, Savarese S. DenseFusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 3338−3347 |
| [10] |
|
| [11] |
|
| [12] |
Yang Z, Chai Y, Anguelov D, Zhou Y, Sun P, Erhan D, Rafferty S, Kretzschmar H. SurfelGAN: synthesizing realistic sensor data for autonomous driving. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 11115−11124 |
| [13] |
Chitta K, Prakash A, Geiger A. NEAT: neural attention fields for end-to-end autonomous driving. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 15773−15783 |
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
Qin Z, Yu H, Wang C, Guo Y, Peng Y, Xu K. Geometric transformer for fast and robust point cloud registration. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 11133−11142 |
| [22] |
|
| [23] |
Huang S, Gojcic Z, Usvyatsov M, Wieser A, Schindler K. PREDATOR: registration of 3D point clouds with low overlap. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 4265−4274 |
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
Aoki Y, Goforth H, Srivatsan R A, Lucey S. PointNetLK: robust & efficient point cloud registration using PointNet. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 7156−7165 |
| [30] |
Qi C R, Su H, Kaichun M, Guibas L J. PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 77−85 |
| [31] |
|
| [32] |
Horn M, Engel N, Belagiannis V, Buchholz M, Dietmayer K. DeepCLR: correspondence-less architecture for deep end-to-end point cloud registration. In: Proceedings of 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). 2020, 1−7 |
| [33] |
Bai X, Luo Z, Zhou L, Chen H, Li L, Hu Z, Fu H, Tai C L. PointDSC: robust point cloud registration using deep spatial consistency. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 15854−15864 |
| [34] |
Liu J, Wang G, Liu Z, Jiang C, Pollefeys M, Wang H. RegFormer: an efficient projection-aware transformer network for large-scale point cloud registration. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. 2023, 8417−8426 |
| [35] |
|
| [36] |
|
| [37] |
Yew Z J, Lee G H. REGTR: end-to-end point cloud correspondences with transformers. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 6667−6676 |
| [38] |
|
| [39] |
Thomas H, Qi C R, Deschaud J E, Marcotegui B, Goulette F, Guibas L J. KPConv: flexible and deformable convolution for point clouds. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6410−6419 |
| [40] |
Cao Y, Xu J, Lin S, Wei F, Hu H. GCNet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). 2019, 1971−1980 |
| [41] |
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7132−7141 |
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
Graham B, Engelcke M, van der Maaten L. 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 9224−9232 |
| [46] |
|
| [47] |
Sarlin P E, DeTone D, Malisiewicz T, Rabinovich A. SuperGlue: learning feature matching with graph neural networks. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 4937−4946 |
| [48] |
|
| [49] |
Zeng A, Song S, Nießner M, Fisher M, Xiao J, Funkhouser T. 3DMatch: learning local geometric descriptors from RGB-D reconstructions. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 199−208 |
| [50] |
|
| [51] |
|
| [52] |
Bai X, Luo Z, Zhou L, Fu H, Quan L, Tai C L. D3Feat: joint learning of dense detection and description of 3D local features. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 6358−6366 |
| [53] |
Gojcic Z, Zhou C, Wegner J D, Wieser A. The perfect match: 3D point cloud matching with smoothed densities. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 5540−5549 |
| [54] |
|
| [55] |
|
| [56] |
Yu H, Qin Z, Hou J, Saleh M, Li D, Busam B, Ilic S. Rotation-invariant transformer for point cloud matching. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 5384−5393 |
| [57] |
Chen H, Yan P, Xiang S, Tan Y. Dynamic cues-assisted transformer for robust point cloud registration. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 21698−21707 |
| [58] |
Choy C, Park J, Koltun V. Fully convolutional geometric features. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 8957−8965 |
| [59] |
Ao S, Hu Q, Yang B, Markham A, Guo Y. SpinNet: learning a general surface descriptor for 3d point cloud registration. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 11748−11757 |
| [60] |
Yu J, Ren L, Zhou W, Zhang Y, Lin L, Dai G. PEAL: prior-embedded explicit attention learning for low-overlap point cloud registration. In: Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, 17702−17711 |
| [61] |
|
Higher Education Press
/
| 〈 |
|
〉 |