Point-voxel dual transformer for LiDAR 3D object detection

Jigang Tong; Fanhang Yang; Sen Yang; Shengzhi Du

doi:10.1007/s11801-025-3134-9

Optoelectronics Letters ›› 2025, Vol. 21 ›› Issue (9) :547 -554. DOI: 10.1007/s11801-025-3134-9

Article

research-article

Point-voxel dual transformer for LiDAR 3D object detection

Author information +

History +

PDF

Abstract

In this paper, a two-stage light detection and ranging (LiDAR) three-dimensional (3D) object detection framework is presented, namely point-voxel dual transformer (PV-DT3D), which is a transformer-based method. In the proposed PV-DT3D, point-voxel fusion features are used for proposal refinement. Specifically, keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module. Subsequently, following the generation of proposals by the region proposal networks (RPN), the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture. In 3D object detection, the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions. Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.

Keywords

Cite this article

Download citation ▾

Jigang Tong, Fanhang Yang, Sen Yang, Shengzhi Du. Point-voxel dual transformer for LiDAR 3D object detection. Optoelectronics Letters, 2025, 21(9): 547-554 DOI:10.1007/s11801-025-3134-9

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	YuJ H, GaoH W, ZhouD L, et al.. Deep temporal model-based identity-aware hand detection for space human-robot interaction. IEEE transactions on cybernetics, 2021, 52(12): 13738-13751[J]

[2]	YU J H, XU Y K, CHEN H, et al. Versatile graph neural networks toward intuitive human activity understanding[J]. IEEE transactions on neural networks and learning systems, 2022.

[3]	ZhouY, TuzelO. Voxelnet: end-to-end learning for point cloud based 3D object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 18–23, 2018, Salt Lake City, USA, 2018, New York. IEEE. 44904499[C]

[4]	DengJ J, ShiS S, LiP W, et al.. Voxel R-CNN: towards high performance voxel-based 3D object detection. Proceedings of the AAAI Conference on Artificial Intelligence, February 2–9, 2021, Vancouver, Canada, 2021, Washington. AAAI. 12011209[C]

[5]	QiC R, SuH, MoK C, et al.. Pointnet: deep learning on point sets for 3D classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, July 21–26, 2017, Honolulu, HI, USA, 2017, New York. IEEE. 652660[C]

[6]	QI C R, YI L, SU H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space[J]. Advances in neural information processing systems, 2017.

[7]	ShiS, WangX G, LiH S. PointRCNN: 3D object proposal generation and detection from point cloud. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 16–20, 2019, Long Beach, CA, USA, 2019, New York. IEEE. 770779[C]

[8]	YanY, MaoY X, LiB. SECOND: sparsely embedded convolutional detection. Sensors, 2018, 18103337[J]

[9]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.

[10]	TONG J G, YANG F H, YANG S, et al. Hyperbolic cosine transformer for LiDAR 3D object detection[EB/OL]. (2022-11-05) [2023-9-18]. https://arxiv.org/abs/2211.05580.

[11]	ShengH L, CaiS J, LiuY, et al.. Improving 3D object detection with channel-wise transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10–17, 2021, Montreal, Canada, 2021, New York. IEEE. 27432752[C]

[12]	ShiS S, GuoC X, JiangL, et al.. PV-RCNN: point-voxel feature set abstraction for 3D object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13–19, 2020, Seattle, WA, USA, 2020, New York. IEEE. 1052910538[C]

[13]	YangZ T, SunY N, LiuS, et al.. 3DSSD: point-based 3D single stage object detector. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 13–19, 2020, Seattle, WA, USA, 2020, New York. IEEE. 1104011048[C]

[14]	ChenC, ChenZ, ZhangJ, et al.. SASA: semantics-augmented set abstraction for point-based 3D object detection. Proceedings of the AAAI Conference on Artificial Intelligence, February 22–March 1, 2022, Vancouver, Canada, 2022, Washington. AAAI. 221229[C]

[15]	ChenY K, LiY W, ZhangX Y, et al.. Focal sparse convolutional networks for 3D object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 19–24, 2022, New Orleans, Louisiana, USA, 2022, New York. IEEE. 54285437[C]

[16]	HuJ S K, KuaiT, WaslanderS L. Point density-aware voxels for lidar 3D object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 19–24, 2022, New Orleans, Louisiana, USA, 2022, New York. IEEE. 84698478[C]

[17]	ZhaoH S, JiangL, JiaJ Y, et al.. Point transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10–17, 2021, Montreal, Canada, 2021, New York. IEEE. 1625916268[C]

[18]	GuoM H, CaiJ X, LiuZ N, et al.. PCT: point cloud transformer. Computational visual media, 2021, 7(2): 187-199[J]

[19]	GuanT R, WangJ, LanS Y, et al.. M3DETR: multi-representation, multi-scale, mutual-relation 3D object detection with transformers. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, January 3–8, 2022, Waikoloa, HI, USA, 2022, New York. IEEE. [C]

[20]	MaoJ G, XueY J, NiuM Z, et al.. Voxel transformer for 3D object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10–17, 2021, Montreal, Canada, 2021, New York. IEEE. 31643173[C]

[21]	XieE, ZhangZ Y, ZhangG D, et al.. Light bottle transformer based large scale point cloud classification. Optoelectronics letters, 2023, 19(6): 377-384[J]

[22]	YangH H, WangW X, ChenM H, et al.. PVT-SSD: single-stage 3D object detector with point-voxel transformer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18–22, 2023, Vancouver, Canada, 2023, New York. IEEE. 1347613487[C]

[23]	GeigerA, LenzP, UrtasunR. Are we ready for autonomous driving? The KITTI vision benchmark suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, June 16–21, 2012, Providence, Rhode Island, USA, 2012, New York. IEEE. 33543361[C]

[24]	CarionN, MassaF, SynnaeveG, et al.. End-to-end object detection with transformers. European Conference on Computer Vision, August 23–28, 2020, Cham, Glasgow, UK, 2020, Heidelberg. Springer. 213229[C]

[25]	JiangB R, LuoR X, MaoJ Y, et al.. Acquisition of localization confidence for accurate object detection. Proceedings of the European Conference on Computer Vision (ECCV), September 8–14, 2018, Munich, Germany, 2018, Heidelberg. Springer. 784799[C]

[26]	CHEN X Z, KUNDU K, ZHU Y K, et al. 3D object proposals for accurate object class detection[J]. Advances in neural information processing systems, 2015, 28.

[27]	OpenPCDET development team. OpenPCDET: an opensource toolbox for 3D object detection from point clouds[EB/OL]. (2020-01-01) [2023-11-25]. https://github.com/openmmlab/OpenPCDet.

[28]	MaoJ G, NiuM Z, BaiH Y, et al.. Pyramid R-CNN: towards better performance and adaptability for 3D object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, October 10–17, 2021, Montreal, Canada, 2021, New York. IEEE. 27232732[C]

[29]	QianR, LaiX, LiX R. BADet: boundary-aware 3D object detection from point clouds. Pattern recognition, 2022, 125108524[J]

[30]	LiZ Y, YaoY C, QuanZ B, et al.. Spatial information enhancement network for 3D object detection from point cloud. Pattern recognition, 2022, 128108684[J]