Semantics-aware transformer for 3D reconstruction from binocular images

Xin Jia, Shourui Yang, Diyi Guan

Optoelectronics Letters ›› 2022, Vol. 18 ›› Issue (5) : 293-299.

Optoelectronics Letters ›› 2022, Vol. 18 ›› Issue (5) : 293-299. DOI: 10.1007/s11801-022-2055-0
Article

Semantics-aware transformer for 3D reconstruction from binocular images

Author information +
History +

Abstract

Existing multi-view three-dimensional (3D) reconstruction methods can only capture single type of feature from input view, failing to obtain fine-grained semantics for reconstructing the complex shapes. They rarely explore the semantic association between input views, leading to a rough 3D shape. To address these challenges, we propose a semantics-aware transformer (SATF) for 3D reconstruction. It is composed of two parallel view transformer encoders and a point cloud transformer decoder, and takes two red, green and blue (RGB) images as input and outputs a dense point cloud with richer details. Each view transformer encoder can learn a multi-level feature, facilitating characterizing fine-grained semantics from input view. The point cloud transformer decoder explores a semantically-associated feature by aligning the semantics of two input views, which describes the semantic association between views. Furthermore, it can generate a sparse point cloud using the semantically-associated feature. At last, the decoder enriches the sparse point cloud for producing a dense point cloud with richer details. Extensive experiments on the ShapeNet dataset show that our SATF outperforms the state-of-the-art methods.

Cite this article

Download citation ▾
Xin Jia, Shourui Yang, Diyi Guan. Semantics-aware transformer for 3D reconstruction from binocular images. Optoelectronics Letters, 2022, 18(5): 293‒299 https://doi.org/10.1007/s11801-022-2055-0

References

[1]
ZhangZ, XueW L. Video image mosaic via multi-module cooperation[J]. Optoelectronics letters, 2021, 17(11):688-692
CrossRef Google scholar
[2]
ZhangZ B, XueW L, FuG K. Unsupervised image-to-image translation by semantics consistency and self-attention[J]. Optoelectronics letters, 2022, 18: 175-180
CrossRef Google scholar
[3]
GanY, ZhangJ H, ChenK Q, et al.. A dynamic detection method to improve SLAM performance[J]. Optoelectronics letters, 2021, 17(11):693-698
CrossRef Google scholar
[4]
ChenZ Q, ZhangH. Learning implicit fields for generative shape modeling, 2019, New York, IEEE: 5939-5948
[5]
LiuF, TranL, LiuX. Fully understanding generic objects: modeling, segmentation, and reconstruction, 2021, New York, IEEE: 7423-7433
[6]
AgarwalN, GopiM. GAMesh: guided and augmented meshing for deep point networks, 2020, New York, IEEE: 702-711
[7]
FanH, SuH, GuibasL J. A point set generation network for 3D object reconstruction from a single image, 2017, New York, IEEE: 605-613
[8]
WangN Y, ZhangY D, LiZ W, et al.. Pixel2Mesh: 3D mesh model generation via image guided deformation[J]. IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2021, 43(10):3600-3613
CrossRef Google scholar
[9]
ChoyC B, XuD F, GwakJ Y, et al.. 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction, 2016, Heidelberg, Springer: 628-644
[10]
ChenR, HanS F, XuJ, et al.. Point-based multi-view stereo network, 2019, New York, IEEE: 1538-1547
[11]
YaoY, LuoZ X, LiS W, et al.. Recurrent MVSNet for high-resolution multi-view stereo depth inference, 2019, New York, IEEE: 5525-5534
[12]
KarA, HaneC, MalikJ. Learning a multi-view stereo machine, 2017, Cambridge, MIT Press: 365-376
[13]
WenC, ZhangY D, LiZ W, et al.. Pixel2Mesh++: multi-view 3D mesh generation via deformation, 2019, New York, IEEE: 1042-1051
[14]
JiaX, YangS R, PengY X, et al.. DV-net: dual-view network for 3D reconstruction by fusing multiple sets of gated control point clouds[J]. Pattern recognition letters, 2020, 131: 376-382
CrossRef Google scholar
[15]
AshishV, NoamS, NikiP, et al.. Attention is all you need, 2017, Cambridge, MIT Press: 5998-6008
[16]
QiC R, YiL, SuH, et al.. PointNet++: deep hierarchical feature learning on point sets in a metric space, 2017, Cambridge, MIT Press: 5099-5108
[17]
HuangL, WangW M, ChenJ, et al.. Attention on attention for image captioning, 2019, New York, IEEE: 4634-4643
[18]
YuanW T, KhotT, HeldD, et al.. PCN: point completion network, 2020, New York, IEEE: 728-737
[19]
CHANG A X, FUNKHOUSER T, GUIBAS L, et al. Shapenet: an information-rich 3D model repository[EB/OL]. (2015-12-09) [2022-01-22]. http://arxiv.org/abs/1512.03012.
[20]
KlokovR, BoyerE, VerbeekJ. Discrete point flow networks for efficient point cloud generation, 2020, Heidelberg, Springer: 694-710
[21]
MeschederL, OechsleM, NiemeyerM, et al.. Occupancy networks: learning 3D reconstruction in function space, 2019, New York, IEEE: 4460-4470
[22]
YaoY, SchertlerN, RosalesE, et al.. Front2back: single view 3D shape reconstruction via front to back prediction, 2020, New York, IEEE: 531-540

Accesses

Citations

Detail

Sections
Recommended

/