Semantics-aware transformer for 3D reconstruction from binocular images

Xin Jia , Shourui Yang , Diyi Guan

Optoelectronics Letters ›› 2022, Vol. 18 ›› Issue (5) : 293 -299.

PDF
Optoelectronics Letters ›› 2022, Vol. 18 ›› Issue (5) : 293 -299. DOI: 10.1007/s11801-022-2055-0
Article

Semantics-aware transformer for 3D reconstruction from binocular images

Author information +
History +
PDF

Abstract

Existing multi-view three-dimensional (3D) reconstruction methods can only capture single type of feature from input view, failing to obtain fine-grained semantics for reconstructing the complex shapes. They rarely explore the semantic association between input views, leading to a rough 3D shape. To address these challenges, we propose a semantics-aware transformer (SATF) for 3D reconstruction. It is composed of two parallel view transformer encoders and a point cloud transformer decoder, and takes two red, green and blue (RGB) images as input and outputs a dense point cloud with richer details. Each view transformer encoder can learn a multi-level feature, facilitating characterizing fine-grained semantics from input view. The point cloud transformer decoder explores a semantically-associated feature by aligning the semantics of two input views, which describes the semantic association between views. Furthermore, it can generate a sparse point cloud using the semantically-associated feature. At last, the decoder enriches the sparse point cloud for producing a dense point cloud with richer details. Extensive experiments on the ShapeNet dataset show that our SATF outperforms the state-of-the-art methods.

Cite this article

Download citation ▾
Xin Jia, Shourui Yang, Diyi Guan. Semantics-aware transformer for 3D reconstruction from binocular images. Optoelectronics Letters, 2022, 18(5): 293-299 DOI:10.1007/s11801-022-2055-0

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

ZhangZ, XueW L. Video image mosaic via multi-module cooperation[J]. Optoelectronics letters, 2021, 17(11):688-692

[2]

ZhangZ B, XueW L, FuG K. Unsupervised image-to-image translation by semantics consistency and self-attention[J]. Optoelectronics letters, 2022, 18: 175-180

[3]

GanY, ZhangJ H, ChenK Q, et al.. A dynamic detection method to improve SLAM performance[J]. Optoelectronics letters, 2021, 17(11):693-698

[4]

ChenZ Q, ZhangH. Learning implicit fields for generative shape modeling, 2019, New York, IEEE: 5939-5948

[5]

LiuF, TranL, LiuX. Fully understanding generic objects: modeling, segmentation, and reconstruction, 2021, New York, IEEE: 7423-7433

[6]

AgarwalN, GopiM. GAMesh: guided and augmented meshing for deep point networks, 2020, New York, IEEE: 702-711

[7]

FanH, SuH, GuibasL J. A point set generation network for 3D object reconstruction from a single image, 2017, New York, IEEE: 605-613

[8]

WangN Y, ZhangY D, LiZ W, et al.. Pixel2Mesh: 3D mesh model generation via image guided deformation[J]. IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2021, 43(10):3600-3613

[9]

ChoyC B, XuD F, GwakJ Y, et al.. 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction, 2016, Heidelberg, Springer: 628-644

[10]

ChenR, HanS F, XuJ, et al.. Point-based multi-view stereo network, 2019, New York, IEEE: 1538-1547

[11]

YaoY, LuoZ X, LiS W, et al.. Recurrent MVSNet for high-resolution multi-view stereo depth inference, 2019, New York, IEEE: 5525-5534

[12]

KarA, HaneC, MalikJ. Learning a multi-view stereo machine, 2017, Cambridge, MIT Press: 365-376

[13]

WenC, ZhangY D, LiZ W, et al.. Pixel2Mesh++: multi-view 3D mesh generation via deformation, 2019, New York, IEEE: 1042-1051

[14]

JiaX, YangS R, PengY X, et al.. DV-net: dual-view network for 3D reconstruction by fusing multiple sets of gated control point clouds[J]. Pattern recognition letters, 2020, 131: 376-382

[15]

AshishV, NoamS, NikiP, et al.. Attention is all you need, 2017, Cambridge, MIT Press: 5998-6008

[16]

QiC R, YiL, SuH, et al.. PointNet++: deep hierarchical feature learning on point sets in a metric space, 2017, Cambridge, MIT Press: 5099-5108

[17]

HuangL, WangW M, ChenJ, et al.. Attention on attention for image captioning, 2019, New York, IEEE: 4634-4643

[18]

YuanW T, KhotT, HeldD, et al.. PCN: point completion network, 2020, New York, IEEE: 728-737

[19]

CHANG A X, FUNKHOUSER T, GUIBAS L, et al. Shapenet: an information-rich 3D model repository[EB/OL]. (2015-12-09) [2022-01-22]. http://arxiv.org/abs/1512.03012.

[20]

KlokovR, BoyerE, VerbeekJ. Discrete point flow networks for efficient point cloud generation, 2020, Heidelberg, Springer: 694-710

[21]

MeschederL, OechsleM, NiemeyerM, et al.. Occupancy networks: learning 3D reconstruction in function space, 2019, New York, IEEE: 4460-4470

[22]

YaoY, SchertlerN, RosalesE, et al.. Front2back: single view 3D shape reconstruction via front to back prediction, 2020, New York, IEEE: 531-540

AI Summary AI Mindmap
PDF

141

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/