Semantics-aware transformer for 3D reconstruction from binocular images
Xin Jia , Shourui Yang , Diyi Guan
Optoelectronics Letters ›› 2022, Vol. 18 ›› Issue (5) : 293 -299.
Semantics-aware transformer for 3D reconstruction from binocular images
Existing multi-view three-dimensional (3D) reconstruction methods can only capture single type of feature from input view, failing to obtain fine-grained semantics for reconstructing the complex shapes. They rarely explore the semantic association between input views, leading to a rough 3D shape. To address these challenges, we propose a semantics-aware transformer (SATF) for 3D reconstruction. It is composed of two parallel view transformer encoders and a point cloud transformer decoder, and takes two red, green and blue (RGB) images as input and outputs a dense point cloud with richer details. Each view transformer encoder can learn a multi-level feature, facilitating characterizing fine-grained semantics from input view. The point cloud transformer decoder explores a semantically-associated feature by aligning the semantics of two input views, which describes the semantic association between views. Furthermore, it can generate a sparse point cloud using the semantically-associated feature. At last, the decoder enriches the sparse point cloud for producing a dense point cloud with richer details. Extensive experiments on the ShapeNet dataset show that our SATF outperforms the state-of-the-art methods.
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
CHANG A X, FUNKHOUSER T, GUIBAS L, et al. Shapenet: an information-rich 3D model repository[EB/OL]. (2015-12-09) [2022-01-22]. http://arxiv.org/abs/1512.03012. |
| [20] |
|
| [21] |
|
| [22] |
|
/
| 〈 |
|
〉 |