Semantics-aware transformer for 3D reconstruction from binocular images
Semantics-aware transformer for 3D reconstruction from binocular images作者机构:the engineering research center of learning-based intelligent system and the key laboratoryof computer vision and system of ministry of educationtianjin university of technologytianjin 300384china zhejiang university of technologyhangzhou 310014china
出 版 物:《Optoelectronics Letters》 (光电子快报(英文版))
年 卷 期:2022年第18卷第5期
页 面:293-299页
核心收录:
学科分类:08[工学] 080203[工学-机械设计及理论] 0802[工学-机械工程] 0702[理学-物理学]
基 金:supported by the National Key R&D Program of China (No.2018YFB1305200) the National Natural Science Foundation of China (Nos.61906134, 62020106004, 92048301, and 61925201)
摘 要:Existing multi-view three-dimensional(3 D) reconstruction methods can only capture single type of feature from input view, failing to obtain fine-grained semantics for reconstructing the complex shapes. They rarely explore the semantic association between input views, leading to a rough 3 D shape. To address these challenges, we propose a semantics-aware transformer(SATF) for 3 D reconstruction. It is composed of two parallel view transformer encoders and a point cloud transformer decoder, and takes two red, green and blue(RGB) images as input and outputs a dense point cloud with richer details. Each view transformer encoder can learn a multi-level feature, facilitating characterizing fine-grained semantics from input view. The point cloud transformer decoder explores a semantically-associated feature by aligning the semantics of two input views, which describes the semantic association between views. Furthermore, it can generate a sparse point cloud using the semantically-associated feature. At last, the decoder enriches the sparse point cloud for producing a dense point cloud with richer details. Extensive experiments on the Shape Net dataset show that our SATF outperforms the state-of-the-art methods.