Hyperbolic cosine transformer for LiDAR 3D object detection
作者机构:Tianjin Key Laboratory for Control Theory & Applications in Complicated Systems and Intelligent Robot Laboratory, Tianjin University of Technology Department of Electrical Engineering, Tshwane University of Technology
出 版 物:《Optoelectronics Letters》 (光电子快报(英文))
年 卷 期:2025年
学科分类:080904[工学-电磁场与微波技术] 0810[工学-信息与通信工程] 0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学] 081105[工学-导航、制导与控制] 081001[工学-通信与信息系统] 081002[工学-信号与信息处理] 0825[工学-航空宇航科学与技术] 0811[工学-控制科学与工程]
基 金:supported by the National Natural Science Foundation of China (Grant No. 62103298) the South African National Research Foundation (Grant Nos. 132797 and 137951)
摘 要:Recently, Transformer has achieved great success in computer vision. However, it is constrained because the spatial and temporal complexity grows quadratically with the number of large points in 3D object detection applications from point clouds. Previous point-wise methods are suffering from time consumption and limited receptive fields to capture information among points. To address these limitations, we propose the cosh-attention, which reduces the computation complexity of space and time from the quadratic order to linear order with respect to the number of points. In the cosh-attention, the traditional softmax operator is replaced by non-negative ReLU activation and hyperbolic-cosine-based operator with re-weighting mechanism. Then based on the key component, cosh-attention, we present a two-stage hyperbolic cosine transformer (ChTR3D) for 3D object detection from point clouds. It refines proposals by applying cosh-attention in linear computation complexity to encode rich contextual relationships among points. Extensive experiments on the widely used KITTI dataset and Waymo Open Dataset demonstrate that, compared with vanilla attention, the cosh-attention significantly improves the inference speed with competitive performance. Experiment results show that, among two-stage state-of-the-art methods using point-level features to refine proposals, the proposed ChTR3D is the fastest one.