咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >STRNet:Triple-stream Spatiotem... 收藏

STRNet:Triple-stream Spatiotemporal Relation Network for Action Recognition

STRNet : 为行动识别的三倍溪流的空间与时间的关系网络

作     者:Zhi-Wei Xu Xiao-Jun Wu Josef Kittler Zhi-Wei Xu;Xiao-Jun Wu;Josef Kittler

作者机构:School of Artificial Intelligence and Computer ScienceJiangnan UniversityWuxi 214122China Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational IntelligenceWuxi 214122China Centre for VisionSpeech and Signal ProcessingUniversity of SurreyGuildfordGU27XHUK 

出 版 物:《International Journal of Automation and computing》 (国际自动化与计算杂志(英文版))

年 卷 期:2021年第18卷第5期

页      面:718-730页

核心收录:

学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 080203[工学-机械设计及理论] 0835[工学-软件工程] 0802[工学-机械工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:supported by National Natural Science Foundation of China(Nos.U1836218,62020106012,61672265 and 61902153) the 111 Project of Ministry of Education of China(No.B12018) the EPSRC Programme FACER2VM(No.EP/N007743/1) the EPSRC/MURI/Dstl Project under(No.EP/R013616/1.) 

主  题:Action recognition spatiotemporal relation multi-branch fusion long-term representation video classification 

摘      要:Learning comprehensive spatiotemporal features is crucial for human action recognition. Existing methods tend to model the spatiotemporal feature blocks in an integrate-separate-integrate form, such as appearance-and-relation network(ARTNet) and spatiotemporal and motion network(STM). However, with blocks stacking up, the rear part of the network has poor interpretability. To avoid this problem, we propose a novel architecture called spatial temporal relation network(STRNet), which can learn explicit information of appearance, motion and especially the temporal relation information. Specifically, our STRNet is constructed by three branches,which separates the features into 1) appearance pathway, to obtain spatial semantics, 2) motion pathway, to reinforce the spatiotemporal feature representation, and 3) relation pathway, to focus on capturing temporal relation details of successive frames and to explore long-term representation dependency. In addition, our STRNet does not just simply merge the multi-branch information, but we apply a flexible and effective strategy to fuse the complementary information from multiple pathways. We evaluate our network on four major action recognition benchmarks: Kinetics-400, UCF-101, HMDB-51, and Something-Something v1, demonstrating that the performance of our STRNet achieves the state-of-the-art result on the UCF-101 and HMDB-51 datasets, as well as a comparable accuracy with the state-of-the-art method on Something-Something v1 and Kinetics-400.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分