咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Conditional selection with CNN... 收藏

Conditional selection with CNN augmented transformer for multimodal affective analysis

作     者:Jianwen Wang Shiping Wang Shunxin Xiao Renjie Lin Mianxiong Dong Wenzhong Guo 

作者机构:College of Computer and Data ScienceFuzhou UniversityFuzhouChina College of Computer and Cyber SecurityFujian Normal UniversityFuzhouChina Key Laboratory of Network Computing and Intelligent Information ProcessingFuzhou UniversityFuzhouChina Digital Fujian Institute of Big Data Security TechnologyFujian Normal UniversityFuzhouChina Department of Sciences and InformaticsMuroran Institute of TechnologyMuroranJapan 

出 版 物:《CAAI Transactions on Intelligence Technology》 (智能技术学报(英文))

年 卷 期:2024年第9卷第4期

页      面:917-931页

核心收录:

学科分类:08[工学] 0835[工学-软件工程] 081202[工学-计算机软件与理论] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:National Key Research and Development Plan of China, Grant/Award Number: 2021YFB3600503 National Natural Science Foundation of China, Grant/Award Numbers: 62276065, U21A20472 

主  题:affective computing data fusion information fusion multimodal approaches 

摘      要:Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional se-mantics. The other is fusing complementary cross‐modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross‐modal attention. As a result, the located nonverbal fea-tures are not only salient but also complementary to sentiment words directly. Experi-mental results show that the authors’ method achieves state‐of‐the‐art performance on several multimodal affective analysis datasets.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分