Improved Blending Attention Mechanism in Visual Question Answering
作者机构:School of AutomationUniversity of Electronic Science and Technology of ChinaChengdu610054China College of Resource and Environment EngineeringGuizhou UniversityGuiyang550025China School of Data Science and Artificial IntelligenceWenzhou University of TechnologyWenzhou325000China School of Public Affairs and AdministrationUniversity of Electronic Science and Technology of ChinaChengdu611731China Department of Geography and AnthropologyLouisiana State UniversityBaton Rouge70803LAUSA
出 版 物:《Computer Systems Science & Engineering》 (计算机系统科学与工程(英文))
年 卷 期:2023年第47卷第10期
页 面:1149-1161页
学科分类:081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:This work was supported by the Sichuan Science and Technology Program(2021YFQ0003)
主 题:Visual question answering spatial attention mechanism channel attention mechanism image feature processing text feature extraction
摘 要:Visual question answering(VQA)has attracted more and more attention in computer vision and natural language *** are committed to studying how to better integrate image features and text features to achieve better results in VQA *** of all features may cause information redundancy and heavy computational *** mechanism is a wise way to solve this ***,using single attention mechanism may cause incomplete concern of *** paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism *** the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as *** the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall *** results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.