Multi-level fusion with deep neural networks for multimodal sentiment classification
Multi-level fusion with deep neural networks for multimodal sentiment classification作者机构:School of Computer SeciencesBeijing University of Posts and CommunicationsBeijing 100876China School of ScienceYanshan UniversityQinhuangdao 066004China School of Artificial IntelligenceBeijing University of Posts and CommunicationsBeijing 100876China
出 版 物:《The Journal of China Universities of Posts and Telecommunications》 (中国邮电高校学报(英文版))
年 卷 期:2022年第29卷第3期
页 面:25-33页
核心收录:
学科分类:0710[理学-生物学] 08[工学] 081104[工学-模式识别与智能系统] 080203[工学-机械设计及理论] 0802[工学-机械工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported in part by the National Key Research and Development(R&D)Program of China(2018YFB1403003)。
主 题:multimodal fusion sentiment analysis deep learning
摘 要:The task of multimodal sentiment classification aims to associate multimodal information, such as images and texts with appropriate sentiment polarities. There are various levels that can affect human sentiment in visual and textual modalities. However, most existing methods treat various levels of features independently without having effective method for feature fusion. In this paper, we propose a multi-level fusion classification(MFC) model to predict the sentiment polarity based on the fusing features from different levels by exploiting the dependency among them. The proposed architecture leverages convolutional neural networks(CNNs) with multiple layers to extract levels of features in image and text modalities. Considering the dependencies within the low-level and high-level features, a bi-directional(Bi) recurrent neural network(RNN) is adopted to integrate the learned features from different layers in CNNs. In addition, a conflict detection module is incorporated to address the conflict between modalities. Experiments on the Flickr dataset demonstrate that the MFC method achieves comparable performance compared with strong baseline methods.