咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Multi-Task Visual Semantic Emb... 收藏

Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval

作     者:Xue-Yang Qin Li-Shuang Li Jing-Yao Tang Fei Hao Mei-Ling Ge Guang-Yao Pang 秦雪洋;李丽双;唐婧尧;郝飞;盖枚岭;庞光垚

作者机构:School of Computer Science and TechnologyDalian University of TechnologyDalian 116024China School of Computer ScienceShaanxi Normal UniversityXi’an 710119China School of Computer EngineeringWeifang UniversityWeifang 261061China Guangxi Colleges and Universities Key Laboratory of Intelligent Industry SoftwareWuzhou UniversityWuzhou 543002 China 

出 版 物:《Journal of Computer Science & Technology》 (计算机科学技术学报(英文版))

年 卷 期:2024年第39卷第4期

页      面:811-826页

核心收录:

学科分类:1205[管理学-图书情报与档案管理] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:supported by the National Natural Science Foundation of China under Grant No.62076048 

主  题:image-text retrieval cross-modal retrieval multi-task learning graph convolutional network 

摘      要:Image-text retrieval aims to capture the semantic correspondence between images and texts,which serves as a foundation and crucial component in multi-modal recommendations,search systems,and online *** mainstream methods primarily focus on modeling the association of image-text pairs while neglecting the advantageous impact of multi-task learning on image-text *** this end,a multi-task visual semantic embedding network(MVSEN)is proposed for image-text ***,we design two auxiliary tasks,including text-text matching and multi-label classification,for semantic constraints to improve the generalization and robustness of visual semantic embedding from a training ***,we present an intra-and inter-modality interaction scheme to learn discriminative visual and textual feature representations by facilitating information flow within and between ***,we utilize multi-layer graph convolutional networks in a cascading manner to infer the correlation of image-text *** results show that MVSEN outperforms state-of-the-art methods on two publicly available datasets,Flickr30K and MSCOCO,with rSum improvements of 8.2%and 3.0%,respectively.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分