咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Fine-Grained Features for Imag... 收藏

Fine-Grained Features for Image Captioning

作     者:Mengyue Shao Jie Feng Jie Wu Haixiang Zhang Yayu Zheng 

作者机构:Zhejiang Sci-Tech UniversityHangzhou310020China Zhejiang University of TechnologyHangzhou310020China 

出 版 物:《Computers, Materials & Continua》 (计算机、材料和连续体(英文))

年 卷 期:2023年第75卷第6期

页      面:4697-4712页

核心收录:

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:supported in part by the National Natural Science Foundation of China(NSFC)under Grant 6150140 in part by the Youth Innovation Project(21032158-Y)of Zhejiang Sci-Tech University 

主  题:Image captioning region features fine-grained features fusion 

摘      要:Image captioning involves two different major modalities(image and sentence)that convert a given image into a language that adheres to visual *** all methods first extract image features to reduce the difficulty of visual semantic embedding and then use the caption model to generate fluent *** Convolutional Neural Network(CNN)is often used to extract image features in image captioning,and the use of object detection networks to extract region features has achieved great ***,the region features retrieved by this method are object-level and do not pay attention to fine-grained details because of the detection model’s *** offer an approach to address this issue that more properly generates captions by fusing fine-grained features and region ***,we extract fine-grained features using a panoramic segmentation ***,we suggest two fusion methods and contrast their fusion *** X-linear Attention Network(X-LAN)serves as the foundation for both fusion *** to experimental findings on the COCO dataset,the two-branch fusion approach is *** is important to note that on the COCO Karpathy test split,CIDEr is increased up to 134.3%in comparison to the baseline,highlighting the potency and viability of our method.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分