Relevant Visual Semantic Context-Aware Attention-Based Dialog
作者机构:National Advanced IPv6 CentreUniversiti Sains MalaysiaPenangMalaysia Lee Kong Chian Faculty of Engineering and Science(LKCFES)Universiti Tunku Abdul RahmanSungai LongSelangorMalaysia
出 版 物:《Computers, Materials & Continua》 (计算机、材料和连续体(英文))
年 卷 期:2023年第76卷第8期
页 面:2337-2354页
核心收录:
学科分类:08[工学] 080203[工学-机械设计及理论] 0802[工学-机械工程]
主 题:Visual dialog context-aware relevant history computer vision natural language processing
摘 要:The existing dataset for visual dialog comprises multiple rounds of questions and a diverse range of image ***,it faces challenges in overcoming visual semantic limitations,particularly in obtaining sufficient context from visual and textual aspects of *** paper proposes a new visual dialog dataset called Diverse History-Dialog(DS-Dialog)to address the visual semantic limitations faced by the existing ***-Dialog groups relevant histories based on their respective Microsoft Common Objects in Context(MSCOCO)image categories and consolidates them for each ***,each MSCOCO image category consists of top relevant histories extracted based on their semantic relationships between the original image caption and historical *** relevant histories are consolidated for each image,and DS-Dialog enhances the current dataset by adding new context-aware relevant history to provide more visual semantic context for each *** new dataset is generated through several stages,including image semantic feature extraction,keyphrase extraction,relevant question extraction,and relevant history dialog *** DS-Dialog dataset contains about 2.6 million question-answer pairs,where 1.3 million pairs correspond to existing VisDial’s question-answer pairs,and the remaining 1.3 million pairs include a maximum of 5 image features for each VisDial image,with each image comprising 10-round relevant question-answer ***,a novel adaptive relevant history selection is proposed to resolve missing visual semantic information for each ***-Dialog is used to benchmark the performance of previous visual dialog models and achieves better performance than previous ***,the proposed DSDialog model achieves an 8% higher mean reciprocal rank(MRR),11% higher R@1%,6% higher R@5%,5% higher R@10%,and 8% higher normalized discounted cumulative gain(NDCG)compared to ***-Dialog also achieves approximately 1 point improvement o