检索结果-南通市图书馆

Visuals to Text:A Comprehensive Review on Automatic Image Captioning

IEEE/CAA Journal of Automatica Sinica 2022年第8期9卷 1339-1365页

作者： Yue Ming Nannan Hu Chunxiao Fan Fan Feng Jiangwan Zhou Hui Yu Beijing University of Posts and Telecommunications Beijing 100876China School of Creative Technologies University of PortsmouthPortsmouth PO12DJUK

Image captioning refers to automatic generation of descriptive texts according to the visual content of *** is a technique integrating multiple disciplines including the computer vision(CV),natural language processing(NLP)and artificial *** recent years,substantial research efforts have been devoted to generate image caption with impressive *** summarize the recent advances in image captioning,we present a comprehensive review on image captioning,covering both traditional methods and recent deep learning-based ***,we first briefly review the early traditional works based on the retrieval and *** deep learning-based image captioning researches are focused,which is categorized into the encoder-decoder framework,attention mechanism and training strategies on the basis of model structures and training manners for a detailed *** that,we summarize the publicly available datasets,evaluation metrics and those proposed for specific requirements,and then compare the state of the art methods on the MS COCO ***,we provide some discussions on open challenges and future research directions.

关键词： Artificial intelligence attention mechanism encoder-decoder framework image captioning multi-modal understanding training strategies

来源：

维普期刊数据库

同方期刊数据库评论

在线全文

学校读者我要写书评

暂无评论

How Good is Google Bard's Visual understanding? An Empirical Study on Open Challenges

引用

Machine Intelligence Research 2023年第5期20卷 605-613页

作者： Haotong Qin Ge-Peng Ji Salman Khan Deng-Ping Fan Fahad Shahbaz Khan Luc Van Gool Computer Vision Lab(CVL) ETH ZurichZurich 8001Switzerland College of Engineering Computing&CyberneticsAustralian National UniversityCanberra 8105Australia Mohamed bin Zayed University of Artificial Intelligence Abu Dhabi 999041UAE

Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational ***,Bard has recently been updated to handle visual inputs alongside text prompts during *** Bard's impressive track record in handling textual inputs,we explore its capabilities in understanding and interpreting visual data(images)conditioned by text *** exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Generative models,especially in addressing complex computer vision problems that demand accurate visual and language ***,in this study,we focus on 15 diverse task scenarios encompassing regular,camouflaged,medical,under-water and remote sensing data to comprehensively evaluate Bard's *** primary finding indicates that Bard still struggles in these vision scenarios,highlighting the significant gap in vision-based understanding that needs to be bridged in future *** expect that this empirical study will prove valuable in advancing future models,leading to enhanced capabilities in comprehending and interpreting finegrained visual *** project is released on https://***/htqin/GoogleBard-VisUnderstand.

关键词： Google Bard multi-modal understanding visual comprehension large language models conversational AI chatbot.

来源：

维普期刊数据库评论

在线全文

维普期刊数据库

学校读者我要写书评

暂无评论

欢迎您,

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

在线全文

在线全文

请选择保存的检索档案：

请选择收藏分类：

通借通还

欢迎您,

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

在线全文

在线全文

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：