咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Multimodal Pretraining from Mo... 收藏

Multimodal Pretraining from Monolingual to Multilingual

作     者:Liang Zhang Ludan Ruan Anwen Hu Qin Jin Liang Zhang;Ludan Ruan;Anwen Hu;Qin Jin

作者机构:School of InformationRenmin University of ChinaBeijing 100872China 

出 版 物:《Machine Intelligence Research》 (机器智能研究(英文版))

年 卷 期:2023年第20卷第2期

页      面:220-232页

核心收录:

学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:supported by the National Natural Science Foundation of China(No.62072462) the National Key R&D Program of China(No.2020AAA0108600) the Large-scale Pretraining Program 468 of Beijing Academy of Artificial Intelligence(BAAI) 

主  题:Multilingual pretraining multimodal pretraining cross-lingual transfer multilingual generation cross-modal retrieval 

摘      要:Multimodal pretraining has made convincing achievements in various downstream tasks in recent ***,since the majority of the existing works construct models based on English,their applications are limited by *** this work,we address this issue by developing models with multimodal and multilingual *** explore two types of methods to extend multimodal pretraining model from monolingual to ***,we propose a pretraining-based model named multilingual multimodal pretraining(MLMM),and two generalization-based models named multilingual CLIP(M-CLIP)and multilingual acquisition(MLA).In addition,we further extend the generalization-based models to incorporate the audio modality and develop the multilingual CLIP for vision,language,and audio(CLIP4VLA).Our models achieve state-of-the-art performances on multilingual vision-text retrieval,visual question answering,and image captioning *** on the experimental results,we discuss the pros and cons of the two types of models and their potential practical applications.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分