咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Efficient Exploration for Mult... 收藏

Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features

Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features

作     者:Wenzhang Liu Lu Dong Dan Niu Changyin Sun Wenzhang Liu;Lu Dong;Dan Niu;Changyin Sun

作者机构:the School of Artificial IntelligenceAnhui UniversityHefei 230039and also with the Peng Cheng LaboratoryShenzhen 518055China the School of Cyber Science and EngineeringSoutheast UniversityNanjing 211189China the School of AutomationSoutheast UniversityNanjing 210096China the School of AutomationSoutheast UniversityNanjing 210096and also with the Peng Cheng LaboratoryShenzhen 518055China 

出 版 物:《IEEE/CAA Journal of Automatica Sinica》 (自动化学报(英文版))

年 卷 期:2022年第9卷第9期

页      面:1673-1686页

核心收录:

学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:the National Key R&D Program of China(2021ZD0112700,2018AAA0101400) the National Natural Science Foundation of China(62173251,61921004,U1713209) the Natural Science Foundation of Jiangsu Province of China(BK20202006) 

主  题:Knowledge transfer multi-agent systems reinforcement learning successor features 

摘      要:In multi-agent reinforcement learning(MARL),the behaviors of each agent can influence the learning of others,and the agents have to search in an exponentially enlarged joint-action ***,it is challenging for the multi-agent teams to explore in the *** may achieve suboptimal policies and fail to solve some complex *** improve the exploring efficiency as well as the performance of MARL tasks,in this paper,we propose a new approach by transferring the knowledge across *** from the traditional MARL algorithms,we first assume that the reward functions can be computed by linear combinations of a shared feature function and a set of taskspecific ***,we define a set of basic MARL tasks in the source domain and pre-train them as the basic knowledge for further ***,once the weights for target tasks are available,it will be easier to get a well-performed policy to explore in the target ***,the learning process of agents for target tasks is speeded up by taking full use of the basic knowledge that was learned *** evaluate the proposed algorithm on two challenging MARL tasks:cooperative boxpushing and non-monotonic *** experiment results have demonstrated the improved performance compared with state-of-the-art MARL algorithms.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分