咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Efficient Multiagent Policy Op... 收藏

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

作     者:Yan Zheng Jian-Ye Hao Zong-Zhang Zhang Zhao-Peng Meng Xiao-Tian Hao Yan Zheng;Jian-Ye Hao;Zong-Zhang Zhang;Zhao-Peng Meng;Xiao-Tian Hao

作者机构:College of Intelligence and ComputingTianjin UniversityTianjin 300350China National Key Laboratory for Novel Software TechnologyNanjing UniversityNanjing 210023China 

出 版 物:《Journal of Computer Science & Technology》 (计算机科学技术学报(英文版))

年 卷 期:2020年第35卷第2期

页      面:268-280页

核心收录:

学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:The work was supported by the National Natural Science Foundation of China under Grant Nos.61702362,U1836214,and 61876119 the Special Program of Artificial Intelligence of Tianjin Research Program of Application Foundation and Advanced Technology under Grant No.16JCQNJC00100 the Special Program of Artificial Intelligence of Tianjin Municipal Science and Technology Commission of China under Grant No.56917ZXRGGX00150 the Science and Technology Program of Tianjin of China under Grant Nos.15PTCYSY00030 and 16ZXHLGX00170 the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20181432 Acknowledgments We thank our industrial re search partner Netease, Inc., especially the Fuxi AILaboratory of Leihuo Business Groups for their discus sion and support with the experiments 

主  题:deep reinforcement learning multiagent system weighted double estimator lenient reinforcement learning cooperative Markov game 

摘      要:Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents changing behavior consistently in stochastic environments. This paper extends the weighted double estimator to multiagent domains and proposes an MA-DRL framework, named Weighted Double Deep Q-Network (WDDQN). By leveraging the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also handle scenarios with raw visual inputs. To achieve efficient cooperation in multiagent domains, we introduce a lenient reward network and scheduled replay strategy. Empirical results show that WDDQN outperforms an existing DRL algorithm (double DQN) and an MA-DRL algorithm (lenient Q-learning) regarding the averaged reward and the convergence speed and is more likely to converge to the Pareto-optimal Nash equilibrium in stochastic cooperative environments.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分