咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Sample-Efficient Deep Reinforc... 收藏

Sample-Efficient Deep Reinforcement Learning with Directed Associative Graph

Sample-Efficient Deep Reinforcement Learning with Directed Associative Graph

作     者:Dujia Yang Xiaowei Qin Xiaodong Xu Chensheng Li Guo Wei Dujia Yang;Xiaowei Qin;Xiaodong Xu;Chensheng Li;Guo Wei

作者机构:University of Science and Technology of ChinaHefei 230026China CAS Key Laboratory of Wireless-Optical CommunicationsHefei 230027China 

出 版 物:《China Communications》 (中国通信(英文版))

年 卷 期:2021年第18卷第6期

页      面:100-113页

核心收录:

学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:This work is supported by the National Key Research and Development Program of China 2018YFA0701603 and Natural Science Foundation of Anhui Province 2008085MF213 

主  题:directed associative graph sample efficiency deep reinforcement learning 

摘      要:Reinforcement learning can be modeled as markov decision process *** consequence,the interaction samples as well as the connection relation between them are two main types of information for ***,most of recent works on deep reinforcement learning treat samples independently either in their own episode or between *** this paper,in order to utilize more sample information,we propose another learning system based on directed associative graph(DAG).The DAG is built on all trajectories in real time,which includes the whole connection relation of all samples among all *** planning with directed edges on DAG,we offer another perspective to estimate stateaction pair,especially for the unknowns to deep neural network(DNN)as well as episodic memory(EM).Mixed loss function is generated by the three learning systems(DNN,EM and DAG)to improve the efficiency of the parameter update in the proposed *** show that our algorithm is significantly better than the state-of-the-art algorithm in performance and sample efficiency on testing ***,the convergence of our algorithm is proved in the appendix and its long-term performance as well as the effects of DAG are verified.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分