文献详情 >Q-greedyUCB: a New Exploration... 收藏

Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

作者：Yu Zhao Joohyun Lee Wei Chen Yu Zhao;Joohyun Lee;Wei Chen

作者机构：Department of Electrical and Electronic EngineeringHanyang UniversityAnsan 15588South Korea Department of Electronic Engineering and Beijing National Research Center for Information Science and Technologythe Department of Electronic EngineeringTsinghua National Laboratory for Information Science and Technology(TNList)Tsinghua UniversityBeijing 100084China

出版物：《China Communications》 (中国通信（英文版）)

年卷期：2021年第18卷第6期

页面：12-23页

核心收录：

学科分类：080904[工学-电磁场与微波技术] 12[管理学] 0809[工学-电子科学与技术（可授工学、理学学位）] 08[工学] 0810[工学-信息与通信工程] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 080402[工学-测试计量技术及仪器] 0804[工学-仪器科学与技术] 081001[工学-通信与信息系统] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术（可授工学、理学学位）]

基　　金：This work was supported by the research fund of Hanyang University(HY-2019-N) This work was supported by the National Key Research&Development Program 2018YFA0701601

主　　题：reinforcement learning for average rewards infinite-horizon Markov decision process upper confidence bound queue scheduling

摘要：This paper proposes a Reinforcement learning(RL)algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival rates are not known in advance and can change over *** this purpose,this problem is formulated as an infinite-horizon Constrained Markov Decision Process(CMDP).To handle the constrained optimization problem,we first adopt the Lagrangian relaxation technique to solve ***,we propose a variant of Q-learning,Q-greedyUCB that combinesε-greedy and Upper Confidence Bound(UCB)algorithms to solve this constrained MDP *** mathematically prove that the Q-greedyUCB algorithm converges to an optimal *** results also show that Q-greedyUCB finds an optimal scheduling strategy,and is more efficient than Q-learning withε-greedy,R-learning and the Averagepayoff RL(ARL)algorithm in terms of the cumulative *** also show that our algorithm can learn and adapt to the changes of the environment,so as to obtain an optimal scheduling strategy under a given power constraint for the new environment.

本地馆藏 |

1、借阅数量：每证可借书6册，期刊2册，团体读者证可借书刊300册。 2、借阅时间：个人借期为30天，每本书可续借1次，借期为30天；团体借期为90天。 3、归还地点：3楼服务台、自助借还设备、还书箱、各分馆 4、馆际互借：读者未能在本馆获取所需文献资料，可至参考咨询阅览室服务台填写《南通市图书馆馆际互借读者申请表》，根据馆际互借协议，我馆将为读者向其他馆代借文献。馆际互借过程中所产生的费用（资料复印、邮寄费等），由读者个人承担。 5、服务电话续借：59003605 59003606 咨询：81100100 59003600

电子资源

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

欢迎您,

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

读者评论与其他读者分享你的观点

请选择收藏分类：

欢迎您,

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：