咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Distributional Reinforcement L... 收藏

Distributional Reinforcement Learning with Quantum Neural Networks

Distributional Reinforcement Learning with Quantum Neural Networks

作     者:Wei Hu James Hu 

作者机构:Department of Computer Science Houghton College Houghton NY USA Department of Computer and Information Science University of Pennsylvania Philadelphia PA USA 

出 版 物:《Intelligent Control and Automation》 (智能控制与自动化(英文))

年 卷 期:2019年第10卷第2期

页      面:63-78页

学科分类:1002[医学-临床医学] 100214[医学-肿瘤学] 10[医学] 

主  题:Continuous-Variable Quantum Computers Quantum Reinforcement Learning Distributional Reinforcement Learning Quantile Regression Distributional Q Learning Grid World Environment MDP Chain Environment 

摘      要:Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learning the distribution over returns has distinct advantages over learning their expected value as seen in different RL tasks. The shift from using the expectation of returns in traditional RL to the distribution over returns in distributional RL has provided new insights into the dynamics of RL. This paper builds on our recent work investigating the quantum approach towards RL. Our work implements the quantile regression (QR) distributional Q learning with a quantum neural network. This quantum network is evaluated in a grid world environment with a different number of quantiles, illustrating its detailed influence on the learning of the algorithm. It is also compared to the standard quantum Q learning in a Markov Decision Process (MDP) chain, which demonstrates that the quantum QR distributional Q learning can explore the environment more efficiently than the standard quantum Q learning. Efficient exploration and balancing of exploitation and exploration are major challenges in RL. Previous work has shown that more informative actions can be taken with a distributional perspective. Our findings suggest another cause for its success: the enhanced performance of distributional RL can be partially attributed to its superior ability to efficiently explore the environment.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分