咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Greedy feature replacement for... 收藏

Greedy feature replacement for online value function approximation

Greedy feature replacement for online value function approximation

作     者:Feng-fei ZHAO Zheng QIN Zhuo SHAO Jun FANG Bo-yan REN 

作者机构:Department of Computer Science and Technology Tsinghua University School of Software Tsinghua University Department of Physics and State Key Laboratory of Low-Dimensional Quantum Physics Tsinghua University 

出 版 物:《Journal of Zhejiang University-Science C(Computers and Electronics)》 (浙江大学学报C辑(计算机与电子(英文版))

年 卷 期:2014年第15卷第3期

页      面:223-231页

核心收录:

学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:Project supported by the 12th Five-Year Defense Exploration Project of China(No.041202005) the Ph.D.Program Foundation of the Ministry of Education of China(No.20120002130007) 

主  题:Reinforcement learning Function approximation Feature dependency Online expansion Feature replacement 

摘      要:Reinforcement learning(RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement(GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference(TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分