Greedy feature replacement for online value function approximation
Greedy feature replacement for online value function approximation作者机构:Department of Computer Science and Technology Tsinghua University School of Software Tsinghua University Department of Physics and State Key Laboratory of Low-Dimensional Quantum Physics Tsinghua University
出 版 物:《Journal of Zhejiang University-Science C(Computers and Electronics)》 (浙江大学学报C辑(计算机与电子(英文版))
年 卷 期:2014年第15卷第3期
页 面:223-231页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:Project supported by the 12th Five-Year Defense Exploration Project of China(No.041202005) the Ph.D.Program Foundation of the Ministry of Education of China(No.20120002130007)
主 题:Reinforcement learning Function approximation Feature dependency Online expansion Feature replacement
摘 要:Reinforcement learning(RL) in real-world problems requires function approximations that depend on selecting the appropriate feature representations. Representational expansion techniques can make linear approximators represent value functions more effectively; however, most of these techniques function well only for low dimensional problems. In this paper, we present the greedy feature replacement(GFR), a novel online expansion technique, for value-based RL algorithms that use binary features. Given a simple initial representation, the feature representation is expanded incrementally. New feature dependencies are added automatically to the current representation and conjunctive features are used to replace current features greedily. The virtual temporal difference(TD) error is recorded for each conjunctive feature to judge whether the replacement can improve the approximation. Correctness guarantees and computational complexity analysis are provided for GFR. Experimental results in two domains show that GFR achieves much faster learning and has the capability to handle large-scale problems.