A Sample-Efficient Actor-Critic Algorithm for Recommendation Diversification
A Sample-Efficient Actor-Critic Algorithm for Recommendation Diversification作者机构:Department of Computer Science and Technology Tsinghua University
出 版 物:《Chinese Journal of Electronics》 (电子学报(英文))
年 卷 期:2020年第29卷第1期
页 面:89-96页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081203[工学-计算机应用技术] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported by Tsinghua University Initiative Scientific Research Program(No.20161080066)
主 题:Recommender system Diversity Reinforcement learning Actor-critic
摘 要:Diversifying recommendation results gains benefits from satisfying user’s existing interests as well as exploring novel information needs. Recently proposed Monte-Carlo based reinforcement learning method suffers from sample inefficiency, large variance, and even failing to perform well in large action space. We propose a novel actor-critic reinforcement learning algorithm for recommendation diversification in order to solve the above mentioned problems. The actor acts as the ranking policy, while the introduced critic predicts the expected future rewards of each candidate action. The critic target is updated by full Bellman equation and the actor network is optimized using expected gradient in the whole action space. To further stabilize and improve the performance, we also add policy-filtered critic supervision loss. Experiments on MovieLens dataset well demonstrate the effectiveness of our approach over multiple competitive methods.