In this article, we study a multi-step interactive recommendation problem for explicit-feedback recommender systems. Different from the existing works, we propose a novel user-specific deep reinforcement learning approach to the problem. Specifically, we first formulate the problem of interactive recommendation for each target user as a Markov decision process (MDP). We then derive a multi-MDP reinforcement learning task for all involved users. To model the possible relationships (including similarities and differences) between different users’ MDPs, we construct user-specific latent states by using matrix factorization. After that, we propose a user-specific deep Q-learning (UDQN) method to estimate optimal policies based on the constructed user-specific latent states. Furthermore, we propose Biased UDQN (BUDQN) to explicitly model user-specific information by employing an additional bias parameter when estimating the Q-values for different users. Finally, we validate the effectiveness of our approach by comprehensive experimental results and analysis.
|Journal||ACM Transactions on Knowledge Discovery from Data|
|Publication status||Published - 1 Oct 2019|