Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation

Luo Ji, Qi Qin, Bingqing Han, Hongxia Yang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

22 Citations (Scopus)

Abstract

Recommender system plays a crucial role in modern E-commerce platform. Due to the lack of historical interactions between users and items, cold-start recommendation is a challenging problem. In order to alleviate the cold-start issue, most existing methods introduce content and contextual information as the auxiliary information. Nevertheless, these methods assume the recommended items behave steadily over time, while in a typical E-commerce scenario, items generally have very different performances throughout their life period. In such a situation, it would be beneficial to consider the long-term return from the item perspective, which is usually ignored in conventional methods. Reinforcement learning (RL) naturally fits such a long-term optimization problem, in which the recommender could identify high potential items, proactively allocate more user impressions to boost their growth, therefore improve the multi-period cumulative gains. Inspired by this idea, we model the process as a Partially Observable and Controllable Markov Decision Process (POC-MDP), and propose an actor-critic RL framework (RL-LTV) to incorporate the item lifetime values (LTV) into the recommendation. In RL-LTV, the critic studies historical trajectories of items and predict the future LTV of fresh item, while the actor suggests a score-based policy which maximizes the future LTV expectation. Scores suggested by the actor are then combined with classical ranking scores in a dual-rank framework, therefore the recommendation is balanced with the LTV consideration. Our method outperforms the strong live baseline with a relative improvement of 8.67% and 18.03% on IPV and GMV of cold-start items, on one of the largest E-commerce platform.

Original languageEnglish
Title of host publicationCIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages782-791
Number of pages10
ISBN (Electronic)9781450384469
DOIs
Publication statusPublished - 30 Oct 2021
Externally publishedYes
Event30th ACM International Conference on Information and Knowledge Management, CIKM 2021 - Virtual, Online, Australia
Duration: 1 Nov 20215 Nov 2021

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Conference

Conference30th ACM International Conference on Information and Knowledge Management, CIKM 2021
Country/TerritoryAustralia
CityVirtual, Online
Period1/11/215/11/21

Keywords

  • actor-critic model
  • cold-start recommendation
  • lifetime value
  • poc-mdp
  • reinforcement learning

ASJC Scopus subject areas

  • General Business,Management and Accounting
  • General Decision Sciences

Fingerprint

Dive into the research topics of 'Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation'. Together they form a unique fingerprint.

Cite this