TY - GEN
T1 - Dynamic contextual multi Arm bandits in display advertisement
AU - Yang, Hongxia
AU - Lu, Quan
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/12
Y1 - 2016/12
N2 - We model the ad selection task as a multi-Armed bandit problem. Standard assumptions in the multi-Armed bandit (MAB) setting are that samples drawn from each arm are independent and identically distributed, rewards (or conversion rates in our scenario) are stationary and rewards feedback are immediate. Although the payoff function of an arm is allowed to evolve over time, the evolution is assumed to be slow. Display ads, on the other hand, are regularly created while others are removed from circulation. This can occur when budgets run out, campaign goal changes, holiday season ends and many other latent factors that go beyond the control of the ad selection system. Another big challenge is that the set of available ads is often extremely huge but standard multi-Armed bandit strategies converge with linear time complexity that cannot accommodate the usually dynamic changes. Due to the above challenges and the restrictions of the original MAB, we propose a novel dynamic contextual MAB which tightly integrates components of dynamic conversion rates prediction, contextual learning and arm overlapping modeling in a principled framework. Besides we propose an accompanied meta analyses framework that allows us to conclude experiments in a more statistically robust manner. We demonstrate on a world leading demand side platform (DSP) that our framework can effectively discriminate premium arms and significantly outperform some standard variations of MAB to these settings.
AB - We model the ad selection task as a multi-Armed bandit problem. Standard assumptions in the multi-Armed bandit (MAB) setting are that samples drawn from each arm are independent and identically distributed, rewards (or conversion rates in our scenario) are stationary and rewards feedback are immediate. Although the payoff function of an arm is allowed to evolve over time, the evolution is assumed to be slow. Display ads, on the other hand, are regularly created while others are removed from circulation. This can occur when budgets run out, campaign goal changes, holiday season ends and many other latent factors that go beyond the control of the ad selection system. Another big challenge is that the set of available ads is often extremely huge but standard multi-Armed bandit strategies converge with linear time complexity that cannot accommodate the usually dynamic changes. Due to the above challenges and the restrictions of the original MAB, we propose a novel dynamic contextual MAB which tightly integrates components of dynamic conversion rates prediction, contextual learning and arm overlapping modeling in a principled framework. Besides we propose an accompanied meta analyses framework that allows us to conclude experiments in a more statistically robust manner. We demonstrate on a world leading demand side platform (DSP) that our framework can effectively discriminate premium arms and significantly outperform some standard variations of MAB to these settings.
KW - Contextual
KW - Display advertisement
KW - Dynamic
KW - Meta analyses
KW - Multi arm bandits
UR - https://www.scopus.com/pages/publications/85014520499
U2 - 10.1109/ICDM.2016.0177
DO - 10.1109/ICDM.2016.0177
M3 - Conference article published in proceeding or book
AN - SCOPUS:85014520499
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1305
EP - 1310
BT - Proceedings - 16th IEEE International Conference on Data Mining, ICDM 2016
A2 - Bonchi, Francesco
A2 - Domingo-Ferrer, Josep
A2 - Baeza-Yates, Ricardo
A2 - Zhou, Zhi-Hua
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE International Conference on Data Mining, ICDM 2016
Y2 - 12 December 2016 through 15 December 2016
ER -