Technical Note—Dynamic Pricing and Learning with Discounting

Zhichao Feng, Milind Dawande, Ganesh Janakiraman, Anyan Qi

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

In many practical settings, learning algorithms can take a substantial amount of time to converge, thereby raising the need to understand the role of discounting in learning. We illustrate the impact of discounting on the performance of learning algorithms by examining two classic and representative dynamic-pricing and learning problems studied in Broder and Rusmevichientong (BR) [Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980] and Keskin and Zeevi (KZ) [Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167]. In both settings, a seller sells a product with unlimited inventory over T periods. The seller initially does not know the parameters of the general choice model in BR (respectively, the linear demand curve in KZ). Given a discount factor ρ, the retailer’s objective is to determine a pricing policy to maximize the expected discounted revenue over T periods. In both settings, we establish lower bounds on the regret under any policy and show limiting bounds of [Formula presented] and [Formula presented] when T→∞ and ρ→1, respectively. In the model of BR with discounting, we propose an asymptotically tight learning policy and show that the regret under our policy as well that under the MLE-CYCLE policy in BR is [Formula presented] (respectively, [Formula presented] when T→∞ (respectively, ρ→1). In the model of KZ with discounting, we present sufficient conditions for a learning policy to guarantee asymptotic optimality and show that the regret under any policy satisfying these conditions is [Formula presented] (respectively, [Formula presented] when T→∞ (respectively, ρ→1). We show that three different policies—namely, the two variants of the greedy iterated least squares policy in KZ and a different policy that we propose—achieve this upper bound on the regret. We numerically examine the behavior of the regret under our policies as well as those in BR and KZ in the presence of discounting. We also analyze a setting in which the discount factor per period is a function of the number of decision periods in the planning horizon.

Original languageEnglish
Pages (from-to)481-492
Number of pages12
JournalOperations Research
Volume72
Issue number2
DOIs
Publication statusPublished - 1 Mar 2024

Keywords

  • discounting
  • dynamic pricing
  • learning
  • regret minimization

ASJC Scopus subject areas

  • Computer Science Applications
  • Management Science and Operations Research

Fingerprint

Dive into the research topics of 'Technical Note—Dynamic Pricing and Learning with Discounting'. Together they form a unique fingerprint.

Cite this