讲座题目:Earning and Learning with Varying Cost
主 讲 人 :香港城市大学 刘光梧 教授
讲座时间:2021年11月24日(周三)10:00
线上平台:腾讯会议215 378 957
主办单位:国家自然科学基金委重大项目--“企业运营与服务创新管理理论与应用研究”课题组、运营与供应链研究中心、服务科学与服务管理研究中心
主讲人简介:
刘光梧博士,香港城市大学商学院管理科学系教授,2005年毕业于清华大学数学系,2009年获香港科技大学工业工程与物流管理博士学位,研究领域包括随机模拟、机器学习、金融工程及风险管理等。刘教授曾在管理科学和运筹学顶级和权威期刊上发表多篇论文,包括Management Science,Operations Research,INFORMS Journal on Computing,Production and Operations Management, Naval Research Logistics,ACM Transactions on Modeling and Computer Simulation等。现担任Naval Research Logistics以及Asia-Pacific Journal of Operational Research的副主编,曾获INFORMS仿真学会2012年杰出模拟出版物奖、香港研究资助局早期职业奖等。
论坛摘要:
We study a dynamic pricing problem where the observed cost in each selling period varies from period to period, and the demand function is unknown and only depends on the price. Motivated by the classical upper confidence bound (UCB) algorithm for the multi-armed bandit problem, we propose a UCB-Like policy to select the price. When the cost is a continuous random variable, as the cost varies, the profit of the optimal price can be arbitrarily close to that of the second-best price, making it very difficult to make the correct decision. In this situation, we show that the expected cumulative regret of our policy grows in the order of (log T)2. When the cost takes discrete values from a finite set and all prices are optimal for some costs, we show that the expected cumulative regret is upper bounded by a constant for any T. This result suggests that the suboptimal price will only be selected in a finite number of periods, and the trade-off between earning and learning vanishes and learning is no longer necessary beyond a certain period.