「最適モデリング」セミナー案内 (4/20) – 大規模複雑システムの最適モデリング手法の構築

日時： 2016年 4月20日(水)　17:00 ～ 18:00

場所：東京大学本郷キャンパス工学部 14号館 5階 534号室

講演者： Michael Jong Kim
https://www.mie.utoronto.ca/faculty/profile.php?id=154

講演題目:
Approximate Learning Trajectories for Bayesian Bandits

講演概要:
It is known that the optimal policy for a multi-armed bandit
problem is the Gittins index policy. For bandit problems
with Bayesian learning (Bayesian bandits) however, computing
the Gittins index is intractable. In this paper, we
introduce the concept of an approximate learning trajectory
as a new approach to approximating the dynamics of future
learning. The approach is based on the Berstein-von Mises
Theorem (Bayesian Central Limit Theorem), and we show how it
can be used to simplify the dynamic programming equations
associated with Bayesian bandits, which allows for an
efficient computation of the Gittins index. We prove that
under the approximate learning trajectory, the approximate
Gittins index policy is asymptotically optimal in that it
approaches the true Gittins index policy in certain limiting
regimes. We also show how the approximate learning
trajectory leads to a new insight into the structure of the
Gittins index for Bayesian bandits. Specifically, we show
that the approximate Gittins index is equal to the sum of
three terms: a myopic reward, an exploration-boost, and a
learning rate adjustment. The first two terms can be thought
of as the Bayesian counterpart to the well-known upper
confidence bound from the bandit literature, while the third
term is new. This is joint work with Andrew Lim.