### abstract ###
In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of  SYMBOL  trials so as to maximize the total payoff of the chosen strategies
While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications such as online auctions and web advertisement
The goal of such research is to identify broad and natural classes of strategy sets and payoff functions which enable the design of efficient solutions
In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric
We refer to this problem as the  Lipschitz MAB problem
We present a solution for the multi-armed problem in this setting
That is, for every metric space  SYMBOL  we define an isometry invariant  SYMBOL  which bounds from below the performance of Lipschitz MAB algorithms for  SYMBOL , and we present an algorithm which comes arbitrarily close to meeting this bound
Furthermore, our technique gives even better results for benign payoff functions
### introduction ###
\newcommand{\willcite}[1][Cite]{{[#1]}}  In a multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of  SYMBOL  trials so as to maximize the total payoff of the chosen strategies
These problems are the principal theoretical tool for modeling the exploration/exploitation tradeoffs inherent in  sequential decision-making under uncertainty
Studied intensively for the last three decades~ CITATION , bandit problems are having an increasingly visible impact on computer science because of their diverse applications including online auctions, adaptive routing, and the theory of learning in games
The performance of a multi-armed bandit algorithm is often evaluated in terms of its  regret , defined as the gap between the expected payoff of the algorithm and that of an optimal strategy
While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with exponentially or infinitely large strategy sets are still a topic of very active investigation~ CITATION
Absent any assumptions about the strategies and their payoffs, bandit problems with large strategy sets allow for no non-trivial solutions --- any multi-armed bandit algorithm performs as badly, on some inputs, as random guessing
But in most applications it is natural to assume a structured class of payoff functions, which often enables the design of efficient learning algorithms~ CITATION
In this paper, we consider a broad and natural class of problems in which the structure is induced by a metric on the space of strategies
While bandit problems have been studied in a few specific metric spaces (such as a one-dimensional interval) ~ CITATION , the case of general metric spaces has not been treated before, despite being an extremely natural setting for bandit problems
As a motivating example, consider the problem faced by a website choosing from a database of thousands of banner ads to display to users, with the aim of maximizing the click-through rate of the ads displayed by matching ads to users' characterizations and the web content that they are currently watching
Independently experimenting with each advertisement is infeasible, or at least highly inefficient, since the number of ads is too large
Instead, the advertisements are usually organized into a taxonomy based on metadata (such as the category of product being advertised) which allows a similarity measure to be defined
The website can then attempt to optimize its learning algorithm by generalizing from experiments with one ad to make inferences about the performance of similar ads~ CITATION
Abstractly, we have a bandit problem of the following form: there is a strategy set  SYMBOL , with an unknown payoff function  SYMBOL  satisfying a set of predefined constraints of the form  SYMBOL  for some  SYMBOL  and  SYMBOL
In each period the algorithm chooses a point  SYMBOL  and observes an independent random sample from a payoff distribution whose expectation is   SYMBOL
A moment's thought reveals that this abstract problem can be regarded as a bandit problem in a metric space
Specifically, if  SYMBOL  is defined to be the infimum, over all finite sequences  SYMBOL  in  SYMBOL , of the quantity  SYMBOL , then  SYMBOL  is a metric and the constraints  SYMBOL  may be summarized by stating that  SYMBOL  is a Lipschitz function (of Lipschitz constant  SYMBOL ) on the metric space  SYMBOL
We refer to this problem as the  Lipschitz MAB problem  on  SYMBOL , and we refer to the ordered triple  SYMBOL  as an  instance  of the Lipschitz MAB problem \xhdr{Prior work }  While our work is the first to treat the Lipschitz MAB problem in general metric spaces, special cases of the problem are implicit in prior work on the continuum-armed bandit problem~ CITATION  --- which corresponds to the space  SYMBOL  under the metric  SYMBOL ,  SYMBOL  --- and the experimental work on ``bandits for taxonomies''~ CITATION , which corresponds to the case in which  SYMBOL  is a tree metric
Before describing our results in greater detail, it is helpful to put them in context by recounting the nearly optimal bounds for the one-dimensional continuum-armed bandit problem, a problem first formulated by R ~Agrawal in 1995~ CITATION  and recently solved (up to logarithmic factors) by various authors~ CITATION
In the following theorem and throughout this paper, the  regret  of a multi-armed bandit algorithm  SYMBOL  running on an instance  SYMBOL  is defined to be the function  SYMBOL  which measures the difference between its expected payoff at time  SYMBOL  and the quantity  SYMBOL
The latter quantity is the expected payoff of always playing a strategy  SYMBOL  if such strategy exists \OMIT{ %%%%%%%%%%%% For any  SYMBOL , there is an algorithm  SYMBOL  for the Lipschitz MAB problem on  SYMBOL  whose regret on any instance  SYMBOL  satisfies  SYMBOL  For any  SYMBOL  there does not exist an algorithm  SYMBOL  for the Lipschitz MAB problem on  SYMBOL  which satisfies  SYMBOL  for every  SYMBOL  and every instance  SYMBOL  } %%%%%%%%%%%%%%%%  In fact, if the time horizon  SYMBOL  is known in advance, the upper bound in the theorem can be achieved by an extremely na\"{i}ve algorithm which simply uses an optimal  SYMBOL -armed bandit algorithm (such as the \textsc{ucb1} algorithm~ CITATION ) to choose strategies from the set  SYMBOL , for a suitable choice of the parameter  SYMBOL
While the regret bound in Theorem~ is essentially optimal for the Lipschitz MAB problem in  SYMBOL , it is strikingly odd that it is achieved by such a simple algorithm
In particular, the algorithm approximates the strategy set by a fixed mesh  SYMBOL  and does not refine this mesh as it gains information about the location of the optimal strategy
Moreover, the metric contains seemingly useful proximity information, but the algorithm ignores this information after choosing its initial mesh
Is this really the best algorithm
A closer examination of the lower bound proof raises further reasons for suspicion: it is based on a contrived, highly singular payoff function  SYMBOL  that alternates between being constant on some distance scales and being very steep on other (much smaller) distance scales, to create a multi-scale ``needle in haystack'' phenomenon which nearly obliterates the usefulness of the proximity information contained in the metric  SYMBOL
Can we expect algorithms to do better when the payoff function is more benign
For the Lipschitz MAB problem on  SYMBOL , the question was answered affirmatively in~ CITATION  for some classes of instances, with algorithms that are tuned to the specific classes \OMIT{ %%%%%%%%%%%%%%%%%%%%%%% For the Lipschitz MAB problem on  SYMBOL , the question was answered affirmatively by Cope~ CITATION  and an even stronger affirmative answer was provided by Auer  et
al ~ CITATION
For example, a special case of the main result in~ CITATION  shows that if the payoff function  SYMBOL  is twice differentiable with finitely many maxima each having a nonzero second derivative, then regret  SYMBOL  can be achieved by modifying the na\"{i}ve algorithm described above to sample uniformly at random from the interval  SYMBOL  instead of deterministically playing  SYMBOL
Our Theorem~, stated below, reveals a similar phenomenon in general metric spaces: it is possible to define algorithms whose regret outperforms the per-metric optimal algorithm when the input instance is sufficiently benign } %%%%%%%%%%%%%%%%%%%%%%%%%%  \xhdr{Our results and techniques }  In this paper we consider the Lipschitz MAB problem on arbitrary metric spaces
We are concerned with the following two main questions motivated by the discussion above:  [(i)] What is the best possible bound on regret for a given metric space [(ii)] Can one take advantage of benign payoff functions
In this paper we give a complete solution to (i), by describing for every metric space  SYMBOL  a family of algorithms which come arbitrarily close to achieving the best possible regret bound for  SYMBOL
We also give a satisfactory answer to (ii); our solution is arbitrarily close to optimal in terms of the zooming dimension defined below
In fact, our algorithm for (i) is an extension of the algorithmic technique used to solve (ii) \OMIT{ %%%%%%%%%%%%%%%%%%%% Our main technical contribution is a new algorithm, the  zooming algorithm , that combines the upper confidence bound technique used in earlier bandit algorithms such as \textsc{ucb1} with a novel  adaptive refinement  step that uses past history to zoom in on regions near the apparent maxima of  SYMBOL  and to explore a denser mesh of strategies in these regions
This algorithm is a key ingredient in our design of an optimal bandit algorithm for every metric space  SYMBOL
Moreover, we show that the zooming algorithm can perform significantly better on benign problem instances
That is, for every instance  SYMBOL  we define a parameter called the  zooming dimension  which is often significantly smaller than  SYMBOL , and we bound the algorithm's performance in terms of the zooming dimension of the problem instance
Since the zooming algorithm is self-tuning, it achieves this bound without requiring prior knowledge of the zooming dimension } %%%%%%%%%%%%%%%%%%%%%%%   Our main technical contribution is a new algorithm, the  zooming algorithm , that combines the upper confidence bound technique used in earlier bandit algorithms such as \textsc{ucb1} with a novel  adaptive refinement  step that uses past history to zoom in on regions near the apparent maxima of  SYMBOL  and to explore a denser mesh of strategies in these regions
This algorithm is a key ingredient in our design of an optimal bandit algorithm for every metric space  SYMBOL
Moreover, we show that the zooming algorithm can perform significantly better on benign problem instances
That is, for every instance  SYMBOL  we define a parameter called the  zooming dimension , and use it to bound the algorithm's performance in a way that is often significantly stronger than the corresponding per-metric bound
Note that the zooming algorithm is  self-tuning , i e it achieves this bound without requiring prior knowledge of the zooming dimension
To state our theorem on the per-metric optimal solution for (i), we need to sketch a few definitions which arise naturally as one tries to extend the lower bound from~ CITATION  to general metric spaces
Let us say that a subset  SYMBOL  in a metric space  SYMBOL  has covering dimension  SYMBOL  if it can be covered by  SYMBOL  sets of diameter  SYMBOL  for all  SYMBOL
A point  SYMBOL  has local covering dimension  SYMBOL  if it has an open neighborhood of covering dimension  SYMBOL
The space  SYMBOL  has max-min-covering dimension  SYMBOL  if it has no subspace whose local covering dimension is uniformly bounded below by a number greater than  SYMBOL  \OMIT{ %%%%%%%%%%%%%%% For metric spaces which are highly homogeneous (in the sense that any two \eps-balls are isometric to one another) the theorem follows easily from a refinement of the techniques introduced in~ CITATION ; in particular, the upper bound can be achieved using a generalization of the na\"{i}ve algorithm described earlier } %%%%%%%%%%%%%%%  In general  SYMBOL  is bounded above by the covering dimension of  SYMBOL
For metric spaces which are highly homogeneous (in the sense that any two \eps-balls are isometric to one another) the two dimensions are equal, and the upper bound in the theorem can be achieved using a  generalization of the na\"{i}ve algorithm described earlier
The difficulty in Theorem~ lies in dealing with inhomogeneities in the metric space
It is important to treat the problem at this level of generality, because some of the most natural applications of the Lipschitz MAB problem, eg the web advertising problem described earlier, are based on highly inhomogeneous metric spaces (That is, in web taxonomies, it is unreasonable to expect different categories at the same level of a topic hierarchy to have the roughly the same number of descendants )  The algorithm in Theorem~ combines the zooming algorithm described earlier with a delicate transfinite construction over closed subsets consisting of ``fat points'' whose local covering dimension exceeds a given threshold  SYMBOL
For the lower bound, we craft a new dimensionality notion, the max-min-covering dimension introduced above, which captures the inhomogeneity of a metric space, and we connect this notion with the transfinite construction that underlies the algorithm
For ``benign'' input instances we provide a  better performance guarantee for the zooming algorithm
The lower bounds in Theorems~ and~ are based on contrived, highly singular, ``needle in haystack'' instances in which the set of near-optimal strategies is astronomically larger than the set of precisely optimal strategies
Accordingly, we quantify the tractability of a problem instance in terms of the number of near-optimal strategies
We define the  zooming dimension  of an instance  SYMBOL  as the smallest  SYMBOL  such that the following covering property holds: for every  SYMBOL  we require only  SYMBOL  sets of diameter  SYMBOL  to cover the set of strategies whose payoff falls short of the maximum by an amount between  SYMBOL  and  SYMBOL
The zooming dimension can be significantly smaller than the max-min-covering dimension \OMIT{ve algorithm from Theorem~ performs poorly compared to the zooming algorithm }} Let us illustrate this point with two examples (where for simplicity the max-min-covering dimension is equal to the covering dimension) \OMIT{ %%% First, if  SYMBOL  is the Euclidean metric on a unit interval, and  SYMBOL  is a twice-differentiable function with negative second derivative at the optimal strategy  SYMBOL , then the zooming dimension is only  SYMBOL  whereas the covering dimension is  SYMBOL } %%% For the first example, consider a metric space consisting of a high-dimensional part and a low-dimensional part
For concreteness, consider a rooted tree  SYMBOL  with two top-level branches  SYMBOL  and  SYMBOL  which are complete infinite  SYMBOL -ary trees,  SYMBOL
Assign edge weights in  SYMBOL  that are exponentially decreasing with distance to the root, and let  SYMBOL  be the resulting shortest-path metric on the leaf set  SYMBOL
If there is a unique optimal strategy that lies in the low-dimensional part  SYMBOL  then the zooming dimension is bounded above by the covering dimension of  SYMBOL , whereas the ``global'' covering dimension is that of  SYMBOL
In the second example, let  SYMBOL  be a homogeneous high-dimensional metric, eg the Euclidean metric on the unit  SYMBOL -cube, and the payoff function is  SYMBOL  for some subset  SYMBOL
Then the zooming dimension is equal to the covering dimension of  SYMBOL , eg it is  SYMBOL  if  SYMBOL  is a finite point set { %%%%%%%%%%%%%%%%%%%%%%%%%


SYMBOL  with the standard metric  SYMBOL ,


than the local covering dimension at the point  SYMBOL  where  SYMBOL  is maximized } %%%%%%%%%%%%%%%%     \xhdr{Discussion } In stating the theorems above, we have been imprecise about specifying the model of computation
In particular, we have ignored the thorny issue of how to provide an algorithm with an input containing a metric space which may have an infinite number of points
The simplest way to interpret our theorems is to ignore implementation details and interpret ``algorithm'' to mean an abstract decision rule, i e a (possibly randomized) function mapping a history of past observations  SYMBOL  to a strategy  SYMBOL  which is played in the current period
All of our theorems are valid under this interpretation, but they can also be made into precise algorithmic results provided that the algorithm is given appropriate oracle access to the metric space
In most cases, our algorithms require only a  covering oracle  which takes a finite collection of open balls and either declares that they cover  SYMBOL  or outputs an uncovered point
We refer to this setting as the \standardMAB
For example, the zooming algorithm uses only a covering oracle for  SYMBOL , and requires only one oracle query per round (with at most  SYMBOL  balls in round  SYMBOL )
However, the per-metric optimal algorithm in Theorem~  uses more complicated oracles, and we defer the definition of these oracles to Section~ \OMIT{ %%%%%% the algorithm is very efficient, requiring only  SYMBOL  operations in total (including oracle queries) to choose its first  SYMBOL  strategies \BobbyNote{Prove the  SYMBOL  bound in "body" } } %%%%%  While our definitions and results so far have been tailored for the Lipschitz MAB problem on infinite metrics, some of them can be extended to the finite case as well
In particular, for the zooming algorithm we obtain sharp results (that are meaningful for both finite and infinite metrics) using a more precise,  non-asymptotic  version of the zooming dimension
Extending the notions in Theorem~ to the finite case is an open question \OMIT{ While our definitions and results so far have been tailored for the Lipschitz MAB problem on infinite metrics, they can be extended to the finite case as well
In particular, for the zooming algorithm we obtain sharp results (that are meaningful for both finite and infinite metrics) using a more precise,  non-asymptotic  version of the zooming dimension
Extending the notions in Theorem~ to the finite case is feasible but more complicated; we leave it to the full version } %%%    \xhdr{Extensions } We provide a number of extensions in which we elaborate on our analysis of the zooming algorithm
First, we provide sharper bounds for several examples in which the reward from playing each strategy  SYMBOL  is  SYMBOL  plus an independent  noise  of a known and ``benign" shape
Second, we upgrade the zooming algorithm so that it satisfies the guarantee in Theorem~  and  enjoys a better guarantee if the maximal reward is exactly 1
Third, we apply this result to a version where  SYMBOL  for some  target set   SYMBOL  which is not revealed to the algorithm
Fourth, we relax some assumptions in the analysis of the zooming algorithm, and use this generalization to analyze the version in which  SYMBOL  for some known function  SYMBOL
Finally, we extend our analysis from reward distributions supported on  SYMBOL  to those with unbounded support and finite absolute third moment \OMIT{ Some of our initial motivation for this project came from the online advertizing scenario described in the introduction
We follow this motivation further in Appendix~ and consider a multi-round game such that in each round an adversary selects a webpage and the algorithm selects an ad which it places on this webpage
We assume that we have a Lipschitz condition on the product (webpages SYMBOL ads) space, and we give an algorithm whose regret dimension (as defined in Section~) is upper-bounded in terms of (essentially) the covering dimension
Although the algorithm is based in the ``na\"{i}ve'' algorithm from Theorem~, the adversarial aspect of the problem creates considerable technical challenges
In future work we hope to pursue more refined guarantees in the style of Section~ }  \OMIT{ %%%%%%%%% Ideally, it would be desirable to have a matching lower bound constituting a  per-instance optimality  guarantee for the zooming algorithm or some other algorithm
The goal, when stated in this form, is plainly unachievable
For any given instance  SYMBOL , if  SYMBOL  is a point where  SYMBOL  achieves its maximum, then the algorithm which always plays strategy  SYMBOL  has zero regret
Nevertheless, one might hope for a subtler characterization of per-instance optimality, eg asserting that no algorithm can outperform  SYMBOL  on one instance  SYMBOL  without performing significantly worse than  SYMBOL  on highly similar instances  SYMBOL
While we have been unable to prove such guarantees for the zooming algorithm, the question of per-instance optimality is an attractive topic for further investigation } %%%%%%%%%%%%%%%%%   \xhdr{Follow-up work } For metric spaces whose max-min-covering dimension is exactly 0, this paper provides an upper bound  SYMBOL  for any  SYMBOL , but no matching lower bound
Characterizing the optimal regret for such metric spaces remained an open question
Following the publication of the conference version, this question has been settled in~ CITATION , revealing the following dichotomy: for every metric space, the optimal regret of a Lipschitz MAB algorithm is either bounded above by any  SYMBOL , or bounded below by any  SYMBOL , depending on whether the completion of the metric space is compact and countable
