• Nem Talált Eredményt

Bibliographic Remarks 3.9 Exercises

DRAFT

3.8 Bibliographic Remarks 3.9 Exercises

Exercise 3.1. (Lower Bound on the Regret of Deterministic Forecasters) Let D be a finite non-empty set with|D| ≥2 and L=

`(y) : y∈D be the set of loss functions, where `(y)(u) = I{u6=y}. Show that the worst-case regret of any deterministic algorithm in aRegret Minimization Game with decision set D, set of loss functions L, and time horizonn is at least n(1−1/|D|).

Hint: Find a loss sequence such that Lbn =n and mini∈DLn(i) =bn/|D|c.

Exercise 3.2. (Worst-Case Regret for Deterministic Environments) Prove inequal-ity (3.1).

Exercise 3.3. (Lower Bound in the Minimax Theorem) Let AandB two sets. Show that for any function f :A×B →R,

sup

b∈B

a∈Ainf f(a, b)≤ inf

a∈Asup

b∈B

f(a, b).

Exercise 3.4. (Maximum of Linear Functional) Let ∆d be thed-dimensional proba-bility simplex.

(a) Show that for any vector v ∈Rd,

maxq∈∆dqTv = max

1≤i≤dvi = max

1≤i≤deTi v

p∈∆mindpTv = min

1≤i≤dvi = min

1≤i≤deTi v .

(b) Let M be any a×b matrix with real entries. Show that for any vector p∈Ra, maxq∈∆bpTM q= max

1≤j≤bpM q .

DRAFT

62 CHAPTER 3. FULL INFORMATION REGRET MINIMIZATION GAMES Show that for any vector q ∈Rb,

p∈∆mina

pTM q = min

1≤i≤aeTi M q .

(c) Consider any Regret Minimization Game with finite decision set and finite set of loss functions. Show that worst-case expect regret, supEE

RA,En

, is the same regard-less of whether the supremum is taken over all deterministic environments or over all environments.

(d) Let M be any a×b matrix with real entries. Prove that

p∈∆mina

maxq∈∆b pTM q = min

p∈∆a

1≤j≤bmaxpM q , maxq∈∆b min

p∈∆a

pTM q = max

q∈∆b min

1≤i≤a eTi M q .

Exercise 3.5. (A Weaker Notion of Regret) Prove inequality (3.2).

Hint: Use the fact that if Xp is a family of random variable then E[infpXp]≤infpEXp.

*** Exercise 3.6. (Adaptive Environments Are Powerful) Show that adaptive environments are more powerful than oblivious environments. Specifically, specify (D,L) such that ifVnoblivious is the minimax regret against oblivious environments andVnadaptive is the minimax regret against adaptive environments on a horizon n then Vnoblivious =o(Vnadaptive) as n→ ∞.

DRAFT

Part II

Expert Framework

DRAFT

DRAFT

65 Why should we care? A: Fixes the basic ideas, the basic results. Simple framework, simple tools, yet they go a long way: The algorithms have found a wide range of use even beyond online learning.

How should this be read?

DRAFT

66

DRAFT

Chapter 4

Mistake Bounds for the Zero-One Loss

Y

ou wake up in the morning and you want to decide what to wear for the day.

You open your browser and check the forecast. At your disposal are the forecasts of N website: These days, weather prediction websites are growing like mushroom in the rain and each of them seems to be an expert in predicting the weather.

Whose prediction should you trust? Should it be the national weather station, your favourite newspaper’s forecast, or the forecast of an independent forecaster who you came across a couple of weeks ago? Instead of building the best predictor in the world, you may settle on a

“modest goal” to predict almost as well as the best website, at least in the long run.

Denoting by D the set of possible forecasts (e.g., D = {‘sunshine’,’rain’,’snow’}), by ft,i ∈Dthe forecast of theith website on the morning of thetth day, by pt∈Dthe prediction for the same day, and by yt ∈D the actual weather, we arrive at the Prediction With Expert Advice problem

Prediction With Expert Advice with Zero-One Loss

In round t= 1,2, . . .:

• Receive the predictions ft,1, ft,2, . . . , ft,N ∈Dof the experts.

• Predict pt∈D based on past information.

• Receive an outcome yt∈D.

• Incur the loss I{pt6=yt}.

Note that the forecaster suffers no loss if the prediction matches the actual outcome, while in the opposite case the forecaster suffers a unit loss. The case when different forecast mistakes can receive different costs which can even change from round to round will be considered in the next two chapters.

In the Prediction With Expert Advice problem with zero-one loss, the goal of the forecaster is to achieve asymptotically the average performance of the best expert, that is, to

DRAFT

68 CHAPTER 4. MISTAKE BOUNDS FOR THE ZERO-ONE LOSS minimize the regret defined as

Rn =

n

X

t=1

I{pt6=yt} − min

1≤i≤N n

X

t=1

I{ft,i 6=yt} .

This regret definition is not the same as the one we used in the definition of Regret Minimization Game, where the comparison was done to the best fixed prediction in hindsight, and, in a strict sense, the Prediction With Expert Advice problem cannot be stated as a Regret Minimization Game. However, we can redefine the problem such that the decision set is D0 = {1, . . . , N} and the loss of a prediction i ∈ D0 in round t is defined as `t(i) =I{ft,i 6=yt}. That is, predicting i∈ D0 means the forecaster mimics the decision of expert i. Notice that this is a restricted setting compared to our original one, as in the Prediction With Expert Advice problem the forecaster is allowed to select some pt∈D that was not selected by any of the experts, while such a choice is impossible in the new setup. However, in a worst-case sense no forecaster could gain anything by predicting a pt6∈ {ft,1, . . . , ft,N}, since the environment can always selectyt∈ {ft,1, . . . , ft,N}, maximizing the forecaster’s loss in the given round while not hurting the experts’ performance compared to a choice of yt 6∈ {ft,1, . . . , ft,N}. Thus, in a worst-case sense, the Prediction With Expert Advice problem can be equivalently reformulated as a Regret Minimization Game with decision setD0, loss function set L={`t : D0 → {0,1}} and regret

Rn=

n

X

t=1

`t(pt)− min

1≤i≤N n

X

t=1

`t(i) .

The problem defined this way is a strict generalization of the matching pennies problem discussed in Chapter 3 (see also Exercise 3.1). There we showed that no deterministic algorithm can achieve sublinear worst-case regret if no additional assumptions are made.

One way to overcome this limitation is to assume that there is an expert who makes no mistake during the game. Note that this implies that the regret of the forecaster is the same as its cumulated loss. In Section 4.1, we give a deterministic algorithm that achieves minimax optimal regret under this assumption. In Section 4.2 we give a deterministic, and a similar randomized algorithm, and bound the number of mistakes they make. A deterministic algorithm with sublinear regret is given in Chapter 5 under some assumptions on the loss functions and the prediction set, while randomized algorithms with sublinear regret are given in Chapter 6.