• Nem Talált Eredményt

DRAFT

2.5 Exercises

happens,ctis slightlysmaller than 1/t: ct = 1/(t+lnn−ln(t+O(lnn))) (note thatctdepends on the horizon). The regret of this forecaster can be bounded by lnn−ln lnn+O(ln lnn/lnn) (cf. Exercise 2.14).2 As noted byTakimoto and Warmuth (2000), the term −ln lnn in the minimax regret is curious. We guess that the constant factor difference of 1/2 between this bound and our lower bound in Theorem 2.3comes from the fact that we consider the simultaneous-move version of the game.

The environment’s minimax strategy is also interesting. The minimax-optimal environment plays yt =−∆t/k∆tk, where ∆t =pt−pt with pt =ctPt−1

s=1ys and where 0/k0kis defined to be (1,0, . . . ,0)>.

Online density estimation for the case of full multivariate Gaussian is considered by Das-gupta and Hsu(2007). For fixed covariance matrices, the online log-likelihood maximization problem is more or less equivalent to the Shooting Game. When the covariance is also to be learned, the problem is ill-defined. This is easiest to see in the one-dimensional case: If the variance to be learned is not bounded away from zero, the losses can be unbounded.

2.5 Exercises

Exercise 2.1. (Lower Bound on the Cumulated Loss)

(a) Prove that for any point pt ∈ Rd there exists a point yt ∈ Rd such that kytk ≤ 1 and kpt−ytk ≥1.

(b) Conclude that forany deterministic prediction strategy for theShooting Game there exists a sequence y1, y2, . . . , yn such that the total loss Pn

t=1kpt−ytk2 aftern rounds is at least n.

(c) This leaves open the question of whether there exist randomized strategies that achieve a smaller loss. Extending the previous result show that there exists a constant c > 0 such that no matter what prediction strategy is used (randomized or not), the total expected loss after n rounds is at least cn. Argue that there exist a simple strategy for the forecaster that achieves a constant multiple of the payoff of the minimax forecaster for the cumulated loss-minimization game. Thus, similarly to what we have seen in the previous chapter, minimax strategies for cumulated loss minimization are uninteresting.

Exercise 2.2. (Steiner’s Lemma) Lety1, y2, . . . , yn be points in Rd. Prove that argmin

u∈Rd n

X

t=1

ku−ytk2 = 1 n

n

X

t=1

yt. Sometimes, this simple result is called Steiner’s lemma.

Hint: Calculate the gradient of Ln(u) =Pn

t=1ku−ytk2 and find the point at which it is zero.

2We have rescaled their bound to match the scaling that we use in this chapter.

DRAFT

46 CHAPTER 2. SHOOT HIGH, AIM LOW

Exercise 2.3. (Harmonic Numbers) The sum Pn t=1

1

t is called the n-th harmonic number. In it useful to know upper and lower bounds for it.

(a) Prove that

n

X

t=1

1

t <1 + Z n

1

1

t dt= 1 + lnn .

With a much bigger effort, it is possible to prove a marginally stronger inequality

n

X

t=1

1

t < γ+ lnn

whereγ ≈0.5772156649. . . is the Euler-Mascheroni constant.

(b) Prove that

n

X

t=1

1 t >

Z n 1

1

t dt = ln(n+ 1) . Hint: Draw a picture.

Exercise 2.4. (Lower Bounds for the Regret of Arbitrary Deterministic Algo-rithms: Part I)

(a) Consider anarbitrary deterministic online algorithm Afor the Shooting Gamemaking predictions in the unit ball of Rd. Show that for any n ≥ 1 there exists a sequence y1, y2, . . . , yn such thatA has non-positive regret.

(b) Consider an arbitrary deterministic online algorithmA for the Shooting Game. Show that for any n≥1 there exists a sequence y1, y2, . . . , yn in the unit ball ofRd such that A has non-negative regret.

Hint: Since A is deterministic, its prediction at time step t can be computed beforehand, and so yt can be selected to achieve the desired goal.

Exercise 2.5. (Lower Bounds for the Regret of FTL) Consider the Follow The Leader Algorithm for the Shooting Game.

(a) Prove that for any sequence y1, y2, . . . and any number of rounds n the regret of the algorithm is non-negative.

(b) According to Theorem 2.2, the regret of FTL on any sequence y1, y2, . . . , yn in the unit ball of Rd is O(logn). Show that the upper bound O(logn) for FTL is tight. More precisely, for any n ≥ 1, construct a sequence y1, y2, . . . , yn in the unit ball such that FTL has Ω(logn) regret on this sequence.

Hint: Consider the sequence yt= (−1)tv where v is an arbitrary unit vector. First solve the case when n is even. The inequality from Exercise2.3(b) might be useful.

DRAFT

2.5. EXERCISES 47

Exercise 2.6. (Lower Bound for FTL on i.i.d. Sequences) Consider the proof of Theorem 2.3. Finish the calculations by assuming that Y1, Y2, . . . , Yn is an i.i.d. zero-mean sequence. Compare the result with the lower bound of Theorem2.3 and explain in intuitive terms what happened.

Exercise 2.7. (Upper Bound for FTL on i.i.d. Sequences) Derive an upper bound on the expected regret of FTL on i.i.d. sequences. Express the bound in terms of a=E[kYtk2] and µ=E[Yt].

** Exercise 2.8. (i.i.d. Sequences: Part III.) In Exercise 2.6you were asked to lower bound the regret of FTL on i.i.d. sequences. In Exercise 2.7, you were asked to derive an upper bound on the regret of FTL on i.i.d. sequences. If you did everything correctly, the lower and upper bounds will not match up. Close the gap between them (up to a constant factor). Do you think that we need correlated stochastic environments to force all algorithms to suffer Ω(logn) expected regret on the Shooting Game?

Exercise 2.9. Prove Eq. (2.3).

Exercise 2.10. Prove Eq. (2.4).

Hint: Introduce ct,s = E[hYt, Ysi], 1 ≤ s, t ≤ n and build a recursion. For building the recursion consider s < t and s=t separately. Then draw a table and recognize the pattern;

you can calculate the diagonals of (ct,s) first. Then calculate the first column, then the second, third, etc.

Exercise 2.11. (Shooting at a Box) Consider the shooting game when the environment’s choices are restricted to lie inside the box [−1,1]d instead of the unit ball. Modify the proofs of the upper and lower bounds (Theorems2.2 and 2.3). What changes in the bounds? Are the bounds still independent of the dimension d? Explain (intuitively) why.

Exercise 2.12. (Shooting at a Convex Set) As a further generalization of Exercise2.11, consider any non-empty closed convex subsetK of the d-dimensional Euclidean space and assume that the environment’s choices are restricted to lie inside K. Answer the same questions as in Exercise 2.11.

** Exercise 2.13. (Shrinkage Estimator) Consider pt= t−1+a1 (y1+· · ·+yt−1). Prove that for a= 1 the regret of this algorithm is 1 + ln(n). Extend the result to an arbitrary value ofa >0. How would you choosea given n?

Hint: Follow Azoury and Warmuth (2001). Alternatively, for yt∈R (1-dimension version), look at the proof of Theorem 19.6 in Chapter 19, where you could choosext= 1 (using the notation of that chapter).

Exercise 2.14. (Nonstationary Shrinkage Estimator) (Takimoto and Warmuth(2000)) Consider the forecaster strategy pt = ct(Pt−1

s=1ys), where ct is defined by the recursion:

cn= 1/n, ct−1 =ct+c2t (1≤t ≤n). Show that the regret of this forecaster is Pn

t=1ctkysk2 and thus, it can be bounded by lnn−ln lnn+O(ln lnn/lnn). Explain intuitively how the original authors may have come up with the sequence (ct).

DRAFT

48 CHAPTER 2. SHOOT HIGH, AIM LOW

Exercise 2.15. (Simulate the Minimax Players) Write a code that simulates the minimax players of Takimoto and Warmuth(2000) against a number of different forecaster-environment strategies in the alternating-move version of the game (cf. Section 2.4 for a description of the players and this version). Run simulations, produce graphs summarizing and observe the behavior of the players. What happens when the minimax players play against each other?

Exercise 2.16. (Lower Bounds for the Regret of Arbitrary Deterministic Algo-rithms: Part II) (Takimoto and Warmuth(2000)) Show that the regret of any deterministic forecaster is lower bounded by lnn−ln lnn+O(ln lnn/lnn).

Hint: Consider the minimax environment described in Section 2.4.

Exercise 2.17. (Gaussian Density Estimation) A d-dimensional normal distribution is specified by its mean µ ∈ Rd and a positive definite covariance matrix Σ ∈ Rd×d. Its density functions is

fµ,Σ(x) = 1

(2π)d/2det(Σ)1/2exp

−1

2(x−µ)TΣ−1(x−µ)

.

(a) Let I be thed×d identity matrix. Show that the negative logarithm of likelihood of fµ,I

given observation x is

−ln(fµ,I(x)) = 1

2kx−µk2+d

2ln(2π).

(b) Consider the density estimation problem from Section1.2.7with the class ofd-dimensional Gaussian densities D = {fµ,I : kµk ≤ √

2}. Show that the problem is equivalent to Shooting Game in the sense that for any algorithm for one of the two problems there exists an algorithm with the same worst-case regret for the other problem.

Exercise 2.18. (Shooting with a Distorted Distance Measure) Fix a positive definite matrix Q∈ Rd×d. Assume that the accuracy of prediction is measured using the squared weighted norm kxk2Q =hx, Qxi, while the environment is restricted to choose the points in a non-empty closed convex set K ⊂Rd. Modify the proofs of the upper and lower bounds in Theorems 2.2 and 2.3. What changes in the bounds? Explain why.

** Exercise 2.19. (Minimax Players for the Shooting Game) Construct the minimax players for the version of the Shooting Game introduced in this chapter (i.e., when the forecaster and the environment choose their moves simultaneously in every round). Derive the minimax regret.

Hint: Use backward induction to figure out the optimal action-value function,Vn,t :M1(Rd)× Ht→R, where M1(Rd) is the space of probability distributions overRd,

Ht=

(p1, y1, . . . , pt−1, yt−1) : ps ∈Rd, ys∈B,1≤s≤t−1

is the set of possible histories at the beginning of roundt andVn,t(µ,(p1, y1, . . . , pt−1, yt−1)) is defined to be the expected regret of the forecaster for the remaining n−t rounds provided

DRAFT

2.5. EXERCISES 49

that the players played (p1, y1, . . . , pt−1, yt−1) in the firstt−1 rounds, the forecaster’s choice for round t is Pt∼µ and in the remaining rounds both the forecaster and the environment play optimally. The optimal decision in round t is then

µt = arg min

µ∈Rd

Vn,t(µ;,(p1, y1, . . . , pt−1, yt−1)).

DRAFT

50 CHAPTER 2. SHOOT HIGH, AIM LOW

DRAFT

Chapter 3

Full Information Regret Minimization Games

W

e defined informally the rules of a Regret Minimization Game in Chapter 1. In this chapter, we define them formally. In order to do that, we will need to define what a strategy is, both for the forecaster and the environment. A pair of strategies, one for each player, specifies what happens in every round of the game. This allows us to define the notion of worst-case regret for a forecaster and minimax regret of a Regret Minimization Game. We will consider the so-called full information setting, when, at the end of each round, the forecaster can observe the loss functions selected by the environment, and the environment can observe the predictions of the forecaster.

We allow both players to randomize. This will be important in Chapter 6 where ran-domization will allow us to construct a forecaster with sublinear regret for the Prediction With Expert Advice problem. That environments can randomize will allow us to make connections to statistical learning theory in Chapter23. Random environments are also useful for proving lower bounds on the regret, even for problems where we care about deterministic environments, as was already demonstrated in Chapter 2.

In this chapter, we will also investigate the relationship between various restricted strategy sets. We define deterministic strategies and we look at their relation to randomized strategies.

We define the class of oblivious environments, which do not react to the forecaster’s predictions.

Oblivious environments are seemingly weaker than adaptive environments that react to the forecaster’s predictions. However, as it will turn out, the restriction to oblivious strategies does not affect the minimax regret of a Regret Minimization Game, a property that certainly does not hold in other classes of games.

For simplicity, the statements in this chapter are proven for finite action and loss spaces.

However, this is done merely to avoid certain technical issues (e.g., measurability and topological questions). However, the results extend to the case when the action and loss spaces are not too big, i.e., when the action space is a compact, convex subset of a Euclidean space and the losses can be described by finite-dimensional parameter vectors, also belonging to a compact subset of a Euclidean space.

DRAFT

52 CHAPTER 3. FULL INFORMATION REGRET MINIMIZATION GAMES