Bibliographical Notes 12.11 Exercises12.11Exercises

DRAFT

Chapter 10 Tracking

12.10 Bibliographical Notes 12.11 Exercises12.11Exercises

Exercise 12.1. (Basics of Convex Sets) Prove the following: The full space E is convex, as is the empty set, ∅. The intersection of any number of convex sets is convex, but the union of convex sets is not necessarily convex.

Exercise 12.2. (Perspective Map) The perspective map P : R^d+1 → R^d with domain dom(P) =R^d×(0,+∞) is defined by P(x, t) = x/t. Show that if C ⊂ dom(P) is convex then P(C) .

={P(x) : x∈C}is also convex. Further, if C ⊂R^d is convex then the inverse image of C underP, i.e., P⁻¹(C) .

={x∈dom(f) : P(x)∈C}, is also convex. The naming comes from that one can view P as the action of a pinhole camera, where the scene pictured is dom(P), xd+1 is the distance of pointx= (x1, . . . , xd+1) from the camera. The image plane of the camera is located the set of points whose distance from the camera plane is−1. The pin-hole that lets the light pass through the camera plane is at the origin.

Exercise 12.3. Letf :E →Rbe convex. Show that if there exists a pointxin the domain of f where f(x) =−∞ then f|int dom(f) ≡ −∞.

Exercise 12.4. (Continuity of Convex Functions) Let f :E →(−∞,+∞] be convex.

Show that int dom(f) = cont(f). Note that the proof strongly exploits that E is finite dimensional.

Exercise 12.5. (Continuity vs. Lower Semicontinuity) Show that any proper, convex, l.s.c. function over the real line is continuous on the closure of its domain, i.e., for any (xn)⊂domf,xn →x implies that f(xn)→f(x). However, the same does not hold already

on the two-dimensional plane, i.e., when E =R².

Hint: For the second part, consider the function f(x, y) = ^y_x²I{x >0}+ιA(x, y), where A=R²\ {(x, y) : x >0 or (x, y) = (0,0)} and show that f has the required properties, yet it is not continuous on its domain at (0,0).

Exercise 12.6. (Lower Semicontinuity I.) Let f : E → R be l.s.c. and convex. Show the following:

DRAFT

12.11. EXERCISES 171

1. If −∞ ∈ran(f) then ran(f)⊂ {−∞,+∞}.

2. If f is proper then R∩ran(f) is convex. Give a function with the said properties when ran(f) itself is not convex.

3. Give an example of a convex function g :E →R such that ran(g) ={−∞,0,+∞}. Exercise 12.7. (Lower Semicontinuity II.) Let f1, f2 : E → R be l.s.c. Show that f1+f2 :E →Ris also l.s.c. Further, iffα :E →Ris l.s.c.,α∈A 6=∅then sup_α∈Afα :E →R is also l.s.c.

Exercise 12.8. (Continuity of Convex Functions II) Let f : E → (−∞,+∞] be convex. True or false? ri(dom(f)) = cont(f). Prove your claim.

Exercise 12.9. (Equivalent Definitions of Convexity) Prove Proposition 12.3.

Exercise 12.10. (Construction of Convex Functions) Prove the following statements:

(a) The sum of finitely many convex functions is also convex.

(b) If f : E → R, g : F → R are convex, V = E ⊕F is the direct sum of E and F (i.e., V ={(x, y) : x∈E, y ∈F}, addition of vectors is componentwise, as is multiplication by reals), the direct sum map, f ⊕g : (x, y)7→ f(x) +g(y) defined on V is convex. A function on a direct sum space that is itself the direct sum of some functions is also called separable.

(c) The supremum of any number of convex functions is also convex: x 7→sup_i∈Ifi(x) is convex where I is any index set and for each i∈I fi :E →R is convex.

(d) Let E, F be Euclidean spaces, α : E → F affine, f :F → (−∞,+∞]. Then f ◦α (so that (f ◦α)(x) =f(α(x))) is convex if f is convex.

(e) If f :E →(−∞,∞] is convex and m : (−∞,∞]→(−∞,+∞] is monotone increasing and convex them m◦f is convex. In particular, if f is convex, α≥0 then z 7→αf(z) is also convex.

(f) For t≥0, the function (x, t)7→tg(x/t) is convex if and only if g is a convex function.^∗ The functiong is called the “perspective” of function f.

(g) Show that (x, t)7→ kxk²/t is convex on E×(0,+∞).

(h) Let f : R→ (−∞,+∞], g :E →R be convex. Let C = conv(R∩rang), and suppose that C⊂dom(f) and f is increasing on C (i.e., f(t)≤f(s) whenever t≤s both belong to C). Extend f to ˜f : (−∞,+∞]→(−∞,+∞] by setting ˜f(+∞) = +∞. Show that f˜◦g is convex.

(i) Formulate and prove an statement analogous to that in Part (h) but for the case when g is concave and f is decreasing.

DRAFT

172 CHAPTER 12. BACKGROUND ON CONVEXITY

Hint: For Part (f), show that epi(g) is the inverse image of epi(f) under the “perspective mapping” that maps (x, t, s)∈E×[0,∞)×Rto (x, s)/t. Then use the result of Exercise12.2.

For Part (g) use Part (f).

Exercise 12.11. (Minimum of Convex Functions)

1. x7→infy∈Cf(x, y) where f : E×F →R convex, C ⊂F is convex, provided that for some x∈E, infy∈Cf(x, y)>−∞.

2. x7→x^>(A−BC^†B^>)x, x∈R^d, where A, C are symmetric matrices and M =

A B B^> C

Here, C^† denotes the pseudo-inverse of matrix C. When C is invertible, C^†= C⁻¹ and A−BC⁻¹B^> is called the Schur complement of C in M. The result shows that the Schur complement of C is positive semidefinite if M is positive semidefinite.

3. x7→distk·k(x, C), x∈E, where C⊂ E is convex. Here distk·k(x, C) = infu∈Cku−xk is the distance of x toC.

4. x 7→ inf{f(x, z) : z ∈C(x)}, where f : E × F → R is convex, for any x ∈ E, C(x) ⊂ F, and {(x, z) : z ∈C(x)} is convex. If x is viewed as a parameter in the family of minimization problems minzf(x, z) s.t. z ∈ C(x) then the result says that the optimum value is convex as a function ofx.

Exercise 12.12. (Functions that Are Convex) Show that the following are convex functions:

(a) x7→max(hα1, xi+c1, . . . ,hαk, xi+ck),x∈E.

(b) x7→Pr

i=1x[i], where x∈E =R^d, r≤d and for x= (x1, . . . , xd), (x[1], . . . , x[d]) are the components of x sorted in nonincreasing order.

i=1wix[i], wherex∈E and w1 ≥w2 ≥. . .≥wr≥0.

(d) x7→sup_y∈Chx, yi where C ⊂E.

(e) λ 7→ −inff∈FL(f) +λg(f), λ ∈ R, L, g : F → R (i.e., λ 7→ inff∈FL(f) +λg(f) is concave).

(f) X 7→λmax(X)(=kXk²),X is a symmetric matrix.

Exercise 12.13. (Recognizing Convex Functions) Prove the following statements:

(a) A function f :E →R is convex if t 7→f(x+tv) (t∈R) is convex for any x, v ∈E.

DRAFT

12.11. EXERCISES 173

(b) Suppose that f :E →Ris differentiable (i.e., dom(f) is open and the gradient∇f off exists at each point in dom(f)). Then f is convex if and only if dom(f) is convex and f(y)≥f(x) +h∇f(x), y−xi holds for all x, y ∈dom(f).

(c) Suppose that f :E →R is differentiable. Then f is strictly convex if and only if dom(f) is convex and f(y)> f(x) +h∇f(x), y−xiholds for all x, y ∈dom(f).

(d) Suppose that f :E →R is twice differentiable and (i.e.,∇²f, the Hessian off exists at each point in dom(f)). Thenf is convex if and only if dom(f) is convex is∇²f(x)0 for all x∈dom(f).

(e) Show that the requirement that dom(f) is convex cannot be dropped from the above statements.

(f) Show that the following functions are convex: x7→e^ax,a ∈R, x∈R; x7→x^a, a≥1 or a≤0, x∈R; x7→ −x^a, 0≤a ≤1,x∈R; x7→ |x|^p,p≥1, x∈R;x7→log(1/x), x >0;

x7→xlog(x),x≥0 (by convention, 0·log 0 = 0); x7→ kxk, where k · kis any norm on R^d, x∈R^d;x7→max(x1, . . . , xd), x= (x1, . . . , xd)∈R^d;x7→ln(e^x¹+. . .+e^x^d), x∈R^d; x7→ −(x1·. . .·xd)^1/d,x∈(0,∞)^d;X 7→ −log detX, X positive definite.

Exercise 12.14. (Tangent Hyperplanes I) Remember that we say that the graphs of two functions, f1, f2 : E → R are tangent at x ∈ dom(f1)∩dom(f2) if f1(x) = f2(x) and

f1(x+u)−f2(x+u)

kuk →0 asu→0. Now, take any point xin the interior of the domain of a proper convex function f : E → R. Let Lθ,x(u) = f(x) +hθ, u−xi be the affine linear function whose graph intersects the graph of f at (x, f(x)). Show that if θ is such that the graph of Lθ,x is tangent to the graph of f at x thenf must entirely lie above the graph of Lθ,x (i.e., epi(f) ⊂epi(Lθ,x), or, in other words, graph(Lθ,x) is a supporting hyperplane to epi(f) at (x, f(x)). (Remember that a hyperplane {x∈E : hθ, xi+t = 0} of a Euclidean spaceE is called a supporting hyperplane of a setC ⊂E at pointx0 ∈C if for anyx∈C,hθ, xi+t≤0 and hθ, x0i+t = 0.)

Exercise 12.15. (Tangent Hyperplanes II) Letf :E →Rbe a proper convex function, x∈int dom(f). Show the following:

(a) If f is differentiable at x then choosing θ = ∇f(x), the derivative of f, Lθ,x(u) = hθ, u−xi+f(x) becomes tangent to f at x (i.e., the graph of f andLθ,x are tangent to each other).

(b) The epigraph off lies entirely above the graph ofLθ,x, therefore (12.2) holds.

Exercise 12.16. (Subdifferential of Improper Convex Functions) Show that if f is convex but is not proper then ∂ f(x) = E everywhere.

DRAFT

174 CHAPTER 12. BACKGROUND ON CONVEXITY

Exercise 12.17. (The Monotonicity of the Subdifferential Map) Functionf :R→R is monotone increasing if (f(x)−f(y))(x−y) ≥ 0 for any x, y ∈ R. Similarly, we call a function f :E →E monotone (we drop “increasing”) if

hf(x)−f(y), x−yi ≥0

for any x, y ∈ E. The definition can also be extended to set-valued maps: A : E → 2^E is monotone if

hθ−φ, x−yi ≥0

for any x, y ∈ E, θ ∈ A(x), φ ∈ A(y). If f : R → R is a differentiable function then f is convex if and only if f⁰ is monotone (increasing). Show that iff :E →R is convex then∂ f is a monotone map.

Exercise 12.18. Prove (b) ⇒ (c) and (c)⇒ (a) of Theorem 12.10.

Exercise 12.19. Consider Theorem12.11. Show that (12.5) implies that xis a minimizer of f over K using only the definition of subdifferentials.

Exercise 12.20. (Equivalence of Lower Semicontinuity and Closedness) Show that a functionf :R^d→R is lower semicontinuous if and only iff is closed.

Exercise 12.21. (Basic Convex Functions) Answer the following questions:

(a) Give an example of a function f :R→R that is convex but is not closed.

(b) Does there exists a functionf :E → Rwith E =R that is convex but not closed and whose domain is closed with a nonempty interior?

** Exercise 12.22. (Fenchel–Moreau I) Prove Theorem 12.13.

Hint: Use a geometric argument based on the known fact that if C1, C2 are closed convex sets with no overlap than they can be strictly separated by a hyperplane (i.e., there exists a closed halfspaceH such that C1 ⊂R

(H) and C2 ⊂E\H).

Exercise 12.23. (When is the Conjugate Everywhere Defined?) Prove Theo-rem 12.24

Hint: To prove the implication (iii)⇒ (ii), it is worthwhile to read about dual norms and the discussion surrounding Proposition12.34.

Exercise 12.24. (Elementary Properties of Conjugation) Let f :E →R. Prove the following statements:

(i) f^∗(0) =−inff(E)(=−inf{f(x) : x∈E}).

(ii) −∞ ∈f^∗(E)⇔ f ≡+∞ ⇔ f^∗ ≡ −∞.

DRAFT

12.11. EXERCISES 175

(iii) Let θ ∈E. Then

f^∗(θ) = sup

x∈dom(f) hθ, xi −f(x) = sup

(x,t)∈epi(f)hθ, xi −t . (iv) f^∗ is convex.

(v) Let g :E →R. Thenf ≤g implies thatg^∗ ≤f^∗ and f^∗∗≤g^∗∗. (vi) f^∗ =f^∗∗∗.

Hint: For Part (iv) use Part (iii).

Exercise 12.25. (Fenchel-Young Inequality) Prove that if f : E → (−∞,+∞] is proper then for any v, θ ∈E,f^∗(θ) +f(v)≥ hθ, vi.

Exercise 12.26. (Self-Conjugacy) Let f : E → R. Prove that f = f^∗ if and only if f(x) = ¹₂kxk²2, x∈E.

Hint: To show that f =f^∗ gives that f = ¹₂k · k²2, first show the general statement that for two functions g, h:E →R, g ≤h implies that h^∗ ≤g^∗. Next, show that f is proper. Then use the Fenchel-Young inequality and the general statement.

Exercise 12.27. (Fenchel-Moreau II) Prove Theorem 12.16.

Exercise 12.28. (Elementary Properties of Conjugation II) Letf :E →(−∞,+∞], g :E →R. Prove the following statements:

(i) For any α >0, (αf)^∗ =αf^∗(·/α).

(ii) For any α >0, (αf(·/α)^∗ =αf^∗. (iii) For any β ∈R, (f +β)^∗ =f^∗−β.

(iv) Suppose thatf is l.s.c., proper, convex function. Thenf ≤g is equivalent tog^∗ ≤f^∗. (v) Suppose that both f andg are l.s.c., proper, convex functions. Then f = g^∗ if and only

if f^∗ =g.

Exercise 12.29. (Infimal Convolutions) Let f, g : E → (−∞,+∞]. The infimal convolution of f and g is defined by

(fg)(x) = inf

u∈Ef(u) +g(x−u). Prove the following statements:

(i) (fg)^∗ =f^∗+g^∗.

(ii) Suppose that f and g are proper. Then (f+g)^∗ ≤f^∗g^∗.

DRAFT

176 CHAPTER 12. BACKGROUND ON CONVEXITY

(iii) (min(f, g))^∗ = max(f^∗, g^∗).

(iv) (max(f, g))^∗ ≤min(f^∗, g^∗).

Exercise 12.30. (Seminorms Are Convex) Show that any seminorm is convex.

Exercise 12.31. (p-norms) Show that the claims made in Example12.25 are true:

(a) For p≥1, p-norms are indeed norms.

(b) The max-norm is a norm.

Hint: For Part (c), consider lower and upper boundingkvk^p for a fixed value of pby constant times kvk^∞.

Exercise 12.32. (Schatten p-norms) Consider the Schatten p-norms defined in Exam-ple 12.26. Prove the following:

(a) The map λ :S^m →R^m is well-defined.

(b) The Schatten p-norms are indeed norms.

Exercise 12.33. (Total Variation) Let N ⊂ {1, . . . , d}² and consider the total-variation seminorm induced by N onE =R^d: kxk^TV=P

(i,j)∈N |xi−xj|. Show the following:

(i) k · k^TV is a seminorm.

(ii) Let F = {x∈E : h1, xi= 0}where 1= (1, . . . ,1) is the vector whose components are all equal to one. Then F is a vector space. Give sufficient and necessary conditions on N that make k · k^TV a norm on F. Clearly, k · k^TV will be a norm on F if kxk^TV = 0 and x∈F implies that x= 0.

Exercise 12.34. (Norm Equivalence) Prove the following statements:

(a) On a finite dimensional Euclidean space any two norms are equivalent.

(b) Calculate sup_v6=0 ^kvk_kvk^p

q for (p, q)∈ {1,2,∞} × {1,2,∞}. (c) ^∗ Calculate sup_v6=0 ^kvk_kvk^p

q for (p, q)∈[1,∞)×[1,∞).

DRAFT

12.11. EXERCISES 177

Exercise 12.35. (Lack of Norm Equivalence in `^p-spaces) This is an example when in infinite dimensional spaces norm equivalence does not hold. Let V be the vector space of infinite sequences x= (x1, x2, . . .) such that xi ∈R,i= 1,2, . . .. Addition and multiplication is defined componentwise: For x, y ∈ V, c ∈ R, x+y = (x1+y1, x2 +y2, . . .) and cx = (cx1, cx2, . . .). With these operations, it is easily seen that V is indeed a vector space. Now,

for x∈V, p≥1, let kxk^p .

= (P∞

i=1|xi|^p)¹^p. Let `^p ={x∈V : kxk^p <+∞}. (a) Prove that `^p is a subspace of V and k · k^p is a norm on`^p.

(b) Show that if q≥p then for any x∈V,kxkq ≤ kxkp. Thus, `^p ⊂`^q. (c) Show that k · k^q is a norm on `^p for any q≥p.

(d) Now, assuming thatq > p, show that it is not possible to reverse the inequality in Part (b) even if we allow a constant blow-up of the upper bound and even if we take only vectors in `^p (the smaller space): There does not exist a constantC ∈Rsuch that for anyx∈`^p, kxk^p ≤Ckxk^q.

(e) Conclude that for any p ≥ 1, in the vector space `^p, there exists norms defined on `^p which are not equivalent.

Exercise 12.36. (Dual Norms Are Norms) Show that the dual, k · k^∗, of any normk · k is again a norm.

Exercise 12.37. (H¨older’s Inequality) Show that H¨older’s inequality holds for any pair of dual norms (k · k,k · k^∗).

Exercise 12.38. (Subdifferentials of Norms) Show that for x6= 0,

∂kxk={θ∈E : hθ, xi=kxk,kθk^∗ = 1}.

Also, show the set graphically when E = R², k · k is a p-norm, p ∈ {1,2,∞} and x ∈ E travels on the points of the surface of the unit ball of k · k (show the sets for significant, interesting points x such that kxk= 1).

Exercise 12.39. (Some Dual Norms) Show that the claims in Example 12.29 are true.

Exercise 12.40. (Subdifferentials of ¹₂k · k²) Letk · k be an arbitrary norm on E. Show that

∂ ¹₂kvk²

θ∈E : hθ, vi= ¹₂kvk² +¹₂kθk²∗ . Hint: Use (12.7) and Proposition 12.30.

Exercise 12.41. Prove Proposition 12.32.

Exercise 12.42. (Minimizing Sequences) Let f : E → R be proper, closed, convex, (xn)⊂domf.

DRAFT

178 CHAPTER 12. BACKGROUND ON CONVEXITY

(i) Show that if xn→x∈int domf then f(xn)→f(x) as n→ ∞.

(ii) Show that the condition x∈ int domf cannot be removed and in fact there exists a function that is proper, closed and convex and a sequence (xn) ⊂ domf such that xn →Argminf and yet f(xn)6→f(x).

Hint: For Part (ii), consider the function defined in Exercise 12.5.

Exercise 12.43. Prove Proposition 12.37.

Exercise 12.44. (Strong Convexity Implies Strict Convexity) Prove Proposi-tion 12.39.

Hint: Letx, y ∈R^d, x6=y, 0< α <1, z =αx+ (1−α)y and use the strong convexity off atz twice.

Exercise 12.45. (Strict Convexity of Norms) Show the following:

(a) The norm k · k² is not strictly convex, but v 7→ kvk²2 is strictly convex.

(b) A k · k is called strictly convex when for any x, y ∈ R^d, kxk = kyk = 1, kx+yk < 2.

Show that if a norm is strictly convex if and only if the map v 7→ kvk² is strictly convex.

(d) Show that forp= 1, v 7→ kvk²1 is not strictly convex over R^d with d >1.

Exercise 12.46. (Lack of Strong Convexity) Let A = {x ∈ R^d : kxk¹ ≤ 1} be the 1-ball, d >1. Define F :A →Rby F(v) =kvk²1, v ∈A. Show thatF is not strongly convex, no matter what norm one would choose in the definition of strong convexity.

DRAFT

Chapter 13 Continuous Exponential Weights

I

n this chapter we consider a generalization of the Exponential Weights Al-gorithm, which, somewhat casually, we will call the Continuous Exponential Weights Algorithm, as it is applicable to problems involving continuously many competitors. The problem is as follows: Denote by E the set of competitors, D the decision set. At the beginning of round t, forecaster receives the function ft:E →D; ft(e) representing the advice of expert e ∈ E for round t. Then, forecaster and environment simultaneously choose a prediction pt ∈D and a loss function `t ∈ L ⊂ {f : f :D→R}, respectively. The forecaster’s learning effectiveness is measured using the regret:

Rn=

t=1

`t(pt)−inf

e∈E n

t=1

`t(ft(e)).

Online Regret Minimization with Expert Advice

Parameters: A decision set D, a set of real-valued functionsLwith domain D, an expert set E.

In round t= 1,2, . . .:

• Forecaster receives the expert advice ft :E →D.

• Forecaster chooses a predictionpt ∈D.

• Environment, simultaneously, chooses a loss function `t ∈ L.

• Forecaster receives `t and environment receives pt. Regret of forecaster after n rounds:

Rn=

t=1

`t(pt)−inf

e∈E n

t=1

`t(ft(e)).

DRAFT

180 CHAPTER 13. CONTINUOUS EXPONENTIAL WEIGHTS An important special case is when D⊂R^d (more generally, Dis a subset of a vector space) and when all the loss functions in L are convex. In this case, we call the above problem online convex optimization, or just a “convex problem”.

One application of this framework arises in supervised learning. Recall that in this setting, in each round a context xt ∈ X is first received. Then, the forecaster needs to produce a prediction byt ∈ Yb, which is evaluated against an outcome yt ∈ Y using a loss function

` : Y ×Yb → R (cf. Section 1.2.6), the goal being to predict almost as well as the best predictor in a given set of predictors

E ⊂n

e : e:X →Ybo .

In particular, the learning effectiveness of a forecaster producing the predictions (byt;t = 1,2, . . .) is measured using

Rn=

t=1

`(yt,ybt)−inf

e∈E n

t=1

`(yt, e(xt)).

We see that by defining L = {`(y,·) : y∈Y}, the advice of expert e in round t to be ft(e) =e(xt), and the loss to be`t(by) = `(yt,ybt), the problem fits nicely our framework.

As a concrete example, we may consider linear prediction problems with the quadratic loss:

D=Yb ⊂R convex, X ⊂R^d convex,

E ⊂ {e : e:X →D, e(x) =hx, θi, θ∈Θ},Θ⊂R^d convex,

`(y,y) =b 1

2(y−y)b².

When the forecaster is restricted to follow the advice of some of the predictors, i.e., byt is restricted to be an element ofYbt ={e(xt) : e∈ E}, an alternative to map this problem to the framework is to chooseD=E, L={g :E →R : g(e) = `(y, e(x)), y ∈Y, x∈X}, ft(e) =e,

`t(e) =`(yt, e(xt)). Becauseft is the identity map, an equivalent way of describing this is to say that the experts are constant. This case corresponds exactly to Regret Minimization Games, introduced in Chapter 1.

When Yb is convex, one can also consider the case whenybt is restricted to the convex hull of Ybt, giving slightly more power to the forecaster. Essentially the same power increase can be achieved by allowing the forecaster to randomize, in which case Yb does not need to be random.

In learning theory, the analogues to the two cases when byt is restricted to lie in Ybt

or not are respectively called proper and improper learning. Thus, in this sense, proper learning corresponds to standard regret minimization, while improper learning corresponds to prediction with expert advice. In statistical learning theory, improper learning is known to be strictly more powerful than proper learning.

In document Online learning: Algorithms for Big Data (Pldal 170-180)