• Nem Talált Eredményt

DRAFT

Chapter 10 Tracking

12.4 Duality

Condition (12.5) is called a variational inequality because it holds for a range of values of y. That (12.5) implies that x is a constrained minima follows from the definition of subdifferentials (cf. Exercise12.19).

With this, we can now to turn to the proof:

Proof. First, we prove the following helpful statement (thinking of choosing g =ιK later):

Claim 12.12. Let f, g :E →R be proper, convex functions such that for any x∈dom(f)∩ dom(g), 0∈∂(f +g)(x) if and only 0∈∂ f(x) +∂ g(x). Then x∈Argminuf(u) +g(u) if and only if x∈dom(f)∩dom(g) and if there exists θ ∈∂ f(x) such that for any y∈E,

g(y) +hθ, y−xi ≥g(x).

The claim is proved through a series of equivalences. First, x ∈ Argminuf(u) +g(u) is equivalent to 0 ∈ ∂(f +g) (x) by Theorem 12.10. Then, 0 ∈ ∂(f +g) (x) is equivalent to 0 ∈ ∂(f + g) (x), x ∈ dom(f)∩ dom(g), which, by our assumption, is equivalent to 0 ∈ ∂ f(x) +∂ g(x), x ∈ dom(f)∩dom(g). This latter condition is equivalent to that x∈dom(f)∩dom(g) and that there exists θ ∈∂ f(x) s.t. −θ ∈∂ g(x), which is equivalent to that x ∈dom(f)∩dom(g) and there exists θ ∈∂ f(x) such that for any y∈ E, g(y) ≥ g(x) +h−θ, y−xi, finishing the proof of the claim.

Now, use the claim withg = ιK. Thanks to our condition that ri(dom(f)) and ri(K) must have a nonempty intersection, Theorem12.9(i) is applicable and gives∂(f+ιK) =∂(f)+∂(ιK).

Thus, the condition of the claim is satisfied. Hence, x ∈ Argminu∈Kf(u) if and only if x ∈dom(f)∩K is such that for any y ∈ E, ιK(y) +hθ, y−xi ≥ιK(x) holds. This latter condition is equivalent to that x∈ dom(f)∩K is such that for any y∈ K, hθ, y−xi ≥ 0, thus finishing the proof.

12.4 Duality

While duality can be motivated in many ways, one possible motivation is to obtain an alternate characterization of the convexity of functions. Take a proper, convex function f : E → R. If one takes an affine minorant g of f (an affine linear function that lower bounds f everywhere), then by definition epi(f)⊂epi(g). If we take the intersection of the epigraph of affine functions, the resulting set will be convex and closed. Call a function closed if its epigraph is closed. Closedness and lower semicontinuity are in fact equivalent (cf. Exercise 12.20) and in this book we use them interchangeably.

Clearly, there is a one-to-one correspondence between “epigraphs-like sets” and functions.

Further, the epigraph of a closed, convex function can be obtained as the intersection of the epigraphs of its affine minorants. Can this process of taking the intersection of the epigraphs of affine functions give other functions? The following theorem provides a negative answer:

Theorem 12.13(Fenchel-Moreau). The epigraph of a function is the intersection of epigraphs of its affine minorants if and only if the function itself is convex and closed.

DRAFT

158 CHAPTER 12. BACKGROUND ON CONVEXITY

In the theorem, we can of course remove the condition that the affine functions must be minorants. A condition, which is often seen in the literature, and which is equivalent to the function being closed is that the function is l.s.c.. This theorem is our first duality result, in the sense that it furbishes a dual description of convex, closed functions in terms of its affine minorants.

To get an algebraic version of the above theorem fix a proper function f : E → R and x ∈ dom(f) (f is not necessarily convex, or closed). Then, for any θ ∈ ∂ f(x), v 7→

f(x) +hθ, v−xi is an affine minorant of f whose graph intersects with the epigraph of f at (x, f(x)). These “tight” affine minorants are in fact uniquely determined by ∂ f(x). Now,

∂ f(x) ={θ ∈E : f(v)≥f(x) +hθ, v−xi for all v ∈E}

={θ ∈E : hθ, xi −f(x)≥ hθ, vi −f(v) for all v ∈E}

=

θ∈E : hθ, xi −f(x)≥sup

v∈Ehθ, vi −f(v)

(1)=

θ ∈E : hθ, xi −f(x) = sup

v∈Ehθ, vi −f(v)

(2)= {θ∈E : x∈Argmaxv∈Ehθ, vi −f(v)} .

Here, (1) holds because we can choose v =x, while (2) holds again for the same reason. This motivates the following definition:

Definition 12.14 (Conjugate). Let f : E → R. The conjugate (or Fenchel conjugate, or Legendre-Fenchel transform) off is

f(θ) = sup

v∈Ehθ, vi −f(v). (12.6)

With this definition, we thus get that

∂ f(x) ={θ ∈E : hθ, xi −f(x) = f(θ)} , (12.7) i.e., ∂ f(x) is the set of “slopes” such that v 7→f(x) +hθ, v−xi is a “tight” affine minorant of the function f at x.

Alternatively, one may ask for determining the smallest value of c∈ R such that u 7→

hθ, ui −c lower bounds f everywhere. We find that this value is inf{c∈R : hθ, ui −c≤f(u) for all u∈E}= inf

c∈R : sup

u∈Ehθ, ui −f(u)≤c

= sup

u∈E hθ, ui −f(u) = f(θ).

Thus, for any θ∈E, u7→ hθ, ui −f(θ) is a (tight) affine minorant off. Being the supremum of affine, therefore convex functions, f is always convex. This and some further elementary properties are recorded in the next result, whose proof is left as an exercise (cf. Exercises12.24 and 12.25).

DRAFT

12.4. DUALITY 159

Proposition 12.15 (Basic Properties of the Conjugate). For any function f :E →R, f is convex. Further the following hold: (i) domf =∅ if and only iff ≡ −∞; (ii)if−∞ ∈ran(f) then f ≡+∞; (iii) −∞ ∈ran(f) if and only if f ≡+∞ if and only if f ≡ −∞; (iv) iff is proper then

f(θ) +f(v)≥ hθ, vi for all v, θ ∈E (12.8) with equality if and only if θ ∈∂ f(v).

Inequality (12.8) is known as theFenchel-Young inequality. From this inequality, reordering gives that for any v, θ ∈E,

f(v)≥ hθ, vi −f(θ).

Since this holds for any θ, we can take the supremum w.r.t. θ, hoping that we recover f:

f(v)≥sup

θ∈Ehθ, vi −f(θ).

Now, we may notice that the expression in the right-hand side is just the conjugate of the function f, evaluated at v. Denoting this biconjugate by f∗∗(v), we thus get

f(v)≥f∗∗(v) for all v ∈E.

Given this and Theorem12.13, the next result comes as no surprise:

Theorem 12.16 (Fenchel–Moreau II). Let f :E →R be proper. Then f is a closed, convex function if and only if f∗∗ =f. Furthermore, f is also proper under any of the latter two equivalent conditions.

The proof is left to Exercise 12.27. As an immediate corollary, we get the following:

Corollary 12.17 (Closedness of the Conjugate). Suppose that f :E →R is proper, closed and convex. Then f is also proper, closed and convex.

Proof. We already saw that f is convex in Proposition 12.15. Applying Theorem 12.16 to f shows thatf is proper. Denoting byf∗∗∗the conjugate of the biconjugate, we have f∗∗∗ =f with no conditions on f (in fact, Exercise 12.24 asks you to prove this). Theorem 12.16, this time applied to f, gives thatf must be closed.

As we saw, convex functions are very well suited for numerical methods that use lo-cal information about the function to be optimized. When this function is not convex, one may attempt to minimize instead a convex “surrogate” of it. A natural choice is to minimize the largest convex lower bound of the function. If f : E → R is the func-tion of interest, this, denoted by ˘f, is defined as the function whose graph is obtained by intersecting the epigraphs of all closed, convex minorants g of f. In other words, f˘(x) = sup

g(x) : g ≤f and g :E →Rclosed, convex . Since the intersection of closed, convex sets is closed, convex, hence ˘f is closed, convex. From the definition, it also follows that ˘f ≤ f and if f ≤ g then ˘f ≤ g. Finally, if˘ f is closed, convex then ˘f =f. The next result connects ˘f and conjugation:

DRAFT

160 CHAPTER 12. BACKGROUND ON CONVEXITY

Theorem 12.18 (Largest Convex Lower Bound). Let f :E →(−∞,+∞]. If f is proper then f˘=f∗∗, otherwise f∗∗=−∞.

Essentially, the theorem gives an analytic method for constructing ˘f. Note that the condition that f is proper is equivalent to that f has at least one affine minorant. In particular, this is always met if f is lower bounded by some constant.

Proof. If f ≡ +∞ then ˘f = f and f∗∗ = +∞, as seen by the definition, thus f∗∗ = ˘f.

Suppose now that f 6≡ +∞. If domf = ∅ then, again from the definition, f∗∗ = −∞. Otherwise, f has an affine minorant, which immediately gives that−∞ 6∈ran(f) and thus we see that ˘f is proper. Since ˘f is closed, convex, Theorem 12.16gives that ( ˘f)∗∗ = ˘f. Further, it is not hard to see that ( ˘f) =f. Conjugating both sides gives the result.

A number of further properties of conjugation are left to be proven as exercises, cf.

Exercises 12.28 and 12.29.

Above we have shown that the Fenchel-Young inequality holds with equality if and only if θ ∈ ∂ f(v) (provided that f : E → (−∞,+∞] is proper). Given the symmetry of the Fenchel-Young inequality, a natural question to ask is whether equality is also equivalent to v ∈∂ f(θ). A partial answer is provided in the next result:

Proposition 12.19. Let f :E →R be proper, x, θ∈E. Then f(x) +f(θ) =hθ, xi implies that x∈∂ f(θ).

Proof. Applying the Fenchel-Young inequality to f and using f∗∗ ≤ f, we get hθ, xi ≤ f(θ) +f∗∗(x)≤f(θ) +f(x) =hθ, xi. Hence, we have hθ, xi=f∗∗(x) +f(θ) from which it follows that x∈∂ f(θ).

Note that the result immediately gives also that θ ∈∂ f(x) implies thatx∈∂ f(θ). The converse also holds when f is convex and closed:

Theorem 12.20. Let f : E → R be a proper, closed, convex function, x, θ ∈ E. Then, x∈∂ f(θ) if and only if θ∈∂ f(x).

Proof. We only need to prove that if x∈∂ f(θ) thenθ ∈∂ f(x). Fromx∈∂ f(θ) we get that hθ, xi=f(θ) +f∗∗(x). Now, by Theorem 12.16, f∗∗ =f. Hence,hθ, xi=f(θ) +f(x) and as we have seen this holds if and only if θ ∈∂ f(x), thus finishing the proof.

A shorter form for the result of the theorem is that (∂ f)−1 = ∂ f (and, of course, (∂ f)−1 = ∂ f), where for a set-valued function s : A → 2B, its inverse s−1 : B → 2A is

defined by s−1(b) = {a∈A : b∈s(a)}.

At the price of some additional assumption, this last result can be pushed a little further for differentiable functions. Specifically, assume thatf is differentiable on int dom(f) (i.e., for any x∈int dom(f),∂ f(x) ={∇f(x)}) and also assume that dom(∂ f) = int dom(f). Then,

(∂ f)−1(θ) ={x : θ ∈∂ f(x)}(∗)= {x : θ =∇f(x)}= (∇f)−1(θ),

where (∗) used that ∂ f(x) = ∅ when x 6∈ int dom(f). On the other hand, (∂ f)−1(θ) = (∂ f)(θ). Now, assume that f is also differentiable over the interior of its domain and

DRAFT

12.4. DUALITY 161

dom(∂ f) = int dom(f). Then, for θ ∈ int dom(f), {(∇f)−1(θ)} = ∂ f(θ) = {∇f(θ)}. Hence, (∇f)−1 can be viewed as a int dom(f)→E mapping. Furthermore, the image space of this map, being an inverse of ∇f, is dom∇f = int dom(f), giving rise to the following statement:

Proposition 12.21. Assume that f :E →R is a proper, closed, convex function that also satisfies the following properties:

(i) f is differentiable on int dom(f) and dom(∂ f) = int dom(f).

(ii) f is differentiable on int dom(f) and dom(∂ f) = int dom(f).

Then, ∇f : int dom(f)→int dom(f) is a bijection and its inverse satisfies (∇f)−1 =∇f. Definition 12.22 (Legendre Functions). Letf :E →R be a proper, closed, convex function.

We say that f is Legendre if it satisfies the conditions of the previous proposition.

The next theorem provides an equivalent set of conditions for f to be Legendre. In fact, historically, these were the conditions that were used to define Legendre functions. For a set C denote by bdry(C) the boundary of C: bdry(C) = C\int(C).

Theorem 12.23. Let f :E →R be a proper, closed, convex function, where E =Rd with some d >0. Then f is Legendre if and only if the following conditions hold:

(i) f is differentiable on int dom(f) 6= ∅ and k∇f(xn)k2 → +∞ whenever xn → x ∈ bdry(dom(f)).

(ii) f is strictly convex on every convex subset of dom∂ f.

Functions satisfying condition (??) of the previous result are called essentially smooth and functions satisfying condition (??) are called essentially strictly convex. In fact, it holds that f is essentially smooth if and only if f satisfies (??) andf is essentially strictly convex if and only if f satisfies (??), so in fact one could also take these as the definitions.

As we saw, for proper, closed, convex functions (∂ f)−1 =∂ f and vice versa. A natural question to ask is to ask when does dom∂ f = E hold? A partial answer is provided by the following theorem:

Theorem 12.24. Let f :E →R be proper, closed, convex. Consider the following:

(i) dom∂ f =E.

(ii) domf =E.

(iii) lim infkxk2→∞f(x)/kxk2 =∞.

Then, (iii) implies (i) and (ii), which are equivalent.

Functions with the property stated in (iii) are called supercoercive (we will meet coercive functions later).

From the theorem and Fenchel–Moreau II (Theorem 12.16), it follows immediately that lim infkθk2→∞f(θ)/kθk2 =∞ implies that domf = dom∂ f =E.

DRAFT

162 CHAPTER 12. BACKGROUND ON CONVEXITY

12.5 Norms, Dual Norms and the Polar

In document Online learning: Algorithms for Big Data (Pldal 157-162)