Convexity - Online learning: Algorithms for Big Data

DRAFT

Chapter 10 Tracking

12.1 Convexity

DRAFT

Chapter 12 Background on Convexity

The purpose of the chapter is to give a short review of the concepts necessary for later chapters. We expect the reader to be familiar with the most basic concepts of convexity, such as the definition of convexity of sets, the various definitions of convexity of functions and their equivalence. Besides these we discuss subdifferentials of convex functions, the first-order optimality conditions for minimization of convex functions, conjugation/duality and their significance. Readers who are familiar with these concepts may still find it useful to read the chapter (thinking of them, we tried to keep the chapter concise) to familiarize themselves with the conventions we use. Bibliographic references for the statements not proved are given at the end of the chapter.

DRAFT

150 CHAPTER 12. BACKGROUND ON CONVEXITY

(a) (b)

C₁

x y

Figure 12.1: The set C1 ⊂ R² shown on the left is convex, while the set C2 shown on the right is not.

For the remainder of the chapter, let E =R^d be a Euclidean space.

Definition 12.1 (Convexity). A set C ⊂ E is convex if for any two points x, y in C, the segment [x, y] .

={λx+ (1−λ)y : 0≤λ ≤1} with endpoints x and y is in C.

The definition is illustrated on Fig. 12.1. Note that the full space E is convex, as is the empty set, ∅. The intersection of any number of convex sets is convex, but the union of convex sets is not necessarily convex.

In what follows we will consider functions f that map E to the extended reals, R .

= R∪ {−∞,+∞}, and we define dom(f) the domain off as the set of those points in E where f takes on a value other than +∞:

dom(f) = {x∈E : f(x)<∞} .

Multiplication and addition are extended to R as follows: x· ∞=∞ and x+∞ =∞ for any x∈R, while (−∞)·(−∞) = +∞ and x·(−∞) =−∞, x+ (−∞) =−∞for x∈R. In particular, +∞ −(+∞) = +∞ and +∞ ·(−∞) = +∞, which may look unusual, but will be extremely useful to make the presentation of our results more concise. We also extend comparisons over the reals to R in the natural way: −∞ ≤ x ≤ +∞ for any x ∈ R and

−∞< x <+∞ for any x∈R.

To allow f to take on +∞ is a classic “trick” of convex analysis that saves us the need to specify the domain of f separately, thus shortening the presentation. Why any value outside ofRcould have served the purpose to designate points that do not belong to the domain of f, the choice of +∞ is especially handy, since it allows one to specify constrained minimization problems in a concise fashion. The status of allowing the functions to take on the value of

−∞ is harder to defend; but basically they come “with the package” once +∞ is allowed because we want to be able to deal with the negatives of functions, too. A function f is called proper if the domain of f is nonempty and its range ran(f) excludes−∞, otherwise it is improper.

The graph of a function f (denoted by graph(f)) is the set of points (x, f(x))∈E×R. The epigraph of a function f (denoted by epi(f)) is the set of points (x, t)∈E×Rsuch that t≥f(x) (see Fig.12.2 for an example). The projection of epi(f) on E is the domain of f.

Definition 12.2 (Convex Function). A function f : E → R is convex if its epigraph is convex.

DRAFT

12.1. CONVEXITY 151

x αx+ (1−α)y y a

Figure 12.2: A convex function over the real line. The dashed area shows the epigraph of the function. The function takes on the values +∞ to the right of the dashed line marked at valuea. Note that the value off is finite (denoted by a dot), soa belongs to the domain off.

As a result, the domain of a convex function must also be convex. Of course, the above definition of convexity coincides with the one taught in high school:

Proposition 12.3. A function f : E → R is convex if and only if for any x, y ∈ E, 0< α <1,

f(αx+ (1−α)y)≤αf(x) + (1−α)f(y). (12.1) The proof follows immediately from the definitions and is left as an exercise (cf. Exer-cise 12.9). Of course, (12.1) always holds (with equality) when α∈ {0,1}. Functions that satisfy (12.1) with the inequality (≤) replaced by strict inequality (<) are called strictly convex. We discuss this and other strengthened forms of convexity later.

The advantage of the definition of convexity through epigraphs is that it makes statements like that the maximum of convex functions is also convex obvious. For other type of statements, like that the sum of a finite number of convex functions is also convex, (12.1) is more useful (cf. Exercise 12.10).

Convex functions are very well-behaving on the interior of their domain:

Theorem 12.4 (Continuity of Convex Functions). Let f :E →(−∞,+∞] be convex. Then f is continuous on the interior of its domain and, in fact, for any x∈int dom(f) there exists an open neighborhood U of x and L >0 such that for any u, v ∈U, f(u)−f(v)≤Lku−vk², i.e., f is locally Lipschitz on the interior of its domain.

Denoting by contf the set of points in the domain of f where f is continuous, the result can be summed up by int dom(f) ⊂ contf. Since, obviously, there can be no point of continuity of f outside of int dom(f), we also have int dom(f) = contf.

We call f :E →R lower semicontinuous (l.s.c.) if for any x∈E, lim infu→xf(u)≥f(x).

When E =R, any proper, convex, l.s.c. function is continuous on the closure of its domain.

However, the same does not hold even whenE = R² (cf. Exercise 12.5). Also, Theorem 12.4

DRAFT

152 CHAPTER 12. BACKGROUND ON CONVEXITY

strongly exploits that E is finite dimensional. In fact, the theorem does not hold in infinite dimensional spaces.

Proof. First we show that f is locally Lipschitz around a point x in its domain if (and only if) it is bounded above on a neighborhood of x. For this take u, v ∈U where U ⊂E is an open ball around x of radius 2r and where f is bounded from above by say C > 0. We will show that f is Lipschitz on the open ball U⁰ around x that has radius r. (The balls are defined w.r.t. the same norm k · k.) Take u, v ∈ U⁰ and let w = v+r(v−u)/kv −uk and α = r/(r+kv −uk). Then w ∈ U and v = αu+ (1−α)w. Hence, f(v)−f(u) = f(αu + (1− α)w)− f(u) ≤ αf(u) + (1 − α)f(w)− f(u) ≤ (1 −α)(C − f(u)). Now, f(u) = 2¹₂f(u) = 2 ¹₂f(x+ (u−x)) + ¹₂f(x−(u−x))− ¹₂f(x−(u−x))

≥ 2(f(x)−C), where we used that x −(u − x) ∈ U. Hence, f(v) − f(u) ≤ (1− α)(3C −2f(x)) =

kv−uk

r+kv−uk(3C−2f(x))≤ kv −uk^|3C−2f(x)|_r . Reversing the role of u, v shows that f is indeed Lipschitz aroundx on U⁰ with Lipschitz coefficient ^3C−2f(x)_r .

Now, take any point x ∈ int dom(f). The result follows if we show that f is bounded above on N for some neighborhood N of x. Choose N to be the convex hull of vectors u1, . . . , ud+1: N = n

Pd+1

i=1 αiui : Pd+1

i=1 αi = 1, αi ≥0o

and pick any v =Pd+1

i=1 αiui ∈N. By the convexity of f, f(v)≤Pd+1

i=1 αif(ui)≤maxif(ui), finishing the proof.

Oftentimes, the domain of a convex function we deal with is a subset of some hyperplane of E. For example, the domain could be the simplex or a two-dimensional disc in a three-dimensional space. The interior of a subset of a hyperplane of E is always empty; hence the previous result is vacuous for such functions. Nevertheless, one’s intuition suggests that the result should still be valid when the definition of interior is appropriately modified: After all, it should not matter whether a 2-dimensional disc is contained in a 2-dimensional or a 3-dimensional space. Indeed, there exists a refined notion of interior, which will allow us to make this intuition formal.

Before introducing this concept, we need some further terminology. First, remember that a hyperplanes in E also also called affine (or an affine subspace): A set H is called affine if for any pairs of distinct points x, y of H, the entire line passing through x and y lies in H: λx+ (1−λ)y ∈ H for any λ ∈ R. The affine hull of a set S ⊂ E is the small-est affine set containing S. The affine hull of S is denoted by aff(S) and satisfies aff(S) = nPk

i=1λixi : for somek ∈N, λ1, . . . , λk ∈R s.t. Pk

i=1λi = 1 andx1, . . . , xk ∈So

. In other words, the affine hull of S can be obtained as the set of affine combinations of finitely many points of the set. With this, we can define the refinement of interior:

Definition 12.5 (Relative Interior). The relative interior of a setS ⊂E is the set of points x∈S such that there exists an open ball B of E containing x such that aff(S)∩B ⊂S. We will denote the relative interior of a set S by ri(S).

The reason affine sets come up in connection to convex functions is because dom(f) is convex. It is easy to see that any nonempty convex set C ⊂ R^d has a nonempty relative interior. For x ∈ E, r > 0 let Ball(x, r) = {y ∈E : kx−yk ≤r} be the (closed) ball centered at xwith radius r. It is easy to modify then the proof of Theorem 12.4 to get the following stronger result:

DRAFT

In document Online learning: Algorithms for Big Data (Pldal 149-153)