Limit Theorems of Probability Theory

(1)

BÁLINT TÓTH

LIMIT THEOREMS OF PROBABILITY

THEORY

2011

Professional manager Abstract

Referee Contents

Technical editor Support

Copyright Editor

(2)

who chose stochastics as topics of specialization. It is assumed that students have a solid background in probability theory (with measure theoretic foundations) and analysis.

The following material covers: ergodic theorems (von Neumann's and Birkhoff's);

limit theorems “with bare hands”: Levy's arcsine laws, sojourn time and local time of 1d random walk; the method of moments with applications; the method of characteristic functions: Lindeberg's theorem with applications, Erdős–Kac theorem (CLT for the number of prime divisors), various other applications; stable laws and stable limits with applications; infinitely divisible distributions, Lévy–Khinchin formula and elements of Lévy processes. With lots of problems for solution and applications.

Key words and phrases: Ergodic theorems, limit theorems, characteristic functions, Lindeberg's theorem, Erdős–Kac theorem, stable laws, infinitely divisible distributions, Lévy–Khinchin formula.

(3)

physics) in technical and information science higher education” Grant No. TÁMOP- 4.1.2-08/2/A/KMR-2009-0028.

Prepared under the editorship of Budapest University of Technology and Economics, Mathematical Institute.

Professional manager:

Miklós Ferenczi

Referee:

János Krámli

Prepared for electronic publication by:

Bálint Vető

Title page design by:

Gergely László Csépány, Norbert Tóth ISBN: 978-963-279-454-9

Copyright: 2011–2016, Bálint Tóth, BME

“Terms of use of : This work can be reproduced, circulated, published and performed for non-commercial purposes without restriction by indicating the author’s name, but it cannot be modified.”

(4)

(5)

1 Stationary sequences, ergodic theorems 3

1.1 Stationary sequences of random variables . . . 3

1.1.1 Examples of stationary sequences . . . 4

1.1.2 Measure preserving transformations, dynamical systems 6 1.1.3 The invariant sigma-algebra, ergodicity . . . 6

1.2 Koopmanism and von Neumann’s ergodic theorem . . . 10

1.3 Birkhoff’s “individual” ergodic theorem . . . 13

1.4 Back to the examples . . . 15

2 Convergence in distribution 18 2.1 Convergence in distribution, basics . . . 18

2.1.1 The special case of R (or R^d) . . . 19

2.1.2 Examples for weak convergence . . . 20

2.1.3 Tightness . . . 21

2.2 Methods for proving weak convergence . . . 22

2.3 With bare hands . . . 22

2.3.1 Arcsine laws and related stuff . . . 22

3 Moments and characteristic functions 31 3.1 The method of moments . . . 31

3.1.1 Weak limit from convergence of moments . . . 32

3.1.2 Appl 1: CLT with the method of moments . . . 34

3.2 The method of characteristic functions . . . 34

3.3 Erd˝os–Kac theorem . . . 36

3.4 Limit theorem for the coupon collector . . . 39

4 Lindeberg’s theorem and its applications 42 4.1 Triangular arrays of random variables . . . 42

(6)

4.2 Application 1: CLT for the number of records . . . 46

4.3 Application 2: CLT in the “borderline” case . . . 47

5 Stable distributions and stable limits 49 5.1 Affine equivalence . . . 49

5.2 Stability . . . 50

5.3 Examples . . . 51

5.4 Symmetric stable laws . . . 54

5.5 Examples, applications . . . 61

5.6 Without symmetry . . . 65

6 Infinitely divisible distributions 69 6.1 Infinite divisibility . . . 69

6.2 Examples . . . 70

6.3 Back to the examples . . . 75

6.4 L´evy measure of stable laws . . . 83

6.4.1 Poisson point processes . . . 84

6.4.2 Back to stable convergence . . . 87

(7)

Stationary sequences, ergodic theorems

1.1 Stationary sequences of random variables

• (Ω,F,P) a probability space

• (S,S) a measurable space

• ξ_j : Ω→S measurable functions,j ∈N (or j ∈Z)

Definition 1.1. The sequence of (S-valued) random variables ξ_j isstation- ary iff (∀k ∈N) (or (∀k ∈Z)) and (∀l ≥0):

distrib(ξ₀, ξ₁, . . . , ξ_l) = distrib(ξ_k, ξ_k+1, . . . , ξ_k+l)

Elementary remarks:

Remark 1.1. A stationary sequence (ξ_j)_j∈

N can always be embedded into a stationary sequence (ξ_j)_j∈

Z.

(8)

Remark 1.2. If (ξ_j)_j∈

Z is a stationary sequence of (S,S)-valued random variables, (S,e S)e is another measurable space, g : S^Z → Se is measurable map, and

ξej :=g(. . . , ξj−1, ξj, ξj+1, . . .).

Then:

ξe_j

j∈Z

is a stationary sequence of (S,e S)-valued random variables.e The essential content of ergodic theorems: generalizations of the laws of large numbers.

If (X_j)^∞_j=0 is a stationary sequence of R-valued random variables, such that E(|Xj|)<∞, then

1 n

n−1

X

j=0

X_j →E(X₁)

asymptotic time averages = state-space averages

– almost surely and in L¹(Ω,F,P)(Birkhoff, difficult);

– in L²(Ω,F,P), (von Neumann, easier).

1.1.1 Examples of stationary sequences

Example 1.1 (I.i.d. sequences). (ξ_j)_j∈

Z i.i.d. sequence of (S,S)-valued random variables.

Example 1.2 (Finitely dependent sequences). Let (ξ_j)_j∈

Z i.i.d. sequence of (S,S)-valued random variables, (S,e S)e another measurable space, m≥0 (fixed), g :S^m+1 →Se measurable map. Then

ξe_j :=g(ξ_j, . . . , ξ_j+m) is a (S,e S)-valued stationary sequence.e

E.g. ξ_j i.i.d. Bernoulli, ξe_j := max{ξ_j, ξ_j+1}.

(9)

Example 1.3 (Example 3a, 3b). (ξ_j)_j∈

Z i.i.d. Bernoulli, P(ξ_j = 0) = 1/2 =P(ξ_j = 1).

ζ_j :=

∞

X

k=0

2^−k−1ξ_j+k η_j :=

∞

X

k=0

2^−k−1ξj−k

Then:

distrib(ζ_j) =U N I[0,1] =distrib(η_j). Remarks:

ζ_j+1 ={2ζ_j}:= 2ζ_j −[2ζ_j] deterministically!

(ηj)_j≥0 is a Markov chain on [0,1].

Example 1.4 (Stationary Markov chains). LetSbe a finite or countable state space, P = (P_α,β)_α,β∈S stochastic matrix, π :S →[0,1], P

α∈Sπ(α) = 1 stationary for P:

X

α∈S

π(α)P_α,β =π(β).

(ξj)_j≥0 the stationary Markov chain:

P(ξ₀ =α₀, ξ₁ =α₁, . . . , ξ_l =α_l) = π(α₀)P_α₀_,α₁. . . P_α_l−1_,α_l

Example 1.5 (Rotations of the circle). S = [0,1), S = Borel, P = Lebesgue.

θ ∈(0,1) (fixed), ξj(ω) :={ω+θ}, j ∈Z

Example 1.6 (“Bernoulli shift”). (see also Example 3a) S= [0,1), S = Borel, P=Lebesgue.

ξ_j(ω) :={2^jω}= 2^jω−[2^jω], j ≥0

(10)

1.1.2 Measure preserving transformations, dy- namical systems

Definition 1.2. Let (Ω,F,P) be a probability space. The T : Ω → Ω measurable transformation is measure preserving if

∀A∈ F : P T⁻¹A

=P(A).

We call (Ω,F,P, T) an endomorphism or a dynamical system.

If T is a.s. invertible we call it an automorphism.

Let (S,S) be another measurable space and g : Ω → S a measurable function. Then

ξ_j :=g(T^jω)

is a stationary sequence ofS-valued random variables.

Remark 1.3. Any stationary sequence of random variables can be realized this way!

(S,S)measurable space,(ξ_j)^∞_j=0 stationary sequence ofS-valued random variables.

Ω :=S^N={ω= (ω₀, ω₁, ω₂, . . .) :ω_j ∈S}

F :=σ S × S × S ×. . .

P=joint distribution of (ξ_j)^∞_j=0 T : Ω→Ω, T ω

j =ω_j+1 g : Ω→S, g(ω) :=ω₀

1.1.3 The invariant sigma-algebra, ergodicity

Definition 1.3. Let (Ω,F,P, T) be an endomorphism. Then I :={A∈ F :P A◦T⁻¹A

= 0} ⊂ F is the sub-sigma-algebra of invariant sets.

(11)

Definition 1.4. The dynamical system(Ω,F,P, T)isergodiciff the invariant sigma-algebra I is trivial with respect to P:

∀A ∈ I : P(A)∈ {0,1}.

Remark 1.4. Equivalently: (Ω,F,P, T) is ergodic iff for f : Ω →R measurable

f(T ω) =f(ω) a.s. ⇔

f(ω) = const. a.s.

Example 1.7 (I.i.d. sequence). SeeExample 1.1. (S,S,P₁)a probability space,

Ω :=S^N={ω= (ω₀, ω₁, ω₂, . . .) :ω_j ∈S}

F :=σ S × S × S ×. . . P=P₁×P₁×P₁ ×. . . T : Ω→Ω, T ω

j =ω_j+1

Theorem 1.1. The endomorphism (Ω,F,P, T) is ergodic.

Proof. The tail sigma-algebra is T :=\

n

σ(ω_n, ω_n+1, ω_n+2, . . .) Fact: I ⊂ T. Not very difficult.

Kolmogorov’s 0-1 law: T is P-trivial.

Example 1.8 (Factors). See Example 1.2 and Example 1.3 (Ω,F,P, T) and (Ω,e Fe,P,e Te) dynamical systems, ϕ: Ω→Ωe measurable, such that

P(ϕ⁻¹(A)) =P(A)e ∀A∈Fe

ϕ◦T =Te◦ϕ P−a.s.

then (Ω,e F,e P,e Te) is a factor of (Ω,F,P, T).

(12)

Theorem 1.2. If (eΩ,Fe,P,e Te) is a factor of (Ω,F,P, T) and (Ω,F,P, T) is ergodic then so is (eΩ,Fe,P,e Te).

Proof. Homework.

Example 1.9 (Ergodic Markov chains). See Example 1.4. The state space: (S,S) finite or countable

The stochastic matrix P = (P_α,β)_α,β∈S,

π probability measure on S, stationary for P: πP =π.

Ω :=S^N={ω= (ω0, ω1, ω2, . . .) :ωj ∈S}

F :=σ S × S × S ×. . .

P(ω₀, ω₁, . . . , ω_l) = π(ω₀)P_ω₀_,ω₁. . . P_ω_l−1_,ω_l T : Ω→Ω, T ω

j =ω_j+1

Theorem 1.3. The dynamical system (Ω,F,P, T) is ergodic iff P is irre- ducible.

Proof. : Proof of⇒: trivial

Proof of ⇐: Denote F_n:=σ(ω₀, . . . , ω_n) and letA∈ I.

Then E 11_A F_n

is a bdd martingale w.r.t. the filtration F_n and E 11A

Fn

(ω)⁽¹⁾= E 11A◦Tⁿ Fn

(ω)⁽²⁾= h(ωn) (1): due to invariance ofA

(2): due to the Markov property

Due to the martingale convergence theorem h(ωn) =E 11A

Fn

(ω)−→^a.s. E 11A

F∞

(ω) = 11A(ω) This can hold only if h≡const.

Example 1.10 (Rotations of the circle). See Example 1.5. Ω = [0,1), F =Borel, P=Lebesgue, T ω:={ω+θ}

Theorem 1.4. The dynamical system (Ω,F,P, T) is ergodic iff θ is irra- tional.

(13)

Proof. Fourier method: let f ∈L²(Ω,F,P).

f(ω)^L=²

∞

X

k=−∞

c_ke^i2πkω, c_k = Z 1

0

e^−i2πkωf(ω)dω Then

f(ω) = f(T ω) a.s. ⇔

∀k ∈Z: c_k e^i2πkθ−1

= 0

⇔

θ /∈Q: c_k =δ_k,0 θ= ^p_q ∈Q: c_k =c_k11{k=mq}

Example 1.11 (“Bernoulli shift”). See Example 1.6. Ω = [0,1), F = Borel, P=Lebesgue, T ω:={2ω}

Theorem 1.5. The dynamical system (Ω,F,P, T) is ergodic.

Proof. (See Example 1.1.) Let Ω =e {0,1}^N, Fe=. . ., Pe = ¹₂ : ¹₂

-Bernoulli, Te= left shift

ϕ:Ωe →Ω ϕ(ω) :=e

∞

X

j=0

2^−j−1ωe_j ϕ⁻¹ : Ω→Ωe ϕ⁻¹(ω)_j := [2^jω] mod 2

Then (Ω,F,P, T) ^ϕ:1−1←→ (eΩ,Fe,P,e Te), and (eΩ,Fe,P,e Te) is ergodic, according to Example 1.1.

Alternative proof: by Fourier method (Homework).

Example 1.12 (Algebraic automorphism of the 2-d torus). Ω = [0,1)×[0,1), F =Borel, P=Lebesgue,

T(x, y) := ({x+ 2y},{x+y}) (picture on blackboard)

Example 1.13 (The “Baker’s Transformation”). Ω = [0,1)×[0,1), F =Borel, P=Lebesgue,

T(x, y) := ({2x},{2x+y/2}) (picture on blackboard) In both examples:

(14)

Theorem 1.6. The dynamical system (Ω,F,P, T) is ergodic.

Proof. • Proof 1 Fourier method

• Proof 2 “Markov partition”

Example 1.14 (Statistical physics). • Ω = phase space of physical particle system,

• F =Borel,

• P=Liouville measure

• = Lebesgue meas. restricted to manifold of conserved quantities,

• T_t:=Newtonian dynamical flow

Theorem 1.7 (Liouville’s theorem). The dynamical flow t 7→ T_t con- serves the measure. I.e. (Ω,F,P, T_t) is a continuous time dynamical system.

Ludwig Boltzmann’s ergodic hypothesis: In the physically relevant cases, (Ω,F,P, T_t) is ergodic.

Major open question! Answer known in very few cases.

1.2 Koopmanism and von Neumann’s (mean, L

²

) ergodic theorem

• (Ω,F,P, T): dynamical system,

• F ⊃ I: its invariant sigma-algebra,

• H:=L²(Ω,F,P): Hilbert space of square integrable functions,

• K := L²(Ω,I,P) = {f ∈ H : f(T ω) = f(ω)P-a.s.}: subspace of T-invariant L²-functions.

(15)

Two linear operators:

Π :H → K, Πf(ω) := E f I

(ω) U :H → H, U f(ω) :=f(T ω)

• Π is the orthogonal projection to the subspace K

• U is Koopman’s representation of the action T.

• K= Ker(U −I) ={f ∈ H:U f =f}

Lemma 1.1. U is a (partial) isometry.

Proof.

(U f, U g) = Z

Ω

f(T ω)g(T ω)dP(ω)

(1)= Z

Ω

f(ω)g(ω)dP(ω) = (f, g) (1) : due to invariance of the measure under the action T.

Remark 1.5. If T is a.s. invertible then U is unitary.

Theorem 1.8 (von Neumann’s mean ergodic theorem). Let

• H: a separable Hilbert space,

• U ∈ B(H): a (partial) isometry,

• K:= Ker(U −I),

• Π: the orthogonal projection to the closed subspace K.

Then

st- lim

n→∞

1 n

n−1

X

j=0

U^j = Π, That is,

∀f ∈ H: lim

n→∞

1 n

n−1

X

j=0

U^jf−Πf

= 0.

(16)

Corollary 1.1. (Ω,F,P, T): a dynamical system, I: its invariant sigma- algebra.

If f ∈L²(Ω,F,P) then

n→∞lim Z

Ω

1 n

n−1

X

j=0

f(T^jω)−E f I

(ω)

2

dP(ω) = 0.

In particular, if (Ω,F,P, T) is ergodic then L²- lim

n→∞

1 n

n−1

X

j=0

f(T^jω) = Z

Ω

f dP.

Proof of von Neumann’s mean ergodic theorem.

H ⁽¹⁾= Ran(U −I)⊕Ker(U^∗−I)

(2)= Ran(U −I)⊕Ker(U −I) (1) : ∀A∈ B(H) : H = RanA⊕KerA^∗

(2) : SinceU ∈ B(H)is an isometry,Ker(U^∗−I) = Ker(U−I). (Homework) Forf ∈Ker(U −I):

U f =f = Πf ⇒ 1

n

n−1

X

j=0

U^jf = Πf.

Forf ∈Ran(U −I): (∀ε >0) (∃g, h∈ H)such that khk< ε and =U g−g+h.

Thus:

1 n

n−1

X

j=0

U^jf = 1

n(Uⁿg−g) + 1 n

n−1

X

j=0

U^jh and hence

1 n

n−1

X

j=0

U^jf

≤ 2

n +ε

kgk

(17)

1.3 Birkhoff ’s “individual” (pointwise, almost sure) ergodic theorem

Theorem 1.9 (Birkhoff ’s individual ergodic theorem). (Ω,F,P, T):

a dynamical system, I: its invariant sigma-algebra.

If f ∈L¹(Ω,F,P) then 1 n

n−1

X

j=0

f(T^j·)→E f I

(·) P-a.s. and in L¹(Ω,F,P).

In particular, if (Ω,F,P, T) is ergodic then 1

n

n−1

X

j=0

f(T^j·)→ Z

Ω

f dP

P-a.s. and in L¹(Ω,F,P).

Proof [Birkhoff 1931, Yosida & Kakutani 1939, Garsia 1965]

X_j =X_j(ω) :=f(T^jω), X :=X₀, S_k =S_k(ω) :=

k−1

X

j=0

X_j(ω), S₀ = 0,

M_k=M_k(ω) := max{S_j(ω) :j = 0,1, . . . , k}, M₀ = 0.

Lemma 1.2 (The maximal ergodic lemma).

E X11{M_k>0}

≥0 Explicitly spelled out:

Z

Ω

f(ω)11{M_k(ω)>0}dP(ω)≥0

Mind thestrict inequality: M_k>0!

(18)

Proof of the maximal lemma (Garsia 1965).

X(ω) ⁽¹⁾= max{S_j(ω) :j = 1, . . . , k+ 1} −max{S_j(T ω) :j = 0, . . . , k}

≥ max{S_j(ω) :j = 1, . . . , k} −max{S_j(T ω) :j = 0, . . . , k}

= max{S_j(ω) :j = 1, . . . , k} −M_k(T ω) (1) : SinceS_j+1(ω) = X(ω) +S_j(T ω), j = 0,1, . . ..

Hence Z

Ω

X(ω)11_{M_k_(ω)>0}dP(ω)

≥ Z

Ω

(max{S_j(ω) :j = 1, . . . , k} −M_k(T ω)) 11{M_k(ω)>0}dP(ω)

(2)= Z

Ω

Mk(ω)−Mk(T ω)

11{M_k(ω)>0}dP(ω)

≥ Z

Ω

M_k(ω)−M_k(T ω)

dP(ω)⁽³⁾= 0.

(2) : Here we use the strict inequality M_k >0.

(3) : Due to invariance of the measure under the action T.

Proof of Birkhoff ’s theorem. Without loss of generality assume E f I

= 0. Fix ε >0 and define

L(ω) := lim sup

n→∞

S_n(ω)

n , D^ε :={ω :L(ω)> ε} ∈ I, X^ε(ω) := X(ω)−ε

11_D^ε(ω), S_k^ε(ω) :=

k−1

X

j=0

X_j^ε(ω), M_k^ε(ω) := max{S_j^ε(ω) :j = 0, . . . , k}, F^ε:=∪_k{ω:M_k^ε(ω)>0}.

Note that

F^ε ={ω : sup

k

M_k^ε(ω)>0}={ω: sup

k

S_k^ε(ω)>0}=D^ε

(19)

0

(1)

≤ E X^ε11{M_n^ε>0}

⁽²⁾

→E(X^ε11_F^ε)

(3)= E(X^ε11_D^ε)⁽⁴⁾= E((X−ε)11_D^ε)⁽⁵⁾= −εP(D^ε)

(1) : due to the maximal lemma (2) : dominated convergence (3) : since F^ε =D^ε

(4) : by definition of X^ε (5) : since D^ε ∈ I and E X

I

= 0.

It follows that∀ε >0 : P(D^ε) = 0, and P(L >0) =P(∪_ε>0D^ε) = lim

ε→0P(D^ε) = 0.

1.4 Back to the examples

Example 1.15 (I.i.d. sequence). SeeExample 1.1. Xj, i.i.d., E(|Xj|)<

∞.

1 n

n−1

X

j=0

Xj →E(Xj). Laws of large numbers.

Example 1.16 (Factors). See Example 1.2 and Example 1.3. Laws of large numbers for factors of i.i.d. sequences.

(20)

Example 1.17 (Stationary denumerable Markov chains). See Exam- ple 1.4.

ξj: stationary MC on S =∪mS^(m), (S^(m) irred. comp.) f :S →R: X

α∈S

π(α)|f(α)|<∞.

1 n

n−1

X

j=0

f(ξ_j)→X

m

11_{ξ₀_∈S(m)}

P

α∈S^(m)π(α)f(α) P

α∈S^(m)π(α) . Law of large numbers for MC.

Example 1.18 (Rotations of the circle). See Example 1.5. θ /∈ Q, f ∈L¹([0,1),B, dω):

1 n

n−1

X

j=0

f(·+jθ)→ Z 1

0

f(ω)dω, a.s. and in L¹.

Remark 1.6. For f := 11_[a,b) stronger:

∀ω∈[0,1) : 1 n

n−1

X

j=0

11_[a,b)(ω+jθ)→b−a.

Proof: Homework.

Consequence 1.1. Fix k ∈ {1,2, . . . ,9}. Then

#{m < n : 2^m =k· · · in dec.}

n → log(k+ 1)−logk

log 10

Proof. Letθ := _{log 10}^{log 2} ∈/ Q. 2^m =k· · ·in dec. ⇔

{mθ} ∈Ak := [logk/log 10,log(k+ 1)/log 10)

(21)

Example 1.19 (Bernoulli shift). See Example 1.6.

ω ∈[0,1), binary expansion: ω=

∞

X

j=1

ω_j2^−j

Theorem 1.10. For Lebesgue-a.e. ω ∈ [0,1) any fixed {0,1} string (₁, ₂, . . . , _k) occurs with its natural proper density 2^−k.

I.e. “Almost all real numbers are normal.”

Example 1.20 (Statistical physics).

Ergodicity m

time averages = phase space averages At the heart of statistical physics.

(22)

Convergence in distribution, weak convergence

2.1 Convergence in distribution, basics

• (S, d) complete, separable metric space,

• S its Borel-sigma-algebra,

• e.g. R,Rⁿ with Euclidean distance,

• C([0,1]), C([0,∞)) with sup-norm distance.

Definition 2.1. A probability measure ν on (S,S) is regular if (∀A∈ S)

ν A

= sup{ν K

:K ⊆A, K compact}

= inf{ν O

:A⊆O, O open}

All measures considered will be assumed regular.

µ_n, n= 1,2, . . . and µregular probability measures on (S,S).

Y_n,n = 1,2, . . . and Y S-valued r.v. with distribution P(Y_n ∈A) =µ_n A

, P(Y ∈A) =µ A

, A∈ S

not necessarily jointly defined.

(23)

Definition 2.2 (Weak convergence of probability measures). µ_n ⇒ µ, or Y_n ⇒Y, iff ∀f :S →R continuous and bounded

n→∞lim Z

S

f dµ_n = Z

S

f dµ, or lim

n→∞E(f(Y_n)) = E(f(Y)).

Theorem 2.1 (Equiv. characterizations, “portmanteau thm”).

(a)≡(b)≡(c)≡(d)

(a) µ_n⇒µ.

(b) (∀A ∈ S), A open: lim infn→∞µ_n A

≥µ A . (c) (∀A ∈ S), A closed: lim sup_n→∞µ_n A

≤µ A . (d) (∀A ∈ S), such that µ ∂A

= 0: limn→∞µ_n A

=µ A .

Proof. Probability 2.

2.1.1 The special case of R (or R

^d

)

The distribution function helps:

Fn(x) := P(Yn< x) = µn (−∞, x) , F(x) := P(Y < x) =µ (−∞, x)

. Theorem 2.2. µn ⇒µ (also denoted Fn⇒F) iff

n→∞lim F_n(x) = F(x), at all points of continuity of F.

Proof. Probability 2.

(24)

2.1.2 Examples for weak convergence

Example 2.1. Convergence in probability (Probability 2, Analysis)— this isNOT the typical case: (Ω,F,P)

Y_n, Y : Ω→R defined on the same probab. sp., Y_n −→^P Y

Example 2.2. Poisson approximation of binomial (Probability 1):

Y_n ∼BIN(p_n, n), lim

n→∞np_n =λ ∈(0,∞), Y ∼P OI(λ).

Example 2.3. De Moivre’s CLT (Probability 1):

Ye_n∼BIN(p, n), Y_n := Ye_n−pn

pp(1−p)n, Y ∼N(0,1).

Example 2.4. De Moivre’s-type CLT for gamma-distributions (Probability 2):

Ye_n∼GAM(λ, n), Y_n := Ye_n−λ⁻¹n

√λ⁻¹n , Y ∼N(0,1).

Example 2.5. General CLT for sums of i.i.d. r.v.-s (Probability 2) — the typical case:

X_n i.i.d. r.v.-s, m :=E(X_j), σ² :=Var(X_j), Y_n:=

Pn

j=1(X_j−m) σ√

n , Y ∼N(0,1)

(25)

2.1.3 Tightness

Definition 2.3. The sequence of probability measures µ_n on (S.S), or the sequence of S-valued random variables Y_n, is tight, if (∀ε > 0) (∃K b S) such that

(∀n) : µ_n S\K

< ε, or P(Y_n ∈/K)< ε.

In the S =R case (∀ε >0) (∃K <∞) such that (∀n) : µ_n (−∞,−K)∪(K,∞)

< ε, or P(|Yn|> K)< ε.

Proposition 2.1. If µ_n ⇒µthen the sequence µ_n is tight.

Proof. Easy, if S is locally compact!

Choose

Ke bK bS s.t. µ S\Ke

< ε/2 and

f :S→[0,1] cont., s.t. f Ke= 0, f ^S\K= 1.

Then

µ_n S\K

≤ Z

S

f dµ_n ≤µ_n S\Ke

↓ µ S\K

≤ Z

S

f dµ ≤µ S\Ke

< ε/2.

Hence, (∃n₀ <∞) such that (∀n≥n₀) :µ_n S\K

< ε.

Theorem 2.3 (Helly’s theorem). Let {µ_n/F_n/Y_n}, n = 1,2, . . ., be a tight sequence of {probability measures / probability distribution functions / random variables} on R. Then one can extract a weakly convergent subsequence {µ_n_k/F_n_k/Y_n_k}, k = 1,2, . . .:

{ µ_n_k ⇒µ / F_n_k ⇒F / Y_n_k ⇒Y } as k → ∞.

(26)

Theorem 2.4 (Prohorov’s theorem). Let {µ_n / Y_n} , n = 1,2, . . ., be a tight sequence of {probability measures / random variables} on the complete separable metric space S. Then one can extract a weakly convergent subsequence {µ_n_k / Y_n_k} , k = 1,2, . . .:

{ µ_n_k ⇒µ / Y_n_k ⇒Y } as k → ∞.

For proof of both Thms see: Probability 2.

2.2 Methods for proving weak convergence

General scheme (1) provetightness

(2) proveuniqueness of possible limits (3) identify the limit

Methods

(A) With bare hands (e.g. De Moivre, Poisson, maxima of i.i.d.) (B) Method of moments

(C) Method of characteristic functions (e.g. Markov–L´evy CLT) (D) Coupling

(E) Mixed methods

2.3 With bare hands

2.3.1 Arcsine laws and related stuff

X_n simple symmetric random walk onZ (d= 1!):

X₀ = 0, P X_n+1 =i±1

X_n=i

= 1 2.

(27)

Some relevant random variables:

The maximum: M_n:= max{X_j :j ∈[0, n]}, First hitting of r∈Z+: T_r:= inf{n >0 :X_n =r}, Return times k ∈N: R₀ = 0, R_k+1 := inf{n > R_k:X_n= 0}, Local time at 0∈Z: L_n:= #{j ∈(0, n] :X_j = 0}, Last visit to 0∈Z: λ_n:= max{j ∈(0, n] :X_j = 0}, Time spent on Z+: π_n:= #{j ∈(0, n] : Xj−1+X_j

2 >0}.

Theorem 2.5 (Limit theorem for the maximum). (i) Discrete,microscopic version: 0≤r ≤n fixed:

P(M_n =r) =P(X_n =r) +P(X_n =r+ 1).

(ii) Local limit theorem: 0≤u fixed, 1n:

n^1/2P Mn = [n^1/2u]

= r2

πe^−u²^/211u>0+O(n^−1/2)

(iii) Global (integrated) limit theorem: 0≤x fixed:

n→∞lim P n^−1/2M_n < x

= 11_x>0 r2

π Z x

0

e^−u²^/2du

= 11_x>0(2Φ(x)−1).

Proof of part (i).

P(Mn≥r) = P(Mn ≥r, Xn 6=r) +P(Mn≥r, Xn =r)

=∗ 2P(M_n ≥r, X_n > r) +P(M_n≥r, X_n =r)

= 2P(Xn≥r)−P(Xn =r).

* due to the reflection principle.

(28)

P(M_n =r) = P(M_n≥r)−P(M_n≥r+ 1)

= 2P(X_n ≥r)−2P(X_n≥r+ 1)−

−P(X_n =r) +P(X_n=r+ 1)

= P(X_n =r) +P(X_n=r+ 1)

Proof of parts (ii) and (iii).

P M_n= [√ nu]

= P X_n= [√ nu]

+P X_n = [√

nu] + 1

=∗∗ n^−1/2 r2

πe^−u²^/2+O(n⁻¹)

** due toDe Moivre.

(iii) Integrated version follows from local version + Fatou + Riemannian integration.

Theorem 2.6 (Limit theorem for the hitting times). (i) Discrete, microscopic version: 0< r≤n fixed:

P(T_r=n) = r n

n (n+r)/2

2⁻ⁿ

(ii) Local limit theorem: 0< s fixed, 1r:

r²P T_r = [r²s]

= r2

πs^−3/2e^−1/(2s)11_s>0+O(r⁻¹).

(iii) Global (integrated) limit theorem: 0< t fixed:

r→∞lim P r⁻²T_r < t

= 11_t>0 1

√2π Z t

0

s^−3/2e^−1/(2s)ds

= 11_t>0 r2

π Z ∞

1/√ t

e^−u²^/2du.

(29)

Proof of part (i).

P(T_r =n) = 1 2P

{max

j≤n−2X_j ≤r−1} ∧ {Xn−1 =r−1}

= 1

2P(X_n−1 =r−1)−

−1 2P

{max

j≤n−2X_j ≥r} ∧ {Xn−1 =r−1}

=∗ 1

2P(X_n−1 =r−1)− 1

2P(X_n−1 =r+ 1)

= r n

n (n+r)/2

2⁻ⁿ

Proof of parts (ii) and (iii).

P T_r = [r²s]

= r

[r²s]

([r²s] +r)/2

2^−[r²^s]

∗∗= r⁻² 2

√2πs^−3/2e^−1/(2s)+O(r⁻³)

** due to Stirling.

(iii) Integrated version: local version + Fatou + Riemannian integration.

(30)

Theorem 2.7 (Limit theorem for the return times). (i) Discrete, microscopic version: 0< k≤n fixed:

P(R_k =k+n) = k n

n (n+k)/2

2⁻ⁿ

(ii) Local limit theorem: 0< s fixed:

k²P R_k = [k²s]

= 1

√2πs^−3/2e^−1/(2s)11_s>0+O(k⁻¹).

(iii) Global (integrated) version: 0< t fixed:

k→∞lim P k⁻²R_k < t

= 11_t>0 1

√2π Z t

0

s^−3/2e^−1/(2s)ds.

= 11t>0

r2 π

Z ∞

1/√ t

e^−u²^/2du.

Proof.

R_k ^law= T_k+k.

Remarks on the last two limit theorems Remark 2.1 (I.i.d. sums).

T_r =ξ₁+ξ₂+· · ·+ξ_r, R_k=ζ₁ +ζ₂+· · ·+ζ_k,

where ξ_i, i= 1,2, . . . and ζ_i, i= 1,2, . . . are sequences ofi.i.d. r.v.-s with ξ_i ^law= T₁, ζ_i ^law= R₁ ^law= T₁+ 1.

(31)

Remark 2.2 (Stability).

f₁(s) := 1

√2πs^−3/2e^−1/(2s)11_s>0, f_a(s) := af₁(as), a >0.

Then

f_a∗f_b =f₍^√_a+^√_b)2

Homework.

Theorem 2.8 (Limit theorem for the local time at zero). Global (integrated) version:

n→∞lim P n^−1/2L_n< t

= 11_t>0 r2

π Z t

0

e^−u²^/2du.

Proof.

{L_n < k}={R_k> n}.

Hence

n→∞lim P Ln < n^1/2t

= lim

n→∞P(R_n^1/2_t> n) = lim

m→∞P Rm > m²/t²

= r2

π Z t

0

e^−u²^/2du.

Remark 2.3. Note that

n→∞lim P n^−1/2|X_n|< u

= lim

n→∞P n^−1/2L_n< u

= lim

n→∞P n^−1/2M_n< u For a simple symmetric random walkXn (on Z) denote

u(n) := P(X_n= 0) = n

n/2

2⁻ⁿ

f(n) :=P(min{m≥1 :X_m =X₀}=n)

(32)

Recall the identity:

u(n) =

n

X

m=0

f(m)u(n−m).

Theorem 2.9 (Paul L´evy’s arcsine theorem). (i) Discrete, microscopic version: 0≤k≤n:

P(λ_2n+1 = 2k)=^X P(λ_2n= 2k) = u(2k)u(2n−2k),

P(π2n+1 ∈ {2k,2k+ 1})=^X P(π2n = 2k) =u(2k)u(2n−2k),

P(λ_2n = 2k+ 1)=^X P(λ_2n+1 = 2k+ 1)=^X P(π_2n= 2k+ 1)= 0^X (ii) Local limit theorem: y∈(0,1) fixed 1n:

nP(λ_2n = 2[ny]) =nP(π_2n = 2[ny]) = 1 π

1

py(1−y)+O(n^−1/2)

(iii) Global (integrated) limit theorem: x∈(0,1) fixed

n→∞lim P n⁻¹λ_n< x

= lim

n→∞P n⁻¹π_n < x

= 110<x<1

2

πarcsin√ x.

Lemma 2.1.

P(X_j 6= 0, j = 1,2, . . . ,2n) =P(X_2n= 0) =:u(2n).

(33)

Proof of Lemma 2.1.

P(X_j 6= 0, j = 1,2, . . . ,2n)

= 2P(Xj >0, j = 1,2, . . . ,2n)

= 2

∞

X

r=1

P({X_j >0, j = 1,2, . . . ,2n−1} ∧ {X_2n = 2r})

=∗ 2

∞

X

r=1

1

2 P(X2n−1 = 2r−1)−P(X2n−1 = 2r+ 1)

=P(X2n−1 = 1) =P(X_2n= 0).

Proof of Theorem 2.9. (i) Forλ_n:

P(λ_2n= 2k) = P({X_2k=0} ∧ {X_j 6= 0, j = 2k+ 1, . . . ,2n})

= P(X_2k=0)P(X_j 6= 0, j = 1, . . . ,2n−2k)})

= u(2k)u(2n−2k).

Forπ_n by induction. Note that

P(π_2n= 2k) = P(π_2n= 2n−2k). Fork = 0 or k=n:

P(π_2n = 0) = P(X_j ≥0, j = 1,2, . . . ,2n)

= P(X_j ≥0, j = 1,2, . . . ,2n−1)

= 2P(X_j >0, j = 1,2, . . . ,2n) =u(2n)u(0) Denote

b(2n,2k) := P(π_2n= 2k) = b(2n,2n−2k)

For1≤k ≤n there is a first excursion to the left or to the right:

b(2n,2k) = 1 2

k

X

r=1

f(2r)b(2n−2r,2k−2r) + 1 2

n−k

X

r=1

f(2r)b(2n−2r,2k)

(34)

By the induction assumption:

b(2n,2k) = 1

2u(2n−2k)

k

X

r=1

f(2r)u(2k−2r) +

+1 2u(2k)

n−k

X

r=1

f(2r)u(2n−2k−2r)

= 1

2u(2n−2k)u(2k) + 1

2u(2k)u(2n−2k) =u(2k)u(2n−2k) (ii)

u(2[ny])u(2[n(1−y)]) ^∗∗=n⁻¹1 π

1

py(1−y)+O(n^−3/2)

** due toStirling.

(iii) Integrated version: local version + Fatou + Riemannian integration.

(35)

The method of moments and the method of characteristic functions

• Recall everything you learnt about characteristic functions.

• Probability II.

3.1 The method of moments

Let X be a random variable, its absolute moments and its moments are assumed finite:

A_k :=E |X|^k

<∞, M_k :=E X^k

Remark 3.1. In order that the sequences A_k and M_k be the sequences of (absolute) moments of a random variable X it must satisfy an infinite set of (Jensen-type) inequalities: in particular, if k₁ +· · ·+k_m =k, respectively, if k₁+· · ·+k_m = 2k then

m

Y

j=1

A_k_j ≤A_k,

m

Y

j=1

|M_k_j| ≤M_2k,

(36)

The “Moment problem”: Given a sequence of moments M_k, does it de- termineuniquely the distribution of a random variable?

Theorem 3.1. If M_k is a sequence of moments such that lim sup

k→∞

|M_k| k!

1/k

:=R⁻¹ <∞

then it determines a unique random variableX (or: probability distribution) such that M_k =E X^k

.

Proof. The power series of the characteristic function

∞

X

k=0

Mk

k! (iu)^k

will have radius of convergence R > 0, and thus it will be uniquely deter- mined.

Example 3.1. Compute all moments of all remarkable distributions. E.g.

X ∼EXP(λ) : M_k =A_k =λ^−kk!

X ∼N(0, σ) : A_2k =σ^2k 2k!

2^kk! =M_2k, A_2k+1 =σ^2k+1

r2

π2^kk!, M_2k+1 = 0 Counterexample 3.1. The log-normal distribution (HW!).

3.1.1 Weak limit from convergence of moments

Theorem 3.2. Let Z_n be a sequence of random variables which have all moments finite and denote

M_n,k :=E Z_n^k .

If (∀k) the limit lim_n→∞M_n,k =: M_k exists and the sequence of moments M_k determines uniquely a distribution/random variable Z, then Z_n⇒Z.

(37)

Remark 3.2. The sequence M_k is a sequence of moments.

Proof. (i) Tightness:

P(|Z_n|> K)≤ M_n,2

K² .≤ sup_nM_n,2 K² .

(ii) Identification of the limit: AssumeZ_n⁰ ⇒Z. Fore K <∞ letϕ_K :R→ R,

ϕ_K(x) := x11|x|≤K + sgn(x)K11|x|>K. Then

E Ze^k

= lim

K→∞E

ϕ_K(Ze)^k

= lim

K→∞ lim

n⁰→∞E ϕ_K(Z_n⁰)^k

(due to weak cvg.)

= lim

K→∞ lim

n⁰→∞ E Z_n^k⁰

−E Z_n^k⁰ −ϕK(Zn⁰)^k

= lim

n⁰→∞M_n⁰_,k− lim

K→∞ lim

n⁰→∞E Z_n^k0 −ϕ_K(Z_n⁰)^k But:

E Z_n^k0 −ϕ_K(Z_n⁰)^k ≤E

Z_n⁰|^k11_|Z_n⁰ _|>K

(1)

≤ p

M_n⁰_,2k p

P(|Z_n⁰|> K)

(2)

≤

pM_n⁰_,2k p M_n⁰_,2 K

(1) : due to Schwarz’s inequality (2) : due to Markov’s inequality Altogether:

E Ze^k

=M_k.

(38)

3.1.2 Appl 1: CLT with the method of mo- ments

Sheds light on thecombinatorial aspects of the CLT. Letξ_j be i.i.d. with all moments finite,E ξ_j^k

=:m_k, m₁ = 0, m₂ =:σ², Z_n:= ξ₁+· · ·+ξ_n

√n . Then, with fixed k:

E Z_n^2k

= n^−k n

k

σ^2k2k!

2^k +o(1)→σ^2k 2k!

2^kk!, E Z_n^2k+1

= o(1) →0, asn → ∞(with k fixed).

3.2 The method of characteristic functions (Repeat from Probability II.)

Theorem 3.3. Let Z_n be a sequence of random variables and ϕ_n :R → R their characteristic functions,

ϕ_n(u) := E(exp(iuZ_n)). If

(∀u∈R) : lim

n→∞ϕ_n(u) = ϕ(u) (pointwise!)

and u7→ϕ(u) is continuous atu= 0, then ϕ is characteristic function of a random variable Z and Z_n⇒Z.

For proving tightness:

Lemma 3.1 (Paul L´evy). Let Y be a random variable and ψ(u) :=

E(exp(iuY)) its characteristic function. Then for any K <∞ P(|Y |> K)≤ K

2 Z 2/K

−2/K

(1−ψ(u))du.

(39)

Proof of Lemma 3.1.

K 2

Z 2/K

−2/K

(1−ψ(u))du = K 2

Z 2/K

−2/K

E 1−e^iuY du

(1)= 2E

1−sin(2Y /K) 2Y /K

(2)

≥ 2E

1− sin(2Y /K) 2Y /K

11|Y|>K

(3)

≥ 2E

1− K 2|Y |

11|Y |>K

≥ P(|Y |> K). (1) : Fubini,

(2) : |sinα/α| ≤1, (3) : sinα/α≤1/|α|.

Proof of Theorem 3.3. (1) Tightness:

From continuity of u7→ϕ(u) atu= 0:

(∃K <∞) : K 2

Z 2/K

−2/K

(1−ϕ(u))du < ε 2. From pointwise convergence (and uniform boundedness ofϕ_n)

(∃n0 <∞) : (∀n≥n0) : K 2

Z 2/K

−2/K

(1−ϕn(u))du < ε.

Hence tightness, by Lemma 3.1.

(2) Identification of the limit: Assume Z_n⁰ ⇒Ze, then E

exp(iuZ)e

= lim

n⁰→∞E(exp(iuZ_n⁰)) =ϕ(u).

(40)

3.3 Erd˝ os–Kac theorem: CLT for number of prime divisors

A mixture of the method of characteristic functions and method of moments.

Denote by Pthe set of primes and

g :N→N, g(m) := #{p∈P:p|m}.

Theorem 3.4 (Paul Erd˝os & Marc Kac, 1940).

n→∞lim n⁻¹#

m ∈ {1,2, . . . , n}: g(m)−log logn

√log logn < x = Z x

−∞

e^−y²^/2

√2π dy.

Probabilistic setup: Let ω_n be randomly sampled from ({1,2, . . . , n}, U N I) and Z_n :=g(ω_n). Then

Z_n−log logn

√log logn ⇒N(0,1).

Proof. We will use

X

p∈P:p≤n

1

p = log logn+O(1).

Define the random variablesYn,p, p∈P, n ∈N.

Y_n,p:= 11_p|ω_n, where ω_n∼U N I({1,2, . . . , n}).

Mind that for n∈N fixed Yn,p

p∈P are jointly defined.

Then

Z_n=X

p∈P

Y_n,p.

Note that for anyk <∞ and p₁, p₂, . . . , p_k ∈P fixed Y_n,p₁, Y_n,p₂, . . . , Y_n,p_k

⇒ X_p₁, X_p₂, . . . , X_p_k

(41)

where X_p, p ∈ P, are (jointly defined) independent random variables with distribution

P(Xp = 1) = 1

p = 1−P(Xp = 0). How to guess the result? Let

αn → ∞, Sn := X

p∈P:p≤αn

Xp. Then

S_n^∗ := S_n−log logα_n

√log logα_n ⇒N(0,1).

Note that

S_n−log logα_n

√log logα_n = S_n−E(S_n)

√log logα_n +E(S_n)−log logα_n

√log logα_n and E(Sn)−log logαn

√log logα_n = log log logαn+O(1)

√log logα_n →0 The weak convergence

S_n−E(S_n)

√log logα_n ⇒N(0,1) is proved with method of characteristic functions:

E(exp(iuS_n^∗)) =Y

p∈P:p≤α_n

1

pexp{iu(p−1)/p

√log logα_n}+ p−1

p exp{ −iu/p

√log logα_n}

→ exp{−u²/2} HW!

Let

αn := n^1/^{log log}ⁿ logα_n = logn

log logn

log logα_n = log logn−log log logn.

Note that

(1): (∀ε >0) : α_n=o(n^ε),

(2): X

αn<p≤n

1

p = log log logn+O(1).

(42)

Let

S_n := X

p∈P:p≤αn

X_p, S_n^∗ := S_n−log logα_n

√log logα_n T_n:= X

p∈P:p≤α_n

Y_n,p, T_n^∗ := T_n−log logα_n

√log logα_n Zn := X

p∈P:p≤n

Yn,p =X

p∈P

Yn,p, Z_n^∗ := Z_n−log logn

√log logn We know that S_n^∗ ⇒N(0,1)and we want to prove Z_n^∗ ⇒N(0,1).

Step 1.

E(|Z_n−T_n|) = X

p∈P:αn<p≤n

E(Y_n,p)≤ X

p∈P:αn<p≤n

1 p

= log log logn+O(1) =o(p

log logn)

|log logn−log logα_n| = log log logn+O(1) =o(p

log logn) Hence

|T_n^∗−Z_n^∗| −→^P 0.

Step 2. We prove T_n^∗ ⇒N(0,1) with method of moments.

By computation:

n→∞lim E S_n^k

= Z ∞

−∞

e^−y²^/2

√2π y^kdy=:M_k. HW!

For1< p₁ < p₂ <· · ·< p_l ≤α_n and k₁, k₂, . . . , k_l ≥1:

E X_p^k₁¹X_p^k₂². . . X_p^k^l

l

= E(X_p₁X_p₂. . . X_p_l) = 1 p₁p₂. . . p_l E Y_n,p^k¹₁Y_n,p^k²₂. . . Y_n,p^k^l_l

= E(Yn,p1Yn,p2. . . Yn,p_l) = 1 n

n p₁p₂. . . p_l

. Hence:

E X_p^k₁¹X_p^k₂². . . X_p^k^l

l

−E Y_n,p^k¹₁Y_n,p^k²₂. . . Y_n,p^k^l

l ≤ 1

n.

(43)

Using this and

(x₁+x₂+· · ·+x_N)^k =

=

N

X

l=1

X

k1,k2,...,kl≥1 k1+k2+···+k_l=k

X

1≤m1<m2<···<m_l≤N

C(l;k₁, k₂, . . . , k_l)x^k_m¹

1x^k_m²

2. . . x^k_m^l

l

we readily obtain

E S_n^k

−E T_n^k ≤ α^k_n

n =o(1) and thus

n→∞lim E T_n^k

=M_k. Hence:

T_n^∗ ⇒N(0,1), which together with “Step 1” implies

Z_n^∗ ⇒N(0,1),

3.4 Limit theorem for the coupon collector

Mixture of “bare hands” and characteristic/generating function method.

For n ∈ N, let ξ_n,k, k = 0,1, . . . , n−1 be independent geometrically distributed random variables with distribution

P(ξ_n,k =m) = k

n m

n−k

n , m= 0,1,2, . . . and

V_n:=

n−1

X

k=0

ξ_n,k Then

E(ξ_n,k) = k

n−k, Var(ξ_n,k) = nk (n−k)² E(V_n) =nlogn+O(n), Var(V_n) = π²

6 n²+O(nlogn).

(44)

Theorem 3.5.

n→∞lim P

Vn−nlogn n < x

= exp{−e^−x}.

Remark 3.3. The (two-parameter family of ) distributions F_a,b(x) := exp{−e^−ax+b}, a∈R+, b∈R,

f_a,b(x) := d

dxF_a,b(x) =aexp{−e^−ax+b−ax+b}

are called Type-1 Gumbel distributions and appear in extreme value theory.

Proof. Letζ_n,k :=ξn,n−k, k = 1, . . . , n, and Z_n :=

n

X

k=1

ζ_n,k n − 1

k

= V_n−nlogn

n −γ+O(n⁻¹).

whereγ is Euler’s constant γ := lim

n→∞

n

X

k=1

k⁻¹−logn

!

≈0.5772. . . .

Lemma 3.2. Let p_n & 0 so that np_n → λ ∈ R+ and ζ_n be a sequence of geometrically distributed random variables with distribution

P(ζ_n=r) = (1−p_n)^rp_n. Then ζ_n/n ⇒EXP(λ).

Proof. Straightforward elementary computation.

Thus

ζ_n,1 n ,ζ_n,2

n , . . .

⇒(ζ₁, ζ₂, . . .)

whereζ_k,k = 1,2, . . . are independentEXP(k)-distributed, E(ζ_k) = 1

k, Var(ζ_k) = 1

k², ζe_k:=ζ_k−E(ζ_k).

(45)

It follows that

Z_n⇒Z := lim

K→∞

K

X

k=1

ζe_k

Note that the limit definingZ exists a.s. due to Kolmogorov’s inequality (see Probability II.)

Computing the distribution of Z: Let Φ : (−1,∞)→R⁺ be the moment generating function (Laplace transform) of Z:

Φ(u) := E(exp(−uZ)) =

∞

Y

k=1

E

exp(−ueζ_k)

=· · ·

= exp

∞

X

k=1

log k

k+u +u k

(Mind that the sum is absolutely convergent!)

Analiticity of (−1,∞)3u7→Φ(u)and the identities

Φ(0) = 1, Φ(u+ 1) =e^γ(u+ 1)Φ(u) HW! determine

Φ(u) =e^γuΓ(u+ 1).

On the other hand:

Z ∞

−∞

e^−uydexp{−e^−(y+γ)} = Z ∞

−∞

e^−uyexp{−e^−(y+γ)}e^−(y+γ)dy

= e^γu Z ∞

0

z^ue^−zdz

= e^γuΓ(u+ 1).