9 Convergence in probability - Lecture Notes for Introductory Probability

One of the goals of probability theory is to extricate a useful deterministic quantity out of a random situation. This is typically possible when a large number of random effects cancel each other out, so some limit is involved. In this chapter we consider the following setting: given a sequence of random variables, Y₁, Y₂, . . ., we want to show that, when nis large, Y_n is approx-imately f(n), for some simple deterministic function f(n). The meaning of “approximately” is what we now make clear.

A sequenceY₁, Y₂, . . . of random variables converges to a number ain probability if, as n→ ∞, P(|Y_n−a| ≤ǫ) converges to 1, for any fixedǫ >0. This is equivalent to P(|Y_n−a|> ǫ)→0 as n→ ∞, for any fixed ǫ >0.

Example 9.1. Toss a fair coinntimes, independently. LetR_n be the “longest run of Heads,”

i.e., the longest sequence of consecutive tosses of Heads. For example, ifn= 15 and the tosses come out

HHTTHHHTHTHTHHH, thenR_n= 3. We will show that, asn→ ∞,

R_n

log₂n →1,

in probability. This means that, to a first approximation, one should expect about 20 consecutive Heads somewhere in a million tosses.

To solve a problem such as this, we need to find upper bounds for probabilities that R_n is large and that it is small, i.e., forP(R_n≥k) andP(R_n≤k), for appropriately chosenk. Now, for arbitrary k,

P(Rn≥k) = P(kconsecutive Heads start at some i, 0≤i≤n−k+ 1)

= P(

n−k+1

[

i=1

{iis the first Heads in a succession of at least kHeads})

≤ n· 1 2^k.

For the lower bound, divide the string of size n into disjoint blocks of size k. There are

⌊ⁿ_k⌋ such blocks (ifn is not divisible by k, simply throw away the leftover smaller block at the end). Then, R_n ≥ k as soon as one of the blocks consists of Heads only; different blocks are independent. Therefore,

P(Rn< k)≤

1− 1 2^k

_⌊ⁿ_k_⌋

≤exp

− 1 2^k

jn k

k ,

using the famous inequality 1−x≤e⁻^x, valid for allx.

Below, we will use the following trivial inequalities, valid for any real number x≥2: ⌊x⌋ ≥ x−1, ⌈x⌉ ≤x+ 1, x−1≥ ^x₂, and x+ 1≤2x.

To demonstrate that _log^Rⁿ

2n →1,in probability, we need to show that, for any ǫ >0,

A little fussing in the proof comes from the fact that (1±ǫ) log₂n are not integers. This is common in such problems. To prove (1), we plug k=⌊(1 +ǫ) log₂n⌋ into the upper bound to

The most basic tool in proving convergence in probability is the Chebyshev inequality: ifX is a random variable with EX=µand Var(X) =σ², then

P(|X−µ| ≥k)≤ σ² k²,

for anyk >0. We proved this inequality in the previous chapter and we will use it to prove the next theorem.

Theorem 9.1. Connection between variance and convergence in probability.

Assume that Yn are random variables and thatais a constant such that

EYn→a, Var(Y_n)→0, as n→ ∞. Then,

Y_n→a, as n→ ∞, in probability.

Proof. Fix anǫ >0. Ifnis so large that

|EYn−a|< ǫ/2, then

P(|Y_n−a|> ǫ) ≤ P(|Y_n−EY_n|> ǫ/2)

≤ 4Var(Y_n) ǫ²

→ 0,

asn→ ∞. Note that the second inequality in the computation is the Chebyshev inequality.

This is most often applied to sums of random variables. Let S_n=X₁+. . .+X_n,

where Xi are random variables with finite expectation and variance. Then, without any inde-pendence assumption,

ES_n=EX₁+. . .+EX_n and

E(S_n²) =

i=1

EX_i²+X

i6=j

E(X_iX_j),

Var(S_n) =

i=1

Var(X_i) +X

i6=j

Cov(X_i, X_j).

Recall that

Cov(X₁, X_j) =E(X_iX_j)−EX_iEX_j and

Var(aX) =a²Var(X).

Moreover, ifX_i are independent,

Var(X₁+. . .+X_n) = Var(X₁) +. . .+ Var(X_n).

Continuing with the review, let us reformulate and prove again the most famous convergence in probability theorem. We will use the common abbreviation i. i. d. for independent identically distributed random variables.

Theorem 9.2. Weak law of large numbers. LetX, X₁, X₂, . . . be i. i. d. random variables with with EX=µ and Var(X) =σ²<∞. Let S_n=X₁+. . .+X_n. Then, as n→ ∞,

n →µ in probability.

Proof. Let Y_n= ^S_nⁿ. We have EY_n=µand Var(Y_n) = 1

n² Var(S_n) = 1

n²n σ² = σ² n. Thus, we can simply apply the previous theorem.

Example 9.2. We analyze a typical “investment” (the accepted euphemism for gambling on financial markets) problem. Assume that you have two investment choices at the beginning of each year:

• a risk-free “bond” which returns 6% per year; and

• a risky “stock” which increases your investment by 50% with probability 0.8 and wipes it out with probability 0.2.

Putting an amounts in the bond, then, gives you 1.06safter a year. The same amount in the stock gives you 1.5swith probability 0.8 and 0 with probability 0.2; note that the expected value is 0.8·1.5s= 1.2s >1.06s. We will assume year-to-year independence of the stock’s return.

We will try to maximize the return to our investment by “hedging.” That is, we invest, at the beginning of each year, a fixed proportion x of our current capital into the stock and the remaining proportion 1−x into the bond. We collect the resulting capital at the end of the year, which is simultaneously the beginning of next year, and reinvest with the same proportion x. Assume that our initial capital is x₀.

It is important to note that the expected value of the capital at the end of the year is maximized when x= 1, but by using this strategy you will eventually lose everything. Let X_n be your capital at the end of yearn. Define the average growth rate of your investment as

λ= lim

We will expressλin terms ofx; in particular, we will show that it is a nonrandom quantity.

Let I_i =I_{stock goes up in yeari}. These are independent indicators withEI_i= 0.8.

X_n = X_n−1(1−x)·1.06 +X_n−1·x·1.5·I_n

= X_n−1(1.06(1−x) + 1.5x·I_n) and so we can unroll the recurrence to get

X_n=x₀(1.06(1−x) + 1.5x)^Sⁿ((1−x)1.06)ⁿ⁻^Sⁿ,

in probability, as n→ ∞. The last expression defines λ as a function ofx. To maximize this, we set ^dλ_dx = 0 to get

0.8·0.44

1.06 + 0.44x = 0.2 1−x. The solution isx= ₂₂⁷, which gives λ≈8.1%.

Example 9.3. Distributenballs independently at random intonboxes. LetN_nbe the number of empty boxes. Show that ¹_nN_n converges in probability and identify the limit.

Note that

Moreover,

1. Assume that nmarried couples (amounting to 2n people) are seated at random on 2n seats around a round table. Let T be the number of couples that sit together. Determine ET and Var(T).

2. There are n birds that sit in a row on a wire. Each bird looks left or right with equal probability. Let N be the number of birds not seen by any neighboring bird. Determine, with proof, the constantc so that, asn→ ∞, ¹_nN →c in probability.

3. Recall the coupon collector problem: sample fromncards, with replacement, indefinitely, and let N be the number of cards you need to get so that each of ndifferent cards are represented.

Find a sequencea_n so that, asn→ ∞,N/a_n converges to 1 in probability.

4. Kings and Lakers are playing a “best of seven” playoff series, which means they play until one team wins four games. Assume Kings win every game independently with probability p.

(There is no difference between home and away games.) LetN be the number of games played.

Compute EN and Var(N).

5. An urn contains n red and m black balls. Select balls from the urn one by one without replacement. LetX be the number of red balls selected before any black ball, and let Y be the number of red balls between the first and the second black one. Compute EX and EY.

Solutions to problems

1. LetI_i be the indicator of the event that theith couple sits together. Then,T =I₁+· · ·+I_n. Moreover,

EI_i = 2

2n−1, E(I_iI_j) = 2²(2n−3)!

(2n−1)! = 4

(2n−1)(2n−2), for any iand j6=i. Thus,

ET = 2n 2n−1 and

E(T²) =ET+n(n−1) 4

(2n−1)(2n−2) = 4n 2n−1, so

Var(T) = 4n

2n−1− 4n²

(2n−1)² = 4n(n−1) (2n−1)².

2. LetIi indicate the event that birdiis not seen by any other bird. Then, EIi is ¹₂ ifi= 1 or i=n and ¹₄ otherwise. It follows that

EN = 1 +n−2

4 = n+ 2 4 .

Furthermore, Ii and Ij are independent if |i−j| ≥ 3 (two birds that have two or more birds between them are observed independently). Thus, Cov(I_i, I_j) = 0 if|i−j| ≥3. AsI_i and I_j are indicators, Cov(I_i, I_j)≤1 for any iand j. For the same reason, Var(I_i)≤1. Therefore,

Var(N) =X

Var(I_i) +X

i6=j

Cov(I_i, I_j)≤n+ 4n= 5n.

Clearly, if M = ¹_nN, then EM = _n¹EN → ¹₄ and Var(M) = _n¹2Var(N) → 0. It follows that c= ¹₄.

3. LetN_i be the number of coupons needed to getidifferent coupons after havingi−1 different ones. Then N = N₁+. . .+N_n, and N_i are independent Geometric with success probability

n−i+1 and the final result can be obtained by plugging in EI_i and by the standard formula

Var(N) =E(N²)−(EN)².

5. Imagine the balls ordered in a row where the ordering specifies the sequence in which they are selected. LetI_i be the indicator of the event that theith red ball is selected before any black ball. Then,EI_i= _m+1¹ , the probability that in a random ordering of theith red ball and allm black balls, the red comes first. As X=I₁+. . .+I_n,EX = _m+1ⁿ .

Now, letJ_i be the indicator of the event that theith red ball is selected between the first and the second black one. Then,EJ_i is the probability that the red ball is second in the ordering of the above m+ 1 balls, soEJ_i=EI_i, and EY =EX.

In document Lecture Notes for Introductory Probability (Pldal 113-121)