• Nem Talált Eredményt

On the supremum of partial sums of independent random variables

N/A
N/A
Protected

Academic year: 2022

Ossza meg "On the supremum of partial sums of independent random variables"

Copied!
26
0
0

Teljes szövegt

(1)

“Broadening the knowledge base and supporting the long term professional  sustainability of the Research University Centre of Excellence

at the University of Szeged by ensuring the rising generation of excellent scientists.””

Doctoral School of Mathematics and Computer Science

Stochastic Days in Szeged 27.07.2012.

On the supremum of partial sums of independent random variables

Péter Major

(Alfréd Rényi Institute of Mathematics)

TÁMOP‐4.2.2/B‐10/1‐2010‐0012 project

(2)

On the supremum of partial sums of independent random variables

P´eter Major

Mathematical Institute of the Hungarian Academy of Sciences.

The original problem:

Let ξ1, . . . , ξn be a sequence of i.i.d. random variables with some distribution µ on a space (X, X).

Let us have a nice class of functions F on the space (X, X) such that R f(x)µ(dx) = 0 for all f ∈ F.

Give a good estimate on the tail distribution P

sup

f∈F

√1 n

n X j=1

f(ξj) > x

for all x > 0.

(3)

(The actual problem was its multivariate ver- sion about the tail distribution of the supre- mum of (degenerated) U-statistics.)

The main questions discussed in this talk:

(a) What is the natural Gaussian counterpart of this problem, and what kind of result does it suggest?

(b) When does the similarity with the Gaussian counterpart stop to exist, and what can be told in that case?

(c) A conjecture of Talagrand, he called the Bernoulli conjecture.

A problem similar to the tail distribution of the supremum of partial sums investigated by

(4)

Michel Talagrand: Give a good upper bound on

E sup

f∈F

√1 n

n X j=1

f(ξj).

There is a concentration inequality which says that

P

sup

f∈F

√1 n

n X j=1

f(ξj) − E sup

f∈F

√1 n

n X j=1

f(ξj) > x

is small. Hence the two problems are equiv- alent. Moreover, Talagrand’s estimate on the expectation of the supremum contains an im- plicit estimate on its tail distribution.

First we consider the analogous Gaussian prob- lem.

Let ηt, Eηt = 0, t ∈ T, be a countable set of (jointly) Gaussian random variables. Put

(5)

d2(s, t) = hE(ηs − ηt)2i1/2, s, t ∈ T. Then d2(s, t) is a metric on the parameter set T.

The problem: Give a good estimate on E sup

tT ηt with the help of the function d2(s, t).

Results with the help of a classical and natural method, the the chaining argument.

A classical estimate due to R. M. Dudley.

Define

e(n) = min

TnT,cardTn22n{α: for all t ∈ T there is some ¯t ∈ Tn such that d2(t,¯t) ≤ α} Theorem of Dudley:

E sup

tT ηt

X n=0

2n/2e(n).

(6)

The content of the notion e(n). Place uni- formly 22n points on the set T in such a way that all points t ∈ T are close to some point of this set. The number e(n) is the smallest α for which all points are closer to this set than α.

(We look for a dense net of T consisting of 22n points.)

The idea of the proof:

Fix some u > 0, and sets Tn ⊂ T, card Tn ≤ 22n, n = 1,2, . . ., S

n=1Tn = T, and put Q(u) = P

sup

tT ηt > u

X n=0

2n/2e(n)

and

QN(u) = P

sup

tT1∪···∪TN ηt > u

N1 X n=0

2n/2e(n)

, N = 0,1, 2, . . . .

(7)

We can estimate well Q(u) for all u ≥ 2, by giving a good bound on QN(u)−QN1(u) for all N = 1,2, . . .. Here we exploit that for all t ∈ TN there is some ¯t ∈ TN1 which is close to it. We get a good estimate on the tail distribution of sup

tT ηt which shows that the main contribution to the expectation we want to bound comes from the event sup

tT ηt ≤ 2 P

n=0

2n/2e(n).

Talagrand found a sharper estimate by intro- ducing the right quantity γ2(T, d) needed in the study of this problem.

To define it, let us first introduce the diameter

∆(A) = sup

s,tAd(s, t) of a set A ⊂ T in a metric space (T, d), and the notion of

Admissible sequence of partitions. A se- quence of refining partitions A0 ⊂ A1 ⊂ A2

(8)

· · · in the parameter set T is an admissible se- quence of partitions if card A0 = 1, and

cardAn ≤ 22n, n = 1,2, . . ..

Given an admissible partition A0 ⊂ A1 ⊂ A2

· · · and a point t ∈ T let An(t) be that element B of the partition An for which t ∈ B.

Given a countable parameter set T with a met- ric d we define

γ2(T, d) = inf sup

tT

X n=0

2n/2∆(An(t)),

where the infimum is taken for all admisssible sequences of partitions of T.

The estimate of Talagrand.

Let ηt, Eηt = 0, t ∈ T, be a sequence of Gaus- sian random variables, with the metric d2(s, t) = [E(ηt − ηs)2]1/2 on T. Then

E sup

tT ηt ≤ Lγ2(T, d2).

(9)

Moreover 1

2(T, d2) ≤ E sup

tT ηt ≤ Lγ2(T, d2) with a universal constant L.

The same upper bound holds for the supremum of random variables Ut, t ∈ T, with d(s, t)2 = E(Us−Ut)2 if their tail distribution satisfies the inequality

P(|Us − Ut| > u) ≤ eCu2/d(s,t)2

for all s, t ∈ T and u > 0 with a universal con- stant C.

Here we demanded a Gaussian type tail be- haviour.

What can be told about the tail distribution of sums of independent random variables?

A classical result, Bernstein’s inequality says:

(10)

Bernstein’s inequality. Let ξ1, . . . , ξn be inde- pendent random variables,

P(|ξj| ≤ 1) = 1, and Eξj = 0, 1 ≤ j ≤ n.

Put σj2 = Eξj2, 1 ≤ j ≤ n, Sn = Pn

j=1ξj and VarSn = Vn2 = Pn

j=1σj2. Then

P(Sn > u) ≤ exp

− u2 2Vn2

1 + u

3Vn2

for all numbers u > 0.

If u ≤ const.Vn2 it supplies a Gaussian type estimate, but if u ≫ Vn2, it supplies a bad esti- mate. Only very weak improvement is possible which does not help if u ≫ Vn2.

For normalized partial sums Sn(f) = 1

√n

n X

j=1

f(ξj), f ∈ F,

(11)

of i.i.d. random variables ξj, Ef(ξ1) = 0 and supx |f(x)| ≤ 1 for all f ∈ F the following Gaus- sian type estimate holds.

P(|Sn(f) − Sn(g)| ≥ u) ≤ 2eu2/100d22(f,g), if u ≤ 3d2(f, g)2

n

with d2(f, g)2 = R(f(x) −g(x))2µ( dx), where µ is the distribution of ξj.

The main problem in the study of the supre- mum of partial sums:

We have no good bound on the tail distribu- tion of P(|Sn(f) − Sn(g)| > u) if d2(f, g)2 =

R(f(x)−g(x))2µ( dx) is small and the level u in the probability is large. How to overcome this problem?

We have to impose some good conditions on the class of functions F.

(12)

Put T = F, and define on it the metrics d2(f, g) =

Z

(f − g)2µ( dx)

1/2

, f, g ∈ F, and

d(f, g) = sup

x |f(x) − g(x)|, f, g ∈ F.

Two approaches:

(a) Talagrand’s approach.

It exploits that if sup

x |f(x)| < c with a small number c > 0, then a good (Gaussian type) estimate holds in the Bernstein inequality in a larger interval. In this case we have a good tail estimate for u ≤ 1c · 3d22(f, g)2.

Put (similarly to γ2(T, d)) γα(F, d) = inf sup

t∈F

X n=0

2n/α∆(An(t)),

(13)

with arbitrary number α > 0 and metric d on F, where the infimum is taken for all admissible sequences of partitions An, n = 0,1,2, . . ., of F, and ∆(An(t)) is the diameter of An(t) with respect to the metric d.

Theorem A. (Talagrand’s estimate on the supremum of a class of partial sums). Let γ1(F, d) denote the quantity γα(F, d) with α = 1 and d(f, g) = d(f, g) in the set T = F. Then

E sup

f∈F

√1 n

n X j=1

f(ξj)

≤ L γ2(F, d2) + 1

√nγ1(F, d)

!

with an appropriate universal constant L > 0.

This result gives a good estimate on the supre- mum of (normalized) partial sums if both

γ2(F, d2) and γ1(F, d) are small. The idea of

(14)

the proof is to adapt the proof of the Gaus- sian counterpart to this case and to exploit that if we have a subclass of F with not too large cardinality which is dense also in the L norm then the Bernstein inequality with (ran- dom variables whose supremum is bounded by a small number) gives a sufficiently good esti- mate, and the chaining argument can be ap- plied.

An example when this result gives sharp esti- mate.

Let X1, . . . , Xn be a sequence of independent random variables, uniformly distributed on the square [0, 1] × [0,1], and let C be the class of Lipschitz 1 functions f(x) on [0,1]×[0,1] such that

Z

[0,1]×[0,1] f(x)dx = 0.

(15)

Then

E sup

f∈C

√1 n

n X l=1

(f(Xl)

≤ Lplog n

with a universal constant L.

This result is equivalent to a (famous) result of Ajtai–Koml´os–Tusn´ady.

The problem solved by Ajtai–Koml´os–Tusn´ady:

Take two independent sequences X1, . . . , Xn and Y1, . . . , Yn of independent random variables uni- formly distributed on the unit square [0,1] × [0,1]. Let us take such a (random) permuta- tion Yπ(1), . . . , Yπ(n) of the indices of random variables Y1, . . . , Yn for which Xj and Yπ(j) are close to each other for all indices j. More pre- cisely we want that

E

n X j=1

ρ(Xj, Yπ(j))

(16)

be as small as possible, where ρ(·,·) is the Euclidean metric.

Theorem of Ajtai–Koml´os–Tusn´ady E

n X j=1

ρ(Xj, Yπ(j))

pnlog n

for an appropriate permutation (π(1), . . . , π(n)) of the set {1, . . . , n} with a universal constant L, and this estimate is sharp.

Example when the above estimate of Tala- grand does not give a good estimate.

Let f(x1, . . . , xk), |f(x1, . . . , xk)| ≤ 1 be a func- tion on Rk, µ a probability measure on Rk. Take a nice class D = {D1, D2, . . .} of sets Dl ⊂ Rk, l = 1,2, . . ., let ¯fl be the restriction of f to Dl, i.e. let

l(x1, . . . , xk)

=

( f(x1, . . . , xk) if (x1, . . . , xk) ∈ Dl 0 if (x1, . . . , xk) ∈/ Dl

(17)

and

f(x1, . . . , xk) = ¯f(x1, . . . , xk)−

Z f¯(x1, . . . , xk) dµ.

Give a good bound on E sup

l

1 n

Pn

j=1flj).

In this case the quantity γ1(F, d) in Theo- rem A cannot be well bounded.

To get a good estimate in this case introduce the following notion.

Definition of L2-dense classes of functions.

Let a measurable space (X, X) be given to- gether with a set F of X measurable real val- ued functions on this space. F is called an L2-dense class of functions with parameter D and exponent L if for all numbers 1 ≥ ε > 0 and probability measure ν there exists a fi- nite ε-dense subset Fε = {f1, . . . , fm} ⊂ F in the space L2(X,X, ν) with m ≤ DεL elements

(18)

such that n inf

fj∈Fε

R |f − fj|2 dν < ε2 for all func- tions f ∈ F.

Then we have the following result.

Theorem B. (Estimate on the supremum of a class of partial sums). Let us con- sider a sequence of independent and identically distributed random variables ξ1, . . . , ξn, n ≥ 2, with values in a measurable space (X, X) and with some distribution µ. Beside this, let a countable and L2-dense class of functions F with some parameter D ≥ 1 and exponent L ≥ 1 be given on the space (X, X) which satisfies the conditions

kfk = sup

xX |f(x)| ≤ 1, for all f ∈ F kfk22 =

Z

f2(x)µ( dx) ≤ σ2 for all f ∈ F with some constant 0 < σ ≤ 1, and

Z

f(x)µ(dx) = 0 for all f ∈ F.

(19)

Define the normalized partial sums Sn(f) =

1n Pn k=1

f(ξk) for all f ∈ F.

There exist some universal constants C > 0, α > 0 and M > 0 such that the supremum of the normalized random sums Sn(f), f ∈ F, satisfies the inequality

P sup

f∈F |Sn(f)| ≥ u

!

≤ C exp

(

−α

u σ

2)

for those numbers u for which

√nσ2 ≥ u ≥ M σ(L3/4 log1/2 2

σ + (logD)3/4) with the parameter D and exponent L of the L2-dense class F.

Under the conditions of this theorem the supre- mum of partial sums is not much greater than its largest (worst) term.

If we take Vapnik–ˆCervonenkis class of sets D = {D1, D2, . . .} and define the above con- sidered functions fl, l = 1,2, . . ., then F =

(20)

{f1, f2, . . .} is an L2-dense class of functions, and Theorem B can be applied for it.

The proof of Theorem B is based on some symmetrization and conditioning argument. We apply the following results.

Symmetrization Lemma. Let us fix a count- able class of functions F on a measurable space (X, X) together with a real number 0 < σ < 1. Consider a sequence of independent and iden- tically distributed random variables ξ1, . . . , ξn with values in the space (X,X) such that

Ef(ξ1) = 0, Ef21) ≤ σ2 for all f ∈ F together with another sequence ε1, . . . , εn of independent random variables with distribution P(εj = 1) = P(εj = −1) = 12, 1 ≤ j ≤ n, inde- pendent also of the random sequence ξ1, . . . , ξn.

(21)

Then P

√1

n sup

f∈F

n X j=1

f(ξj)

≥ An1/2σ2

≤ 4P

√1

n sup

f∈F

n X j=1

εjf(ξj)

≥ A

3n1/2σ2

if A ≥ 3√

√ 2 nσ.

Theorem (Hoeffding’s inequality). Let ε1, . . . , εn be independent random variables, P(εj = 1) = P(εj = −1) = 12, 1 ≤ j ≤ n, and let a1, . . . , an be arbitrary real numbers. Put V =

Pn

j=1ajεj. Then P(V > u) ≤ exp

− u2

2Pnj=1 a2j

for all u > 0.

The Hoeffding inequality gives always a good Gaussian type estimate.

(22)

The symmetrization lemma enables us to re- place the estimation of

P

√1

n sup

f∈F

n X j=1

f(ξj)

≥ An1/2σ2

by the estimation of P

√1

n sup

f∈F

n X j=1

εjf(ξj)

≥ A

3n1/2σ2

.

This can be done by estimating the conditional probabilities

P

√1

n sup

f∈F

n X j=1

εjf(ξj)

≥ A

3n1/2σ2

1 = x1, . . . , ξn = xn)

= P

√1

n sup

f∈F

n X j=1

εjf(xj)

≥ A

3n1/2σ2

. The right-hand side can be bounded by the Hoeffding inequality.

(23)

On the basis of this observation a proof of Theorem B can be worked out.

The conjecture of Talagrand

Let a sequence of Bernoulli sums ηj = PN

l=1aj,lεl, j = 1, . . . , M, be given, where ε1,. . . , εN, P(ε1 = 1) = P(ε1 = −1) = 12 are independent random variables. Give a good estimate on

E sup

1jM ηj.

We would like to give both an upper and a lower bound in such a way that the proportion of these two estimates is less than a universal constant.

Put T = {1, . . . , M}, d22(i, j) = E(ηi−ηj)2 =

N X j=1

(ai,l−aj,l)2, i, j ∈ T,

(24)

and define with the help of this metric the quantity γ2(T, d2) and γ2(T1, d2) for all sets T1 ⊂ T. The Gaussian estimate hold for the E sup

jT1 ηj (Hoeffding inequality), hence E sup

jT1 ηj ≤ γ2(T1, d2) for all T1 ⊂ T.

On the other hand,

PN l=1

aj,lεl

PN

l=1|aj,l|. Put b(T2) = sup

jT2 PN

l=1|aj,l|. Then E sup

1jM ηj ≤ L inf

T1,T2T,

T1T2=T

2(T1, d) + b(T2)).

Talagrand’s conjecture: This estimate is sharp:

E sup

1jM ηj ≥ 1

L inf

T1,T2T,

T1T2=T

2(T1, d) + b(T2)).

(25)

My conjecture: Talagrand’s conjecture does not hold. Moreover, there is no good esti- mate (where the proportion of the upper and lower bound is less than a universal constant) by means of γ2(T1, d) and b(T2).

To prove the lower bound for E sup

jT ηj we need some estimates which say that if some random variables are far from each other in some sense, than their supremum is large.

In the Gaussian case Sudakov’s inequality holds.

Sudakov’s inequality. Let M Gaussian ran- dom variables ξ1, . . . , ξM be given, for which Eξj = 0, and E(ξj − ξk)2 ≥ a2 for all pairs 1 ≤ j < k ≤ M and some number a > 0. Then

E sup

1jM ξj ≥ a L

plogM with a universal number L > 0.

(26)

This inequality is sharp.

A version of it for Bernoulli sums:

Theorem. Let a set of Bernoulli sequences ηj = PN

l=1aj,lεl, j = 1, . . . , M, be given. Put a2j = PN

l=1a2j,l for all 1 ≤ j ≤ M, B = sup

1jM

aj and

C = sup

1jM sup

llN |aj,l|.

If |aj − aj| ≥ 14B for all 1 ≤ j, j ≤ M, and C ≤ L B

0

logM with a sufficiently large universal constant L0, then

E sup

1jM ηj ≥ B L

plog M .

This inequality is sharp only if the Bernoulli sequences are in the ‘Gaussian domain’.

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

The decision on which direction to take lies entirely on the researcher, though it may be strongly influenced by the other components of the research project, such as the

In this article, I discuss the need for curriculum changes in Finnish art education and how the new national cur- riculum for visual art education has tried to respond to

In the first piacé, nőt regression bút too much civilization was the major cause of Jefferson’s worries about America, and, in the second, it alsó accounted

The present paper analyses, on the one hand, the supply system of Dubai, that is its economy, army, police and social system, on the other hand, the system of international

This paper is concerned with wave equations defined in domains of R 2 with an invariable left boundary and a space-like right boundary which means the right endpoint is moving

M icheletti , Low energy solutions for the semiclassical limit of Schrödinger–Maxwell systems, in: Analysis and topology in nonlinear differential equations, Progr..

In particular, intersection theorems concerning finite sets were the main tool in proving exponential lower bounds for the chromatic number of R n and disproving Borsuk’s conjecture

Note that this equation is not a typical eigenvalue problem since it has an inhomogeneous character (in the sense that if u is a nontrivial solution of the equation then tu fails to