“Broadening the knowledge base and supporting the long term professional sustainability of the Research University Centre of Excellence
at the University of Szeged by ensuring the rising generation of excellent scientists.””
Doctoral School of Mathematics and Computer Science
Stochastic Days in Szeged 27.07.2012.
On the supremum of partial sums of independent random variables
Péter Major
(Alfréd Rényi Institute of Mathematics)
TÁMOP‐4.2.2/B‐10/1‐2010‐0012 project
On the supremum of partial sums of independent random variables
P´eter Major
Mathematical Institute of the Hungarian Academy of Sciences.
The original problem:
Let ξ1, . . . , ξn be a sequence of i.i.d. random variables with some distribution µ on a space (X, X).
Let us have a nice class of functions F on the space (X, X) such that R f(x)µ(dx) = 0 for all f ∈ F.
Give a good estimate on the tail distribution P
sup
f∈F
√1 n
n X j=1
f(ξj) > x
for all x > 0.
(The actual problem was its multivariate ver- sion about the tail distribution of the supre- mum of (degenerated) U-statistics.)
The main questions discussed in this talk:
(a) What is the natural Gaussian counterpart of this problem, and what kind of result does it suggest?
(b) When does the similarity with the Gaussian counterpart stop to exist, and what can be told in that case?
(c) A conjecture of Talagrand, he called the Bernoulli conjecture.
A problem similar to the tail distribution of the supremum of partial sums investigated by
Michel Talagrand: Give a good upper bound on
E sup
f∈F
√1 n
n X j=1
f(ξj).
There is a concentration inequality which says that
P
sup
f∈F
√1 n
n X j=1
f(ξj) − E sup
f∈F
√1 n
n X j=1
f(ξj) > x
is small. Hence the two problems are equiv- alent. Moreover, Talagrand’s estimate on the expectation of the supremum contains an im- plicit estimate on its tail distribution.
First we consider the analogous Gaussian prob- lem.
Let ηt, Eηt = 0, t ∈ T, be a countable set of (jointly) Gaussian random variables. Put
d2(s, t) = hE(ηs − ηt)2i1/2, s, t ∈ T. Then d2(s, t) is a metric on the parameter set T.
The problem: Give a good estimate on E sup
t∈T ηt with the help of the function d2(s, t).
Results with the help of a classical and natural method, the the chaining argument.
A classical estimate due to R. M. Dudley.
Define
e(n) = min
Tn⊂T,cardTn≤22n{α: for all t ∈ T there is some ¯t ∈ Tn such that d2(t,¯t) ≤ α} Theorem of Dudley:
E sup
t∈T ηt ≤
X∞ n=0
2n/2e(n).
The content of the notion e(n). Place uni- formly 22n points on the set T in such a way that all points t ∈ T are close to some point of this set. The number e(n) is the smallest α for which all points are closer to this set than α.
(We look for a dense net of T consisting of 22n points.)
The idea of the proof:
Fix some u > 0, and sets Tn ⊂ T, card Tn ≤ 22n, n = 1,2, . . ., ∞S
n=1Tn = T, and put Q(u) = P
sup
t∈T ηt > u
X∞ n=0
2n/2e(n)
and
QN(u) = P
sup
t∈T1∪···∪TN ηt > u
N−1 X n=0
2n/2e(n)
, N = 0,1, 2, . . . .
We can estimate well Q(u) for all u ≥ 2, by giving a good bound on QN(u)−QN−1(u) for all N = 1,2, . . .. Here we exploit that for all t ∈ TN there is some ¯t ∈ TN−1 which is close to it. We get a good estimate on the tail distribution of sup
t∈T ηt which shows that the main contribution to the expectation we want to bound comes from the event sup
t∈T ηt ≤ 2 P∞
n=0
2n/2e(n).
Talagrand found a sharper estimate by intro- ducing the right quantity γ2(T, d) needed in the study of this problem.
To define it, let us first introduce the diameter
∆(A) = sup
s,t∈Ad(s, t) of a set A ⊂ T in a metric space (T, d), and the notion of
Admissible sequence of partitions. A se- quence of refining partitions A0 ⊂ A1 ⊂ A2 ⊂
· · · in the parameter set T is an admissible se- quence of partitions if card A0 = 1, and
cardAn ≤ 22n, n = 1,2, . . ..
Given an admissible partition A0 ⊂ A1 ⊂ A2 ⊂
· · · and a point t ∈ T let An(t) be that element B of the partition An for which t ∈ B.
Given a countable parameter set T with a met- ric d we define
γ2(T, d) = inf sup
t∈T
X∞ n=0
2n/2∆(An(t)),
where the infimum is taken for all admisssible sequences of partitions of T.
The estimate of Talagrand.
Let ηt, Eηt = 0, t ∈ T, be a sequence of Gaus- sian random variables, with the metric d2(s, t) = [E(ηt − ηs)2]1/2 on T. Then
E sup
t∈T ηt ≤ Lγ2(T, d2).
Moreover 1
Lγ2(T, d2) ≤ E sup
t∈T ηt ≤ Lγ2(T, d2) with a universal constant L.
The same upper bound holds for the supremum of random variables Ut, t ∈ T, with d(s, t)2 = E(Us−Ut)2 if their tail distribution satisfies the inequality
P(|Us − Ut| > u) ≤ e−Cu2/d(s,t)2
for all s, t ∈ T and u > 0 with a universal con- stant C.
Here we demanded a Gaussian type tail be- haviour.
What can be told about the tail distribution of sums of independent random variables?
A classical result, Bernstein’s inequality says:
Bernstein’s inequality. Let ξ1, . . . , ξn be inde- pendent random variables,
P(|ξj| ≤ 1) = 1, and Eξj = 0, 1 ≤ j ≤ n.
Put σj2 = Eξj2, 1 ≤ j ≤ n, Sn = Pn
j=1ξj and VarSn = Vn2 = Pn
j=1σj2. Then
P(Sn > u) ≤ exp
− u2 2Vn2
1 + u
3Vn2
for all numbers u > 0.
If u ≤ const.Vn2 it supplies a Gaussian type estimate, but if u ≫ Vn2, it supplies a bad esti- mate. Only very weak improvement is possible which does not help if u ≫ Vn2.
For normalized partial sums Sn(f) = 1
√n
n X
j=1
f(ξj), f ∈ F,
of i.i.d. random variables ξj, Ef(ξ1) = 0 and supx |f(x)| ≤ 1 for all f ∈ F the following Gaus- sian type estimate holds.
P(|Sn(f) − Sn(g)| ≥ u) ≤ 2e−u2/100d22(f,g), if u ≤ 3d2(f, g)2√
n
with d2(f, g)2 = R(f(x) −g(x))2µ( dx), where µ is the distribution of ξj.
The main problem in the study of the supre- mum of partial sums:
We have no good bound on the tail distribu- tion of P(|Sn(f) − Sn(g)| > u) if d2(f, g)2 =
R(f(x)−g(x))2µ( dx) is small and the level u in the probability is large. How to overcome this problem?
We have to impose some good conditions on the class of functions F.
Put T = F, and define on it the metrics d2(f, g) =
Z
(f − g)2µ( dx)
1/2
, f, g ∈ F, and
d∞(f, g) = sup
x |f(x) − g(x)|, f, g ∈ F.
Two approaches:
(a) Talagrand’s approach.
It exploits that if sup
x |f(x)| < c with a small number c > 0, then a good (Gaussian type) estimate holds in the Bernstein inequality in a larger interval. In this case we have a good tail estimate for u ≤ 1c · 3d22(f, g)2.
Put (similarly to γ2(T, d)) γα(F, d) = inf sup
t∈F
X∞ n=0
2n/α∆(An(t)),
with arbitrary number α > 0 and metric d on F, where the infimum is taken for all admissible sequences of partitions An, n = 0,1,2, . . ., of F, and ∆(An(t)) is the diameter of An(t) with respect to the metric d.
Theorem A. (Talagrand’s estimate on the supremum of a class of partial sums). Let γ1(F, d∞) denote the quantity γα(F, d) with α = 1 and d(f, g) = d∞(f, g) in the set T = F. Then
E sup
f∈F
√1 n
n X j=1
f(ξj)
≤ L γ2(F, d2) + 1
√nγ1(F, d∞)
!
with an appropriate universal constant L > 0.
This result gives a good estimate on the supre- mum of (normalized) partial sums if both
γ2(F, d2) and γ1(F, d∞) are small. The idea of
the proof is to adapt the proof of the Gaus- sian counterpart to this case and to exploit that if we have a subclass of F with not too large cardinality which is dense also in the L∞ norm then the Bernstein inequality with (ran- dom variables whose supremum is bounded by a small number) gives a sufficiently good esti- mate, and the chaining argument can be ap- plied.
An example when this result gives sharp esti- mate.
Let X1, . . . , Xn be a sequence of independent random variables, uniformly distributed on the square [0, 1] × [0,1], and let C be the class of Lipschitz 1 functions f(x) on [0,1]×[0,1] such that
Z
[0,1]×[0,1] f(x)dx = 0.
Then
E sup
f∈C
√1 n
n X l=1
(f(Xl)
≤ Lplog n
with a universal constant L.
This result is equivalent to a (famous) result of Ajtai–Koml´os–Tusn´ady.
The problem solved by Ajtai–Koml´os–Tusn´ady:
Take two independent sequences X1, . . . , Xn and Y1, . . . , Yn of independent random variables uni- formly distributed on the unit square [0,1] × [0,1]. Let us take such a (random) permuta- tion Yπ(1), . . . , Yπ(n) of the indices of random variables Y1, . . . , Yn for which Xj and Yπ(j) are close to each other for all indices j. More pre- cisely we want that
E
n X j=1
ρ(Xj, Yπ(j))
be as small as possible, where ρ(·,·) is the Euclidean metric.
Theorem of Ajtai–Koml´os–Tusn´ady E
n X j=1
ρ(Xj, Yπ(j))
≤ pnlog n
for an appropriate permutation (π(1), . . . , π(n)) of the set {1, . . . , n} with a universal constant L, and this estimate is sharp.
Example when the above estimate of Tala- grand does not give a good estimate.
Let f(x1, . . . , xk), |f(x1, . . . , xk)| ≤ 1 be a func- tion on Rk, µ a probability measure on Rk. Take a nice class D = {D1, D2, . . .} of sets Dl ⊂ Rk, l = 1,2, . . ., let ¯fl be the restriction of f to Dl, i.e. let
f¯l(x1, . . . , xk)
=
( f(x1, . . . , xk) if (x1, . . . , xk) ∈ Dl 0 if (x1, . . . , xk) ∈/ Dl
and
f(x1, . . . , xk) = ¯f(x1, . . . , xk)−
Z f¯(x1, . . . , xk) dµ.
Give a good bound on E sup
l
√1 n
Pn
j=1fl(ξj).
In this case the quantity γ1(F, d∞) in Theo- rem A cannot be well bounded.
To get a good estimate in this case introduce the following notion.
Definition of L2-dense classes of functions.
Let a measurable space (X, X) be given to- gether with a set F of X measurable real val- ued functions on this space. F is called an L2-dense class of functions with parameter D and exponent L if for all numbers 1 ≥ ε > 0 and probability measure ν there exists a fi- nite ε-dense subset Fε = {f1, . . . , fm} ⊂ F in the space L2(X,X, ν) with m ≤ Dε−L elements
such that n inf
fj∈Fε
R |f − fj|2 dν < ε2 for all func- tions f ∈ F.
Then we have the following result.
Theorem B. (Estimate on the supremum of a class of partial sums). Let us con- sider a sequence of independent and identically distributed random variables ξ1, . . . , ξn, n ≥ 2, with values in a measurable space (X, X) and with some distribution µ. Beside this, let a countable and L2-dense class of functions F with some parameter D ≥ 1 and exponent L ≥ 1 be given on the space (X, X) which satisfies the conditions
kfk∞ = sup
x∈X |f(x)| ≤ 1, for all f ∈ F kfk22 =
Z
f2(x)µ( dx) ≤ σ2 for all f ∈ F with some constant 0 < σ ≤ 1, and
Z
f(x)µ(dx) = 0 for all f ∈ F.
Define the normalized partial sums Sn(f) =
√1n Pn k=1
f(ξk) for all f ∈ F.
There exist some universal constants C > 0, α > 0 and M > 0 such that the supremum of the normalized random sums Sn(f), f ∈ F, satisfies the inequality
P sup
f∈F |Sn(f)| ≥ u
!
≤ C exp
(
−α
u σ
2)
for those numbers u for which
√nσ2 ≥ u ≥ M σ(L3/4 log1/2 2
σ + (logD)3/4) with the parameter D and exponent L of the L2-dense class F.
Under the conditions of this theorem the supre- mum of partial sums is not much greater than its largest (worst) term.
If we take Vapnik–ˆCervonenkis class of sets D = {D1, D2, . . .} and define the above con- sidered functions fl, l = 1,2, . . ., then F =
{f1, f2, . . .} is an L2-dense class of functions, and Theorem B can be applied for it.
The proof of Theorem B is based on some symmetrization and conditioning argument. We apply the following results.
Symmetrization Lemma. Let us fix a count- able class of functions F on a measurable space (X, X) together with a real number 0 < σ < 1. Consider a sequence of independent and iden- tically distributed random variables ξ1, . . . , ξn with values in the space (X,X) such that
Ef(ξ1) = 0, Ef2(ξ1) ≤ σ2 for all f ∈ F together with another sequence ε1, . . . , εn of independent random variables with distribution P(εj = 1) = P(εj = −1) = 12, 1 ≤ j ≤ n, inde- pendent also of the random sequence ξ1, . . . , ξn.
Then P
√1
n sup
f∈F
n X j=1
f(ξj)
≥ An1/2σ2
≤ 4P
√1
n sup
f∈F
n X j=1
εjf(ξj)
≥ A
3n1/2σ2
if A ≥ 3√
√ 2 nσ.
Theorem (Hoeffding’s inequality). Let ε1, . . . , εn be independent random variables, P(εj = 1) = P(εj = −1) = 12, 1 ≤ j ≤ n, and let a1, . . . , an be arbitrary real numbers. Put V =
Pn
j=1ajεj. Then P(V > u) ≤ exp
− u2
2Pnj=1 a2j
for all u > 0.
The Hoeffding inequality gives always a good Gaussian type estimate.
The symmetrization lemma enables us to re- place the estimation of
P
√1
n sup
f∈F
n X j=1
f(ξj)
≥ An1/2σ2
by the estimation of P
√1
n sup
f∈F
n X j=1
εjf(ξj)
≥ A
3n1/2σ2
.
This can be done by estimating the conditional probabilities
P
√1
n sup
f∈F
n X j=1
εjf(ξj)
≥ A
3n1/2σ2
|ξ1 = x1, . . . , ξn = xn)
= P
√1
n sup
f∈F
n X j=1
εjf(xj)
≥ A
3n1/2σ2
. The right-hand side can be bounded by the Hoeffding inequality.
On the basis of this observation a proof of Theorem B can be worked out.
The conjecture of Talagrand
Let a sequence of Bernoulli sums ηj = PN
l=1aj,lεl, j = 1, . . . , M, be given, where ε1,. . . , εN, P(ε1 = 1) = P(ε1 = −1) = 12 are independent random variables. Give a good estimate on
E sup
1≤j≤M ηj.
We would like to give both an upper and a lower bound in such a way that the proportion of these two estimates is less than a universal constant.
Put T = {1, . . . , M}, d22(i, j) = E(ηi−ηj)2 =
N X j=1
(ai,l−aj,l)2, i, j ∈ T,
and define with the help of this metric the quantity γ2(T, d2) and γ2(T1, d2) for all sets T1 ⊂ T. The Gaussian estimate hold for the E sup
j∈T1 ηj (Hoeffding inequality), hence E sup
j∈T1 ηj ≤ γ2(T1, d2) for all T1 ⊂ T.
On the other hand,
PN l=1
aj,lεl
≤ PN
l=1|aj,l|. Put b(T2) = sup
j∈T2 PN
l=1|aj,l|. Then E sup
1≤j≤M ηj ≤ L inf
T1,T2⊂T,
T1∪T2=T
(γ2(T1, d) + b(T2)).
Talagrand’s conjecture: This estimate is sharp:
E sup
1≤j≤M ηj ≥ 1
L inf
T1,T2⊂T,
T1∪T2=T
(γ2(T1, d) + b(T2)).
My conjecture: Talagrand’s conjecture does not hold. Moreover, there is no good esti- mate (where the proportion of the upper and lower bound is less than a universal constant) by means of γ2(T1, d) and b(T2).
To prove the lower bound for E sup
j∈T ηj we need some estimates which say that if some random variables are far from each other in some sense, than their supremum is large.
In the Gaussian case Sudakov’s inequality holds.
Sudakov’s inequality. Let M Gaussian ran- dom variables ξ1, . . . , ξM be given, for which Eξj = 0, and E(ξj − ξk)2 ≥ a2 for all pairs 1 ≤ j < k ≤ M and some number a > 0. Then
E sup
1≤j≤M ξj ≥ a L
plogM with a universal number L > 0.
This inequality is sharp.
A version of it for Bernoulli sums:
Theorem. Let a set of Bernoulli sequences ηj = PN
l=1aj,lεl, j = 1, . . . , M, be given. Put a2j = PN
l=1a2j,l for all 1 ≤ j ≤ M, B = sup
1≤j≤M
aj and
C = sup
1≤j≤M sup
l≤l≤N |aj,l|.
If |aj − aj′| ≥ 14B for all 1 ≤ j, j′ ≤ M, and C ≤ L B
0√
logM with a sufficiently large universal constant L0, then
E sup
1≤j≤M ηj ≥ B L
plog M .
This inequality is sharp only if the Bernoulli sequences are in the ‘Gaussian domain’.