JJ II

(1)

volume 2, issue 1, article 11, 2001.

Received 3 November, 2000;

accepted 11 January 2001.

Communicated by:A. Lupa¸s

Abstract Contents

JJ II

J I

Home Page Go Back

Close Quit

Journal of Inequalities in Pure and Applied Mathematics

ON SOME APPLICATIONS OF THE AG INEQUALITY IN INFORMA- TION THEORY

BERTRAM MOND AND JOSIP E. PE ˇCARI ´C

School of Mathematics And Statistics La Trobe University, Bundoora, Victoria, 3083 AUSTRALIA.

EMail:B.Mond@latrobe.edu.au

URL:http://www.latrobe.edu.au/www/mathstats/Staff/mond.html Department of Mathematics

Faculty of Textile Technology University of Zagreb, CROATIA.

EMail:pecaric@mahazu.hazu.hr

URL:http://mahazu.hazu.hr/DepMPCS/indexJP.html

c

2000School of Communications and Informatics,Victoria University of Technology ISSN (electronic): 1443-5756

042-00

(2)

On Some Applications of the AG Inequality in Information Theory

Bertram Mondand Josip E. Peˇcari´c

Title Page Contents

JJ II

J I

Go Back Close

Quit Page2of19

J. Ineq. Pure and Appl. Math. 1(1) Art. 6, 2000

http://jipam.vu.edu.au

Abstract

Recently, S.S. Dragomir used the concavity property of the log mapping and the weighted arithmetic mean-geometric mean inequality to develop new inequalities that were then applied to Information Theory. Here we extend these inequalities and their applications.

2000 Mathematics Subject Classification:26D15.

Key words: Arithmetic-Geometric Mean, Kullback-Leibler Distances, Shannon’s En- tropy.

1. Introduction

One of the most important inequalities is the arithmetic-geometric means inequality:

Leta_i, p_i, i= 1, . . . , nbe positive numbers,P_n=Pn

i=1p_i. Then (1.1)

n

Y

i=1

a^p_iⁱ^/Pⁿ ≤ 1 P_n

n

X

i=1

p_ia_i,

with equality iffa₁ =· · ·=a_n.

It is well-known that using (1.1) we can prove the following generalization of another well-known inequality, that is Hölder’s inequality:

Letp_ij, q_i (i = 1, . . . , m; j = 1, . . . , n)be positive numbers with Q_m = Pm

i=1qi. Then (1.2)

n

X

j=1 m

Y

i=1

(p_ij)^Qm^qi ≤

m

Y

i=1 n

X

j=1

p_ij

!_Qm^qi .

In this note, we show that using (1.1) we can improve some recent results which have applications in information theory.

(4)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page4of19

2. An Inequality of I.A. Abou-Tair and W.T. Su- laiman

The main result from [1] is:

Letp_ij, q_i (i= 1, . . . , m; j = 1, . . . , n)be positive numbers. Then (2.1)

n

X

j=1 m

Y

i=1

(p_ij)^Qm^qi ≤ 1 Q_m

m

X

i=1 n

X

j=1

p_ijq_i.

Moreover, set in (1.1),n=m, p_i =q_i, a_i =Pn

j=1p_ij. We now have (2.2)

m

Y

i=1 n

X

j=1

p_ij

!_Qm^qi

≤ 1 Q_m

m

X

i=1 n

X

j=1

p_ijq_i

! .

Now (1.2) and (2.2) give (2.3)

n

X

j=1 m

Y

i=1

(p_ij)^Qm^qi ≤

m

Y

i=1 n

X

j=1

p_ij

!_Qm^qi

≤ 1 Q_m

m

X

i=1 n

X

j=1

p_ijq_i.

which is an interpolation of (2.1). Moreover, the generalized Hölder inequality was obtained in [1] as a consequence of (2.1). This is not surprising since (2.1), forn= 1, becomes

m

Y

i=1

(p_i1)^Qm^qi ≤ 1 Qm

m

X

i=1

p_i1q_i

(5)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page5of19

which is, in fact, the A-G inequality (1.1) (setm = n, p_i1 = a_i and q_i = p_i).

Theorem 3.1 in [1] is the well-known Shannon inequality:

Given Pn

i=1a_i =a, Pn

i=1b_i =b. Then

alna b

≤

n

X

i=1

a_iln ai

b_i

; a_i, b_i >0.

It was obtained from (2.1) through the special case (2.4)

n

Y

i=1

b_i a_i

^ai_a

≤ b a.

Let us note that (2.4) is again a direct consequence of the A-G inequality. In- deed, in (1.1), setting a_i → b_i/a_i, p_i → a_i, i = 1, . . . , n we have (2.4).

Theorem 3.2 from [1] is Rényi’s inequality. Given Pm

i=1a_i =a, Pm

i=1b_i =b, then forα >0, α6= 1,

1

α−1(a^αb^1−α−a)≤

m

X

i=1

1

α−1 a^α_ib^1−α_i −a_i

; a_i, b_i ≥0.

In fact, in the proof given in [1], it was proved that Hölder’s inequality is a consequence of (2.1). As we have noted, Hölder’s inequality is also a consequence of the A-G inequality.

(6)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page6of19

3. On Some Inequalities of S.S. Dragomir

The following theorems were proved in [2]:

Theorem 3.1. Letai ∈(0,1)andbi >0(i= 1, . . . , n). Ifpi >0 (i= 1, . . . , n) is such thatPn

i=1p_i = 1, then exp

" _n X

i=1

p_ia²_i bi

−

n

X

i=1

p_ia_i

#

≥ exp

" _n X

i=1

p_i a_i

bi

ai

−1

# (3.1)

≥

n

Y

i=1

a_i bi

aipi

≥ exp

"

1−

n

X

i=1

p_i p_i

a_i ai#

≥ exp

" _n X

i=1

p_ia_i−

n

X

i=1

p_ib_i

#

with equality iffa_i =b_ifor alli∈ {1, . . . , n}.

Theorem 3.2. Let a_i ∈ (0,1) (i = 1, . . . , n)and b_j > 0 (j = 1, . . . , m). If p_i > 0 (i = 1, . . . , n)is such thatPn

i=1p_i = 1andq_j > 0 (j = 1, . . . , m)is such thatPm

j=1q_j = 1, then we have the inequality

(3.2) exp

n

X

i=1

p_ia²_i

m

X

j=1

q_j bj

−

n

X

i=1

p_ia_i

!

≥ exp

" _n X

i=1 m

X

j=1

p_iq_j a_i

bj

ai

−1

#

(7)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page7of19

≥

n

Q

i=1

a^a_iⁱ^pⁱ

m

Q

j=1

b^q_j^j^Pⁿ_i=1piai

≥ exp

"

1−

n

X

i=1 m

X

j=1

p_iq_j b_j

a_i ai#

≥ exp

n

X

i=1

p_ia_i−

m

X

j=1

q_jb_j

! .

The equality holds in (3.2) iffa₁ =· · ·=a_n =b₁ =· · ·=b_m.

First we give an improvement of the second and third inequality in (3.1).

Theorem 3.3. Let a_i, b_i and p_i (i = 1, . . . , n) be positive real numbers with Pn

i=1p_i = 1. Then exp

p_i

a_i bi

ai

−1 ⁻¹

≥

n

X

i=1

p_i a_i

bi

ai

(3.3)

≥

n

Y

i=1

a_i bi

piai

≥

" _n X

i=1

p_i b_i

a_i

ai#−1

≥exp

"

1−

n

X

i=1

p_i b_i

a_i ai#

,

(8)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page8of19

with equality iffa_i =b_i, i= 1, . . . , n.

Proof. The first inequality in (3.3) is a simple consequence of the following well-known elementary inequality

(3.4) e^x−1 ≥x, for allx∈R

with equality iffx = 1. The second inequality is a simple consequence of the A-G inequality that is, in (1.1), set a_i → (a_i/b_i)^aⁱ, i = 1, . . . , n. The third inequality is again a consequence of (1.1). Namely, fora_i → (b_i/a_i)^aⁱ, i = 1, . . . , n, (1.1) becomes

n

Y

i=1

b_i ai

aipi

≤

n

X

i=1

p_i b_i

ai

which is equivalent to the third inequality. The last inequality is again a consequence of (3.4).

Theorem 3.4. Letai ∈(0,1)andbi >0 (i= 1, . . . , n). Ifpi >0, i= 1, . . . , n is such that Pn

i=1p_i = 1, then exp

" _n X

i=1

p_i a²_i

b_i

−

n

X

i=1

p_ia_i

#

≥ exp

" _n X

i=1

p_i ai

b_i ai

−1

# (3.5)

≥

n

X

i=1

pi

a_i b_i

ai

≥

n

Y

i=1

a_i b_i

piai

(9)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page9of19

≥

" _n X

i=1

p_i b_i

ai

ai#−1

≥ exp

"

1−

n

X

i=1

p_i b_i

ai

ai#

≥ exp

" _n X

i=1

p_ia_i−

n

X

i=1

p_ib_i

#

with equality iffai =bifor alli= 1, . . . , n.

Proof. The theorem follows from Theorems3.1and3.3.

Theorem 3.5. Let a_i, p_i (i = 1, . . . , n); b_j, q_j (j = 1, . . . , m) be positive numbers with

Pn

i=1pi =Pm

j=1qj = 1. Then

exp

" _n X

i=1 m

X

j=1

p_iq_j a_i

b_j ai

−1

#

≥

n

X

i=1 m

X

j=1

p_iq_j a_i

b_j ai

(3.6)

≥

n

Q

i=1

a^a_iⁱ^pⁱ

m

Q

j=1

≥

" _n X

i=1 m

X

j=1

p_iq_j b_j

a_i

ai#−1

(10)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page10of19

≥exp

"

1−

n

X

i=1 m

X

j=1

p_iq_j b_j

ai

ai#−1

.

Equality in (3.6) holds iffa1 =· · ·=an=b1 =· · ·=bm.

Proof. The first and the last inequalities are simple consequences of (3.4). The second is also a simple consequence of the A-G inequality. Namely, we have

n

Q

i=1

a^a_iⁱ^pⁱ

m

Q

j=1

=

n

Y

i=1 m

Y

j=1

a_i b_j

aipiqj

≤

n

X

i=1 m

X

j=1

p_iq_j a_i

b_j ai

,

which is the second inequality in (3.6). By the A-G inequality, we have

n

Y

i=1 m

Y

j=1

b_j a_i

aipiqj

≤

n

X

i=1 m

X

j=1

p_iq_j b_j

a_i ai

which gives the third inequality in (3.6).

Theorem 3.6. Let the assumptions of Theorem3.2be satisfied. Then

exp

" _n X

i=1

p_ia²_i

m

X

j=1

q_j b_j

−

n

X

i=1

p_ia_i

# (3.7)

≥exp

" _n X

i=1 m

X

j=1

p_iq_j ai

b_j ai

−1

#

(11)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page11of19

≥

n

X

i=1 m

X

j=1

p_iq_j a_i

bj

ai

≥

n

Q

i=1

a^a_iⁱ^pⁱ

m

Q

j=1

≥

" _n X

i=1 m

X

j=1

p_iq_j b_j

a_i

ai#−1

≥exp

"

1−

n

X

i=1 m

X

j=1

p_iq_j b_j

ai

ai#

≥exp

" _n X

i=1

piai−

m

X

j=1

qjbj

# .

Equality holds in (3.7) iffa₁ =· · ·=a_n =b₁ =· · ·=b_m.

Proof. The theorem is a simple consequence of Theorems3.2and3.5.

(12)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page12of19

4. Some Inequalities for Distance Functions

In 1951, Kullback and Leibler introduced the following distance function in Information Theory (see [4] or [5])

(4.1) KL(p, q) :=

n

X

i=1

p_ilog p_i qi

,

provided thatp, q ∈ Rⁿ++ := {x = (x₁, . . . , x_n)∈Rⁿ, x_i > 0, i = 1, . . . , n}.

Another useful distance function is theχ²-distance given by (see [5])

(4.2) D_χ²(p, q) :=

n

X

i=1

p²_i −q_i² q_i ,

wherep, q ∈Rⁿ++. S.S. Dragomir [2] introduced the following two new distance functions

(4.3) P₂(p, q) :=

n

X

i=1

p_i q_i

pi

−1

and

(4.4) P₁(p, q) :=

n

X

i=1

− q_i

p_i pi

+ 1

,

provided p, q ∈ Rⁿ++. The following inequality connecting all the above four distance functions holds.

(13)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page13of19

Theorem 4.1. Letp, q ∈Rⁿ++withp_i ∈(0,1). Then we have the inequality:

D_χ²(p, q) +Q_n−P_n ≥ P₂(p, q) (4.5)

≥ nln 1

n

P₂(p, q) + 1

≥ KL(p, q)

≥ −nln

− 1

n

P1(p, q) + 1

≥ P₁(p, q)

≥ P_n−Q_n, where P_n = Pn

i=1p_i = 1, Q_n = Pn

i=1q_i. Equality holds in (4.5) iff p_i = q_i (i= 1, . . . , n).

Proof. Set in (3.5), p_i = 1/n, a_i = p_i, b_i = q_i (i = 1, . . . , n) and take logarithms. After multiplication byn, we get (4.5).

Corollary 4.2. Letp, q be probability distributions. Then we have D_χ²(p, q) ≥ P₂(p, q)

(4.6)

≥ nln 1

n

P₂(p, q) + 1

≥ KL(p, q)

≥ −nln

1− 1

n

P₁(p, q)

≥ P1(p, q)≥0.

(14)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page14of19

Equality holds in (4.6) iffp=q.

Remark 4.1. Inequalities (4.5) and (4.6) are improvements of related results in [2].

(15)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page15of19

5. Applications for Shannon’s Entropy

The entropy of a random variable is a measure of the uncertainty of the random variable, it is a measure of the amount of information required on the average to describe the random variable. Letp(x), x ∈χbe a probability mass function.

Define the Shannon’s entropyf of a random variableX having the probability distributionpby

(5.1) H(X) :=X

x∈χ

p(x) log 1 p(x).

In the above definition we use the convention (based on continuity argu- ments) that0 log

0 q

= 0andplog ^p₀

=∞. Now assume that|χ|(card(χ) =

|χ|)is finite and let u(x) = _|χ|¹ be the uniform probability mass function inχ.

It is well known that [5, p. 27]

(5.2) KL(p, q) = X

x∈χ

p(x) log

p(x) q(x)

= log|χ| −H(X).

The following result is important in Information Theory [5, p. 27]:

Theorem 5.1. LetX, pandχbe as above. Then

(5.3) H(X)≤log|χ|,

with equality if and only ifXhas a uniform distribution overχ.

(16)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page16of19

In what follows, by the use of Corollary 4.2, we are able to point out the following estimate for the differencelog|χ| −H(X), that is, we shall give the following improvement of Theorem 9 from [2]:

Theorem 5.2. LetX, pandχbe as above. Then

|χ|E(X)−1≥X

x∈χ

|χ|^p(x)[p(x)]^p(x)−1 (5.4)

≥ |χ|ln ( 1

|χ|

X

x∈χ

[|χ|^p(x)[p(x)]^p(x) )

≥ln|χ| −H(X)

≥ −|x|ln ( 1

|χ|

X

x∈χ

|χ|^−p(x)[p(x)]^−p(x) )

≥X

x∈χ

|χ|^−p(x)[p(x)]^−p(x)−1

≥0,

whereE(X)is the informational energy ofX, i.e.,E(X) :=P

x∈χp²(x). The equality holds in (5.4) iffp(x) = _|χ|¹ for allx∈χ.

Proof. The proof is obvious by Corollary4.2by choosingu(x) = _|χ|¹ .

(17)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page17of19

6. Applications for Mutual Information

We consider mutual information, which is a measure of the amount of informa- tion that one random variable contains about another random variable. It is the reduction of uncertainty of one random variable due to the knowledge of the other [6, p. 18].

To be more precise, consider two random variables X and Y with a joint probability mass functionr(x, y)and marginal probability mass functionsp(x) and q(y), x ∈ X, y ∈ Y. The mutual information is the relative entropy between the joint distribution and the product distribution, that is,

I(X;Y) = X

x∈χ,y∈Y

r(x, y) log

r(x, y) p(x)q(y)

=D(r, pq).

The following result is well known [6, p. 27].

Theorem 6.1. (Non-negativity of mutual information). For any two random variablesX, Y

(6.1) I(X, Y)≥0,

with equality iffX andY are independent.

In what follows, by the use of Corollary4.2, we are able to point out the following estimate for the mutual information, that is, the following improvement of Theorem 11 of [2]:

(18)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page18of19

Theorem 6.2. LetX andY be as above. Then we have the inequality

X

x∈χ

X

y∈Y

r²(x, y)

p(x)q(y) −1≥X

x∈χ

X

y∈Y

"

r(x, y) p(x)q(y)

r(x,y)

−1

#

≥ |χ| |Y|ln

"

1

|χ||Y|

X

x∈χ

X

y∈Y

r(x, y) p(x)q(y)

r(x,y)#

≥I(X, Y)

≥ −|χ| |Y|ln ( 1

|χ||Y|

X

x∈χ

X

y∈Y

p(x)q(y) r(x, y)

r(x,y)

≥X

x∈χ

X

y∈Y

"

1−

r(x, y) p(x)q(y)

r(x,y)#

≥0.

The equality holds in all inequalities iffXandY are independent.

(19)

Title Page Contents

JJ II

J I

Go Back Close

Quit Page19of19

References

[1] I.A. ABOU-TAIR AND W.T. SULAIMAN, Inequalities via convex func- tions, Internat. J. Math. and Math. Sci., 22 (1999), 543–546.

[2] S.S. DRAGOMIR, An inequality for logarithmic mapping and applications for the relative entropy, RGMIA Res. Rep. Coll., 3(2) (2000), Article 1.

[ONLINE]http://rgmia.vu.edu.au/v3n2.html

[3] S. KULLBACKANDR.A. LEIBLER, On information and sufficiency, An- nals Maths. Statist., 22 (1951), 79–86.

[4] S. KULLBACK, Information and Statistics, J. Wiley, New York, 1959.

[5] A. BEN-TAL, A. BEN-ISRAEL AND M. TEBOULLE, Certainty equiva- lents and information measures: duality and extremal, J. Math. Anal. Appl., 157 (1991), 211–236.

[6] T.M. COVER AND J.A.THOMAS, Elements of Information Theory, John Wiley and Sons, Inc., 1991.