f -divergencefunctionalwithapplicationstotheZipf-Mandelbrotlaw Improvementsoftheinequalitiesforthe 7

(1)

Chapter 7 Improvements of the inequalities for the

f -divergence functional with applications to the

Zipf-Mandelbrot law

Saad Ihsan Butt, László Horváth, Dilda Peˇcarić and Josip Peˇcarić Abstract. The Jensen’s inequality plays a crucial role to obtain inequalities for divergences between probability distributions. In this chapter, we introduce a new functional, based on the f-divergence functional, and then we obtain some estimates for the new functional, thef-divergence and the Rényi divergence by applying a cyclic refinement of the Jensen’s inequality. Some inequalities for Rényi and Shannon entropies are obtained too. Zipf-Mandelbrot law is used to illustrate the results.

2010 AMS Subject Classification.Primary 26A51, 26D15, 94A17.

Key words and phrases.Convex function,f-divergence, inequalities forf-divergence, Jensen’s inequality, Shan- non entropy.

The research of the second author has been supported by the Hungarian National Research, Development and Innovation Office Grant No. KH130513.

167

(2)

7.1 Introduction

Divergences between probability distributions have been introduced to measure the dif- ference between them. A lot of different type of divergences exist, for example the f- divergence (especially, Kullback–Leibler divergence, Hellinger distance and total variation distance), R´enyi divergence, Jensen–Shannon divergence, etc. (see [45] and [51]). There are a lot of papers dealing with inequalities for divergences and entropies, see e.g. [44] and [50] and the references therein. The Jensen’s inequality plays a crucial role some of these inequalities.

First we give some recent results on integral and discrete Jensens inequalites. We need the following hypotheses:

(H₁) Let 2≤k≤n be integers, and let p₁,...,p_n and 1,...,k represent positive probability distributions.

(H₂) LetCbe a convex subset of a real vector spaceV, and f :C→Rbe a convex function.

(H₃) Let(X,B,)be a probability space.

Letl≥2 be afixed integer. The-algebra inX^lgenerated by the projection mappings prm:X^l→X(m=1,...,l)

prm(x1,...,x_l):=xm

is denoted byB^l. ^l means the product measure onB^l: this measure is uniquely (^is

-finite) specified by

^l(B1×...×B_l):=(B1)...(Bl), B_m∈B, m=1,...,l.

(H4) Letgbe a-integrable function onX taking values in an intervalI⊂R.

(H₅) Letf be a convex function onIsuch that f◦gis-integrable onX.

Under the conditions (H₁) and (H₃-H₅) we define C_int=C_int(f,g,,p,)

:=



ⁿ

i=1

k−1



j=0j+1p_i+j

Xⁿ

f

⎛

⎜⎜

⎜⎝

k−1

j=0_j+1p_i+jg(x_i+j)

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠dⁿ(x₁,...,x_n), (7.1)

and fort∈[0,1]

C_par(t) =C_par(t,f,g,,p,):=



ⁿ

i=1

k−1



j=0

j+1p_i+j

·

Xⁿ

f

⎛

⎜⎜

⎜⎝t

k−1

j=0j+1p_i+jg(xi+j)

k−1

j=0j+1p_i+_j

+ (1−t)

X

gd

⎞

⎟⎟

⎟⎠dⁿ(x1,...,x_n), (7.2)

(3)

wherei+jmeansi+j−nin case ofi+j>n.

Now we state cyclic renements of the discrete and integral form of Jensens inequality introduced in [20] (see also [36]):

Theorem 7.1 Assume (H₁) and (H₂). If v₁,...,v_n∈C, then f

n



i=1

p_iv_i

≤C_dis=C_dis(f,v,p,) (7.3)

:=



ⁿ

i=1

k−1



j=0

_j+1p_i+j

f

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+_jv_i+j

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠≤



ⁿ

i=1

p_if(v_i)

where i+j means i+j−n in case of i+j>n.

Theorem 7.2 Assume (H₁) and (H₃-H₅). Then f

⎛

⎝

X

gd

⎞

⎠≤C_par(t)≤C_int≤ X

f◦gd, t∈[0,1].

To give applications in information theory, we introduce some denitions. The following notion was introduced by Csisz´ar in [2] and [37].

Definition 7.1 Let f :]0,[→]0,[be a convex function, and letp:= (p₁,...,p_n)and q:= (q₁,...,q_n)be positive probability distributions. The f -divergence functional is

I_f(p,q):=



ⁿ

i=1

q_if pi

q_i .

It is possible to use nonnegative probability distributions in the f-divergence functional, by defining

f(0):= lim

t→0+f(t); 0f 0

0 :=0; 0fa 0

:= lim

t→0+t fa t

, a>0.

Based on the previous denition, the following new functional was introduced in [9].

Definition 7.2 Let J⊂R be an interval, and let f :J→Rbe a function. Let p:= (p₁,...,p_n)∈Rⁿ, andq:= (q₁,...,q_n)∈]0,[ⁿsuch that

p_i

q_i ∈J, i=1,...,n. (7.4)

Then let

Iˆ_f(p,q):=



ⁿ

i=1

q_if p_i

qi .

(4)

As a special case, Shannon entropy and the measures related to it are frequently applied in fields like population genetics, molecular ecology, information theory, dynamical systems and statistical physics(see [21, 22].

Definition 7.3 The Shannon entropy of a positive probability distributionp:= (p1,...,pn) is defined by

H(p):=−



ⁿ

i=1

pilog(pi).

One of the most famous distance functions used in information theory [27, 30], mathematical statistics [28, 31, 29] and signal processing [23, 26] is Kullback-Leibler distance.

The Kullback-Leibler distance [13, 25] between the positive probability distributions p= (p₁,...,p_n)andq= (q₁,...,q_n)is defined by

Definition 7.4 The Kullback-Leibler divergence between the positive probability distri- butionsp:= (p1,...,pn)andq:= (q1,...,qn)is defined by

D(pq):=



ⁿ

i=1

pilog pi

q_i . We shall use the so called Zipf-Mandelbrot law.

Definition 7.5 Zipf-Mandelbrot law is a discrete probability distribution depends on three parameters N∈ {1,2,...}, q∈[0,[and s>0, and it is defined by

f(i;N,q,s):= 1

(i+q)^sH_N,q,s, i=1,...,N, where

H_N,q,s:=



^N

k=1

1 (k+q)^s. If q=0, then Zipf–Mandelbrot law becomes Zipf’s law.

Zipf’s law is one of the basic laws in information science and bibliometrics. Zipf’s law is concerning the frequency of words in the text. We count the number of times each word appears in the text. Words are ranked(r)according to the frequency of occurrence(f). The product of these two numbers is a constant: r·f=c.

Apart from the use of this law in bibliometrics and information science, Zipf’s law is frequently used in linguistics (see [39], p. 167). In economics and econometrics, this distribution is known as Pareto’s law which analyze the distribution of the wealthiest mem- bers of the community (see [39], p. 125). These two laws are the same in the mathematical sense, they are only applied in a different context (see [42], p. 294).

The same type of distribution that we have in Zipf’s and Pareto’s law can be also found in other scientific disciplines, such as: physics, biology, earth and planetary sciences, computer science, demography and the social sciences. For example, the same type of distribution, which we also call the Power law, we can analyze the number of hits on web

(5)

sites, the magnitude of earthquakes, diameter of moon craters, intensity of solar flares, intensity of wars, population of cities, and others (see [48]).

More general model introduced Benoit Mandelbrot (see [46]), by using arguments on the fractal structure of lexical trees.

The are also quite different interpretation of Zipf-Mandelbrot law in ecology, as it is pointed out in [47] (see also [43] and [52]).

7.2 Estimations of f - and R´enyi divergences

In this section we obtain some estimates for the new functional, the f-divergence functional, the Sannon entropy and the R´enyi divergence by applying cyclic renement results for the Jensens inequality. Finally, some concrete cases are considered, by using Zipf- Mandelbrot law.

It is generally common to take log with base of 2 in the introduced notions, but in our investigations this is not essential.

7.2.1 Inequalities for Csisz´ar divergence and Shannon entropy

In thefirst result we apply Theorem 7.1 to ˆI_f(p,q).

Theorem 7.3 Let2≤k≤n be integers, and let:= (1,...,k)be a positive probability distribution. Let J⊂Rbe an interval, letp:= (p1,...,p_n)∈Rⁿ, and letq:= (q1,...,q_n)∈ ]0,[ⁿsuch that

pi

q_i ∈J, i=1,...,n.

(a) If f:J→Ris a convex function, then Iˆf(p,q) =



ⁿ

i=1

qif pi

q_i

≥



ⁿ

i=1

k−1



j=0j+1q_i+j

f

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j

k−1

j=0j+1q_i+j

⎞

⎟⎟

⎟⎠≥f

⎛

⎜⎜

⎜⎝



n i=1

pi



n i=1

q_i

⎞

⎟⎟

⎟⎠



n i=1

qi. (7.5)

If f is a concave function, then inequality signs in (7.5) are reversed.

(b) If f:J→Ris a function such that x→x f(x) (x∈J)is convex, then Iˆ_id_J_f(p,q) =



ⁿ

i=1

p_if pi

q_i

(6)

≥



ⁿ

i=1

k−1

j=0



j+1p_i+j

f

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j

k−1

j=0j+1q_i+j

⎞

⎟⎟

⎟⎠≥ f

⎛

⎜⎜

⎜⎝



n i=1

pi



n i=1

q_i

⎞

⎟⎟

⎟⎠



n i=1

pi. (7.6)

If x→x f(x) (x∈J)is a concave function, then inequality signs in (7.6) are reversed.

In all these inequalities i+j means i+j−n in case of i+j>n.

Proof. (a) By applying Theorem 7.1 withC:=J, f:= f, pi:= qi



n i=1

q_i

, vi:= pi

q_i, i=1,...,n

we have



n i=1

q_if pi

q_i = _n

i=1



q_i

·



ⁿ

i=1

qi



n i=1

q_i f

pi

q_i

≥ n

i=1



q_i

·



ⁿ

i=1

⎛

⎜⎜

⎜⎝

k−1



j=0j+1q_i+j



n i=1

q_i

⎞

⎟⎟

⎟⎠f

⎛

⎜⎜

⎜⎝

k−1

j=0j+1qi+j



n i=1 qi

pi+j

q_i+j

k−1

j=0j+1q_i+j



n i=1 qi

⎞

⎟⎟

⎟⎠

=



ⁿ

i=1

k−1



j=0

_j+1q_i+j

f

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j

k−1

j=0j+1q_i+j

⎞

⎟⎟

⎟⎠

≥f

⎛

⎜⎜

⎜⎝



n i=1

pi



n i=1

q_i

⎞

⎟⎟

⎟⎠



n i=1

qi.

(b) We can prove similarly to (a), by usingf :=id_Jf.

The proof is complete. 2

Remark 7.1 (a) Csisz´ar and K¨orner classical inequality for the f-divergence functional is generalized and refined in (7.5).

(b) Other type of refinements are applied to the f-divergence functional in [40], [41]

and [35].

(c) For example, the functionsx→xlog_b(x) (x>0,b>1)andx→xarctan(x) (x∈R) are convex.

(7)

We mention two special cases of the previous result.

Thefirst case corresponds to the entropy of a discrete probability distribution.

Corollary 7.1 Let2≤k≤n be integers, and let:= (1,...,k)be a positive probability distribution.

(a) Ifq:= (q1,...,qn)∈]0,[ⁿ, and the base oflogis greater than1, then

−



ⁿ

i=1

q_ilog(qi)

≤ −



ⁿ

i=1

k−1

j=0



j+1q_i+_j

log k−1



j=0j+1q_i+_j

≤log

⎛

⎜⎜

⎜⎝ n



n i=1

q_i

⎞

⎟⎟

⎟⎠



n i=1

q_i. (7.7)

If the base oflogis between0and1, then inequality signs in (7.7) are reversed.

(b) Ifq:= (q1,...,qn)is a positive probability distribution and the base oflogis greater than1, then we have estimates for the Shannon entropy ofq

H(q)≤ −



ⁿ

i=1

k−1



j=0

_j+1q_i+j

log k−1



j=0

_j+1q_i+j

≤log(n).

Proof. (a) It follows from Theorem 7.3 (a), by using f:=log andp:= (1,...,1).

(b) It is a special case of (a). 2

The second case corresponds to the relative entropy or Kullback-Leibler divergence between two probability distributions.

Corollary 7.2 Let2≤k≤n be integers, and let^:= (1,...,k)be a positive probability distribution.

(a) Letp:= (p1,...,p_n)∈]0,[ⁿandq:= (q1,...,q_n)∈]0,[ⁿ. If the base oflogis greater than1, then



n i=1

p_ilog p_i

q_i (7.8)

≥



ⁿ

i=1

k−1



j=0j+1p_i+j

log

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j

k−1

j=0j+1q_i+j

⎞

⎟⎟

⎟⎠≥log

⎛

⎜⎜

⎜⎝



n i=1

p_i



n i=1

q_i

⎞

⎟⎟

⎟⎠



n i=1

pi. (7.9)

(8)

(b) Ifpandqare positive probability distributions, and the base oflogis greater than 1, then we have

D(pq)≥



ⁿ

i=1

k−1



j=0j+1p_i+j

log

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j

k−1

j=0j+1q_i+j

⎞

⎟⎟

⎟⎠≥0. (7.10)

Proof. (a) We can apply Theorem 7.3 (b) to the functionf :=log.

(b) It is a special case of (a). 2

Remark 7.2 We can apply Theorem 7.3 to have similar inequalities for other distances between two probability distributions.

7.2.2 Inequalities for R´enyi divergence and entropy

The R´enyi divergence and entropy come from [49].

Definition 7.6 Letp:= (p₁,...,p_n)andq:= (q₁,...,q_n)be positive probability distri- butions, and let≥0,=1.

(a) The R´enyi divergence of order is defined by D_(p,q):= 1

−1log n

i=1



q_i p_i

q_i



. (7.11)

(b) The R´enyi entropy of order ^of^pis defined by H_(p):= 1

1−^log n

i=1



p^_i

. (7.12)

The R´enyi divergence and the R´enyi entropy can also be extended to nonnegative probability distributions.

If→1 in (7.11), we have the Kullback-Leibler divergence, and if→1 in (7.12), then we have the Shannon entropy.

In the next two results inequalities can be found for the R´enyi divergence.

Theorem 7.4 Let2≤k≤n be integers, and let^:= (1,...,k),p:= (p₁,...,p_n)and q:= (q₁,...,q_n)be positive probability distributions.

(a) If0≤≤,, =1, and the base oflogis greater than1, then

D_(p,q)≤ 1

−1log

⎛

⎜⎜

⎝



n i=1

k−1



j=0j+1p_i+j ⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j_p

i+j

qi+j

_−1

k−1

j=0j+1p_i+_j

⎞

⎟⎟

⎟⎠

−1

−1⎞

⎟⎟

⎠ (7.13)

(9)

≤D_(p,q)

The reverse inequalities hold if the base oflogis between0and1.

(b) If1<, and the base oflogis greater than1, then D₁(p,q) =D(pq) =



ⁿ

i=1

p_ilog p_i

q_i

≤ 1

−1log

⎛

⎜⎜

⎜⎝



n i=1

k−1



j=0j+1p_i+j

exp

⎛

⎜⎜

⎜⎝

(−1)^k−1

j=0j+1p_i+_jlog_p

i+j

qi+j

k−1

j=0j+1p_i+_j

⎞

⎟⎟

⎟⎠

⎞

⎟⎟

⎟⎠

≤D_(p,q), where the base ofexpis the same as the base oflog.

(c) If0≤<1, and the base oflogis greater than1, then D_(p,q)

≤ 1

−1



n i=1

k−1

j=0



j+1p_i+j

log

⎛

⎜⎜

⎜⎝

k−1

j=0_j+1p_i+j_p

i+j

q_i+j

_−1

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠≤D₁(p,q)

Proof. (a) By applying Theorem 7.1 withC:= ]0,[,f :]0,[→R, f(t):=t^^⁻¹⁻¹, v_i:=

p_i qi

−1

, i=1,...,n, we have

n



i=1

q_i p_i

qi

^_⁻¹₋₁

= n

i=1



p_i p_i

qi

−1^_⁻¹₋₁

≤



ⁿ

i=1

k−1



j=0j+1p_i+j ⎛

⎜⎜

⎜⎝

k−1

j=0_j+1p_i+j_p

qi+j_i+j

_−1

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠

−1

−1

≤



ⁿ

i=1

p_i p_i

q_i

−1

(7.14)

if either 0≤ <1< or 1< ≤, and the reverse inequalities hold in (7.61) if 0≤

≤<1. By raising the power_¹₋₁, we have from all these cases that n



i=1

q_i p_i

q_i

_₋₁¹

(10)

≤

⎛

⎜⎜

⎝



n i=1

k−1



j=0j+1p_i+j ⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j

_p

i+j

qi+j

_−1

k−1

j=0j+1p_i+_j

⎞

⎟⎟

⎟⎠

−1

−1⎞

⎟⎟

⎠

1−1

≤ n

i=1



pi

q_i

−1 ¹

−1

= n

i=1



qi

pi

q_i

 ¹

−1

.

Since log is increasing if the base of log is greater than 1, it now follows (7.13).

If the base of log is between 0 and 1, then log is decreasing, and therefore inequality signs in (7.13) are reversed.

(b) and (c) When=1 or =1, we have the result by taking limit.

Theorem 7.5 Let2≤k≤n be integers, and let:= (1,...,k),p:= (p₁,...,p_n)and q:= (q₁,...,q_n)be positive probability distributions.

If either0≤<1and the base oflogis greater than1, or1< and the base oflog is between0and1, then

1



n i=1

q_i

pi

qi

_



n i=1

pi

q_i

−1

log pi

q_i ≤ 1

(−1)



ⁿ

i=1

p_i

pi

qi

_−1×

×



ⁿ

i=1

k−1



j=0

_j+1p_i+j p_i+j

q_i+j

−1 log

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j_p

qi+ji+j

_−1

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠≤D_(p,q) (7.15)

≤ 1

−1



n i=1

k−1



j=0

j+1p_i+_j

log

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j_p

i+j

qi+j

_−1

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠≤D₁(p,q)

If either0≤<1and the base oflogis between0and1, or1< and the base of logis greater than1, then the reverse inequalities holds.

Proof. We prove only the case when 0≤<1 and the base of log is greater than 1, the other cases can be proved similarly.

Since_−1¹ <0 and the function log is concave, we have from Theorem 7.1 by choosing C:= ]0,[,f :=log,

vi:=

p_i qi

−1, i=1,...,n,

(11)

that

D_(p,q) = 1

−1log n

i=1



p_i p_i

q_i

−1

≤ 1

−1



n i=1

k−1



j=0j+1p_i+j

log

⎛

⎜⎜

⎜⎝

k−1

j=0_j+1p_i+j_p

i+j

q_i+j

_−1

k−1

j=0j+1p_i+_j

⎞

⎟⎟

⎟⎠

≤ 1

−1



n i=1

pilog p_i

qi

−1

=



ⁿ

i=1

pilog p_i

qi =D1(p,q) and this gives the desired upper bound forD_(p,q).

Since the base of log is greater than 1, the functionx→xlog(x) (x>0)is convex, and therefore_1−¹ <0 and Theorem 7.1 imply that

D_(p,q):= 1

−1log n

i=1



pi

p_i qi

−1

= 1

(−1)



ⁿ

i=1

pi

_p

qii

_−1 n

i=1



p_i p_i

q_i

−1 log

n i=1



p_i p_i

q_i

−1

≥ 1

(−1)



ⁿ

i=1

pi

_p

qii

_−1



n i=1

k−1



j=0

j+1p_i+_j

×

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j_p

i+j

q_i+j

_−1

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠log

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+_j_p

i+j

q_i+j

_−1

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠

1 (−1)



ⁿ

i=1

p_i

pi

qi

_−1



n i=1

k−1

j=0



j+1p_i+j p_i+j

q_i+_j

−1 log

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p_i+j_p

i+j

qi+j

_−1

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠

≥ 1

(−1)



ⁿ

i=1

p_i

pi

qi

_−1



n i=1

pi

p_i q_i

−1

log p_i

q_i

−1

= 1



n i=1

p_i_p

qii

_−1



n i=1

p_i p_i

q_i

−1

log p_i

q_i

(12)

which gives the desired lower bound forD_(p,q).

Now, by using the previous theorems, some inequalities of R´enyi entropy are obtained.

Denote_n¹:=₁

n,...,¹_n

be the discrete uniform distribution.

Corollary 7.3 Let2≤k≤n be integers, and let ^:= (1,...,k)andp:= (p₁,...,p_n) be positive probability distributions.

(a) If0≤≤,, =1, and the base oflogis greater than1, then

H_(p)≥ 1 1−^log

⎛

⎜⎜

⎝



n i=1

k−1



j=0j+1p_i+j ⎛

⎜⎜

⎜⎝

k−1

j=0j+1p^_i+j

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠

−1

−1⎞

⎟⎟

⎠≥H_(p).

(b) If1<, and the base oflogis greater than1, then H(p) =−



ⁿ

i=1

pilog(pi)≥log(n)

+ 1 1− ^log

⎛

⎜⎜

⎜⎝



n i=1

k−1



j=0j+1p_i+j

exp

⎛

⎜⎜

⎜⎝

(−1)^k−1

j=0_j+1p_i+jlog(npi+j)

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠

⎞

⎟⎟

⎟⎠

≥H_(p), where the base ofexpis the same as the base oflog.

(c) If0≤<1, and the base oflogis greater than1, then

H_(p)≥ 1 1−



n i=1

k−1

j=0



j+1p_i+j

log

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p^_i+j

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠≥H(p)

Proof. Ifq=¹_n, then D_(p,1

n) = 1

−1log n

i=1



n^−1p^_i

=log(n) + 1

−1log n

i=1



p^_i

,

(13)

and therefore

H_(p) =log(n)−D_(p,1

n). (7.16)

(a) It follows from Theorem 7.4 and (7.16) that H_(p) =log(n)−D_(p,1

n)

≥log(n)− 1

−1log

⎛

⎜⎜

⎝n^−1



ⁿ

i=1

k−1



j=0j+1p_i+j ⎛

⎜⎜

⎜⎝

k−1

j=0j+1p^_i+_j

k−1

j=0j+1p_i+_j

⎞

⎟⎟

⎟⎠

−1

−1⎞

⎟⎟

⎠

≥log(n)−D_(p,1

n) =H_(p). (b) and (c) can be proved similarly.

Corollary 7.4 Let2≤k≤n be integers, and let:= (1,...,k)andp:= (p₁,...,p_n) be positive probability distributions.

If either0≤<1and the base oflogis greater than1, or1< and the base oflog is between0and1, then

− 1



n i=1

p^_i



n i=1

p^_i log(p_i)≥log(n)− 1 (−1)



ⁿ

i=1

p^_i

×



ⁿ

i=1

k−1



j=0j+1p^_i+j

log

⎛

⎜⎜

⎜⎝n^−1

k−1

j=0j+1p^_i+j

k−1

j=0j+1p_i+j

⎞

⎟⎟

⎟⎠≥H_(p)

≥ 1 1−



n i=1

k−1

j=0



j+1p_i+j

log

⎛

⎜⎜

⎜⎝

k−1

j=0j+1p^_i+_j

k−1

j=0j+1p_i+_j

⎞

⎟⎟

⎟⎠≥H(p)

If either0≤<1and the base oflogis between0and1, or1< and the base of logis greater than1, then the reverse inequalities holds.

In all these inequalities i+j means i+j−n in case of i+j>n.

Proof. We can prove as Corollary 7.3, by using Theorem 7.5. 2 We illustrate our results by using Zipf–Mandelbrot law.

(14)

7.2.3 Inequalities by using the Zipf-Mandelbrot law

We illustrate the previous results by using Zipf-Mandelbrot law.

Corollary 7.5 Letpbe the Zipf-Mandelbrot law as in Definition 10.1, let2≤k≤N be integers, and let:= (1,...,k)be a probability distribution. By applying Corollary 7.3 (c), we have:

If0≤<1, and the base oflogis greater than1, then H_(p) = 1

1−^log 1

H_N,q,s^



N i=1

1 (i+q)^s

≥ 1 1−



n i=1

k−1



j=0

j+1

(i+j+q)^sH_N,q,s

log

⎛

⎜⎜

⎜⎝ 1 H_N,q,s^−1

k−1

j=0

j+1

(i+j+q)^^s k−1

j=0

j+1

(i+q)^s

⎞

⎟⎟

⎟⎠

≥ s H_N,q,s



N i=1

log(i+q)

(i+q)^s +log(H_N,q,s) =H(p) The reverse inequalities hold if the base oflogis between0and1.

Corollary 7.6 Letp₁andp₂be the Zipf-Mandelbrot law with parameters N∈ {1,2,...}, q₁, q₂∈[0,[ and s₁, s₂>0, respectively, let 2≤k≤N be integers, and let  :=

(1,...,k)be a probability distribution. By applying Corollary 7.2 (b), we have:

If the base oflogis greater than1, then D(p1p2) =



^N

i=1

1

(i+q₁)^s¹H_N,q₁_,s₁ log

(i+q2)^s²H_N,q₂_,s₂ (i+q₁)^s¹H_N,q₁_,s₁

≥



^N

i=1

k−1

j=0



j+1 1

(i+j+q₁)^s¹H_N,q₁_,s₁

log

⎛

⎜⎜

⎜⎝

k−1

j=0j+1 1

(i+j+q1)^s¹HN,q1,s1

k−1

j=0j+1 1

(i+j+q2)^s²HN,q2,s2

⎞

⎟⎟

⎟⎠≥0. (7.17)

(15)

7.3 Cyclic improvemnts of inequalities for entropy of Zipf-Mandelbrot law via Hermite interpolating polynomial

In order to give our main results, we consider the following hypotheses for next sections.

(M₁) LetI⊂Rbe an interval,x:= (x₁,...,x_n)∈Iⁿand letp₁,...,p_n and1,...,krep- resent positive probability distributions for 2≤k≤n.

(M₂) Letf :I→Rbe a convex function.

Remark 7.3 Under the conditions (M₁), we define J₁(f) =J₁(x,p,;f):=



ⁿ

i=1

pif(xi)−Cdis(f,x,p,)

J₂(f) =J₁(x,p,^;^f):=C_dis(f,x,p,)−f n

i=1



p_ix_i

wheref :I→Ris a function. The functionalsf →J_u(f)are linear,u=1,2, and Theorem 7.1 imply that

J_u(f)≥0, u=1,2 if f :I→Ris a convex function.

Assume (H₁) and (H₃-H₅). Then we have the following additional linear functionals J₃(f) =J₃(f,g,,p,):=

X

f◦gd−Cint(f,g,,p,)≥0,

J₄(f) =J₄(t,f,g,,p,):=

X

f◦gd−Cpar(t,f,g,,p,)≥0; t∈[0,1], J₅(f) =J₅(t,f,g,,p,):=C_int(f,g,,p,)−Cpar(t,f,g,,p,)≥0; t∈[0,1],

J6(f) =J6(t,f,g,,p,):=Cpar(t,f,g,,p,)−f

⎛

⎝

X

gd

⎞

⎠≥0; t∈[0,1].

Forv=1,...,5, consider the Green functionsG_v:[1,2]×[1,2]→Rdefined as G₁(z,r) =

'_(

2−z)(1−r)

2−1 , 1≤r≤z;

(2−r)(1−z)

2−1 , z≤r≤2. (7.18) G2(z,r) =

1−r, 1≤r≤z,

1−z, z≤r≤2. (7.19)