University of Miskolc

(1)

Basics I.

Introduction Self-information Entropy Properties of entropy Notations End

Page1of42

JJ II

J I

← - , →

Full Screen Search

Close

BASICS I.

Fegyverneki Sándor

University of Miskolc

Department of Applied Mathematics

matfs@uni-miskolc.hu

17th of February 2021.

(2)

Basics I.

Introduction Self-information Entropy Properties of entropy Notations End

Page2of42

JJ II

J I

← - , →

Full Screen

1 Introduction

• Introduction

• Self-information

• Entropy

• Properties of entropy

• Notations

• References

(3)

Basics I.

Page3of42

JJ II

J I

← - , →

Full Screen Search

Close

Theory of statistical communication:

Information theory, signal detecting, stochastic filtration.

(4)

Basics I.

Page4of42

JJ II

J I

← - , →

Full Screen

Information theorists devote their efforts to quantitative examination of the following three questions:

1. What is information?

2. What are the fundamental limitations on the accuracy with which information can be transmitted?

3. What design methodologies and computational algo- rithms yield practical systems for communicating and stor- ing information that perform close to the aforementioned fundamental limits?

(5)

Basics I.

Page5of42

JJ II

J I

← - , →

Full Screen Search

Close

Early work on statistical physics by

L. Boltzmann (1896), Szilárd L. (1929), Neumann J.

(1932).

Communication theory

H.Nyquist (1924), R.V.L. Hartley (1928).

Mathematical model of communication theory:

C.E. Shannon (1948).

quick development, problems, methods and results Parallel theory developed independently by Norbert Wiener (1948) [160,161] also played a significant part in placing communication theory. on a firm footing rooted in mathematical statistics and led to the development of the closely related discipline of cybernetics.

(6)

Basics I.

Page6of42

JJ II

J I

← - , →

Full Screen

General Shannon’s model (one-way, noiseless):

SOURCE→CODING−→CHANNEL−→DECODING→USER

message

translation (coding) speed (capacity) translation (decoding)

(7)

Basics I.

Page7of42

JJ II

J I

← - , →

Full Screen Search

Close

General Shannon’s model (one-way, noise):

SOURCE→CODING−→CHANNEL−→DECODING→USER

~ w NOISE source message

translation (coding)

speed (capacity) + noise (error?) translation (decoding) –correction

(8)

Basics I.

Introduction Self-information Entropy Properties of entropy Notations End

Page8of42

JJ II

J I

← - , →

Full Screen

2 Self-information

Supplying information is equivalent to removing uncertainty.

That is,

information supplied = prior uncertainty - posterior uncertainty.

Let

X={x1, x₂, . . . , x_n} a finite set.

Choosing one element. How much (many) information?

(9)

Basics I.

Page9of42

JJ II

J I

← - , →

Full Screen Search

Close

Counterfeit coin

You are given 27 coins, 26 of which have the same weight, and one of which has a lighter weight. You have a balance scale (using a pan balance).

What is the minimum number of weighings it takes to determine which coin has the lighter weight?

(10)

Basics I.

Page10of42

JJ II

J I

← - , →

Full Screen

Note that there are three possible outcomes to each weighing: left side heavier, right side heavier, or both sides equal.

In order to do the given task in as few weighings as possible, we will need as much information from each weighing as possible. Hence, all three possibilities should be real- izable for each weighing (except for the final weighing in some scenarios, as we will see below).

3 weighings Generallog₃n

(11)

Basics I.

Page11of42

JJ II

J I

← - , →

Full Screen Search

Close

H.W.

(a) You are given twelve coins, eleven of which have the same weight, and one of which has a weight different from the others (either heavier or lighter, you do not know). You have a balance scale.

What is the minimum number of weighings it takes to determine which coin has the different weight, and also whether it is heavier or lighter than the rest?

(b) You are givenN coins,N−1of which have the same weight, and one of which has a weight different from the others (either heavier or lighter, you do not know). You are allowedW weighings on a balance scale.

What is the maximum value for N, as a function of W, for which you can determine which coin has the different weight, and also whether it is heavy or light?

(12)

Basics I.

Page12of42

JJ II

J I

← - , →

Full Screen

Let

X={x₁, x₂, . . . , x_n} a finite set.

How many binary digits are necessary to the description one element?

Hartley(1928)

I= log₂n

(13)

Basics I.

Page13of42

JJ II

J I

← - , →

Full Screen Search

Close

Consider sequences withmelements from the setX.

Number of sequences

n^m. If

2^k−1< n^m≤2^k, then the number of binary digits

k m for one element of the setX.

Thus

log₂n≤ k

m <log₂n+ 1 m, that is,

by increasemlog₂nis arbitrary approximate.

Unit for information quantity:1bit= ln 2nat.

(14)

Basics I.

Page14of42

JJ II

J I

← - , →

Full Screen

The elements are different.

Shannon(1948) – probability Aan event

P(A)is the probability thatAoccurs His solution:

I= log₂ 1 P(A) Probability field(Ω,F,P)

Classical probability field:Ω ={ω1, ω2, . . . , ωn} P({ωi}) = 1

n ∀i= 1, . . . , n.

(15)

Basics I.

Page15of42

JJ II

J I

← - , →

Full Screen Search

Close

Required Properties: additivity, monotonicity, normalization.

Additivity Letn=N M.

X=

N

[

i=1

Ei.

Ei∩Ej =∅, |Ei|=M.

Two steps: Choose one subset after choose one element.

Idea:

I(N M) =I(N) +I(M).

(16)

Basics I.

Page16of42

JJ II

J I

← - , →

Full Screen

Monotonicity

small probability⇒big information quantity for occuring A⊂B⇒P(A)≤P(B)⇒I(A)≥I(B), that is, if

P(A) =P(B) iff P(A)≤P(B) and P(A)≥P(B).

Therefore there exists functionf,such that I(A) =f(P(A)).

(17)

Basics I.

Page17of42

JJ II

J I

← - , →

Full Screen Search

Close

Normalization

LetI(A) = 1,ifP(A) =1 2.

|X|= 2.

I(A) = 1bits= ln 2nats.

(18)

Basics I.

Page18of42

JJ II

J I

← - , →

Full Screen Search

Theorem.

Iff : (0,1]→Rand (1)f(p)≥f(q),ifp≤q, (2)f(pq) =f(p) +f(q), (3)f(1

2) = 1, then

f(p) = log₂1 p. Proof.

x:= log₂1 p The statement

−x if

(19)

Basics I.

Page19of42

JJ II

J I

← - , →

Full Screen Search

Close

Figure 1:

(20)

Basics I.

Page20of42

JJ II

J I

← - , →

Full Screen Search

By condition (2)

f(pⁿ) =nf(p) (∀n∈N),

if we apply the mathematical induction. From this, if p=1

2, then

f(2⁻ⁿ) =n.

Furthermore,

2⁻ⁿ= 2⁻ n m

!^m , that is,

(21)

Basics I.

Page21of42

JJ II

J I

← - , →

Full Screen Search

Close

then

f 2⁻ n m

!

= n m. Thus for all rational0< x

f(2^−x) =x.

Ifx= 0, then 1 =f(1

22⁰) =f(1

2)+f(2⁰) = 1+f(1), that is f(1) = 0.

(22)

Basics I.

Page22of42

JJ II

J I

← - , →

Full Screen Search

For all irrrationalx >0,exit m∈Nandn∈N,that n

m ≤x < n+ 1 m . Then

n

m =f 2⁻ n m

!

≤f(2^−x)≤f



2⁻ n+ 1

m



= n+ 1 m . Ifm→ ∞,thenf(2^−x) =xThus ifx≥0,then

f(p) = log₂1 p.

♠

(23)

Basics I.

Page23of42

JJ II

J I

← - , →

Full Screen Search

Close

Definition.

The mapξ: Ω→Ris called random variable, if {ξ < x}={ω|ω∈Ω, ξ(ω)< x} ∈ F, ∀x∈R.

Definition.

Distribution functionF(x) =P(ξ < x).

Definition.

Random variableξis called discrete if number of possible values is at most countable infinite.

Sequence:x1, x2, . . . . Definition.

The quantity I(ξ = x) = log₂ 1

P(ξ=x) is the self- information of the valuexof the random variableξ.

Definition.

(24)

Basics I.

Page24of42

JJ II

J I

← - , →

Full Screen Search

Discrete distribution

pi=P(ξ=xi) (i= 1,2, . . .).

Theorem.

Ifp1, p2, . . . is a discrete distribution, then p_i ≥0 (i= 1,2, . . .) and

∞

X

i=1

p_i= 1.

Definition.

If the random variable is finite, that is,

x₁, x₂, . . . , x_n and p_i=P(ξ=x_i) (i= 1,2, . . . , n), then

E(ξ) =

n

Xxipi

(25)

Basics I.

Page25of42

JJ II

J I

← - , →

Full Screen Search

Close

Figure 2:

(26)

Basics I.

Introduction Self-information Entropy Properties of entropy Notations End

Page26of42

JJ II

J I

← - , →

Full Screen

3 Entropy

Definition.

Let

P ={p1, p₂, . . . , p_n}

a discrete distribution of the random variableξ. then H(ξ) =−

n

X

i=1

p_ilog₂p_i is called .

(27)

Basics I.

Page27of42

JJ II

J I

← - , →

Full Screen Search

Close

Figure 3: Binary case

(28)

Basics I.

Page28of42

JJ II

J I

← - , →

Full Screen

Figure 4: The functionxln(x)

(29)

Basics I.

Page29of42

JJ II

J I

← - , →

Full Screen Search

Close

Note.

The probabilityp= 0 :

Extension for the functionxlog₂x.By definition

x→0+0lim xlog₂x= 0, azaz 0 log₂0 =−0 log₂1 0 = 0 Note.

The entropyH(ξ)is the expectation of self information.

H(ξ) =

n

X

i=1

log₂ 1

pi

pi, xi= log₂ 1 pi

. Notations:

H(ξ) =H(P) =H_n(p₁, p₂, . . . , p_n) =H(p₁, p₂, . . . , p_n).

(30)

Basics I.

Page30of42

JJ II

J I

← - , →

Full Screen

Figure 5: Ternary case for entropy

(31)

Basics I.

Introduction Self-information Entropy Properties of entropy Notations End

Page31of42

JJ II

J I

← - , →

Full Screen Search

Close

4 Properties of entropy

1.Hn(p1, p2, . . . , pn)≥0.

Proof.

H(ξ) =

n

X

i=1

log₂ 1

pi

pi.

∀pilog₂ 1 p_i ≥0.

♠ 2. If pk = 1 and pi = 0 (1 ≤ i ≤ n, i 6= k), then Hn(p1, p2, . . . , pn) = 0.

3.Hn+1(p1, p2, . . . , pn,0) =Hn(p1, p2, . . . , pn).

(32)

Basics I.

Page32of42

JJ II

J I

← - , →

Full Screen

4.Hn(p1, p2, . . . , pn)≤Hn

1 n,1

n, . . . ,1 n

= log₂n.

Proof.

The function−log₂xis convex. Apply Jensen’s inequality.

E(f(ξ))≥f(E(ξ)).

Classical probability field, Hartley. ♠ 5.H(ξ)is a continuous function.

6.Hn(p1, p2, . . . , pn)is symmetric in probabilities.

(33)

Basics I.

Page33of42

JJ II

J I

← - , →

Full Screen Search

Close

7. (Branching) Ifq_n =p₁+p₂+· · ·+p_m,then

H_n+m−1(q1, q2, . . . , q_n−1, p1, p2, . . . , pm) =

=Hn(q1, q2, . . . , qn) +qnHm(p1

q_n,p2

q_n, . . . ,pm

q_n).

Proof.

(34)

Basics I.

Page34of42

JJ II

J I

← - , →

Full Screen Search

Hn(q1, q2, . . . , qn) +qnHm(p1

qn

,p2

qn

, . . . ,pm

qn

) =

=−

n

X

i=1

qilog₂qi−qn m

X

i=1

p_i qn

log₂ p_i qn

=

=−

n

X

i=1

q_ilog₂q_i−

m

X

i=1

p_i(log₂p_i−log₂q_n) =

=−

n−1

X

i=1

q_ilog₂q_i−q_nlog₂q_n−

−

m

X

i=1

pilog₂pi+ log₂qn m

X

i=1

pi=

=−

n−1

Xq_ilog₂q_i−

m

Xp_ilog₂p_i=

(35)

Basics I.

Page35of42

JJ II

J I

← - , →

Full Screen Search

Close

♠ Note.

AXIOMS:

(1)H(P)is continuous in the distributionP.

(2) Ifpi= 1

n (1≤i≤n),thenH is monotone increasing in valuen.

(3) If0≤λ≤1,then

H_n+1(p₁, p₂, . . . , λp_n,(1−λ)p_n) =

=H_n(p₁, p₂, . . . , p_n) +p_nH₂(λ,1−λ).

(36)

Basics I.

Introduction Self-information Entropy Properties of entropy Notations End

Page36of42

JJ II

J I

← - , →

Full Screen Search

5 Notations

N – set of natural numbers (positive integers) R – set of real numbers

R² –{(x, y)|x, y∈R}

A⊂B –A subset ofB

A∩B – intersection ofAandB (common part) A∪B – union ofAandB (all element in one set) A – the elements of the base set out ofA(complement) A\B – A∩B

F(a+ 0) – right side limit, that is, lim

x→a+0F(x) F(a−0) – left side limit, that is, lim F(x)

(37)

Basics I.

Page37of42

JJ II

J I

← - , →

Full Screen Search

Close

f(·) :D→R – the mapf with domainD,the "point"

substitutes the variable.

f(D) – the range (codomain) of the mapf.

(38)

Basics I.

Page38of42

JJ II

J I

← - , →

Full Screen

Source alphabet (abc):X ={x₁, . . . , x_n}(n≥2).

Set of source messages:

X =

∞

[

k=1

X^k.

Code alphabet (abc):Y ={y1, . . . , y_s} (s≥2).

Set of code messages:

Y=

∞

[

k=1

Y^k.

(39)

Basics I.

Introduction Self-information Entropy Properties of entropy Notations End

Page39of42

JJ II

J I

← - , →

Full Screen Search

Close

6

Thank you

for your attention.

(40)

Basics I.

Page40of42

JJ II

J I

← - , →

Full Screen Search

References

[1] J. Aczél, Z. Daróczy: On Measures of Infor- mation and Their Characterization, Academic Press, New York,1975.

[2] S. Arimoto: An algorithm for calculating the capacity of an arbitrary discrete memoryless channel, IEEE Trans. Inform. Theory, IT-18,1972, pp14-20.

[3] R.B. Ash:Information Theory, Interscience, New York,1965. 22

[4] J. Berstel, D. Perrin:Theory of Codes, Academic Press, New York,2002.

[5] R. Blahut: Computation of channel capacity

(41)

Basics I.

Page41of42

JJ II

J I

← - , →

Full Screen Search

Close

[6] T.M. Cover, J.A. Thomas:Elements of information theory, Wiley, New York,1991. 22 [7] S. Guiasu: Information theory with applications,

McGRAW-HILL, New York,1977. 22

[8] M. Jimbo, K. Kunisawa: An Iteration Method for Calculating the Relative Capacity, Depart- ment of Information Sciences, Faculty os Sience and Technology, Sience University of Tokyo, Noda City Chiba 278, Japan.

[KS13] . Kelbert, Y. Suhov: Information theory and coding by example,Cambridge University Press, 2013.

[9] C.E. Shannon, W.Weaver: A Mathematical Theory of Communication, The Bell System Technical Journal, Vol. 27, pp. 379-423, 623- 656, July, October,1948.

(42)

Basics I.

Page42of42

JJ II

J I

← - , →

Full Screen

[10] Xue-Bin Liang:An Algebraic, Analytic and Al- gorithmic Investigation on the Capacity and Capacity-Achieving Input Probability Distribu- tions of Finite-Input Finite-Output Discrete Memoryless Channels, Department of Electrical and Computer Engineering Louisiana State Uni- versity, Baton Rouge, LA 70803,2004.