Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page1of42
JJ II
J I
← - , →
Full Screen Search
Close
BASICS I.
Fegyverneki Sándor
University of Miskolc
Department of Applied Mathematics
matfs@uni-miskolc.hu
17th of February 2021.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page2of42
JJ II
J I
← - , →
Full Screen
1 Introduction
• Introduction
• Self-information
• Entropy
• Properties of entropy
• Notations
• References
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page3of42
JJ II
J I
← - , →
Full Screen Search
Close
Theory of statistical communication:
Information theory, signal detecting, stochastic filtration.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page4of42
JJ II
J I
← - , →
Full Screen
Information theorists devote their efforts to quantitative examination of the following three questions:
1. What is information?
2. What are the fundamental limitations on the accuracy with which information can be transmitted?
3. What design methodologies and computational algo- rithms yield practical systems for communicating and stor- ing information that perform close to the aforementioned fundamental limits?
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page5of42
JJ II
J I
← - , →
Full Screen Search
Close
Early work on statistical physics by
L. Boltzmann (1896), Szilárd L. (1929), Neumann J.
(1932).
Communication theory
H.Nyquist (1924), R.V.L. Hartley (1928).
Mathematical model of communication theory:
C.E. Shannon (1948).
quick development, problems, methods and results Parallel theory developed independently by Norbert Wiener (1948) [160,161] also played a significant part in placing communication theory. on a firm footing rooted in mathematical statistics and led to the development of the closely related discipline of cybernetics.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page6of42
JJ II
J I
← - , →
Full Screen
General Shannon’s model (one-way, noiseless):
SOURCE→CODING−→CHANNEL−→DECODING→USER
message
translation (coding) speed (capacity) translation (decoding)
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page7of42
JJ II
J I
← - , →
Full Screen Search
Close
General Shannon’s model (one-way, noise):
SOURCE→CODING−→CHANNEL−→DECODING→USER
~ w NOISE source message
translation (coding)
speed (capacity) + noise (error?) translation (decoding) –correction
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page8of42
JJ II
J I
← - , →
Full Screen
2 Self-information
Supplying information is equivalent to removing uncer- tainty.
That is,
information supplied = prior uncertainty - posterior uncer- tainty.
Let
X={x1, x2, . . . , xn} a finite set.
Choosing one element. How much (many) information?
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page9of42
JJ II
J I
← - , →
Full Screen Search
Close
Counterfeit coin
You are given 27 coins, 26 of which have the same weight, and one of which has a lighter weight. You have a balance scale (using a pan balance).
What is the minimum number of weighings it takes to determine which coin has the lighter weight?
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page10of42
JJ II
J I
← - , →
Full Screen
Note that there are three possible outcomes to each weigh- ing: left side heavier, right side heavier, or both sides equal.
In order to do the given task in as few weighings as possi- ble, we will need as much information from each weighing as possible. Hence, all three possibilities should be real- izable for each weighing (except for the final weighing in some scenarios, as we will see below).
3 weighings Generallog3n
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page11of42
JJ II
J I
← - , →
Full Screen Search
Close
H.W.
(a) You are given twelve coins, eleven of which have the same weight, and one of which has a weight different from the others (either heavier or lighter, you do not know). You have a balance scale.
What is the minimum number of weighings it takes to determine which coin has the different weight, and also whether it is heavier or lighter than the rest?
(b) You are givenN coins,N−1of which have the same weight, and one of which has a weight different from the others (either heavier or lighter, you do not know). You are allowedW weighings on a balance scale.
What is the maximum value for N, as a function of W, for which you can determine which coin has the different weight, and also whether it is heavy or light?
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page12of42
JJ II
J I
← - , →
Full Screen
Let
X={x1, x2, . . . , xn} a finite set.
How many binary digits are necessary to the description one element?
Hartley(1928)
I= log2n
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page13of42
JJ II
J I
← - , →
Full Screen Search
Close
Consider sequences withmelements from the setX.
Number of sequences
nm. If
2k−1< nm≤2k, then the number of binary digits
k m for one element of the setX.
Thus
log2n≤ k
m <log2n+ 1 m, that is,
by increasemlog2nis arbitrary approximate.
Unit for information quantity:1bit= ln 2nat.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page14of42
JJ II
J I
← - , →
Full Screen
The elements are different.
Shannon(1948) – probability Aan event
P(A)is the probability thatAoccurs His solution:
I= log2 1 P(A) Probability field(Ω,F,P)
Classical probability field:Ω ={ω1, ω2, . . . , ωn} P({ωi}) = 1
n ∀i= 1, . . . , n.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page15of42
JJ II
J I
← - , →
Full Screen Search
Close
Required Properties: additivity, monotonicity, normaliza- tion.
Additivity Letn=N M.
X=
N
[
i=1
Ei.
Ei∩Ej =∅, |Ei|=M.
Two steps: Choose one subset after choose one element.
Idea:
I(N M) =I(N) +I(M).
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page16of42
JJ II
J I
← - , →
Full Screen
Monotonicity
small probability⇒big information quantity for occuring A⊂B⇒P(A)≤P(B)⇒I(A)≥I(B), that is, if
P(A) =P(B) iff P(A)≤P(B) and P(A)≥P(B).
Therefore there exists functionf,such that I(A) =f(P(A)).
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page17of42
JJ II
J I
← - , →
Full Screen Search
Close
Normalization
LetI(A) = 1,ifP(A) =1 2.
|X|= 2.
I(A) = 1bits= ln 2nats.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page18of42
JJ II
J I
← - , →
Full Screen Search
Theorem.
Iff : (0,1]→Rand (1)f(p)≥f(q),ifp≤q, (2)f(pq) =f(p) +f(q), (3)f(1
2) = 1, then
f(p) = log21 p. Proof.
x:= log21 p The statement
−x if
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page19of42
JJ II
J I
← - , →
Full Screen Search
Close
Figure 1:
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page20of42
JJ II
J I
← - , →
Full Screen Search
By condition (2)
f(pn) =nf(p) (∀n∈N),
if we apply the mathematical induction. From this, if p=1
2, then
f(2−n) =n.
Furthermore,
2−n= 2− n m
!m , that is,
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page21of42
JJ II
J I
← - , →
Full Screen Search
Close
then
f 2− n m
!
= n m. Thus for all rational0< x
f(2−x) =x.
Ifx= 0, then 1 =f(1
220) =f(1
2)+f(20) = 1+f(1), that is f(1) = 0.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page22of42
JJ II
J I
← - , →
Full Screen Search
For all irrrationalx >0,exit m∈Nandn∈N,that n
m ≤x < n+ 1 m . Then
n
m =f 2− n m
!
≤f(2−x)≤f
2− n+ 1
m
= n+ 1 m . Ifm→ ∞,thenf(2−x) =xThus ifx≥0,then
f(p) = log21 p.
♠
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page23of42
JJ II
J I
← - , →
Full Screen Search
Close
Definition.
The mapξ: Ω→Ris called random variable, if {ξ < x}={ω|ω∈Ω, ξ(ω)< x} ∈ F, ∀x∈R.
Definition.
Distribution functionF(x) =P(ξ < x).
Definition.
Random variableξis called discrete if number of possible values is at most countable infinite.
Sequence:x1, x2, . . . . Definition.
The quantity I(ξ = x) = log2 1
P(ξ=x) is the self- information of the valuexof the random variableξ.
Definition.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page24of42
JJ II
J I
← - , →
Full Screen Search
Discrete distribution
pi=P(ξ=xi) (i= 1,2, . . .).
Theorem.
Ifp1, p2, . . . is a discrete distribution, then pi ≥0 (i= 1,2, . . .) and
∞
X
i=1
pi= 1.
Definition.
If the random variable is finite, that is,
x1, x2, . . . , xn and pi=P(ξ=xi) (i= 1,2, . . . , n), then
E(ξ) =
n
Xxipi
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page25of42
JJ II
J I
← - , →
Full Screen Search
Close
Figure 2:
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page26of42
JJ II
J I
← - , →
Full Screen
3 Entropy
Definition.
Let
P ={p1, p2, . . . , pn}
a discrete distribution of the random variableξ. then H(ξ) =−
n
X
i=1
pilog2pi is called .
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page27of42
JJ II
J I
← - , →
Full Screen Search
Close
Figure 3: Binary case
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page28of42
JJ II
J I
← - , →
Full Screen
Figure 4: The functionxln(x)
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page29of42
JJ II
J I
← - , →
Full Screen Search
Close
Note.
The probabilityp= 0 :
Extension for the functionxlog2x.By definition
x→0+0lim xlog2x= 0, azaz 0 log20 =−0 log21 0 = 0 Note.
The entropyH(ξ)is the expectation of self information.
H(ξ) =
n
X
i=1
log2 1
pi
pi, xi= log2 1 pi
. Notations:
H(ξ) =H(P) =Hn(p1, p2, . . . , pn) =H(p1, p2, . . . , pn).
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page30of42
JJ II
J I
← - , →
Full Screen
Figure 5: Ternary case for entropy
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page31of42
JJ II
J I
← - , →
Full Screen Search
Close
4 Properties of entropy
1.Hn(p1, p2, . . . , pn)≥0.
Proof.
H(ξ) =
n
X
i=1
log2 1
pi
pi.
∀pilog2 1 pi ≥0.
♠ 2. If pk = 1 and pi = 0 (1 ≤ i ≤ n, i 6= k), then Hn(p1, p2, . . . , pn) = 0.
3.Hn+1(p1, p2, . . . , pn,0) =Hn(p1, p2, . . . , pn).
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page32of42
JJ II
J I
← - , →
Full Screen
4.Hn(p1, p2, . . . , pn)≤Hn
1 n,1
n, . . . ,1 n
= log2n.
Proof.
The function−log2xis convex. Apply Jensen’s inequality.
E(f(ξ))≥f(E(ξ)).
Classical probability field, Hartley. ♠ 5.H(ξ)is a continuous function.
6.Hn(p1, p2, . . . , pn)is symmetric in probabilities.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page33of42
JJ II
J I
← - , →
Full Screen Search
Close
7. (Branching) Ifqn =p1+p2+· · ·+pm,then
Hn+m−1(q1, q2, . . . , qn−1, p1, p2, . . . , pm) =
=Hn(q1, q2, . . . , qn) +qnHm(p1
qn,p2
qn, . . . ,pm
qn).
Proof.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page34of42
JJ II
J I
← - , →
Full Screen Search
Hn(q1, q2, . . . , qn) +qnHm(p1
qn
,p2
qn
, . . . ,pm
qn
) =
=−
n
X
i=1
qilog2qi−qn m
X
i=1
pi qn
log2 pi qn
=
=−
n
X
i=1
qilog2qi−
m
X
i=1
pi(log2pi−log2qn) =
=−
n−1
X
i=1
qilog2qi−qnlog2qn−
−
m
X
i=1
pilog2pi+ log2qn m
X
i=1
pi=
=−
n−1
Xqilog2qi−
m
Xpilog2pi=
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page35of42
JJ II
J I
← - , →
Full Screen Search
Close
♠ Note.
AXIOMS:
(1)H(P)is continuous in the distributionP.
(2) Ifpi= 1
n (1≤i≤n),thenH is monotone increasing in valuen.
(3) If0≤λ≤1,then
Hn+1(p1, p2, . . . , λpn,(1−λ)pn) =
=Hn(p1, p2, . . . , pn) +pnH2(λ,1−λ).
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page36of42
JJ II
J I
← - , →
Full Screen Search
5 Notations
N – set of natural numbers (positive integers) R – set of real numbers
R2 –{(x, y)|x, y∈R}
A⊂B –A subset ofB
A∩B – intersection ofAandB (common part) A∪B – union ofAandB (all element in one set) A – the elements of the base set out ofA(complement) A\B – A∩B
F(a+ 0) – right side limit, that is, lim
x→a+0F(x) F(a−0) – left side limit, that is, lim F(x)
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page37of42
JJ II
J I
← - , →
Full Screen Search
Close
f(·) :D→R – the mapf with domainD,the "point"
substitutes the variable.
f(D) – the range (codomain) of the mapf.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page38of42
JJ II
J I
← - , →
Full Screen
Source alphabet (abc):X ={x1, . . . , xn}(n≥2).
Set of source messages:
X =
∞
[
k=1
Xk.
Code alphabet (abc):Y ={y1, . . . , ys} (s≥2).
Set of code messages:
Y=
∞
[
k=1
Yk.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page39of42
JJ II
J I
← - , →
Full Screen Search
Close
6
Thank you
for your attention.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page40of42
JJ II
J I
← - , →
Full Screen Search
References
[1] J. Aczél, Z. Daróczy: On Measures of Infor- mation and Their Characterization, Academic Press, New York,1975.
[2] S. Arimoto: An algorithm for calculating the ca- pacity of an arbitrary discrete memoryless chan- nel, IEEE Trans. Inform. Theory, IT-18,1972, pp14-20.
[3] R.B. Ash:Information Theory, Interscience, New York,1965. 22
[4] J. Berstel, D. Perrin:Theory of Codes, Academic Press, New York,2002.
[5] R. Blahut: Computation of channel capacity
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page41of42
JJ II
J I
← - , →
Full Screen Search
Close
[6] T.M. Cover, J.A. Thomas:Elements of informa- tion theory, Wiley, New York,1991. 22 [7] S. Guiasu: Information theory with applications,
McGRAW-HILL, New York,1977. 22
[8] M. Jimbo, K. Kunisawa: An Iteration Method for Calculating the Relative Capacity, Depart- ment of Information Sciences, Faculty os Sience and Technology, Sience University of Tokyo, Noda City Chiba 278, Japan.
[KS13] . Kelbert, Y. Suhov: Information theory and coding by example,Cambridge University Press, 2013.
[9] C.E. Shannon, W.Weaver: A Mathematical Theory of Communication, The Bell System Technical Journal, Vol. 27, pp. 379-423, 623- 656, July, October,1948.
Basics I.
Introduction Self-information Entropy Properties of entropy Notations End
Page42of42
JJ II
J I
← - , →
Full Screen
[10] Xue-Bin Liang:An Algebraic, Analytic and Al- gorithmic Investigation on the Capacity and Capacity-Achieving Input Probability Distribu- tions of Finite-Input Finite-Output Discrete Memoryless Channels, Department of Electrical and Computer Engineering Louisiana State Uni- versity, Baton Rouge, LA 70803,2004.