Access Line

(1)

Development of Complex Curricula for Molecular Bionics and Infobionics Programs within a consortial* framework**

Consortium leader

PETER PAZMANY CATHOLIC UNIVERSITY

Consortium members

SEMMELWEIS UNIVERSITY, DIALOG CAMPUS PUBLISHER

The Project has been realised with the support of the European Union and has been co-financed by the European Social Fund ***

**Molekuláris bionika és Infobionika Szakok tananyagának komplex fejlesztése konzorciumi keretben

***A projekt az Európai Unió támogatásával, az Európai Szociális Alap társfinanszírozásával valósul meg.

(2)

Principal Component Analysis

Főkomponens analízis

András Oláh, Gergely Treplán, Dávid Tisza

Digitális- neurális-, és kiloprocesszoros architektúrákon alapuló jelfeldolgozás

Digital- and Neural Based Signal Processing &

Kiloprocessor Arrays

(3)

Contents

• Data compression

• Principal Component analysis (PCA or KLT)

• Notation

• Summary of the algorithm

• Problems, extensions

• Hebbian PCA (The Oja algorithm)

• Experiments

• Problems

(4)

Examples of well known transformations:

• DCT - Discrete Cosine Transformation (JPEG,MPEG)

• KLT - Karhunen-Loève Transformation

Transformation

Information

Discarding

Only the important segment remains Prioritized

representation

(5)

Notation

• Scalars are noted with normal letters

• Vectors are noted with bold face lower case.

• Matrices are noted with bold face upper case.

• Random data vector (e.g. video frame):

• Correlation matrix:

• Vectors are defined to be column vectors:

x

( )

^T ^; ^ij

( )

ⁱ ^j

E R E x x

= =

R xx

dim( ) x = × N 1

(6)

• i element of a vector (scalar):

• (i,j)^th element of a matrix:

• i^th row of a matix:

• j^th column of a matix:

• Submatrix (eg. first K rows and all columns):

• i^th element of a set (eg. a set of eigenvectors)

• Expectation operator, if X is a discrete random variable:

,

R

i j ,:

R

i

R:, j

( )ⁱ

∈ S

_R

= { : is an Eigen vector of }

s x x R

( ( ) : x p x_i _i ) E X =

∑

x

i

1: ,:K

R

(7)

• Dot product:

• Correlation matix:

• Diagonal extension operator:

1 2

1

2

[ , , ]

0 0

( ) :

0 0

,

0 0

T

N

a a N

a diag

a

=

 

 

= =  

 

 

… a

A a

⋯

⋮ ⋮ ⋱

1

, ^T _i _i

i

a b

=

< ^{a b} >= ^a ⋅ =^b

∑

⋅

( ·

^T

), dim ( )

E N N

= = ×

R x x R

(8)

• eigenvectors and eigenvalues of

• Set of Eigen equations:

• Properties of the eigenvectors:

( ) ( )

·

ⁱ

= λ

_i

·

ⁱ

, i = … 1, , N

R s s

R

( ) ( ) 1|

, orthonormal bases, so 0 |

i j i j T

i j

=

< >=  ⋅ =

 ≠

s s S S I

⋅ = ⋅ R S S Λ

[ ]

( )

(1)

2

) (

1

(2 )

,

, , , , ,

T N

diag λ λ λN

 

=  …

= …

S s 

Λ

s s

(9)

Basis transformations back and forth to the eigenvectors:

• Coder, transforming to the eigenvectors basis:

• Decoder, transforming from the eigenvector basis:

( ) 1

( )

:, ·

= , ( ) ( ) ·

N

T T i

i i n n

i T

n

y s x

=

⋅ = ^x = ^s ^x

∑

y S x S

(1) (2) ( ) ( )

,:

1

· ( , , , ·

( ) )

N N

i i i

T n

i i i n

n

x s s s s y

=

 

= …

= ^S ^y   ^y =

∑

= ⋅ x S y

(10)

PCA

• Suppose we have an element of a data set, which is a grayscale image map now.

data frame at time instant k:

• Generate the data vector from it, by reading it

column wise: x[k]

(11)

• The data usually can be modeled as a random vector variable x, because we are dealing with processes assuming quasi stationarity and

ergodicity. E.g. consecutive image frames from a video flow are usually „similar”.

• Thus in a time window we examine the corr.

matrix of the process:

{ }

{ } { } { }

1 1 1 2 1

2 1 2 2 2

1 2

. .

. . . .

.

N T N

N N N N

E x x E x x E x x E x x E x x E x x E

E x x E x x E x x

 

 

= =  

 

 

R xx

( )

dim R = ×N N

(12)

• The data vector random is assumed to have 0 mean.

• Compute the eigenvectors and eigenvalues of the correlation matrix R and sort it in an

decreasing order (by importance):

( ) ( )

·

ⁱ

= λ

_i

·

ⁱ

, i = … 1, , N

R s s

1 2 3 1

(1) (2) (3) ( ) ( 1) ( )

M M N

λ λ λ λ λ ₊ λ

+

≥ ≥ ≥ … ≥ ≥ ≥ … ≥

s s s s s s

վ վ վ վ վ վ

( )

E x = 0

(13)

• Eigenvectors point

to orthogonal directions in each

dimension which tries to capture the independent variance directions of the distribution

• Eigenvalues are proportional to the magnitude of variances

Eigenvalue scaled eigenvectors:

blue arrows

Iso curves of the distribution:

black dashed contours

Sample points of the distribution:

red dots

0 1 0.9

0 , 0.9 1 -0.7071 -0.7071 , -0.7071 0.7071

~

1.9 0 0 0.1 X N    

     

 

 

=  

 

=  

S Λ

(14)

• Eigen vectors correspond to the directions of the standard deviations of the data understood as a sample set of a multidimensional random process.

• Eigen values correspond to the magnitude of the standard deviations.

• If we use a hypothesis that this linear quantity

(correlation) captures real world “importance” then we can sort the dimensions along “importance” and

• compress by discarding “not really important”

dimensions.

(15)

( )

(1)

1 1

( 2 )

2 2

( )

(1)

1 ( 2 )

2

(

)

( ) )

G iven , g et all

T ran sfo rm ed d at

,

a: = =

o rig in a

, 1, ,

·

· l d ata:

i i

T T

T i

i

T N

i

i i

N N N

T T N

T

y y

y

i N

y

λ λλ

λ λ

⇒ =   

⇒ =   

 ⇒ =  

⇒ =   

⇒ =   

=

⋅

…

=

s s s s R

s x

s x y

s x

y x S S

s

y

x x S

⋮ ⋮ ⋮ ⋮

• We can transform the data frame to the new basis system in a lossless fashion:

(16)

• We want to discard the „unimportant”

components to achieve compression:

• Instead of using all eigenvectors, just use the first M to get an approximate of y

1 2 3 1

(1) (2) (3) ( ) ( 1) ( )

M M N

λ λ λ λ λ ₊ λ

+

≥ ≥ ≥ … ≥ ≥ ≥ … ≥

s s s s s s

վ վ վ վ վ վ

Transformation

Information

Discarding

Only the important segment remains Prioritized

representation

(17)

• The approximate of the transformed data:

( )

(1)

1 1

(2)

(1)

1 (2)

2

( 2 2

( )

:,1:

)

ˆ ,

ˆ the first M eigenvectors, arranged in a matrix compressed, transformed data: =ˆ =

decompressed, lo

ˆ ˆ

y d

· ss

M M M

M

T T T

T

T M M

T

y y

y y y y

λλ λ

⇒ =   

  

⇒ =  ⇒ =  

  

⇒ =   

=

⋅

s x

s x y

s x

S S

y x s

s s

S S x

⋮ ⋮ ⋮ ⋮

ata recovery: xˆ = Sˆ^T·yˆ

(18)

PCA compression

Compression algorithm:

Decompression algorithm:

Compression

algorithm Access Line Decompression

x ˆy ˆy ˆx

dim( )x = N dim( )yˆ = M dim( )xˆ = N

N >> M

ˆ ˆ= ^T ⋅ y S x

ˆ= ˆ ⋅ ˆ x S y

(19)

The PCA is the optimal lossy representation in the terms of Mean Square Error:

2 2 2 2

2 2

ˆ ˆ

1 1

ˆ : QoS criterion

ˆ ˆ ˆ ˆ ˆ

ˆ ˆ ( ˆ) ( ˆ) ( ˆ )( ˆ)

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ 2ˆ ˆ ˆ ˆ

ˆ ˆ

T

T T T T

T T T T T T T T T

N M

T T

i i i

i i i M

ε

λ λ λ

= = =

− <

   

− = − =  − −  =  − −  =

 

=  − − +  =  − +  =

 

 

   

=   −   =

∑ ∑

− =

y y I

E x x

E x x E x Sy E x Sy x Sy E x y S x Sy E x x y S x x S y y S S y E x x y y y y E x x E y y

1 N

∑

+ Minimum because

eigenvalues are ordered in a monotone

decreasing sequence

(20)

• The components we discard in the course of compression contribute as much as their corresponding eigenvalues.

• The mean square error is the sum of the eigenvalues

corresponding to the discarded components. We drop the minimum contribution due to the monoton decreasing order of eigenvalues.

2 2

1 1 1

ˆ

N M N

i i i

i i i M

λ λ λ

= = = +

− =

∑ ∑

− =

∑

E x x

(21)

PCA architecture

Access Line

ˆy ˆy

∑

⋮

⋮ x1

x2

xN

ˆy1

ˆy2

ˆ_M y

(1)

s1 (1)

s2 (1)

sN _{( 2)}

s1 ( 2)

s2 ( 2)

sN

( ) 1

s M ( ) 2

s M (M)

sN

∑

⋮

ˆx1

ˆx2

ˆ_N x

(1)

s1 (2)

s1 ( ) 1

s M ₍₁₎

s2 (2)

s2 ( ) 2

s M

(1)

sN (2)

sN (M)

sN

1

2

M

1

2

N

(22)

PCA compression overview

We assume that R is given for the process we are dealing with and R remains the same over time. (stationarity) 1. Solve the Eigen problem for R to get the first M largest

eigenvalues and the corresponding eigenvectors based on the quality requirement.

2. Assemble the coding/decoding matrix.

3. Use the coder to encode the data (matrix-vector multiplication), and send through the access line.

4. Decode the data with the decoding matrix.

(23)

Difficulties of direct implementing the PCA for video compression

•Nobody tells us what is R of an underlying video flow.

•R does not remain „still” for the whole time

•Computing eigenvalues and eigenvectors are computationally complex

•Practically the normalization of the variances in the original data set along each dimension before the PCA transformation is useful to get meaningful relative

importance values after the transformation.

(24)

Difficulties of direct implementing the PCA for video compression

•Storage problems:

•N: dim(x)=640x480=307200 per color channel

•NxN: dim(R)=dim(E(x x^T))=307200x307200

•If a float (32 bit) used to hold an element of R,

•sizeof(R)~350GB per color channel in memory to hold R or an estimate of R

(25)

Estimation of the correlation matrix

• Using the assumed ergodicity of the process and estimate in a time window of length K

• From time to time we have to reestimate the empirical correlation matrix.

• Note that we still assume quasi stationarity for a K length time window

1 1

1 1 ˆ

lim [ ] [ ] [ ] [ ]

K K

T T

K k k

k k k k

K K

→∞ = =

= ⇒ =

∑

^x ^x ^R

∑

^x ^x ^R

(26)

Overcoming the storage problem - Introducing the kernel based PCA

• We are using the fact that was constructed using K outer products, so

• Using the basis transform:

• We can rewrite:

Rˆ

[ ]

( ) ( ) ( )

[1], , [ ]

i i i

= ⋅ = K ⋅

s X ξ x ⋯ x ξ

( )ˆ

rank R = K

( ) ( )

ˆ , dim( )ˆ

, dim( )

i i

i

i i

i

N N

K K K N N

λ λ

= = ×

⋅ξ = ξ G = × << ×

Rs s R

G

(27)

Kernel based PCA

[ ]

( ) ( )

( )

1

( ) ( )

1 1 1

( ) ( )

1 1 1

( ) (

, ,

1 1

ˆ 1

, [ ] [ ]

[1] [ ] ˆ

[ ] [ ] [ ] [ ] [ ]

[ ] [ ] [ ] [ ] [

· ] [ ]

K

i i T

i

k

i i

i

K K K

T i i T

l i m

k l m

K K K

i T T i T

l i m

l k m

K K

i i

l n k k l i m

l k

k k

K K

k k l m K n

n k k l K n m

K ξ

λ λ

λ

ξ ξ λ

ξ ξ

ξ

=

= = =

= =

= ⋅ = ⋅ =

=

= → ⋅

=

∑

∑ ∑ ∑

∑ ∑

R x x

s X ξ x x ξ

R s s

x x x x x

x x x x x x

G G

⋯

) , 1

( ) ( )

1

K

n m m

i i

i

i i

K K λ

λ

=

⋅ =

∑ ^G

G G ξ Gξ

(28)

Kernel based PCA

• Storage problems:

•N: dim(x)=640x480=307200 per color channel

•Sizeof(R)~350GB per color channel in memory to hold R

•If window length of K=1000 frames used

•Sizeof(G)=4MB per color channel [ ^[1], ^{, [} ^]]

ˆ 1 dim ( ˆ )

dim

·

, ( )

,

·

T

K

N N K

K K

=

= = ×

×

= =

X G X

X x x

R X R

G X

⋯

(29)

The Oja algorithm

• The Oja algorithm can be another solution to the storage problem and an alternative for the existing effective algorithms for the computation of the

EigenValue Decomposition(EVD), which is to be performed from time window to time window

• If we assume that we know can we recursively compute assuming it is changing slowly enough?

( )ⁱ [ ]k s

( )ⁱ [k +1]

s

(30)

• The Oja algorithm is a nonlinear stochastic difference equation:

, proof with Kushner Clark theorem

[ ] [ ] [ ] [ ] [ ] [ ] [ ]

(

[ ] [ ]^·

)

^, ^{( )}

1

T

k k y k

k k u u

k k y k

y k

η

ϕ ϕ

 

+ = +  − 

= w =

w w x

x

w

x₁

x₂ w₂

Σ y w₁

x_N w_N

[ ] ^{(1 )}

li m k

→ ∞ w = s

(31)

• Lemma:

• Proof outline:

1

, (1)

. . max

1

n :

T

s t

∈ = λ =

=

w

w Rw w

w

ℝ s

[ ]

(1) 1

1 1

1 2

1

(1) (1)

2

,

· , ·

if we c

,

hoose , 1 0 0

,

·

T T T

N

T T T T T

i i

T T

i

i N

T

i

v

λ

λ λ

− −

−

=

= = = ⇒ = = =

= = = =

= = = …

= =

∑

w s

∑

v S w S v S S I S S RS S RS w Rw v S RSv v S RS

w SΛ

v v v w s v S s

w Rw

Λ Λ

(32)

• Using the properties of the Rayleigh quotient, that

• Starting from the Rayleigh quotient and using the Newton’s method to find the maximum:

( ) max

, max

if , max , if

. . 1

n

T T

T N N

s t

λ

×

= ∈ ∈ =

=

w =

R R R w Rw w s

w w w

ℝ ℝ

( )

[ ] [

[ 1] [ ]

[ ] [ ] [ ] ]

T T

T

grad

k k

k k k

η η

 

+ = + =

 − 

 

 

= +  

w

Rw w R w

w w w Rw

w w w

w

(33)

• We still need the knowledge of R, so we substitute:

(Use the simplification discussed at the LMS and at the Robins-Monroe algorithm)

• Stability can be proven by the Kushner-Clark thm

( )

[ ] [ ]

[ ] [ ] [ ]

( ) [ ] [ ] ( ) [ ] [ ]

( [ ]

[ 1] [ ]

[ ) (

]

[

[ ] [ ]

] [ ] [ ] [

[ ] ]

)

T T T

y k y k y k

E k k E k k

E k E

k

k k k

k k

k y k k k y k

η η η

+ = +  −  =

 − 

= +   ≅

≅ + −

xx w w xx w w

x x

w w

w w x w

w

w x

x

T T

y = x w = w x

(34)

• Extending the algorithm with the Generalized

Hebbian Algorithm one can design a feed-forward neural network to catch all the eigenvectors of the correlation matrix.

• Generalized Hebbian Rule:

[ ] [ ] [ ] [ ] [ ] [ ]

1

j

ji j i j mi m

m

w k η y k x k y k w k y k

=

 

+ =  −

 

∆

∑



(35)

The Oja algorithm with the Generalized Hebbian Algorithm (GHA)

• No knowledge of R is needed for the iteration

• Can be implemented in a feed-forward neural network structure

• For the j^th neuron’s i^th weight:

[ ] [ ] [ ] [ ]

(

[ ] [ ]

)

[ ] [ ] ¹ [ ] [ ]

1

note that is a function of as well

ji ji j ji ji j

j

ji i mi m

m

w k w k y k x k w k y k

x k x k w k y k

x j

η

−

+ = + ′ −

′ = −

′

∑

(36)

Hebbian PCA

• Hebbian learning:

• Normalization:

• Expanding by Taylor series:

[

^k ^{+ =}¹

] [ ]

^k ⁺^η ^{y k}

[ ] [ ]

^k

wɶ w x

[ ] [ ]

[

¹

] [ ] [ ]

¹

1 1 1

1

k k k k

k + −

+ = = + +

+

w w w w

w

ɶ ɶ ɶ

ɶ

[ ] [ ] [ ] [ ] [ ] ( )

[ ] [ ] ( )

2 2 2

1 2

2

k k y k T k k O

k y k O

η η

+ = + + =

= + +

w w w x

w

ɶ ɶ

[

^k ⁺¹

]

⁻¹ ^{= −}¹ ^η ^y²

[ ]

^k ⁺^O

( )

^η²

wɶ

(37)

Experiments

•An eigenvalue distribution of a small video stream.

•We experimented on two small video streams

•The first illustrates a calm aquarium.

•The second is an action scene with quick changes.

•We will present these sequences with different compression ratios using the Kernel and the

Hebbian PCA

(38)

Experiments

•The video snippet consisted of 57 frames, window length of 10 was used.

(39)

Experiments

Original:

With

all eigenvectors:

With

5 eigenvectors:

With

2 eigenvectors:

(40)

Experiments

Original:

With all eigenvectors:

With 20 eigenvectors:

With 15 eigenvectors:

(41)

Experiments Error function:

Aquarium Star Wars

2

1

ˆ

N

i i M

ε λ

= +

=

∑

<

x - x

# eigenvectors # eigenvectors

(42)

Experiments

•Difference between the approximated and the true eigenvectors

0 2 4 6 8 10 12 14 16 18 20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 2 4 6 8 10 12 14 16 18 20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1 100 iterations 500 iterations

# eigenvectors # eigenvectors

(43)

Problems with solutions 1. A simple eigenvalue problem:

















=

1 0

5 . 0

0 1

5 . 0

5 . 0 5 . 0 5 . 0 R

3 , 2 , 1 i

i

i = s =

Rs λ

( )

det ^Iλ ⁻ ^R ⁼ 0

(

^I^λ ⁻ ^{R s}

)

⁼ ⁰

5 .

1 = 1

λ λ₂ = 1 λ₃ = 0



 



= 

3 1 3

1 3

1

s1 



 



 − −

= 6

1 6

2 s3



 



 −

= 2

1 2

0 1 s2

(44)

Problems with solutions

2. An information flow, compressed from time window to time window (e.g. video flow) has the following eigenvalues of the correlation matrix:

If we want the mean square error of the PCA compression to be less or equal than 0.01, how many neurons should we implement in the

coder ?

2 ^m, m 1 1000

m m ,...,

λ = ⁻ =

(45)

• The mean square error is defined as:

• So at least 11 neurons must be used to achieve the desired error level

2 2

1 1 1

ˆ

N M N

i i i

i i i M

λ λ λ

= = = +

− =

∑ ∑

− =

∑

E x x

1000 1000

2 2

1 1 11

ˆ 2 0.0063

0. 10 _i ⁱ 47 66

i M i M M

λ i ⁻

= + = + =

− = =

≥^{E x x} =

∑ ∑

(46)

3. Train the underlying neuron with the Oja algorithm with the following parameters:

a) Compute the weights after the 2^nd iteration:

b) Can the following state be stabile? Explain your answer!

[ ]

[ ] [ ]

T

1 0.5 1

0 0 0 1

η 0.1

= − −

=

= x w

[ ]² ⁼ ^?

w

[ ] [^{∞ =} ^{2 3} ⁻^{1 3} ⁻^{2 3}]^T

w

(47)

[ ] [ ] [ ] [ ] [ ] [ ]

( )

[ ] [ ] [ ]

[ ] [ ] [ ] [ ] [ ] [ ]

( )

[ ] [ ] [ ]

[ ] [ ] ( )

(

[ ] [ ] ( )

)

[ ] [ ]

T T

T T T

T

1

1 0.5 1 0 0 0 1 0.1

1 0 0 0 0 0

0 0

1 0 0 1 0.1 1 1 0.5 1 0 0 1 1

1 0.1 0.

0

1

5

·

1

T

k k y k k k y k

y y

y

η

η η

+ = + −

= − − = =

= + −

=

= + − − − − −

−

= −

=

w w x w

x

x w

w w x w

w w

w

(48)

[ ] [ ] [ ]

[ ] [ ] [ ] [ ] [ ] [ ]

( )

[ ] [ ] [ ]

[ ] [ ] [ ] [ ]

[ ] [ ]

T T

T T T

1 0.5 1 1 0.1 0.05 1 0.1

2 1 1 1 1 1

1 1

9 9

2 0.1 0.05 1 0.1 1 0.5 1 0.1 0.05 1

8 8

2 -0.1998 0.0999 0.

· 1 1.125

9859

T

y y

y

η η

= − − = − =

= + −

=

−  − 

= − + − − − −

≈

=

 

 

−

x w

w w x w

w w

w

x

(49)

b) Can the following state be stable ? Explain your answer!

Let us check if it can be an eigenvector: each and every eigenvector must have a unit norm and form an

orthonormal basis with the others. Since this vector has the unit norm it passes this check.

Let’s solve the eigen problem to be sure:

( )^{∞ =}[^{2 3} ⁻^{1 3} ⁻^{2 3}]^T

w

1 0.5 1

0.5 0.25 0.5

1 0.5 1

· ^T

− −

 

 

= − 

 − 

 

R = x x

( )

det Iλ − R = 0

1 2.25

λ = ⁽¹⁾ ² ¹ ²

3 3 3

 T

=  − −  s