• Nem Talált Eredményt

Access Line

N/A
N/A
Protected

Academic year: 2022

Ossza meg "Access Line"

Copied!
54
0
0

Teljes szövegt

(1)

Development of Complex Curricula for Molecular Bionics and Infobionics Programs within a consortial* framework**

Consortium leader

PETER PAZMANY CATHOLIC UNIVERSITY

Consortium members

SEMMELWEIS UNIVERSITY, DIALOG CAMPUS PUBLISHER

The Project has been realised with the support of the European Union and has been co-financed by the European Social Fund ***

**Molekuláris bionika és Infobionika Szakok tananyagának komplex fejlesztése konzorciumi keretben

***A projekt az Európai Unió támogatásával, az Európai Szociális Alap társfinanszírozásával valósul meg.

(2)

Principal Component Analysis

Főkomponens analízis

András Oláh, Gergely Treplán, Dávid Tisza

Digitális- neurális-, és kiloprocesszoros architektúrákon alapuló jelfeldolgozás

Digital- and Neural Based Signal Processing &

Kiloprocessor Arrays

(3)

Contents

• Data compression

• Principal Component analysis (PCA or KLT)

Notation

Summary of the algorithm

Problems, extensions

• Hebbian PCA (The Oja algorithm)

• Experiments

• Problems

(4)

Examples of well known transformations:

• DCT - Discrete Cosine Transformation (JPEG,MPEG)

• KLT - Karhunen-Loève Transformation

Transformation

Information

Discarding

Only the important segment remains Prioritized

representation

(5)

Notation

• Scalars are noted with normal letters

• Vectors are noted with bold face lower case.

• Matrices are noted with bold face upper case.

• Random data vector (e.g. video frame):

• Correlation matrix:

• Vectors are defined to be column vectors:

x

( )

T ; ij

( )

i j

E R E x x

= =

R xx

dim( ) x = × N 1

(6)

i element of a vector (scalar):

(i,j)th element of a matrix:

ith row of a matix:

jth column of a matix:

Submatrix (eg. first K rows and all columns):

ith element of a set (eg. a set of eigenvectors)

Expectation operator, if X is a discrete random variable:

,

R

i j ,:

R

i

R:, j

( )i

S

R

= { : is an Eigen vector of }

s x x R

( ( ) : x p xi i ) E X =

x

i

1: ,:K

R

(7)

• Dot product:

• Correlation matix:

• Diagonal extension operator:

1 2

1

2

[ , , ]

0 0

0 0

( ) :

0 0

,

0 0

T

N

a a N

a diag

a

a

a

=

= =

a

A a

1

, T i i

i

a b

=

< a b >= a ⋅ =b

( ·

T

), dim ( )

E N N

= = ×

R x x R

(8)

• eigenvectors and eigenvalues of

• Set of Eigen equations:

• Properties of the eigenvectors:

( ) ( )

·

i

= λ

i

·

i

, i = … 1, , N

R s s

R

( ) ( ) 1|

, orthonormal bases, so 0 |

i j i j T

i j

=

< >= ⋅ =

s s S S I

⋅ = ⋅ R S S Λ

[ ]

( )

(1)

2

) (

1

(2 )

,

, , , , ,

T N

diag λ λ λN

=

=

S s

Λ

s s

(9)

Basis transformations back and forth to the eigenvectors:

• Coder, transforming to the eigenvectors basis:

• Decoder, transforming from the eigenvector basis:

( ) 1

( )

:, ·

= , ( ) ( ) ·

N

T T i

i i n n

i T

n

y s x

=

=

= x = s x

y S x S

(1) (2) ( ) ( )

,:

1

· ( , , , ·

( ) )

N N

i i i

T n

i i i n

n

x s s s s y

=

 

= …

= S y   y =

= ⋅ x S y

(10)

PCA

• Suppose we have an element of a data set, which is a grayscale image map now.

data frame at time instant k:

• Generate the data vector from it, by reading it

column wise: x[k]

(11)

• The data usually can be modeled as a random vector variable x, because we are dealing with processes assuming quasi stationarity and

ergodicity. E.g. consecutive image frames from a video flow are usually „similar”.

• Thus in a time window we examine the corr.

matrix of the process:

{ }

{ } { } { }

{ } { } { }

{ } { } { }

1 1 1 2 1

2 1 2 2 2

1 2

. .

. . . .

.

N T N

N N N N

E x x E x x E x x E x x E x x E x x E

E x x E x x E x x

= =

R xx

( )

dim R = ×N N

(12)

• The data vector random is assumed to have 0 mean.

• Compute the eigenvectors and eigenvalues of the correlation matrix R and sort it in an

decreasing order (by importance):

( ) ( )

·

i

= λ

i

·

i

, i = … 1, , N

R s s

1 2 3 1

(1) (2) (3) ( ) ( 1) ( )

M M N

M M N

λ λ λ λ λ + λ

+

s s s s s s

վ վ վ վ վ վ

( )

E x = 0

(13)

Eigenvectors point

to orthogonal directions in each

dimension which tries to capture the independent variance directions of the distribution

Eigenvalues are proportional to the magnitude of variances

Eigenvalue scaled eigenvectors:

blue arrows

Iso curves of the distribution:

black dashed contours

Sample points of the distribution:

red dots

0 1 0.9

0 , 0.9 1 -0.7071 -0.7071 , -0.7071 0.7071

~

1.9 0 0 0.1 X N   

    

=

=

S Λ

(14)

Eigen vectors correspond to the directions of the standard deviations of the data understood as a sample set of a multidimensional random process.

Eigen values correspond to the magnitude of the standard deviations.

If we use a hypothesis that this linear quantity

(correlation) captures real world “importance” then we can sort the dimensions along “importance” and

compress by discarding “not really important”

dimensions.

(15)

( )

(1)

1 1

( 2 )

2 2

( )

( )

(1)

1 ( 2 )

2

(

(

)

( ) )

G iven , g et all

T ran sfo rm ed d at

,

a: = =

o rig in a

, 1, ,

·

· l d ata:

i i

T T

T i

i

T N

i

i i

N N N

T T N

T

y y

y y

y y

y

i N

y

λ λλ

λ λ

=

=

=

=

=

=

=

s s s s R

s x

s x

s x y

s x

y x S S

s

y

x x S

• We can transform the data frame to the new basis system in a lossless fashion:

(16)

• We want to discard the „unimportant”

components to achieve compression:

Instead of using all eigenvectors, just use the first M to get an approximate of y

1 2 3 1

(1) (2) (3) ( ) ( 1) ( )

M M N

M M N

λ λ λ λ λ + λ

+

s s s s s s

վ վ վ վ վ վ

Transformation

Information

Discarding

Only the important segment remains Prioritized

representation

(17)

• The approximate of the transformed data:

( )

(1)

1 1

(2)

(1)

1 (2)

2

( 2 2

( )

:,1:

)

ˆ ,

ˆ the first M eigenvectors, arranged in a matrix compressed, transformed data: =ˆ =

decompressed, lo

ˆ ˆ

y d

· ss

M M M

M

T T T

T

T M M

T

y y

y y y y

λλ λ

=

= =

=

=

s x

s x y

s x

S S

y x s

s s

S S x

ata recovery: xˆ = SˆT·yˆ

(18)

PCA compression

Compression algorithm:

Decompression algorithm:

Compression

algorithm Access Line Decompression

x ˆy ˆy ˆx

dim( )x = N dim( )yˆ = M dim( )xˆ = N

N >> M

ˆ ˆ= T y S x

ˆ= ˆ ˆ x S y

(19)

The PCA is the optimal lossy representation in the terms of Mean Square Error:

2 2 2 2

2 2

ˆ ˆ

1 1

ˆ : QoS criterion

ˆ ˆ ˆ ˆ ˆ

ˆ ˆ ( ˆ) ( ˆ) ( ˆ )( ˆ)

ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ 2ˆ ˆ ˆ ˆ

ˆ ˆ

T

T T T T

T T T T T T T T T

N M

T T

i i i

i i i M

ε

λ λ λ

= = =

<

= = = =

= + = + =

= =

∑ ∑

=

y y I

E x x

E x x E x Sy E x Sy x Sy E x y S x Sy E x x y S x x S y y S S y E x x y y y y E x x E y y

1 N

+ Minimum because

eigenvalues are ordered in a monotone

decreasing sequence

(20)

The components we discard in the course of compression contribute as much as their corresponding eigenvalues.

The mean square error is the sum of the eigenvalues

corresponding to the discarded components. We drop the minimum contribution due to the monoton decreasing order of eigenvalues.

2 2

1 1 1

ˆ

N M N

i i i

i i i M

λ λ λ

= = = +

=

∑ ∑

=

E x x

(21)

PCA architecture

Access Line

ˆy ˆy

x1

x2

xN

ˆy1

ˆy2

ˆM y

(1)

s1 (1)

s2 (1)

sN ( 2)

s1 ( 2)

s2 ( 2)

sN

( ) 1

s M ( ) 2

s M (M)

sN

ˆx1

ˆx2

ˆN x

(1)

s1 (2)

s1 ( ) 1

s M (1)

s2 (2)

s2 ( ) 2

s M

(1)

sN (2)

sN (M)

sN

1

2

M

1

2

N

(22)

PCA compression overview

We assume that R is given for the process we are dealing with and R remains the same over time. (stationarity) 1. Solve the Eigen problem for R to get the first M largest

eigenvalues and the corresponding eigenvectors based on the quality requirement.

2. Assemble the coding/decoding matrix.

3. Use the coder to encode the data (matrix-vector multiplication), and send through the access line.

4. Decode the data with the decoding matrix.

(23)

Difficulties of direct implementing the PCA for video compression

•Nobody tells us what is R of an underlying video flow.

•R does not remain „still” for the whole time

•Computing eigenvalues and eigenvectors are computationally complex

•Practically the normalization of the variances in the original data set along each dimension before the PCA transformation is useful to get meaningful relative

importance values after the transformation.

(24)

Difficulties of direct implementing the PCA for video compression

•Storage problems:

•N: dim(x)=640x480=307200 per color channel

•NxN: dim(R)=dim(E(x xT))=307200x307200

•If a float (32 bit) used to hold an element of R,

•sizeof(R)~350GB per color channel in memory to hold R or an estimate of R

(25)

Estimation of the correlation matrix

• Using the assumed ergodicity of the process and estimate in a time window of length K

• From time to time we have to reestimate the empirical correlation matrix.

• Note that we still assume quasi stationarity for a K length time window

1 1

1 1 ˆ

lim [ ] [ ] [ ] [ ]

K K

T T

K k k

k k k k

K K

→∞ = =

= ⇒ =

x x R

x x R

(26)

Overcoming the storage problem - Introducing the kernel based PCA

• We are using the fact that was constructed using K outer products, so

• Using the basis transform:

• We can rewrite:

Rˆ

[ ]

( ) ( ) ( )

[1], , [ ]

i i i

= ⋅ = K

s X ξ x x ξ

( )ˆ

rank R = K

( ) ( )

( ) ( )

ˆ , dim( )ˆ

, dim( )

i i

i

i i

i

N N

K K K N N

λ λ

= = ×

ξ = ξ G = × << ×

Rs s R

G

(27)

Kernel based PCA

[ ]

( ) ( )

( )

1

( ) ( )

( ) ( )

1 1 1

( ) ( )

1 1 1

( ) (

, ,

1 1

ˆ 1

, [ ] [ ]

[1] [ ] ˆ

[ ] [ ] [ ] [ ] [ ]

[ ] [ ] [ ] [ ] [

· ] [ ]

K

i i T

i

k

i i

i

K K K

T i i T

l i m

k l m

K K K

i T T i T

l i m

l k m

K K

i i

l n k k l i m

l k

k k

K K

k k l m K n

n k k l K n m

K ξ

λ λ

λ

ξ ξ λ

ξ ξ

ξ

=

= = =

= = =

= =

= = =

=

=

=

=

R x x

s X ξ x x ξ

R s s

x x x x x

x x x x x x

G G

) , 1

( ) ( )

( ) ( )

1

K

n m m

i i

i

i i

K K λ

λ

=

=

=

G

G G ξ Gξ

(28)

Kernel based PCA

Storage problems:

•N: dim(x)=640x480=307200 per color channel

•Sizeof(R)~350GB per color channel in memory to hold R

•If window length of K=1000 frames used

•Sizeof(G)=4MB per color channel [ [1], , [ ]]

ˆ 1 dim ( ˆ )

dim

·

, ( )

,

·

T

T

K

N N K

K K

=

= = ×

×

= =

X G X

X x x

R X R

G X

(29)

The Oja algorithm

The Oja algorithm can be another solution to the storage problem and an alternative for the existing effective algorithms for the computation of the

EigenValue Decomposition(EVD), which is to be performed from time window to time window

If we assume that we know can we recursively compute assuming it is changing slowly enough?

( )i [ ]k s

( )i [k +1]

s

(30)

The Oja algorithm

The Oja algorithm is a nonlinear stochastic difference equation:

, proof with Kushner Clark theorem

[ ] [ ] [ ] [ ] [ ] [ ] [ ]

(

[ ] [ ]·

)

, ( )

1

T

k k y k

k k u u

k k y k

y k

η

ϕ ϕ

+ = +

= w =

w w x

x

w

x1

x2 w2

Σ y w1

xN wN

[ ] (1 )

li m k

→ ∞ w = s

(31)

The Oja algorithm

Lemma:

Proof outline:

1

, (1)

. . max

1

n :

T

s t

= λ =

=

w

w Rw w

w

s

[ ]

(1) 1

1 1

1 2

1

(1) (1)

2

,

· , ·

if we c

,

hoose , 1 0 0

,

·

T T T

N

T T T T T

i i

T T

i

i N

T

i

v

v

λ

λ λ

=

=

= = = = = =

= = = =

= = =

= =

w s

v S w S v S S I S S RS S RS w Rw v S RSv v S RS

w

v v v w s v S s

w Rw

Λ Λ

(32)

The Oja algorithm

Using the properties of the Rayleigh quotient, that

Starting from the Rayleigh quotient and using the Newton’s method to find the maximum:

( ) max

, max

if , max , if

. . 1

n

T T

T N N

s t

λ

×

= =

=

w =

R R R w Rw w s

w w w

( )

[ ] [

[ 1] [ ]

[ ] [ ] [ ] ]

T T

T

grad

k k

k k

k k k

η η

+ = + =

= +

w

Rw w R w

w w w Rw

w w w

w

(33)

The Oja algorithm

We still need the knowledge of R, so we substitute:

(Use the simplification discussed at the LMS and at the Robins-Monroe algorithm)

Stability can be proven by the Kushner-Clark thm

( )

[ ] [ ]

[ ] [ ] [ ]

( ) [ ] [ ] ( ) [ ] [ ]

( [ ]

[ 1] [ ]

[ ) (

]

[

[ ] [ ]

] [ ] [ ] [

[ ] ]

)

T T T

T T T

y k y k y k

E k k E k k

E k E

k

k k k

k k

k y k k k y k

η η η

+ = + =

= +

+

xx w w xx w w

x x

w w

w w

w w x w

w

w x

x

T T

y = x w = w x

(34)

The Oja algorithm

Extending the algorithm with the Generalized

Hebbian Algorithm one can design a feed-forward neural network to catch all the eigenvectors of the correlation matrix.

Generalized Hebbian Rule:

[ ] [ ] [ ] [ ] [ ] [ ]

1

1

j

ji j i j mi m

m

w k η y k x k y k w k y k

=

+ =

(35)

The Oja algorithm with the Generalized Hebbian Algorithm (GHA)

No knowledge of R is needed for the iteration

Can be implemented in a feed-forward neural network structure

For the jth neuron’s ith weight:

[ ] [ ] [ ] [ ]

(

[ ] [ ]

)

[ ] [ ] 1 [ ] [ ]

1

note that is a function of as well

ji ji j ji ji j

j

ji i mi m

m

w k w k y k x k w k y k

x k x k w k y k

x j

η

+ = +

=

(36)

Hebbian PCA

Hebbian learning:

Normalization:

Expanding by Taylor series:

[

k + =1

] [ ]

k +η y k

[ ] [ ]

k

wɶ w x

[ ] [ ]

[

1

] [ ] [ ]

1

1 1 1

1

k k k k

k +

+ = = + +

+

w w w w

w

ɶ ɶ ɶ

ɶ

[ ] [ ] [ ] [ ] [ ] ( )

[ ] [ ] ( )

2 2 2

2 2 2

1 2

2

k k y k T k k O

k y k O

η η

η η

+ = + + =

= + +

w w w x

w

ɶ ɶ

[

k +1

]

1 = −1 η y2

[ ]

k +O

( )

η2

wɶ

(37)

Experiments

•An eigenvalue distribution of a small video stream.

•We experimented on two small video streams

•The first illustrates a calm aquarium.

•The second is an action scene with quick changes.

•We will present these sequences with different compression ratios using the Kernel and the

Hebbian PCA

(38)

Experiments

•The video snippet consisted of 57 frames, window length of 10 was used.

(39)

Experiments

Original:

With

all eigenvectors:

With

5 eigenvectors:

With

2 eigenvectors:

(40)

Experiments

Original:

With all eigenvectors:

With 20 eigenvectors:

With 15 eigenvectors:

(41)

Experiments Error function:

Aquarium Star Wars

2

1

ˆ

N

i i M

ε λ

= +

=

<

x - x

# eigenvectors # eigenvectors

(42)

Experiments

•Difference between the approximated and the true eigenvectors

0 2 4 6 8 10 12 14 16 18 20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 2 4 6 8 10 12 14 16 18 20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1 100 iterations 500 iterations

# eigenvectors # eigenvectors

(43)

Problems with solutions 1. A simple eigenvalue problem:

=

1 0

5 . 0

0 1

5 . 0

5 . 0 5 . 0 5 . 0 R

3 , 2 , 1 i

i

i

i = s =

Rs λ

( )

det Iλ R = 0

(

Iλ R s

)

= 0

5 .

1 = 1

λ λ2 = 1 λ3 = 0

=

3 1 3

1 3

1

s1

= 6

1 6

1 6

2 s3

= 2

1 2

0 1 s2

(44)

Problems with solutions

2. An information flow, compressed from time window to time window (e.g. video flow) has the following eigenvalues of the correlation matrix:

If we want the mean square error of the PCA compression to be less or equal than 0.01, how many neurons should we implement in the

coder ?

2 m, m 1 1000

m m ,...,

λ = =

(45)

Problems with solutions

• The mean square error is defined as:

• So at least 11 neurons must be used to achieve the desired error level

2 2

1 1 1

ˆ

N M N

i i i

i i i M

λ λ λ

= = = +

=

∑ ∑

=

E x x

1000 1000

2 2

1 1 11

ˆ 2 0.0063

0. 10 i i 47 66

i M i M M

λ i

= + = + =

= =

E x x =

∑ ∑

(46)

Problems with solutions

3. Train the underlying neuron with the Oja algorithm with the following parameters:

a) Compute the weights after the 2nd iteration:

b) Can the following state be stabile? Explain your answer!

[ ]

[ ] [ ]

T

T

1 0.5 1

0 0 0 1

η 0.1

=

=

= x w

[ ]2 = ?

w

[ ] [∞ = 2 3 1 3 2 3]T

w

(47)

Problems with solutions

a) Compute the weights after the 2nd iteration:

[ ] [ ] [ ] [ ] [ ] [ ]

( )

[ ] [ ] [ ]

[ ] [ ] [ ] [ ] [ ] [ ]

( )

[ ] [ ] [ ]

[ ] [ ] ( )

(

[ ] [ ] ( )

)

[ ] [ ]

T T

T T T

T

1

1 0.5 1 0 0 0 1 0.1

1 0 0 0 0 0

0 0

1 0 0 1 0.1 1 1 0.5 1 0 0 1 1

1 0.1 0.

0

0

1

5

·

1

T

k k y k k k y k

y y

y

η

η η

+ = +

= = =

= +

=

= +

= −

=

w w x w

x

x w

w w x w

w w

w

(48)

Problems with solutions

a) Compute the weights after the 2nd iteration:

[ ] [ ] [ ]

[ ] [ ] [ ] [ ] [ ] [ ]

( )

[ ] [ ] [ ]

[ ] [ ] [ ] [ ]

[ ] [ ]

T T

T T T

1 0.5 1 1 0.1 0.05 1 0.1

2 1 1 1 1 1

1 1

9 9

2 0.1 0.05 1 0.1 1 0.5 1 0.1 0.05 1

8 8

2 -0.1998 0.0999 0.

· 1 1.125

9859

T

T

y y

y

η η

= = − =

= +

=

= − + − − −

=

x w

w w x w

w w

w

x

(49)

Problems with solutions

b) Can the following state be stable ? Explain your answer!

Let us check if it can be an eigenvector: each and every eigenvector must have a unit norm and form an

orthonormal basis with the others. Since this vector has the unit norm it passes this check.

Let’s solve the eigen problem to be sure:

( )∞ =[2 3 1 3 2 3]T

w

1 0.5 1

0.5 0.25 0.5

1 0.5 1

· T

= −

R = x x

( )

det Iλ R = 0

1 2.25

λ = (1) 2 1 2

3 3 3

T

= s

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

We take the program code and delete all instructions, except: the allocation definition of static data structure; the access algorithms correspond- ing to 0,;

The decision on which direction to take lies entirely on the researcher, though it may be strongly influenced by the other components of the research project, such as the

In this article, I discuss the need for curriculum changes in Finnish art education and how the new national cur- riculum for visual art education has tried to respond to

A felsőfokú oktatás minőségének és hozzáférhetőségének együttes javítása a Pannon Egyetemen... Introduction to the Theory of

In the first piacé, nőt regression bút too much civilization was the major cause of Jefferson’s worries about America, and, in the second, it alsó accounted

The present paper analyses, on the one hand, the supply system of Dubai, that is its economy, army, police and social system, on the other hand, the system of international

involve flow changes and active vasodilation in the large arteries of the Willis circle. Do

In addition, several researches found that Airbnb guests stay longer and spend more than average tourists (Budapest Business Journal 2015). Peer-to-peer accommodations are also