Eigenvectors and eigenvalues ? - 3

3 | B ASIC CONCEPTS

3.4 Eigenvectors and eigenvalues ?

1

^Figure³^.⁹: Sample feature distribution

with observations belonging into two possible classes (indicated by blue and red).

(−∞,x]and(x,∞), with x either being1or3. As a first step, we need to calculate the contingency tables for the two potential locations of spitting X.

The contingency tables for the two cases are included in Table3.6, and they inform us about the number of feature values that fall into a given range of X broken down for the different class labels.

Class labelY

Table3.6: Sample dataset for illustrating feature discretization using mutual information.

We can now calculate the different mutual information scores correspond-ing to the cases of discretizcorrespond-ing X uscorrespond-ing the criteria X ≤ ¹^{or X} ≤ ^{3. By} denoting the discretized random variable that we derive from X by using a threshold for creating the bins to be t∈ {^{1, 3}}^{by X}t, we get that

We arrived to the above results by applying Eq.(3.8)for the data derived from the contingency tables in Table3.6. Based on the values obtained for MI(X₁,Y)and MI(X₃,Y), we can conclude that – according to our sample

– performing discretization using the threshold t=1is a better choice. Can you find another data point from Figure3.9, using which as a boundary for discretization per-forms even better than the choice of t=1?

3.4 Eigenvectors and eigenvalues ?

The concepts ofeigenvectorsandeigenvaluesfrequently reoccur during our discussion of various topics later on related to e.g. di-mensionality reduction (Chapter6) and graph-based data mining approaches (Chapter8). In order to ease the understanding of those parts, we revisit here these important concepts next briefly.

Given a square matrixM∈ Rⁿ^×ⁿ, an eigenvalue–eigenvector pair forMsatisfies the following equality

Mx=λx, (3.9)

b a s i c c o n c e p t s 51

for some scalarλandn-dimensional vectorx. Note that anyn×ⁿ matrix hasn(not necessarily distinct) eigenvalue–eigenvector pairs.

To this end, we shall index the different eigenvalues and eigenvectors a matrix has asλi,xi (1 ≤ ⁱ ≤ n). In the general case, theλi values can be complex as well, however, matrices that we consider in this book can always be assumed to have real eigenvalues.

Intuitively, an eigenvector for some matrix Mis such a vector, the direction of which does not get altered – modulo to reflection per-haps – relative to its original orientation. Although the direction of the eigenvectors remains intact, their magnitude can change. The rate with which an eigenvector changes is exactly its corresponding eigenvalue. This means that if an eigenvectorxhas a corresponding eigenvector of2, then all the components of the matrix–vector prod-uctMxwould be twice the original coordinates of the original vector x.

It turns out that the eigenvalues for matrixMare such valuesλ, which satisfy that det(M−^λI) = 0, whereIdenotes the identity matrix and det refers to the determinant of its argument.

We can see, that the definition of an eigenvalue according to Eq. (3.9) implies that the equation

(M−^λI)x=0 (3.10)

also has to hold, with0marking the vector of all zeros. This kind of homogeneous system of linear equation can be trivially solved by x=0. This solution is nonetheless a trivial one, which works for any M. If we wish to avoid obtaining such a trivial – hence uninteresting – solution, the rows of(M−^λI)should not be linearly independent.

Had(M−^λI)be of full rank (meaning that it consisted of linearly independent rows), the only solution which would satisfy Eq. (3.10) would be the trivial one. The way to ensure(M−λI)not to be of full rank, is to require det(M−^λI) = 0 to hold. Determinants can be viewed as polynomials, hence the eigenvalues ofMare the roots of the polynomial that we obtain from the determinant of the matrix M−λI. The polynomial that we can construct from the determinant of matrix(M−^λI)is called thacharacteristic polynomialofM.

Essentially, theλeigenvalues forMare the roots of the characteristic polynomial derived from M.

Example3.5. As a concrete example, let us determine the eigenpairs of the matrix M =

5 1 4 8

. Based on the previous discussion, the eigenvalues of M need to be such that the determinant of the matrix

M−λI=

5−^λ 1 4 8−^λ

equals zero. Calculating the determinant of a2x2matrix is easy, as all we need to do is to multiply its elements across its diagonal and subtract the product of the elements in the off-diagonal, leaving us with the polynomial

p(λ) = (5−^λ)(8−^λ)−¹∗⁴=λ²−^13λ+36.

Finding the roots of the above quadratic equation, we get thatλ1 = 4and λ₂ =9. If we substitute back, we get that the eigenvectorxaccompanying the eigenvalueλ=4has to fulfil

1 1 4 4

# x=0,

which is a system of two linear equations and two unknowns.

Since we deliberately constructed our system of equation such that it consist of linearly dependant row coefficients, we are free to choose one of the variables inxto any value. Let us hence arbitrarily set x₂to1.

What it means – because of the x₁+x₂ = 0and4x₁+4x₂ = 0 requirements in our system of equations – is that x₁ = −^x2, meaning that the eigenvector corresponding the eigenvalue4isx=

" would yield a valid solution to the above linear system of equations. In order to avoid this ambiguity, a common practice is to think of and report eigenvectors such that they have a unit norm. What it means, that the canonical way of reporting the eigenvector that we just found is to devide all of its components by its norm, i.e.,√

2in the given example. What it means is that one of the eigenvalue–eigenvector pairs for matrix M is

After a similar line of thought, we get for the other eigenvalue that

" means that the eigenvector corresponding to the eigenvalue9for matrix M is

, or in its unit normalized canonical form

0.97014 0.24254

# .

Let us now also see how can we obtain the eigenpairs of matrixM using Octave. Figure3.10and Figure3.11provides two alternative ways for doing so.

b a s i c c o n c e p t s 53

Figure3.10illustrates the usage of the convenient built-in function of Octave for calculating eigenproblems. The default behavior of the functioneigis that it returns the eigenvalues of its argument in the form of a vector. In case we also want to know the eigenvectors cor-responding to the particular eigenvalues, we can also do so, by using not only the default return value of the function, but the tuple of ma-trices it can also return. In the latter case the matrix returned second contains the eigenvalues of the argument in its main diagonal, and the matrix returned at position one contains one (unit-normalized) eigenvector in each of its columns. The eigenvalue in the same posi-tion in the second returned matrix corresponds to the eigenvalue in the same column from the first returned matrix.

M=[5 1; 4 8];

# obtaining only the eigenvalues of M eigenvals = eig(M)

>> eigenvals = 4

# obtaining both the eigenvectors and eigenvalues of M [eigenvecs, eigenvals] = eig(M)

>> eigenvecs =

-0.70711 -0.24254 0.70711 -0.97014 eigenvals =

Diagonal Matrix

4 0

0 9

CODE SNIPPET

Figure3.10: Eigencalculation using Octave

We provide an alternative way for calculating the eigenpairs of the same matrixMin Figure3.11. Notice how the provided calculation connects to the calculation provided in Example3.5. In Figure3.11, the functionrootsis used for finding the roots of a polynomial deter-mined by its coefficients of decreasing degree and the functionnull finds (unit-normalized) vectors in thenull spaceof its argument. Re-call that a null space of some matrixAare such vectorsxfor which

Ax=0holds. Can you anticipate the values for

eig_vals,^eig_vec1and^eig_vec2 in Figure3.11? Hint: looking back at Example3.5and Figure3.10can help a lot in giving the right answer (potentially modulo to a multiplier -1foreig_vec1andeig_vec2)?

?

eig_vals=roots([1 -13 36]);

eig_vec1=null(M-evals(1)*eye(size(M)));

eig_vec2=null(M-evals(2)*eye(size(M)));

CODE SNIPPET

Figure3.11: An alternative way of determining the eigenpairs of matrixM without relying on the built-in Octave functioneig.

3.5 Summary of the chapter

This chapter introduced the basic concepts and fundamental tech-niques for data representation and transformation in data mining.

At the end of the chapter, we revisited the problem of eigencalcu-lation for matrices, a technique that we will refer to multiple times throughout the remainder of the book.

In document DATAMINING GÁBORBEREND (Pldal 50-55)