• Nem Talált Eredményt

2.2 Time series representations

2.2.4 Principal component analysis

In the previous three subsections, representation methods created originally for univariate time series have been presented. Although such representations can be extended to multivariate time series, a multivariate time series is not only a set of univariate time series considered in the same time horizon. Instead, the correlation between the variables often characterizes them better than the actual values of the variables. This correlation can be treated as a hidden (latent) process and it is desirable to compare the time series based on it.

Figure 2.4. Dimension reduction with the help of PCA from 3 (filled dots) to 2 (textured dots) dimensions. Note how the correlation between the dots is preserved.

One of the most frequently applied tools to discover such a hidden process is principal component analysis (PCA) [40]. The main advantages of PCA are its optimal variable reduction property [41] and its capability to put the focus on the latent variables. Such properties made it ideal for industrial applications deal with a large number of variables [42].

The aim of PCA is to find the orthonormaln×nprojection matrixPnthat fulfills the following equations:

YnT =PnTXnT, Pn−1 =PnT

(2.13)

whereXnis the multivariate time series with lengthmandYnis the transformed data having diagonal covariance matrix. The raws of Pn are the new base for representingXnand they are called principal components.

To solve Equation 2.13, PCA linearly transforms the original, n-dimensional data to a new, p < n-dimensional coordinate system with minimal loss of infor-mation in the least square sense. It calculates the eigenvectors and eigenvalues of the covariance (or correlation) matrix of then-dimensional data, and selects the largestpeigenvalues (λ1, . . . , λp) with the corresponding eigenvectors as a new basis.

Technically speaking, PCA finds the most significantporthogonal directions with the largest variance in the original dataset as it is shown in Figure 2.4.

Measures of PCA models

One of the reasons PCA is used is its variable reduction property, i.e. when only the firstp < nprincipal components are used for the projection. To measure the accuracy of the PCA model, the percentage of the captured variance can be used:

Pp i=1λi

Pn

i=1λi, (2.14)

whereλ1, . . . , λnare the eigenvalues of the covariance matrix ofXn. Knowing how much percentage of the variance is wished to be explained by the PCA model, the number of retained principal components can be determined.

Another popular method to select the appropriate number of components is to visualize the eigenvalue against the component number, i.e. the scree plot. Usually, the first few principal components describe the major part of the variance while the remaining eigenvalues are relatively small. Looking for an “elbow” on the scree plot, the number of principal components can also be determined.

When the PCA model has adequate number of dimensions, it can be assumed that the distance of the data from thep-dimensional space of the PCA model is resulted by measurement failures, disturbances and negligible information. Hence, it is useful to analyze the reconstruction error of the projection. It can be computed for thejth data point of the time seriesXnas follows:

Q(j) = (Xn(j)−Xn(j))(Xn(j)−Xn(j))T

=Xn(j)(I −UXn,pUXT

n,p)Xn(j)T,

(2.15)

whereXn(j)is thejthpredicted value ofXn,Iis the identity matrix andUpis the matrix of eigenvectors. These eigenvectors belong to the most importantpn eigenvalues of the covariance matrix ofXn, i.e. they are the principal components.

The analysis of the distribution of the projected data is also informative. The Hotelling’s T2 statistic is often used to calculate the distance of the mapped data from the center of the linear subspace. Its formula is the following for thejth point:

T2(j) = Yp(j)Yp(j)T, (2.16) whereYp(j)is the lower (p) dimensional representation ofXn(j).

Figure 2.5 shows theQand theT2measures in case of a2-variable11-element time series. The original time series data points are represented by grey ellipses and

X2

T2 Q

X1

Figure 2.5. Measures of PCA-models: Hotelling’sT2statistics and theQ reconstruc-tion error

the black spot marks the intersection of the axes of the principal components, i.e. the center of the space that was defined by these principal components.

PCA-based similarity measures

Krzanowski [43] defined a similarity measure, called PCA similarity factor, by comparing the principal components (the dimensionality reduced, new coordinate systems) as follows:

sP CA(Xn, Yn) = trace(UXTn,pUYn,pUYTn,pUXn,p)

p , (2.17)

whereXnandYnare the two multivariate time series withnvariables,UXn,pand UYn,pindicate the matrices of eigenvectors that belong to the most importantpn eigenvalues of the covariance matrices ofXnandYn, i.e. the two new bases of the projections ofXnandYn.

The PCA similarity factor has a geometrical explanation, because it measures the similarity between the two new bases by computing the squared cosine values between all the combinations of the first pprincipal components fromUXn,p and UYn,p:

sP CA(Xn, Yn) = 1 p

p

X

i=1 p

X

j=1

cos2Θi,j, (2.18)

whereΘi,j is the angle between theith principal component ofXnand thejth principal component ofYn.

Although the PCA similarity factor made it possible to compare time series based on the latent variables, i.e. the “rotation” of the principal components, it weights all principal components equally while the principal components do not describe the variance equally. Johannesmeyer [44] modified the PCA similarity factor and weighted each principal component according to the variance it explains, i.e. its eigenvalue: whereλXi n andλYjn are the corresponding eigenvalues for theith andjthprincipal component ofXnandYn.

Yang and Shahabi [45] moved one step forward and presented an extension of the PCA similarity factor, called Eros (Extended Frobeniusnorm). Eros compares the principal components with the same importance and instead of using their eigenvalues as weight, which is learned from a learning set. For twon-variable time seriesXn, Ynand for the corresponding matricesUXn andUYnEros is defined as:

sEros(Xn, Yn) = wherecos Θi is the angle between the two corresponding principal components andwiis theielement of the weighting vector that is based on the eigenvalues of the sequences in the learning set.

Although Eros proved its superiority over the standard PCA similarity factor, and even an indexing structure was introduced for it [46], Eros still carries the burden of all PCA-based similarity measures: none of the presented methods can consider the alteration of the correlation structure and thus cannot handle the fluctuation of the latent variables. Correlation-based dynamic time warping (CBDTW) is presented in Section 3.1 to address this drawback of all PCA-based similarity measures by utilizing dynamic time warping and correlation-based segmentation.