Principal component analysis - Time series representations

2.2 Time series representations

2.2.4 Principal component analysis

In the previous three subsections, representation methods created originally for univariate time series have been presented. Although such representations can be extended to multivariate time series, a multivariate time series is not only a set of univariate time series considered in the same time horizon. Instead, the correlation between the variables often characterizes them better than the actual values of the variables. This correlation can be treated as a hidden (latent) process and it is desirable to compare the time series based on it.

Figure 2.4. Dimension reduction with the help of PCA from 3 (filled dots) to 2 (textured dots) dimensions. Note how the correlation between the dots is preserved.

One of the most frequently applied tools to discover such a hidden process is principal component analysis (PCA) [40]. The main advantages of PCA are its optimal variable reduction property [41] and its capability to put the focus on the latent variables. Such properties made it ideal for industrial applications deal with a large number of variables [42].

The aim of PCA is to find the orthonormaln×nprojection matrixP_nthat fulfills the following equations:

Y_n^T =P_n^TX_n^T, P_n⁻¹ =P_n^T

(2.13)

whereX_nis the multivariate time series with lengthmandY_nis the transformed data having diagonal covariance matrix. The raws of P_n are the new base for representingXnand they are called principal components.

To solve Equation 2.13, PCA linearly transforms the original, n-dimensional data to a new, p < n-dimensional coordinate system with minimal loss of infor-mation in the least square sense. It calculates the eigenvectors and eigenvalues of the covariance (or correlation) matrix of then-dimensional data, and selects the largestpeigenvalues (λ₁, . . . , λ_p) with the corresponding eigenvectors as a new basis.

Technically speaking, PCA finds the most significantporthogonal directions with the largest variance in the original dataset as it is shown in Figure 2.4.

Measures of PCA models

One of the reasons PCA is used is its variable reduction property, i.e. when only the firstp < nprincipal components are used for the projection. To measure the accuracy of the PCA model, the percentage of the captured variance can be used:

Pp i=1λi

i=1λ_i, (2.14)

whereλ₁, . . . , λ_nare the eigenvalues of the covariance matrix ofX_n. Knowing how much percentage of the variance is wished to be explained by the PCA model, the number of retained principal components can be determined.

Another popular method to select the appropriate number of components is to visualize the eigenvalue against the component number, i.e. the scree plot. Usually, the first few principal components describe the major part of the variance while the remaining eigenvalues are relatively small. Looking for an “elbow” on the scree plot, the number of principal components can also be determined.

When the PCA model has adequate number of dimensions, it can be assumed that the distance of the data from thep-dimensional space of the PCA model is resulted by measurement failures, disturbances and negligible information. Hence, it is useful to analyze the reconstruction error of the projection. It can be computed for thej^th data point of the time seriesX_nas follows:

Q(j) = (X_n(j)−X_n(j))(X_n(j)−X_n(j))^T

=X_n(j)(I −U_X_n_,pU_X^T

n,p)X_n(j)^T,

(2.15)

whereX_n(j)is thej^thpredicted value ofX_n,Iis the identity matrix andU_pis the matrix of eigenvectors. These eigenvectors belong to the most importantp≤n eigenvalues of the covariance matrix ofX_n, i.e. they are the principal components.

The analysis of the distribution of the projected data is also informative. The Hotelling’s T² statistic is often used to calculate the distance of the mapped data from the center of the linear subspace. Its formula is the following for thej^th point:

T²(j) = Y_p(j)Y_p(j)^T, (2.16) whereY_p(j)is the lower (p) dimensional representation ofX_n(j).

Figure 2.5 shows theQand theT²measures in case of a2-variable11-element time series. The original time series data points are represented by grey ellipses and

T² Q

Figure 2.5. Measures of PCA-models: Hotelling’sT²statistics and theQ reconstruc-tion error

the black spot marks the intersection of the axes of the principal components, i.e. the center of the space that was defined by these principal components.

PCA-based similarity measures

Krzanowski [43] defined a similarity measure, called PCA similarity factor, by comparing the principal components (the dimensionality reduced, new coordinate systems) as follows:

s_{P CA}(X_n, Y_n) = trace(U_X^T_n_,pU_Y_n_,pU_Y^T_n_,pU_X_n_,p)

p , (2.17)

whereX_nandY_nare the two multivariate time series withnvariables,U_X_n_,pand U_Y_n_,pindicate the matrices of eigenvectors that belong to the most importantp≤n eigenvalues of the covariance matrices ofXnandYn, i.e. the two new bases of the projections ofX_nandY_n.

The PCA similarity factor has a geometrical explanation, because it measures the similarity between the two new bases by computing the squared cosine values between all the combinations of the first pprincipal components fromU_X_n_,p and U_Y_n_,p:

s_{P CA}(X_n, Y_n) = 1 p

i=1 p

j=1

cos²Θ_i,j, (2.18)

whereΘ_i,j is the angle between thei^th principal component ofX_nand thej^th principal component ofY_n.

Although the PCA similarity factor made it possible to compare time series based on the latent variables, i.e. the “rotation” of the principal components, it weights all principal components equally while the principal components do not describe the variance equally. Johannesmeyer [44] modified the PCA similarity factor and weighted each principal component according to the variance it explains, i.e. its eigenvalue: whereλ^X_i ⁿ andλ^Y_jⁿ are the corresponding eigenvalues for thei^th andj^thprincipal component ofX_nandY_n.

Yang and Shahabi [45] moved one step forward and presented an extension of the PCA similarity factor, called Eros (Extended Frobeniusnorm). Eros compares the principal components with the same importance and instead of using their eigenvalues as weight, which is learned from a learning set. For twon-variable time seriesXn, Y_nand for the corresponding matricesU_X_n andU_Y_nEros is defined as:

s_Eros(X_n, Y_n) = wherecos Θ_i is the angle between the two corresponding principal components andw_iis theielement of the weighting vector that is based on the eigenvalues of the sequences in the learning set.

Although Eros proved its superiority over the standard PCA similarity factor, and even an indexing structure was introduced for it [46], Eros still carries the burden of all PCA-based similarity measures: none of the presented methods can consider the alteration of the correlation structure and thus cannot handle the fluctuation of the latent variables. Correlation-based dynamic time warping (CBDTW) is presented in Section 3.1 to address this drawback of all PCA-based similarity measures by utilizing dynamic time warping and correlation-based segmentation.

In document Korreláció alapú, elasztikus hasonlósági mértékek technológiai idősorok adatbányászatához (Pldal 33-37)