• Nem Talált Eredményt

BudapestUniversityofTechnologyandEconomics Prof.VanTienDo Supervisor YangYuanLi by Ph.D.Dissertation AMethodtoProcessImagesDataandPredictionModelsforsomeMapReduceApplications

N/A
N/A
Protected

Academic year: 2023

Ossza meg "BudapestUniversityofTechnologyandEconomics Prof.VanTienDo Supervisor YangYuanLi by Ph.D.Dissertation AMethodtoProcessImagesDataandPredictionModelsforsomeMapReduceApplications"

Copied!
110
0
0

Teljes szövegt

(1)

A Method to Process Images Data and Prediction Models for some MapReduce

Applications

Ph.D. Dissertation

by

YangYuan Li

Supervisor

Prof. Van Tien Do

Budapest University of Technology and Economics

Budapest, Hungary 2020

(2)
(3)

myself, and I only used the sources given at the end. Every part that was quoted word-for- word, or was taken over with the same content, I noted explicitly by giving the reference of the source.

The reviews of the dissertation and the report of the thesis discussion are available at the Deans Office of the Faculty of Electrical Engineering and Informatics of the Budapest University of Technology and Economics.

Budapest, February 19, 2020

YangYuan Li

(4)
(5)

Big data analytics are widely applied in diversity productive fields for retrieving useful information and making decision. Dimensionality reduction in high dimensional data processing and energy consumption estimation for a specific computing job are both chal- lenging problems prior to big data analytics. This thesis has two purposes: (1) to develop an efficient dimensionality reduction algorithm for 2D unlabeled image data, (2) to char- acterize MapReduce applications for big data analytics and further to establish models for predicting resource dependency pattern and usage parameters in future. In the dis- sertation, the analytical methodology is adopted to reach the first aim and the practical methodology is applied to reach the second one.

The dissertation is composed of six chapters, each of them dealing with different aspects of this study.

Chapter 1 is introductory and the motivation, problem, objective and research methodol- ogy in the dissertation.

Chapter 2 proposes a novel unsupervised 2-dimensional dimensionality reduction method with graph embedding, which incorporates the similarity matrix learning and global dis- criminative information into the procedure of dimensionality reduction. Furthermore, an efficient optimization algorithm is developed to realize the proposed method and the con- vergence property of this algorithm is proven. The proposed method is compared with several 2-dimensional unsupervised dimensionality reduction methods and the clustering performance is evaluated by K-means on several benchmark data sets. The obtained results show that the proposed method outperforms the state-of-the-art methods.

Chapter 3 characterizes several MapReduce applications by investigating and identify- ing the significant correlated coefficients of resource usages parameters and their lagged variable. The inter-dependency and inner-dependency of resource usage parameters are analyzed. The experimental results showed that the identified signatures can be used to categorize MapReduce applications.

Chapter 4 concentrates on establishing multiple linear regression models for several MapReduce applications from usage parameters perspective and on identifying the min- imal data sampling time for stable modeling. The analytical results show that resource- intensive characteristics play an important role on either forecasting resource dependency pattern or the minimal number of data samples for stable modeling.

Chapter 5 establishes and compares resource usage parameters forecasting models for four benchmark MapReduce applications using multiple linear regression (MLR) and long short-term memory (LSTM), respectively. Simultaneously, the impact of data sample size to prediction performance of long short-term memory models are investigated. The results show that LSTM models with sufficient sample size exhibit a higher prediction accuracy than multiple linear regression models and the resource-intensive characteristics are closely related to prediction accuracy. Additionally, in order to alleviate underfitting problem, a

i

(6)

two-phase modeling approach is proposed and the effective validation is implemented on read/write intensive application. Moreover, the extensive prediction based on local LSTM model is proposed for write-intensive application. The experimental results show that the prediction value obtained from local model multiplying the standard disk I/O rate ratio might be used to predict usage parameters on heterogeneous machine for write-intensive application.

The main conclusions and the applicability of results are drawn in chapter 6.

-ii-

(7)

Abstract i

List of Figures viii

List of Tables ix

Acknowledgements xi

1 Introduction 1

1.1 Motivations . . . 1

1.1.1 High dimensionality reduction . . . 1

1.1.2 Characterization of MapReduce application . . . 2

1.1.3 Resource dependency pattern modeling . . . 2

1.1.4 Usage parameters prediction . . . 2

1.2 Objectives . . . 3

1.3 Thesis structure . . . 3

2 Discriminative Unsupervised 2D Dimensionality Reduction with Graph Embedding 5 2.1 Introduction . . . 6

2.2 Related works . . . 6

2.2.1 Unsupervised dimensionality reduction . . . 7

2.2.2 Spectral clustering . . . 7

2.3 The proposed method . . . 8

2.3.1 The DUGE algorithm . . . 11

2.3.2 The solution of parameter γ and matrixP . . . 12

2.3.3 Convergence analysis of algorithm . . . 14

2.4 Experimental analysis . . . 16

2.4.1 Data sets . . . 16

2.4.2 Evaluation matrices . . . 17

2.4.3 Comparison methods . . . 17 iii

(8)

CONTENTS

2.4.4 Experiment settings . . . 18

2.4.5 Experimental results . . . 18

2.4.6 Convergence analysis . . . 19

2.5 Conclusion . . . 20

3 Characterization of MapReduce Applications 21 3.1 Introduction . . . 22

3.2 Technical overview . . . 23

3.2.1 Apache Hadoop . . . 23

3.2.2 MapReduce programming model . . . 23

3.2.3 MapReduce application catalog . . . 24

3.2.4 Time series data . . . 24

3.3 Experimental environment and data collection . . . 25

3.3.1 Experimental environment . . . 25

3.3.2 Data collection . . . 25

3.4 Evaluation . . . 26

3.4.1 Explore non-randomness . . . 26

3.4.2 Non-randomness identification . . . 26

3.4.3 Correlated characteristic analysis . . . 29

3.4.4 Correlation matrix . . . 29

3.4.5 Evaluation and discussions . . . 31

3.5 Conclusion . . . 35

4 Multiple Linear Regression Models for MapReduce Applications 37 4.1 Introduction . . . 38

4.2 Technical Overview . . . 39

4.2.1 Multiple linear regression methods . . . 39

4.2.2 Collinearity problem . . . 39

4.2.3 K-fold cross-validation . . . 39

4.2.4 Subset selection approach . . . 40

4.3 Data collection . . . 40

4.4 Models . . . 41

4.4.1 Multiple linear regression model . . . 41

4.4.2 Data autocorrelation pattern . . . 42

4.4.3 Linear relationship investigation . . . 43

4.4.4 Feasibility test . . . 44

4.4.5 Modeling implementation . . . 45 -iv-

(9)

4.4.6 Validation of autocorrelation of error terms . . . 48

4.5 The minimal sampling time for stable modeling . . . 49

4.5.1 Experimental design . . . 49

4.5.2 The minimal sampling time of estimated coefficient . . . 49

4.5.3 The minimal sampling time of statistical metrics . . . 50

4.6 Evaluation and discussion . . . 51

4.6.1 Estimated coefficients and statistical metrics . . . 51

4.6.2 The minimal sampling time for stable modeling . . . 53

4.7 Conclusion . . . 54

5 LSTM Models to Forecast Usage Parameters of MapReduce Applica- tions 57 5.1 Introduction . . . 58

5.2 Related works . . . 59

5.3 Models . . . 60

5.3.1 Multiple linear regression model . . . 60

5.3.1.1 Identifying significant autoregressive term . . . 60

5.3.1.2 Multiple linear regression methods . . . 62

5.3.2 Multivariate LSTM model . . . 63

5.3.2.1 Hyperparameters learning algorithm . . . 64

5.3.2.2 Prediction algorithm . . . 67

5.4 Results and Discussion . . . 67

5.4.1 Environment and data collection . . . 68

5.4.2 CPU usage prediction comparison . . . 68

5.4.3 Prediction accuracy and impact of sample size . . . 70

5.4.3.1 Performance baseline model . . . 70

5.4.3.2 Prediction accuracy comparison . . . 70

5.4.3.3 Impact of sample size . . . 72

5.4.3.4 Overfitting and underfitting evaluation . . . 74

5.4.3.5 Two-phase modeling approach . . . 76

5.4.4 Characteristics of LSTM . . . 78

5.5 Conclusions . . . 81

6 Summary 83

Own Publication 84

Bibliography 86

-v-

(10)

CONTENTS

-vi-

(11)

2.1 Clustering ACC and NMI of DUGE on 4 data sets . . . 19

2.2 Convergence curve of DUGE on Coil20 dataset . . . 20

3.1 Autocorrelation plot and autocovariance plot for Pi application . . . 27

3.2 Autocorrelation plot and autocovariance plot for Terasort application . . . . 28

3.3 Correlation coefficient of CPU-intensive application . . . 32

3.4 Correlation coefficient of read-intensive application . . . 33

3.5 Correlation coefficient of write-intensive application . . . 33

3.6 Correlation coefficient of read/write-intensive application . . . 34

4.1 ACF and PACF plot of resource usage of Terasort application . . . 43

4.2 Residual ACF plot of regression model of Terasort . . . 48

4.3 Estimated coefficients distribution of read rate model of Terasort application 50 4.4 Statistical metrics distribution of read rate model of Terasort application . 51 4.5 Estimated coefficients distribution of regression models . . . 52

4.6 RSE and R2 distribution of regression models . . . 53

4.7 The minimal sampling time of MapReduce applications . . . 54

4.8 The minimal sampling time of statistical metrics . . . 54

5.1 ACF and PACF plot of resource usage of Wordcount application . . . 61

5.2 Illustration of long short-term memory networks model . . . 63

5.3 Common module for calculating the mean of validation RMSE for each forecasting model . . . 65

5.4 Hyperparameters learning algorithm . . . 66

5.5 Dropout rate learning algorithm . . . 67

5.6 Time series graphs of real value vs prediction on CPU usage . . . 69

5.7 NRMSE of forecasting models for CPU usage . . . 71

5.8 NRMSE of forecasting models for predicting memory usage . . . 71

5.9 NRMSE of forecasting models for predicting read rate . . . 72

5.10 NRMSE of forecasting models for predicting write rate . . . 72 vii

(12)

LIST OF FIGURES

5.11 Sensitivity comparison of CPU usage forecasting models . . . 73

5.12 Sensitivity comparison of memory usage forecasting models . . . 74

5.13 Sensitivity comparison of read rate forecasting models . . . 74

5.14 Sensitivity comparison of write rate forecasting models . . . 75

5.15 Overfitting vs Underfitting . . . 75

5.16 Map phase vs Reduce phase . . . 76

5.17 Overall modeling vs separated phase modeling . . . 77

5.18 Real usage parameters comparison of Teragen between two machines . . . . 78

5.19 LSTM model using scale factor for CPU usage prediction of Teragen . . . . 80

5.20 LSTM model using scale factor for write rate prediction of Teragen . . . 81

-viii-

(13)

2.1 Notation summary . . . 8

2.2 Description of data sets . . . 16

2.3 Clustering result in terms of accuracy . . . 18

2.4 Clustering result in terms of normalized mutual information . . . 19

3.1 Correlation matrix of Pi application . . . 29

3.2 Correlation matrix of Wordcount application . . . 30

3.3 Correlation matrix of Teragen application . . . 30

3.4 Correlation matrix of Terasort application . . . 30

3.5 Categorized threshold of correlation coefficient . . . 31

4.1 The largest partial autocorrelations and the corresponding lag numbers of applications . . . 44

4.2 Correlation matrix of Terasort application . . . 44

4.3 The increasing percentage of base error rate . . . 45

4.4 Multiple linear regression models . . . 47

5.1 Abbreviation form of mentioned regression models . . . 62

5.2 The abbreviation expressions of LSTM models . . . 68

5.3 Standard specification of two machines . . . 68

5.4 Abbreviation expressions of persistence models . . . 70

5.5 Notations of some variables . . . 79

5.6 Usage parameters mean on heterogeneous machines . . . 80

ix

(14)

LIST OF TABLES

-x-

(15)

I would like to thank all people who have provided valuable assistance during my study towards the Ph.D. degree.

I would like to express my sincere gratitude to Prof. Dr. Do Van Tien for his intensive supervision. Prof. Dr. Do Van Tien has guided me on the direction of my research at preliminary time and taught me a lot about the vital ability to approach and solve problems. Without his continuous supervision and straight criticisms, I could not grow up to accomplish this study and achieve PhD degree. I deeply thank Ms. Tran Thi Xuan, one of my colleague in Analysis, Design and Development of ICT systems laboratory at our department, for her work cooperation and enthusiastic support through my research. All the other members of the Analysis, Design and Development of ICT systems laboratory, colleagues, and the university staffs are acknowledged.

Finally, I dedicate my sincerely thankfulness to my parents, wife and daughter, for their love and intensive encouragement. I am also grateful to all family members and friends who have supported me throughout.

xi

(16)

Chapter 1

Introduction

1.1 Motivations

With the rapid development of digital technology and information system, plenty of huge and complex repositories with terabytes of data (even petabytes) are generated explosively.

Usually, this kind of repository is called big data and has the characteristics of huge volume, high velocity,and enormous variety [88]. Consequently, such large and complex datasets are challenging traditional database management or data processing tools.

As a consensus, images and image sequences (videos) make up about 80 percent of all corporate and public unstructured big data [83]. An inherent nature of image data is high dimensionality which is prone to incur unavailability of traditional statistical analyzing approach. Therefore, dimensionality reduction method plays a crucial role for alleviating the dimensionality curses in many fields such as multimedia event detection, image par- tition, video category recognition, gene expression and time series prediction in advance data analytics.

Furthermore, numerous tools are available for processing big data, such as Apache Hadoop etc. It is a collection of open-source software utilities and use computing cluster to pro- cess massive amounts of data and computations. MapReduce is a key component of Hadoop for data parallel processing. The fault-tolerant storage and high throughput data processing are its highlight characteristics [1]. According to MapReduce paradigm, nu- merous MapReduce applications are developed for big data analytic job. However, the uncertainty of resource dependency pattern and demands for a specific computing job often leads to the low efficiency and energy wasting in a specific distributed computing platform. Therefore, the characterization of MapReduce application for categorical goal, resource dependency pattern modeling and accurate usage parameters prediction are in great need and play an crucial role in resource allocation and scheduling strategy from the perspective of cluster/cloud operator.

The following four subsections are the problem statements in the dissertation.

1.1.1 High dimensionality reduction

High-dimensional data processing is a crucial challenge in many fields such as multimedia event detection, image partition, video category recognition, gene expression and time se- ries prediction [115, 133]. The typical solution for alleviating this problem is to implement dimensionality reduction methods in advance data analytics. In past decades, a number

(17)

of dimensionality reduction methods have been proposed [4, 50, 18, 128]. The conven- tional clustering methods often rely on the representations of the relationships among data points. Clustering is then accomplished by spectral or other graph-theoretic op- timization procedures [76, 96]. However, most of the spectral-based clustering methods only focus on the local structures and ignore the global discriminative information of data, which may lead to overfitting and degrade the clustering performance [121, 125]. Thus, a discriminative 2D unsupervised dimensionality reduction method is needed to solve the problems.

1.1.2 Characterization of MapReduce application

The most established software platform for big data analytics is Apache Hadoop which has been widely applied in cloud computing. Originally, Hadoop is designed for computer clusters built from commodity hardware. It provides MapReduce programming model for data parallel processing and hadoop distributed file system (HDFS) for data storing [1].

Based on MapReduce programming model, various MapReduce application with different resource demands are developed. Unfortunately, these resource demands performed huge differences between them according to their computing goal yet cloud operator and relevant consumers could not in advance know these demands and might results in inappropriate resource allocation or reservation. Therefore, identifying the characteristics of workload on data analytics platform prior to being executed could help cluster owner or operator actively to control computing resource which can be beneficial to power savings and service performance improving.

1.1.3 Resource dependency pattern modeling

MapReduce applications have been widely applied to big data analytics. The resource allocation of computing job has a huge influence to efficiency of computing cluster. How- ever, the unknown characteristics of resource dependency often leads to lower efficiency and wasting of computing resource of system. To avoid above problems, therefore, model- ing the dependent pattern among resource usage parameters of MapReduce applications is crucially needed from the viewpoint of cloud operators. Many researches have used multiple linear regression (MLR) to predict performance metrics of MapReduce applica- tion [61, 130, 123, 34]. However, the constructed performance models above mainly focus on predicting execution time of Hadoop jobs and none of them could not be used for effective resource utilization by both users/consumers and service providers in the cloud.

1.1.4 Usage parameters prediction

The usage parameters prediction constitutes one of particularly significant tasks in the operation of computing clusters. MapReduce application is widely used for processing huge amount data either in public clouds or in private clouds. Cloud providers can manage their resource usage by obtaining future usage demand from the current and past usage patterns of resources. Therefore, the accurately forecasting usage parameters is of great importance for dynamic scaling of cloud resources to achieve efficiency in terms of cost and energy consumption while keeps quality of service. Although multiple linear regression (MLR) [61, 85, 123, 34] and long short-term memory (LSTM) [28, 106, 129, 13, 122] are widely applied to time series prediction in many fields, few of them were applied to forecast resource usage parameters of MaprReduce application.

-2-

(18)

1.2. OBJECTIVES

1.2 Objectives

The main objectives of the dissertation include:

• the development of an efficient 2D unsupervised dimensionality reduction method for 2D unlabeled image data to alleviate overfitting and improve clustering performance,

• the characterization of MapReduce applications,

• the development of models to predict usage parameters of several MapReduce ap- plications.

In the dissertation, I applied a mathematical analysis and statistical methods to investigate the problems.

1.3 Thesis structure

In chapter 2, a novel unsupervised 2-dimensional dimensionality reduction method, which incorporates the similarity matrix learning and global discriminant information into the procedure of dimensionality reduction was proposed [J1]. This discriminative graph-based embedding 2D unsupervised dimensionality reduction method learns the projection ma- trices which are useful to clustering. Instead of using a predetermined similarity matrix to characterize the local structures of original 2D data, the proposed approach involves the similarity matrix learning into the procedure of dimensionality reduction. Inspired by the thought that isolated local structures may incur overfitting and degrade the cluster- ing performance, we integrated the global discriminative information into the proposed method. An iterative optimization algorithm is then derived to solve the proposed mini- mization problem. The convergence of the proposed algorithm is analyzed in theory and experiment. Both of the theoretical analysis and experimental performance indicate the effectiveness and superiority of the proposed method. We compare the proposed method with several 2-dimensional unsupervised dimensionality reduction methods and evaluate the clustering performance by K-means [38] on several benchmark data sets. The obtained results show that the proposed method outperforms the state-of-the-art methods.

Chapter 3 characterizes MapReduce applications by analyzing and extracting either inter- dependency or inner-dependency relationships among resource usage parameters and its’

significant historic usage [J4]. The extracted characteristics might further be used to per- form categorizing based on correlated coefficients of resource usages parameters. Simulta- neously, the non-randomness of each usage parameters were investigated by calculating the relevant autocorrelation and autocovariance. As a result, the significant lagged variable of each usage parameter is identified based on the observations of non-randomness. By calcu- lating Pearson correlation coefficient, the inter-dependency of resource usage parameters including autocorrelation of each usage parameter and the correlation of resource usage parameters were investigated and analysed. Based on the analysis, the specific groups of correlation efficients for categorizing goal of MapReduce applications were identified.

This work can be of much use for efficient scheduling of MapReduce applications on the commercial computing clouds as well as helps the provider of cloud-service to predict the resource usage of the systems.

In chapter 4, some resource usage dependency models are established for 7 benchmark MapReduce applications using multiple linear regression methods and the minimal number

-3-

(19)

of data samples for stable modeling [J2] were identified. Due to the significant autocorre- lation nature, the associated autoregressive term is involved by analyzing non-randomness of resource usage. Based on the intuitive observation of correlation matrix and precise calculation of base error rate improvement, the feasibility of linear regression modeling was ensured. Then, multiple linear regression models for resource dependency pattern are established. The estimated coefficients of model come from the ordinary least squares approach. The measurements of R2 and residual standard error (RSE) are used to eval- uate model fit quality. In addition, the minimal sampling time for stable modeling was investigated and identified by observing if the change rate of estimated coefficients and statistical metrics converged to a threshold (0.1). The numerical results showed that resource-intensive characteristics play an important role on forecasting resource depen- dency pattern as well as on the minimal data sampling time for stable modeling.

Chapter 5 establishes forecasting models (multivariate long short-term memory and mul- tiple linear regression) for future resource usage of four MapReduce applications with ex- clusive resource-intensive characteristics [C2, C1] [J3]. To effectively evaluate prediction accuracy and feasibility of both models, NRMSE (normalized root mean squared error) and performance baseline model were involved. Meanwhile, the impact of sample size to prediction accuracy was also investigated. Moreover, a two-phase modeling approach was proposed for read/write intensive application (Terasort) to alleviate serious underfitting issues. According to scaled up/down characteristics of usage parameters on heterogeneous machines, an extensive prediction approach based on local model for write-intensive ap- plication is proposed. The results show that models using long short-term memory with sufficient sample size exhibit a higher accuracy than using multiple linear regression and the resource-intensive characteristics are closely related to prediction accuracy of forecast- ing models.

Finally, chapter 6 summaries conclusions of the dissertation and provides the applicability of new results.

-4-

(20)

Chapter 2

Discriminative Unsupervised 2D

Dimensionality Reduction with

Graph Embedding

(21)

2.1 Introduction

High-dimensional data processing is applied in many fields such as multimedia event de- tection, multimedia event recounting, video category recognition, gene expression and time series prediction [115, 133]. Dimensionality reduction is a typical method to increase the data processing rate. In recent years, many one dimensional dimensionality reduc- tion methods have been proposed. However, the processing of transforming 2-dimensional data matrices to one dimensional vectors may destroy the structures of data. Thus, 2- dimensional (2D) dimensionality reduction methods are considered as a helpful substitute.

Since it is difficult to obtain the tag information, 2D unsupervised dimensionality meth- ods attract increasing attentions. In the conventional 2D unsupervised dimensionality reduction methods, the similarity matrix plays an important role for its efficiency and comprehensibility [4, 50, 18, 128]. Usually, similarity learning and dimensionality reduc- tion are conducted in two separated steps, so the performance of dimensionality reduction highly depends on the quality of similarity learning. However, due to noises in the collected data the learned similarity matrix may not be the optimal one [10, 59, 118]. Therefore, efforts have been devoted to find a similarity matrix that captures the structures of data.

For example, Nie et al. [76] proposed to learn the data similarity matrix by assigning the adaptive and nearest neighbors for each data point. Du and Shen [24] concerned both the global and local structures of data, and performed adaptive structure learning. Kodirov et al. [27] integrated the graph’s learning into the 1-norm graph regularized optimization problem for a robust subspace clustering.

Another crucial aspect of unsupervised dimensionality reduction is spectral clustering, which has closely relationship with similarity matrix. The conventional clustering methods often rely on the representations of the relationships among data points. Clustering is then accomplished by spectral or other graph-theoretic optimization procedures [76, 96].

However, most of the spectral-based clustering methods only focus on the local structures and ignore the global discriminative information of data, which may lead to overfitting and degrade the clustering performance [121, 125].

In this chapter, we propose a discriminative 2D unsupervised dimensionality reduction method, named Discriminative Unsupervised 2D Dimensionality Reduction with Graph Embedding (DUGE). DUGE mitigates the negative impact of the predetermined simi- larity matrix. After transforming the 2-dimensional data matrices to the corresponding 1-dimensional vectors, the proposed method involves the global discriminative information of data distribution. Extensive experiments are conducted on several real-world bench- mark data sets to validate the proposed method. The experimental results show that our method outperforms the state-of-the-art methods.

The structure of this chapter is as follows. In section 2.2, we gives the overview of state- of-the-art on similarity matrix based unsupervised dimensionality reduction methods and the spectral clustering. In section 2.3, we propose Discriminative Unsupervised 2D Di- mensionality Reduction with Graph Embedding (DUGE). Section 2.4 presents extensive experiment results. Conclusions are included in section 2.5.

2.2 Related works

Our work is mainly related to unsupervised dimensionality reduction. Particularly, we focus on the 2D unsupervised dimensionality reduction with similarity matrix learning and spectral clustering learning. In this section, we introduce some state-of-the-art similarity

-6-

(22)

2.2. RELATED WORKS

matrix based unsupervised dimensionality reduction methods firstly, and then we extend our discussion to the spectral clustering.

2.2.1 Unsupervised dimensionality reduction

Principal Component Analysis (PCA) is widely applied for its efficiency and compre- hensibility. Most of the PCA-based 1D dimensionality reduction methods first trans- form the sample data to a vector, then construct a covariance matrix to extract fea- tures [16, 42, 51, 101, 66, 132]. However, when the dimensionality of the samples is high, it is difficult to calculate the covariance matrix [59, 69]. Thus, to cope with the limitation of conventional PCA, Yang et al. [117] proposed a 2-dimensional principal component analysis (2DPCA) method that computes the covariance matrix on the original 2D image matrices. Due to the smaller size of an image variance matrix than the original variance matrix, the time to extract image features is small. Nevertheless, 2DPCA only works on the row direction, which can not minimize the dimensionality of feature space. Therefore, Zhang et al. developed a 2-dimensional PCA model that simultaneously considers the row and column directions [21].

Although great progress have been achieved, the PCA-based 2D dimensionality reduc- tion methods might fail to obtain a desirable subspace representation when the distance between two clusters is shorter than that of intra-cluster [120]. In the unsupervised di- mensionality reduction methods, the local structures of data distribution attract more and more attention, such as locality preserving projections (LPP), which describes the adjacency relationships of data points by constructing the similarity matrix [40, 116, 118].

After that, Hu et al. [42] extended the conventional LPP model to its 2D version by pre- determining the adjacency relation in the original 2D image space. Once the similarity graph is determined, it is a fixed matrix in the next procedures, and performance of the dimensionality reduction is mainly determined by the similarity matrix [4, 119, 118]. Since the existence of noises among original data, the LPP-based methods might fail to con- struct an ideal similarity graph [56, 76, 95]. Nie et al. proposed a 1-dimensional adaptive neighbors clustering algorithm called Clustering and Projected Clustering with Adaptive Neighbors (PCAN), which learns the similarity matrix and clustering structure simulta- neously [76]. On the basis of PCAN, Wang et al proposed a Discriminative Unsupervised Dimensionality Reduction (DUDR) method, which also constructs the similarity matrix in the procedure of dimensionality reduction [108]. Zhao et al. proposed a Unsupervised 2D Dimensionality Reduction with Adaptive Structure Learning (DRASL) method, which constructs the similarity matrix by learning the local structures of 2D image space in dimensionality reduction process [131].

2.2.2 Spectral clustering

Spectral clustering is used to learn local geometric structures. The general formula of spectral clustering can be written as

min

FTF=IT r(FTLF), (2.1)

where L is the Laplacian matrix [15]. Most of the spectral-based clustering algorithms construct the similarity matrix ahead of dimensionality reduction. To cope with noises in data, Nie et al. proposed a Clustering and Projected Clustering With Adaptive Neighbors method that incorporates the spectral clustering learning [76]. Local Learning-based Clus-

(23)

tering (LLC) utilizes a kernel regression model for label prediction based on the assumption that the class label of a data point can be determined by its neighbors [114, 113]. However, these methods only focus on the learning of local structures, which may induce overfitting under certain condition. To deal with this problem, Yang et al. proposed a Nonnegative Spectral Clustering with Discriminative Regularization (NSDR) method which takes both the local structures and global structures into account [125]. The objective function of NSDR can be formulated as

min

FTF=IT r(FTLF) +ξΩ(F), (2.2) where ξ 0 is a regularization parameter, T r[FTLF]is associated with local structures, and Ω(F) contain the global discriminative information.

Although, (2.2) is easily applied to obtain the relaxed cluster indicator, the corresponding similarity matrix still is to be constructed on the original data. Therefore, noise may cause a problem for the conventional spectral-based clustering algorithms.

2.3 The proposed method

Let X1, X2, . . . , XN denote a 2D image data set, where N data points are sampled from c clusters. Pij represents the similarity between data points Xi and Xj, (Xi, Xj Rm×n, i, j= 1,2..., N). Let U Rm×u and V Rn×v be the row-directional and column- directional projection matrices respectively.

Notation Description

N the number of data points

c Cluster number

L Laplacian matrix

P Similarity matrix

Pi Thei-th column of matrix P,(i= 1,2...., N)

I Identity matrix

1RN A column vector with all the elements are 1 T r(·) Trace operator

r(·) Rank operator

t The iterative step in the DUGE algorithm γ A regularization parameter of a penalty term

fi The row vector of i-th point in cluster indicator matrix F The cluster indicator matrix

Ω(F) The global discriminative information

λ A large enough value for keeping the csmallest eigenvalue of L equal to zero

ξ A regularization parameter for trade-off of global discriminative information

Sb Between-cluster scatter matrix St Total scatter matrix

Xbi The vector form of 2D image data µ The mean of all data points inXb

Table 2.1. Notation summary

(24)

2.3. THE PROPOSED METHOD

The dimensionality reduction problem is formulated [68, 25] as min

P,U,V

N i,j=1

∥UTXiV −UTXjV∥2FPij+γPij2

s.t. UTU =I, VTV =I, P1=1,0≤Pij 1,1≤i, j≤N (2.3) whereP = [Pij]RN×N. In order to avoid a trivial solution, the penalty termγN

i,j=1Pij2 is imposed into (2.3), where γ is a regularization parameter [76].

Assume that each data point is given a function valuefiR1×c. From [31], we have

N i,j=1

fifj22Pij = 2T r(FTLF), (2.4)

where matrixF = [fT1,fT2, . . . ,fTN]RN×c. L=D−PT2+P is the Laplacian matrix,D is a diagonal matrix withDii=∑

j(Pij +Pji)/2.

According to [17, 70], the number of connected components in the graph associated with similarity matrixP is equal to the multiplicities c of the eigenvalue 0 whenP is nonneg- ative. If r(L) = N−c data points could be assigned to c clusters, we can add the rank constraint to (2.3). Therefore, the problem can be written as

min

P,U,V

N i,j=1

∥UTXiV −UTXjV∥2FPij +γPij2

s.t. P1=1,0≤Pij 1,1≤i, j ≤N, UTU =I, r(L) =N −c. (2.5) Let σi 0 denote the i-th smallest eigenvalue of L. Ifc

i=1σi = 0, then r(L) = N−c holds. Moreover, according to Ky Fan’s theorem [31], we have

c i=1

σi = min

F∈RN×c,FTF=IT r(FTLF). (2.6) Therefore, (2.5) is equivalent to [131]

min

P,U,V

N i,j=1

∥UTXiV −UTXjV∥2FPij +γPij2 +λfifj22Pij

s.t. UTU =I, VTV =I, P1=1,0≤Pij 1,1≤i, j≤N, FTF =I, (2.7) whereλ is a large enough value, which can keep thec smallest eigenvalue ofL equal to zero.

Suppose Xˆ = [ ˆX1,Xˆ2, . . . ,XˆN], where Xˆi RM(i = 1,···, N) is the vector form of 2D data matrix Xi and M = m×n. H = I N111T is the centering matrix. The

(25)

between-cluster scatter matrix and total scatter matrix can be obtained as [125]:

Sb=

c i=1

Nii−µ)(µi−µ)T =XF Fe TXeT, (2.8)

St=

N i=1

( ˆXi−µ)( ˆXi−µ)T =XeXeT, (2.9) where µi is the mean of data points ini-th cluster, µis the mean of all data points in X,ˆ Ni is the number of data points belong to the i-th cluster, Xe = ˆXH. To minimize the covariance of within-cluster scatter matrix and maximize the covariance of between-cluster scatter matrix be, the following formulation is proposed

max

FTF=IT r[(St+µI)1Sb], (2.10) where µI is used to make St+µI invertible. (2.10) is actually the learning of the global discriminative information. Combining (2.8) and (2.9), (2.10) can be reformulated as:

max

FTF=I

T r[FTXeT(XeXeT +µI)1XFe ]. (2.11) Since

T r[FTHF] =T r[FT(I 1

N11T)F] =c−1, (2.12) (2.10) is equivalent to

min

FTF=IT r[FTHF (St+µI)1Sb]. (2.13) Let us recall that our aim is to find two projection matrices. Thus, combining (2.7), (2.11), (2.12) and (2.13), we arrive at

min

P,U,V,F(

N i,j=1

∥UTXiV −UTXjV∥2FPij+γPij2 +λfifj22Pij) +ξΩ(F)

s.t. UTU =I, VTV =I, P1=1,0≤Pij 1,1≤i, j≤N, FTF =I, (2.14) where Ω(F) = T r[FTHF −FTXeT(XbXeT +µI)1XFe ], ξ 0 is a regularization parame- ter [125].

2.3.1 The DUGE algorithm

In this section, we propose an algorithm for (2.14). As a result, projection matrices U andV that map the original data points in high-dimensional space to a lower-dimensional space can be obtained.

When U, V and P are fixed, F can be obtained through solving the following problem min

F∈RN×c,FTF=IT r[FT(2λL+ξR)F], (2.15)

(26)

2.3. THE PROPOSED METHOD

whereR=H−XeT(XeXeT +µI)1X. By setting the derivative with respect toe F to 0,F can be solved.

With P and F are fixed, (2.14) is transformed into

min

UTU=I,VTV=I

N i,j=1

∥UTXiV −UTXjV∥2FPij. (2.16)

Note that

G(U,V) =

N i,j=1

∥UTXiV −UTXjV∥2FPij; (2.17)

Wv =

N i,j=1

Pij(Xi−Xj)V VT(Xi−Xj)T; (2.18) and

Wu =

N i,j=1

Pij(Xi−Xj)TU UT(Xi−Xj). (2.19) Combining (2.17), (2.18) and (2.19), G(U,V) can be obtained

G(U,V) =T r(UTWvU) =T r(VTWuV). (2.20) With fixed V, (2.16) can be written as:

min

UTU=IT r(UTWvU). (2.21) The solution ofU is the orthogonal generalized eigenvectors of Wv corresponding to the u smallest generalized eigenvalues.

In the similar way, with fixedU, optimization problem (2.16) can be written as:

min

VTV=IT r(VTWuV). (2.22) The solution ofV to problem (2.16) is formed by theV eigenvectors corresponding to the v smallest eigenvalues ofWu.

With fixed U, V and F, we can obtainP by tackling the following problem

min

P (

N i,j=1

∥UTXiV −UTXjV∥2FPij+γPij2 +λfifj22Pij)

s.t. P1=1,0≤Pij 1,1≤i, j≤N (2.23) Let us introduce d1ij =∥UTXiV −UTXjV∥2F, d2ij =∥fi−fj22 and dij =d1ij +d2ij and an optimization problem

min

PiT1=1,0Pij1,1jN

Pi+ 1 2γdi2

2, (2.24)

(27)

where Pi,(i= 1,2..., N), is thei-th column of similarity matrix P.

This optimization problem turns to a conventional Euclidean projection problem in the simplex space, and can be solved efficiently by using the methods proposed in [76].

We propose a procedure for (2.14), which is depicted in Algorithm 1.

Algorithm 1:The optimization algorithm of DUGE

Data: Data points X1, X2, . . . XN, the parameters k, c, u, v, ξ and λ. Result: Projection matricesU Rm×u and V Rn×v.

Initialize columniofP,i= 1, . . . , N, by solving the optimization problem min

P1=1,0Pij1

N j=1

∥Xi−Xj2FPij +γPij2;

The initial matrices ofV and U are set as an arbitrary column orthogonal matrix;

Sett=0;

repeat

UpdateLt=DtPtT2+Pt, whereDtRN×N is a diagonal matrix with the i-th diagonal element as

j(Pijt +Pjit)/2;

UpdateFt, whose columns are the ceigenvectors of(2λLt+ξR) corresponding to itsc smallest eigenvalues;

UpdateUt, whose columns are the u eigenvectors ofWtv corresponding to the u smallest eigenvalues in (2.21);

UpdateVt, whose columns are thev eigenvectors ofWtu corresponding to the v smallest eigenvalues in (2.22);

Update thei-th column ofPt,i= 1, . . . , N, by solving the (2.24), where diRN×1 is a vector with the j-th element isdij =d1ij+d2ij;

t=t+ 1;

untilConvergence;

Return the projection matrices U and V.

2.3.2 The solution of parameter γ and matrix P

In our proposed method, γ is an important parameter which is connected with the con- struction of similarity matrixP. In order to ensure each data point has onlykneighbors, the following method is adopted to solve the value ofγ. The Lagrangian function of (2.24) can be formulated as

1 2

Pi+ di

i

2

2−α(PiT11)−βTi Pi. (2.25) Under the condition of KKT [7], the solution ofPij can be obtained by

Pij = (−dij

i +α )

+. (2.26)

(28)

2.3. THE PROPOSED METHOD

In order to ensureP has onlyk non-zero elements, we have {Qiti +α >0 t= 1, . . . , k;

Qiti +α≤0 t=k+ 1, . . . , N; (2.27) where Q is obtained by sorting each row of D with ascending order, D is a matrix with its ij-th element is dij. Additionally, we add a constraint∑N

j=1Pij = 1 on (2.25), andα can be obtained by

α= 1 k + 1

2kγi

k t=1

Qit. (2.28)

Note thatγi satisfies k

2Qik1 2

k t=1

Qit < γi k

2Qi,k+11 2

k t=1

Qit. (2.29)

Without loss of generality, we set γi= k

2Qi,k+11 2

k t=1

Qit, (2.30)

and γ can be solved by

γ = 1 N

N i=1

(k

2Qi,k+11 2

k t=1

Qit )

. (2.31)

2.3.3 Convergence analysis of algorithm

According to the following theorem, the objective function value in the iteration process decreases to the minimal value.

Theorem 1 The inequalityN

i,j=1

(∥Ut+1TXiVt+1 −Ut+1TXjVt+12FPijt+1 +γPijt+12 + λft+1i ft+1j 22Pijt+1

)

+ξΩ(Ft+1) N

i,j=1

(∥UtTXiVt−UtTXjVt2FPijt +γPijt2 + λftiftj22Pijt

)

+ξΩ(Ft) holds in the iterative process, where t is the iteration step.

Proof:

After thet-th iteration of Algorithm 1, the updated U, V, P andF are denoted asUt,Vt, Pt and Ft respectively. Similarly, they are denoted as U = Ut+1, V =Vt+1, P =Pt+1 and F =Ft+1 in the next iteration.

(29)

If we fixedPt, Vt andFt, the following inequality is obtained:

N i,j=1

(∥Ut+1TXiVt−Ut+1TXjVt2FPijt +γPijt2+λftiftj22Pijt )

+ξΩ(Ft)

N i,j=1

(∥UtTXiVt−UtTXjVt2FPijt +γPijt2+λftiftj22Pijt )

+ξΩ(Ft). (2.32)

In the same way, ifPt,Ut and Ft are fixed, we have

N i,j=1

(∥UtTXiVt+1−UtTXjVt+12FPijt +γPijt2+λftiftj22Pijt )

+ξΩ(Ft)

N i,j=1

(∥UtTXiVt−UtTXjVt2FPijt +γPijt2+λftiftj22Pijt )

+ξΩ(Ft). (2.33) when Ut,Vt,Ft are fixed,

N i,j=1

(∥UtTXiVt−UtTXjVt2FPijt+1+γPijt+12+λftiftj22Pijt+1 )

+ξΩ(Ft)

N i,j=1

(∥UtTXiVt−UtTXjVt2FPijt +γPijt2+λftiftj22Pijt )

+ξΩ(Ft). (2.34) With fixed Pt,Vt and Ut, we obtain

N i,j=1

(∥UtTXiVt−UtTXjVt2FPijt +γPijt2+λft+1i ft+1j 22Pijt )

+ξΩ(Ft+1)

N i,j=1

(∥UtTXiVt−UtTXjVt2FPijt +γPijt2+λftiftj22Pijt )

+ξΩ(Ft). (2.35) Additionally, consider the following two inequalities

N i,j=1

∥Ut+1TXiVt+1−Ut+1TXjVt+12FPijt+1

N i,j=1

∥UtTXiVt−UtTXjVt2FPijt, (2.36) and

N i,j=1

(ft+1i ft+1j 22Pijt+1 )

+ξΩ(Ft+1)

N i,j=1

(ftiftj22Pijt )

+ξΩ(Ft) (2.37)

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

The dissertation is divided into ten chapters. Following this introductory chapter, Chapter 2 presents a chronolgy of the authors teaching as background to the

Meanwhile these comic operas operate with several generic traditions of the Hungarian operettas and folk-plays that is the reason why the generic borders fade and merge

After this, the chapter turns to the specific strategies the government and the energy sector have adopted to meet energy transition goals (section “Strategies and Instruments of

Disassembly, Editor, Output Windows: Load dotprodc.dxe By default, VisualDSP++ opens an Output window, a Disassembly window, and an Editor window that displays the source file

According to this section, an investor of a Party may submit a claim to the Tribunal that the other Party has breached an obligation under either Chapter 8 (chapter on

The aim of this chapter is to support this finding through summaris- ing the results of our study on the host phases of potentially toxic elements in urban dust and total

This paper attempts to demonstrate the important linkage between human resource development (HRD) and internal marketing (IM) to achieve the organizational effectiveness

This dissertation is focused on the evolution of mating systems and parental care in regards to three topics where most of my research has concentrated: sexual size dimorphism