• Nem Talált Eredményt

LSTM Models to Forecast Usage Parameters of MapReduce

5.4 Results and Discussion

5.4.3 Prediction accuracy and impact of sample size

The normalized root mean square error (NRMSE) [79] is used to evaluate the prediction accuracy of forecasting model. The range of NRMSE is between 0% to 100%. Typically, the lower NRMSE represents the higher prediction accuracy. The formula is as follows

N RM SE = RM SE

(ymax−ymean), (5.19)

whereRM SE comes from (5.18) and the dominatorymax−ymean represents the normal-ization factor.

5.4.3.1 Performance baseline model

In order to evaluate the effectiveness of the forecasting models, the corresponding persis-tence model is build up. This kind of model uses historical observation at time instanttto predict the expected outcome at the next time instantt+ 1. Furthermore, the NRMSE of persistence model is set up as the relevant performance baseline of each usage parameter forecasting model. When test NRMSE is lower than this performance baseline, it proves that the relevant forecasting model makes sense on prediction, otherwise is not suitable for forecasting. Roughly speaking, the performance baseline is a threshold to evaluate whether the corresponding forecasting model is effective on prediction or not. Persistence models are defined in Table 5.4.

Model Name Full Name of Persistence model

PST_CPU Persistence model predicting CPU usage PST_MEM Persistence model predicting memory usage PST_RIO Persistence model predicting read rate PST_WIO Persistence model predicting write rate

Table 5.4. Abbreviation expressions of persistence models

5.4.3.2 Prediction accuracy comparison

Figures 5.7, 5.8, 5.9 and 5.10 depict the prediction accuracy comparison for four MapRe-duce applications. We use performance baseline NRMSE as the threshold of forecasting effectiveness and the height of various histogram as performance accuracy. The model can be used to predict the usage parameter when test NRMSE is lower than the related performance baseline NRMSE and vice versa. The less height of histogram represents better prediction accuracy.

Figure 5.7 presents the test NRMSE values comparison of CPU usage forecasting model of four MapReduce applications. The observations show that RG_CP U_P I has the bigger NRMSE than P ST_CP U of Pi, whereas the rest applications present the lower NRMSE than performance baseline NRMSE. It means RG_CP U_P I make no sense on predicting CPU usage. Moreover, all LST_CP U exhibit a significant improvement in forecasting performance, yet LST2_CP U are ineffective to significantly decrease the corresponding NRMSE. Notably, LST_CP U_W C decreases the NRMSE of predictions 4.17% comparing to RG_CP U_W C.

The test NRMSE of memory usage forecasting models are shown in Figure 5.8. Except for the NRMSE ofLST_M EM_P Idecreased significantly, all others presented either similar

5.4. RESULTS AND DISCUSSION

Wordcount Pi Terasort Teragen

0 5 10 15

Normalized RMSE (%)

CPU Usage Prediction

NRMSE of Persistence Model NRMSE of Linear regression Model NRMSE of One-hidden-layer LSTM model NRMSE of Two-hidden-layer LSTM model

Figure 5.7. NRMSE of forecasting models for CPU usage

or larger NRMSE with the performance baseline model. As we know, Pi application focuses on computing the Pi value. The rest of the applications have significant data-intensive characteristics, and memory usage gets 100%. Thus, the prediction of memory usage by either multivariate LSTM or multiple linear regression is meaningless for data-intensive application yet is only sufficient to CPU-intensive-only applications.

Wordcount Pi Terasort Teragen

0 5 10 15

Normalized RMSE (%)

Memory Usage Prediction

NRMSE of Persistence Model NRMSE of Linear regression Model NRMSE of One-hidden-layer LSTM model NRMSE of Two-hidden-layer LSTM model

Figure 5.8. NRMSE of forecasting models for predicting memory usage

Figure 5.9 shows the test NRMSE comparison of forecasting models for the read rate pre-diction. The observations show that all NRMSE values of the forecasting model are lower than 15%,LST_RIO_W C and LST_RIO_T S present significant improvement in pre-diction accuracy than the performance baseline model. Regression models cannot achieve higher prediction accuracy. Note that all LST2_RIO are ineffective to significantly in-crease prediction accuracy except increasing model training time. However, Teragen and Pi, non-read-intensive application, shows the small difference in prediction accuracy for both kinds of forecasting models due to its extremely less variance. For example, most of the read rate of Pi is 0MB/s.

Wordcount Pi Terasort Teragen

0 5 10 15

Normalized RMSE (%)

Read Rate Prediction

NRMSE of Persistence Model NRMSE of Linear regression Model NRMSE of One-hidden-layer LSTM model NRMSE of Two-hidden-layer LSTM model

Figure 5.9. NRMSE of forecasting models for predicting read rate

Figure 5.10 presents test NRMSE comparison of write rate forecasting models. The ob-servations show that all models have the NRMSE of lower than 15%; most of them can effectively predict write rate for relevant applications. LST_W IO_T S,LST_W IO_T G, and LST2_W IO_T Gshowed significant improvement in prediction accuracy. The com-mon characteristic of Teragen and Terasort is write-intensive property. Although the models of remaining applications both present meaningful prediction, their accuracies are lower than the linear regression model. Furthermore, no evidence provedLST2_W IOcan achieve a better prediction thanLST_W IO, and a one-hidden-layer model could achieve the best test NRMSE.

Wordcount Pi Terasort Teragen

0 5 10 15

Normalized RMSE (%)

Write Rate Prediction

NRMSE of Persistence Model NRMSE of Linear regression Model NRMSE of One-hidden-layer LSTM model NRMSE of Two-hidden-layer LSTM model

Figure 5.10. NRMSE of forecasting models for predicting write rate

5.4.3.3 Impact of sample size

The sample size generally has a significant impact on fitting quality of forecasting models.

The different machine learning algorithms have various needs on sample size for higher pre-diction accuracy. In this case, we respectively fit linear regression and LSTM models with

5.4. RESULTS AND DISCUSSION

the different sample size for four benchmark MapReduce applications. Conventionally, the training data is used to fit model while test data is for evaluation prediction accuracy.

Therefore, a group of ordered forecasting models are built upon training dataset which sample size takes from 100 to the largest size with a fixed increment for every MapReduce application. To guarantee the fairness of performance evaluation, the prediction of the same application for different forecasting models takes consistent test data. It is worth emphasizing that Terasort is an exception whose initial sample size 500, and the corre-sponding increment is 500 due to its’ huge amount of sample data. Moreover, test NRMSE is applied to evaluate the prediction accuracy of each fitted model. Figures 5.11 to 5.14 shows the impact of training data size to prediction accuracy of forecasting models for each usage parameters.

As can be seen in Figure 5.11, the sample size has less impact on the prediction accu-racy of RG_CP U. However, LST_CP U are very sensitive to the small sample size.

LST_CP U_P I and REG_CP U_P I are both insensitive to sample size due to trun-cating preparing the time of running Pi application. LST_CP U_P I showed a lower NRMSE than REG_CP U_P I. As the size increases to some extent, all LST_CP U present better prediction accuracy than RG_CP U. Therefore, the sufficient sample size can guarantee more accurate prediction for LST_CP U.

100 500 1000 1500 2000 2500 2900

200 4060 80

Normalized RMSE [%]

Wordcount_LSTM Wordcount_Linear

100 300 500 700 900 1100 1300 1500 1800 0

20 40 60 80

Normalized RMSE [%]

Pi_LSTM Pi_Linear

500 2000 4000 6000 8000 10000

Sample Size 0

10 20 30

Normalized RMSE [%]

Terasort_LSTM Terasort_Linear

100 300 500 700 900 1100 1300 1500 1700 1900 Sample Size

0 10 20 30

Normalized RMSE [%]

Teragen_LSTM Teragen_Linear

Modelling Sensitiviy Test for CPU usage

Figure 5.11. Sensitivity comparison of CPU usage forecasting models

Figure 5.12 compares the impact of sample size on the prediction accuracy of memory usage. It is observed that most RG_M EM are non-sensitive to sample size as well as performs better prediction performance thanLST_M EM. In other words, the less sample size can lead to a stableRG_M EM. In contrast, it is found thatLST_M EM are sensi-tive to sample size comparing toRG_M EM, and less sample size will result in significant declining on the prediction accuracy. The impact of sample size on prediction accuracy for read rate forecasting is showed in Figure 5.13. Similarly, RG_RIO is non-sensitive to sample size, whereas LST_RIO is sensitive. The exception is Pi application. Either RG_RIO_P I or LST_RIO_P I is non-sensitive to sample size. However, as training dataset size increasing, the prediction accuracy of LST_RIO significantly improved ex-cept for models of Pi application. Thus, the results show that the stable RG_RIO are non-sensitive to sample size comparing toLST_RIO.

Finally, Figure 5.14 exhibits the impact comparison of write rate prediction as the sample size is increasing. RG_W IOshows similar non-sensitive behaviour withRG_RIOwhile LST_W IO is sensitive to sample size. However, a sufficient sample size can effectively improve prediction accuracy. For Teragen (write rate-intensive application) showed

signif-100 500 1000 1500 2000 2500 2900 0

20 40 60 80

Normalized RMSE [%]

Wordcount_LSTM Wordcount_Linear

100 300 500 700 900 1100 1300 1500 1800 0

20 40 60 80

Normalized RMSE [%]

Pi_LSTM Pi_Linear

500 2000 4000 6000 8000 10000

Sample Size 0

10 20 30

Normalized RMSE [%]

Terasort_LSTM Terasort_Linear

100 300 500 700 900 1100 1300 1500 1700 1900 Sample Size

0 10 20 30

Normalized RMSE [%]

Teragen_LSTM Teragen_Linear

Modelling Sensitiviy Test for Memory Usage

Figure 5.12. Sensitivity comparison of memory usage forecasting models

100 500 1000 1500 2000 2500 2900

0 5 10 15 20 25

Normalized RMSE [%]

Wordcount_LSTM Wordcount_Linear

100 300 500 700 900 1100 1300 1500 1800 0

1 2 3 4

Normalized RMSE [%]

Pi_LSTM Pi_Linear

500 2000 4000 6000 8000 10000

Sample Size 0

5 10 15 20 25

Normalized RMSE [%]

Terasort_LSTM Terasort_Linear

100 300 500 700 900 1100 1300 1500 1700 1900 Sample Size

02 46 108 1214 1618

Normalized RMSE [%]

Teragen_LSTM Teragen_Linear

Modelling Sensitiviy Test for Read Rate

Figure 5.13. Sensitivity comparison of read rate forecasting mod-els

icant improvement on prediction accuracy as the sample size is sufficient. The NRMSE of RG_W IO_T S also has apparent decreasing. However, the applications which are non-intensive on write rate are non-sensitive to sample size. Thus, the results show that the stable RG_W IO are non-sensitive to sample size andLST_W IO of write rate-intensive application can achieve higher accurate prediction accuracy when the sample size is suffi-cient.

Overall, the multiple linear regression models can be acquired by less sample size, yet the LSTM model is not. Simultaneously, it is observed that LSTM forecasting models of in-tensive resources achieve more accurate prediction when the sample size is sufficient. This result brings more practical values when the resource-intensive property of applications is given.

5.4.3.4 Overfitting and underfitting evaluation

As a consensus, overfitting [39] and underfitting are often seen in machine learning pro-cess. If a statistical model is overfitted, it represents that more parameters are involved than can be justified by the data. Yet the occurrence of underfitting indicates that this statistical model cannot adequately capture the underlying structure of the data due to

5.4. RESULTS AND DISCUSSION

100 500 1000 1500 2000 2500 2900

0 5 10 15 20 25

Normalized RMSE [%]

Wordcount_LSTM Wordcount_Linear

100 300 500 700 900 1100 1300 1500 1800 0

5 10 15 20 25

Normalized RMSE [%]

Pi_LSTM Pi_Linear

500 2000 4000 6000 8000 10000

Sample Size 0

5 10 15 20 25

Normalized RMSE [%]

Terasort_LSTM Terasort_Linear

100 300 500 700 900 1100 1300 1500 1700 1900 Sample Size

0 5 10 15 20 25

Normalized RMSE [%]

Teragen_LSTM Teragen_Linear

Modelling Sensitiviy Test for Write Rate

Figure 5.14. Sensitivity comparison of write rate forecasting mod-els

some parameters or terms might being missing [30]. Typically, we identify if a model is overfitting, underfitting or good fit by comparing train and test error of the corresponding model. The significantly larger test error indicates that the overfitting exists, yet the less test error represents underfitting. Likewise, the tiny difference between train and test error refers to good fit. In this case, we calculated train and test NRMSE of each model and showed them in Figure 5.15.

CPU usage memory usage read rate write rate

0 5 10 15 20 25

Normalized RMSE (%)

Wordcount

Train NRMSE Test NRMSE

CPU usage memory usage read rate write rate

0 5 10 15 20 25

Normalized RMSE (%)

Pi

Train NRMSE Test NRMSE

CPU usage memory usage read rate write rate

0 5 10 15 20 25

Normalized RMSE (%)

Terasort

Train NRMSE Test NRMSE

CPU usage memory usage read rate write rate

0 5 10 15 20 25

Normalized RMSE (%)

Teragen

Train NRMSE Test NRMSE

Figure 5.15. Overfitting vs Underfitting

As can be seen from Figure 5.15,LST_CP U_W C,LST_M EM_P Iand allLST_W IO showed good fit except Terasort. The differences between test and train NRMSE are less than 1%. In addition, LST_RIO_W C showed slightly overfitting and LST_CP U_T G presented slightly underfitting. Note that the worse overfitting occurs atLST_CP U_P I due to extremely small variance of CPU usage. In contrast, all forecasting models of Terasort showed commonly underfitting. This phenomenon might be due to various com-puting pattern in map phase and reduce phase. Figure 5.16 depicts the division of phase for MapReduce application over time.

Start

End Reduce phase

Map phase

Figure 5.16. Map phase vs Reduce phase

In Figure 5.16, it is observed that map and reduce phases are not clearly separated and mixed phase (overlap part) happened for a period of time. In this work, four MapReduce applications have different computational goal and algorithm. Therefore, they present var-ious phase characteristics and unpredictable consuming time in each phase. Like Terasort, the consumed time is 3728 seconds for map phase and 8800 seconds for pure reduce phase.

In contrast, map phase of Wordcount occupied 3724 seconds and pure reduce phase was only 2 seconds. Likewise, Pi application presents similar long-map and short-reduce phase characteristics in consuming time aspect with Wordcount. Furthermore, a kind of special application, like Teragen, only have map phase for generating data.

By combining mechanism of MapReduce and Figure 5.15 and 5.16, we concluded that the major reason might be different resource-consuming pattern in Map and Reduce phase.

The huge difference between two phases, therefore, might lead to heavy underfitting be-havior. In training process, training dataset of Terasort is the first 10000 seconds data that mixing the whole map and the part reduce phase while test data comes from pure re-duce phase. According to this underfitting, therefore, we recommend separately modeling usage parameters for Terasort on Map and reduce phase modeling respectively. Although these overfitting and underfitting occurs, LSTM models still exhibits powerful prediction capability than regression models in predicting intensive-resource usage.

5.4.3.5 Two-phase modeling approach

We model the usage parameters of Terasort for two different phases: map phase and pure reduce phase, respectively. Based on the captured log files in the executing process, we divided the dataset into map and reduce phase sample. Then, we apply the learning Hyperparameters and prediction algorithms elaborated in previous sections again to model and forecast resource usage. Two groups of differently optimal Hyperparameters are set

5.4. RESULTS AND DISCUSSION

to their LSTM models. The comparison of train and test NRMSE for forecasting models are depicted in Figure 5.17..

CPU usage memory usage read rate write rate

(a) Overall

0 5 10 15 20 25

Normalized RMSE (%)

Terasort Application

Train NRMSE Test NRMSE

CPU usage memory usage read rate write rate

(b) Map Phase

0 5 10 15 20 25

Normalized RMSE (%)

Train NRMSE Test NRMSE

CPU usage memory usage read rate write rate

(c)Reduce Phase

0 5 10 15 20 25

Normalized RMSE (%)

Train NRMSE Test NRMSE

Figure 5.17. Overall modeling vs separated phase modeling

Figure 5.17 consists of three sub-figures: (a) prediction accuracy on an overall sample dataset, (b) prediction accuracy on map phase (including a pure map and mixed-phase) sample dataset, (c) prediction accuracy on pure reduce phase sample dataset. Sub-figure (a) showed the heavy underfitting on all usage parameters prediction. In map phase mod-els (sub-figure (b)), NRMSE of all usage parameters predictions shows the removing of underfitting, which represents the forecasting models are valuable. Particularly, read and write rate predictions achieved excellent fit performance and CPU usage had slightly over-fitting. Moving to sub-figure (c), the test and train error of CPU usage presents incredible similarity, which represents good prediction accuracy of LSTM model. Unfortunately, in the reduce phase, it can be seen that read and write rate LSTM models still have moderate underfitting.

As a consequence, the two-phase modeling approach for a specific application can correctly reveal usage parameters pattern than overall modeling, such as Terasort with long-term reduce phase. For instance, CPU usage prediction of Terasort in reduce phase achieves the best forecasting performance. Additionally, read and write rate predictions in map phase perform high accuracy also. Therefore, the results indicate that MapReduce application with long-term reduce phase should adopt phase modeling approach. Reversely, ones with extremely short reduce phase also can achieve excellent prediction performance using overall modeling approach, such as Wordcount application.