• Nem Talált Eredményt

3.4 Results

3.4.3 Evaluation

In this section the statistical analysis of the results is performed using the Bayesian T-Test introduced in Chapter 1.3. First, the raw results for all datasets are presented and evaluated.

Márton Szemenyei 56/130 FEATURE COMPRESSION

The results (Tables 3.2 and 3.3) demonstrate that the proposed methods outperform the classical ones for random structured composite classes. The one exception is the overlapping noisy dataset, where our methods trade some of the classification accuracy for better within instance separation. The results for the special datasets also meet our expectations. The CN and ∆N datasets are shown to confuse LDA and SDA, while our methods perform remarkably well on them. The ∆µ dataset proves to be difficult for LDA, exactly as predicted in Chapter 3.2.1.

Table 3.2: Results of the algorithms for the four synthetic datasets. Our methods are in bold.

Dataset Disjunct D. Noisy Overlapping O. Noisy

Metric (%) awi an ac awi an ac awi an ac awi an ac LDA 89 100 100 88 98 100 87 100 100 87 96 99 SDA 96 100 100 96 97 99 92 100 100 81 87 95 SDA-IC 96 100 100 93 90 98 100 100 100 88 84 95 SCDA 97 100 100 93 88 96 88 97 100 93 88 97 SSCDA 100 100 100 99 99 100 100 100 100 93 90 98 Wilks 100 100 100 89 76 86 100 100 100 88 73 78

Table 3.3: Results of the algorithms for the special synthetic datasets.

Dataset CN ∆N ∆µ

Metric (%) awi an ac awi an ac awi an ac

LDA 76 59 69 87 100 100 79 52 52

SDA 88 68 87 89 100 100 100 100 100 SDA-IC 99 65 83 100 100 100 100 100 100 SCDA 100 69 84 97 100 100 100 100 100 SSCDA 100 69 86 100 100 100 100 100 100 Wilks 100 69 83 100 100 100 100 100 100

The results (Table 3.4) for the other datasets indicate that the proposed methods are able to provide better results on 3D shape graphs. The difference between the algorithms is much more subtle on the synthetic dataset, which is likely due to smaller noise.

The results (Tables 3.5 and 3.6) also demonstrate the efficiency of the proposed methods for image classification, with SCDA and SSCDA consistently outperforming the baseline methods. Interestingly, SDA, SDA-IC and the Wilks method are bested by LDA on some datasets.

We have also evaluated the performance of the classification method by using full SURF features as a baseline (Table 3.7). When using SURF features, no dimension

Table 3.4: The algorithms’ performance on 3D shape graph datasets.

Dataset Synthetic Synth. Images Real Images Results (%) awi an ac awi an ac awi an ac

LDA 53 78 83 71 62 67 51 57 56

SDA 63 77 84 58 70 78 67 65 65

SDA-IC 71 74 86 61 78 83 71 63 64 SCDA 83 73 87 74 77 81 65 74 74 SSCDA 81 76 86 76 71 78 65 75 75 Wilks 67 79 85 69 64 74 75 69 69 Table 3.5: The methods’ accuracy on image datasets with few (<5) classes.

Dataset VML Action UIUC Cars GRAZ02

Results (%) awi an ac awi an ac awi an ac

LDA 48 61 68 45 75 77 60 60 72

SDA 63 60 71 52 78 79 62 68 79

SDA-IC 60 61 71 54 78 81 64 63 75 SCDA 57 63 78 58 86 88 72 63 77 SSCDA 50 64 80 56 87 93 72 64 82 Wilks 55 64 79 44 86 89 62 62 74 Table 3.6: The methods’ accuracy on image datasets with many (>5) classes.

Dataset CVA Objects VGG Flowers UIUC Birds Results (%) awi an ac awi an ac awi an ac

LDA 45 34 55 43 32 42 47 39 49

SDA 47 34 54 44 35 50 42 29 46

SDA-IC 49 34 54 45 35 52 44 28 41 SCDA 50 37 60 46 36 50 51 26 52 SSCDA 54 38 61 47 37 52 46 32 58 Wilks 43 36 57 42 38 52 42 32 55

reduction was performed. The results seem to indicate that SSCDA is able to outperform raw SURF features on most datasets.

As a quick and simple illustration of the efficiency of the methods, the algorithms were ranked on all datasets using the ac and awi criteria, and the average rank achieved by each method was computed. The results (Fig. 3.4) show that the SCDA and SSCDA methods achieve the lowest average rank on both criteria.

In the next part of the analysis, the 95% credible intervals (CI) are computed for

Márton Szemenyei 58/130 FEATURE COMPRESSION

Table 3.7: The comparison of our methods to the SURF feature extractor

Dataset SDA-IC SCDA SSCDA SURF

Results (%) awi an ac awi an ac awi an ac awi an ac

VML 60 61 71 57 63 78 50 64 80 60 65 78

CAR 54 78 81 58 86 88 56 87 93 47 83 85

GRAZ 64 63 75 72 63 77 72 64 82 60 59 60

CVA 49 34 54 50 37 60 54 38 61 41 39 49

VGG 45 35 52 46 36 50 47 37 52 42 37 47

BIRD 44 28 41 51 26 52 46 32 58 47 41 48

Figure 3.4: Average ranks of the different methods.

the size of the improvement brought by the proposed methods. To get the credi-ble intervals, the Bayesian alternative of the paired samples t-test [110] based on BEST [36] is used. This statistic is computed for bothawi and ac, while comparing the proposed methods against LDA and SDA.

Table 3.8: The 95%credible intervals of the Bayesian t-tests

Baseline LDA SDA

Metric awi ac awi ac

SDA-IC [3.5, 14] [1.7, 8.4] [0.8, 5.8] [1.6, 1.1]

SDCA [5.6, 15] [1.2, 13] [0.86, 9.4] [0.04, 4.5]

SSDCA [7, 15] [3.1, 15] [2.2, 10] [0.86, 6.4]

Wilks [1.7, 13] [1.4, 12] [1.7, 6.2] [4, 3.8]

The results (Table 3.8) show that while the proposed methods tend to perform better

or the same as LDA or SDA, only SCDA and SSCDA have acredibly positive effect size on both measures against both LDA and SDA. The full results of the individual t-tests are found in Figures A.4-A.19 in the Appendix.

−5 0 5 10

0.000.050.100.150.20

Data w. Post. Pred.

SSCDAawi and SCDAawi

Probability

N=16 Mean difference

µdiff

−4 −2 0 2 4 6

median=0.79

23.1% < 0 < 76.9%

95% HDI

−1.4 3.2

Std. Dev. of difference

σdiff

2 4 6 8 10 12

median=4.1

95% HDI

2.1 6.4

Effect Size

(µdiff0) σdiff

−0.5 0.0 0.5 1.0 1.5

median=0.19

23.1% < 0 < 76.9%

95% HDI

−0.34 0.73

−4 −2 0 2 4 6

0.000.050.100.150.200.250.30

Data w. Post. Pred.

SSCDAac and SCDAac

Probability

N=16 Mean difference

µdiff

−2 −1 0 1 2 3 4

median=1.5

1.3% < 0 < 98.7%

95% HDI

0.18 2.8

Std. Dev. of difference

σdiff

2 4 6 8

median=2.5

95% HDI

1.5 3.7

Effect Size

(µdiff0) σdiff

−0.5 0.0 0.5 1.0 1.5 2.0

median=0.62

1.3% < 0 < 98.7%

95% HDI

0.069 1.2

Figure 3.5: Bayesian t-test between the SSCDA and SCDA methods for awi (top) and ac (bottom).

Moreover, the two best methods were also compared (Fig. 3.5), getting the credible intervals [1.4, 3.2] for awi, with a 76.9% probability that SSCDA outperforms SCDA on node separation, and[0.18, 2.8]for acwith a 98.7% probability that SS-CDA performs better in terms of classification accuracy. This allows us to conclude that SSCDA is the superior method of the two.

The SURF descriptor was also compared to SSCDA (Fig. 3.6) using the same Bayesian test. The results are [5.9, 16] for awi, with a 85.7% probability that SSCDA outperforms the SURF descriptors at node separation, and [1.3, 18] for ac with a 98.3% probability that SSCDA performs better at classification accuracy.

This proves that SSCDA is good way to extract features from an image, since it cred-ibly outperforms SURF at classification accuracy. This is unsurprising, since the SURF descriptor was not optimized for structured composite image classification.

Márton Szemenyei 60/130 FEATURE COMPRESSION

−10 −5 0 5 10 15

0.000.010.020.030.040.050.06

Data w. Post. Pred.

SSCSURFVals[, 1] and SSCSURFVals[, 3]

Probability

N=6 Mean difference

µdiff

−40 −20 0 20 40 60

median=5.1

14.3% < 0 < 85.7%

95% HDI

−5.9 16

Std. Dev. of difference

σdiff

0 20 40 60

median=10

95% HDI

3.9 23

Effect Size

(

µdiff−0

)

σdiff

−2 −1 0 1 2 3 4

median=0.50

14.3% < 0 < 85.7%

95% HDI

−0.47 1.5

0 5 10 15 20

0.000.020.040.060.08

Data w. Post. Pred.

SSCSURFVals[, 2] and SSCSURFVals[, 4]

Probability

N=6 Mean difference

µdiff

−10 0 10 20 30 40 50

median=9.5

1.7% < 0 < 98.3%

95% HDI

1.3 18

Std. Dev. of difference

σdiff

0 20 40 60 80

median=8.0

95% HDI

3.0 18

Effect Size

(

µdiff−0

)

σdiff

0 2 4 6 8

median=1.2

1.7% < 0 < 98.3%

95% HDI

0.0085 2.6

Figure 3.6: Bayesian t-test between the SSCDA and SURF methods for awi (top) and ac (bottom).

Comparing methods for rank selection

In this section the methods for selecting the rank of the within instance scatter matrix are compared. The methods were evaluated for all datasets, and the total accuracy ((awi +ac)/2) and the number of dimensions kept were computed. For these tests, the SSCDA algorithm was used. Similarly, the Bayesian t-test is used to compare the different methods. The iterative method is used as a baseline to compare the other methods against, since it evaluates all possible values, therefore it is bound to find the the best possible accuracy at the lowest number of dimensions.

Table 3.9 shows the results of the various rank selection methods.

Table 3.10 shows the results of the test for the two criteria. The results justify adjusting the rank of the within instance scatter, since without it the SSCDA method produces clearly suboptimal results both in terms of classification accuracy and

Table 3.9: Comparison of methods for selecting the rank of the within instance scatter matrix.

Note that the number of dimensions is also reported as a percent of the maximum

Dataset Synthetic Shape Recogn. Image Classif.

Results (%) awi ac #dim awi ac #dim awi ac #dim

breakpoint 94 97 13 59 70 1.3 48 75 1.2

information retained 93 99 29 67 80 1 52 63 10

class-number based 86 98 26 65 73 1.4 54 64 5

iterative 100 100 21 71 81 1.1 54 74 1.8

no rank adjustment 89 78 49 46 31 26 7 73 99

dimension reduction. Among the rank adjustment methods, iterative trial and error clearly outperforms all other methods on accuracy. It is worth noting however, that the number of dimensions retained by the other methods is usually similar. This means that it may be a viable strategy to use one of these methods to find an initial value, and check the surrounding few values for the optimum. The full results are found in Figures A.20-A.33 in the Appendix.

Table 3.10: The95% credible intervals of the Bayesian t-tests

Baseline no rank adjustment iterative

Metric ac #dim ac #dim

breakpoint [1.1, 15] [81, 36] [5, 1.9] [2.5, 1.5]

information retained [2.0, 14] [75, 43] [6.8, 1.8] [3.7, 8.0]

class-number based [2.3, 15] [77, 38] [5.2, 1.9] [0.79, 6.9]

iterative [5.0, 19] [82, 40] N/A N/A