• Nem Talált Eredményt

3.3 Evaluation and application

3.3.2 Comparison to PLA-based DTW

Comparison of trend representations

In this section a detailed application of the two proposed algorithms is pre-sented. Word recognition is a popular eld of the application of DTW, hence for qualication of our algorithm, this type of problem is presented. Slant and skew angle of a handwriting is usually constant for a person but inter- and intra-character spacing changes with high variety. In [170] the word "Alexan-dria" written by George Washington is applied for DTW. In their work, cleaned and normalized handwriting image is divided into a given number of columns and three proles ("time series") are generated: projection prole, upper and lower proles. The rst measures the sum of pixel intensities in a column, upper and lower proles show the distance from the upper and lower bound of the image bounding box. In our study, a part of the projection prole of

"Alexandria" is presented to show the dierences between a DTW and a sym-bolic sequence based alignment. The original "time series" are presented in Fig. 3.6.

At rst, the segmentation methods are compared, then the alignment tech-niques, while both are qualied.

Fig. 3.6 and 3.7 show how the two dierent segmentation techniques work.

In Fig. 3.6 segments are vertically shifted from the original data in order to make them visible. The number of segments is an essential parameter of both approach. For PLA, it has to be a priori dened and for triangular episode segmentation it is a function of ltering.

1 50 100

1 50 100

Figure 3.6: Cleaned and normalized original data of two word projection pro-les (solid line) and its PLA representation .

0 20 40 60 80 100 120 140

0 0.2 0.4 0.6 0.8 1

DABCGDABCGDACDABCGDABCGDA

0 20 40 60 80 100 120 140

0 0.2 0.4 0.6 0.8 1

DABCGDABCDABCDABCGDABCGAGDA

Figure 3.7: Episode segmentation of the two - Gauss-ltered - word projection proles.

A cornerstone of the segmentation is to nd an optimal number of segments.

PLA is more robust in this sense, but it is more dicult to understand a set of linear segments then a set of well-dened characters for human observers.

Note that PLA is not a regression technique, linear segments do not cut each other, lowering the number of segments highly decreases the quality of trend representation. Besides, one can see that for a properly chosen equal number of segments, both techniques work very similarly.

A positive attribute of using symbolic representation of trends is that it can easily be converted into input of DTW: dening each character as a single number following a similarity logic, these trends can be warped as well. E.g.

let the conversion logic of the seven primitive episodes from rapid decreasing to rapid increasing segments as follows: B = −15, F = −10, C = −5, G = 0, D= 5, E= 10 and A= 15, so the trend on Fig. 3.7 represented as a converted time series, that is a applicable for DTW.

0 5 10 15 20 25

−20

−10 0 10 20

0 5 10 15 20 25

−20

−10 0 10 20

Figure 3.8: Converted episode segmentation for DTW application.

To get a more detailed qualitative trend representation that handles changes by magnitude and duration, one is able to apply the extended symbolic episode set of fuzzied episodes proposed by Wong et al.[102] and explained in Section 3.1.1.

Comparison of alignments

Both alignments techniques are similar: they are based on a dynamic program-ming matrix and try to nd the cheapest path (or a maximal score) by making the segments (episodes) correspond to each other, even if located elsewhere in the trend. The rst main dierence is that the distances (or transformation weights) are pre-dened for sequence alignment, thus their re-calculation dur-ing online runs is not necessary. As a disadvantage, this algorithm needs more

a priori knowledge to discover every possible type of segments for a particular problem and similarity scores or transformation weights between them have to be chosen carefully. The main input parameter of DTW is the warping constraint that declares the working area near the diagonal of the distance matrix, which is also dicult to choose. Its shape aects the accuracy of the algorithm. Widely-known contstraints are the Sakoe-Chiba Band [129] and the Itakura parallelogram[171].

From this working area comes the conclusion that DTW is only appropriate for signals that are not totally dierent, i.e. their warping path is not far from the main diagonal of the grid.

5 10 15 20 25

Figure 3.9: Warping path of DTW for synchronization of two word projection proles (labels mean number of segments)

D AB C G DA B C G DA CD AB C G D AB C G DA

Figure 3.10: Path of sequence alignment for synchronization of two word pro-jection proles (labels mean episode of the corresponding segments)

In Figures 3.9 and 3.10 the arbitrary mappings of the two alignment algo-rithms are presented. For constraint of DTW a maximal value of 10 percent of

the shorter trend was allowed. Due to lack of space only the alignment of the primitive episodes is presented in Eq. 3.10 (identical match is noted as '|').

DABCGDABCGDA-CDABCGDABC--GDA

||||||||| || |||||||||| |||

DABCGDABC-DABCDABCGDABCGAGDA

(3.10)

Eq. 3.10 shows that 4 of 28 episodes are identical (originally 27, but there was a gap injection to get an equal length of both sequences), that is true for the shape of segments. Further analysis showed that 9 episodes dier at least one fuzzy attribute if the parameters are the following: thresholds of magnitude are 0.1 for medium and 0.3 for large changes, episodes longer than 10 data points are medium-sized and after 30 points duration, they are marked to be long.

As one can see, with equal number of segments, episode sequences can be aligned with fewer transformation steps: four gaps are injected, while DTW warps 8 segments. The converted episode based time warping works identical to Fig. 3.10, that suggests not using PLA for segmentation, as it is a corner-stone of the whole algorithm. Although it is indexable, but a symbolic, higher order representation of a trend is more understandable for the user.

These results are just partially comparable but prove that if a priori knowl-edge is incorporated into the algorithm it can improve the accuracy in contrast to a generally applied technique.