Disambiguation Quality with Tagger Combination

Gy¨orgy Orosz

(Supervisor: dr. G´abor Pr´osz´eky) oroszgy@itk.ppke.hu

Abstract—In case of morphologically rich languages full mor-phological disambiguation is a fundamental task that is known to be more difficult to solve effectively than just providing PoS tags. In our work we overview Hungarian disambiguator tools and present some common tagging combination techniques in order to investigate how these methods and tools could be used together to improve the full annotation accuracy. After analyzing the disambiguators’ error analysis, we introduce a method that jointly picks the proper tagger and lemmatizer tool and harmonizes their output, thus achieving a 28.90% error reduction rate compared to a PurePos, a Hungarian state-of-the-art disambiguator.

Keywords-part-of-speech tagging, morphological disambigua-tion, lemmatizadisambigua-tion, agglutinative languages, natural language processing

I. INTRODUCTION

Part-of-speech tagging is one of the basic and most studied tasks of computational linguistics. There are several freely available tools and algorithms that work with high precision.

However, assigning PoS tags is only a subtask of mor-phological disambiguation. It is also crucial to identify the lemma, which is not a trivial task for languages having a rich morphology like Turkish or Hungarian. Nevertheless, most of the currently available tools only deal with disambiguating morphosyntactic labels; there are only few that do the whole job. Robust and accurate operation of these tools is important, since they are usually parts of larger linguistic processing chains. Thus errors propagating from this level affect the performance of systems performing more complex language processing tasks.

In our paper, we survey taggers that perform full mor-phological disambiguation for Hungarian, investigating and comparing their common errors. Lessons learned from the error analysis help us to combine them successfully to gain better performance.

II. BACKGROUND

First we give a brief overview of full morphological annota-tion tools for Hungarian. After comparing them, we overview commonly used tagger combination techniques. The experi-ments described in this paper were performed on the Hun-garian Szeged Corpus [3] with PoS annotation automatically converted to morphosyntactic tags used by the Hungarian Hu-Mor morphological analyzer [10], [9]. 10% of the corpus was separated for testing and another 10% is used for development and tuning purposes. Each set contains about 7100 sentences,

while the rest, about 57000 sentences, were used for training the systems.

A. Morphological annotation tools

1) PurePos: [8] is an open source hybrid system for full morphological disambiguation. It is based on hidden Markov models, but it can use an integrated morphological analyzer (MA) module as well to tag unseen words and to assign lemmas. The tool uses well-known trigram tagging algorithms, but what distinguishes it from its predecessors is the complete integration of a morphological analyzer, which results in a further boost in its PoS tagging accuracy and also makes high precision lemmatization possible.

2) HuLaPos: [7] is a purely statistical annotation tool based on an SMT¹ decoder. An advantage of applying this methodology to PoS tagging is that it can consider the context in both directions. Moreover, HuLaPos uses a higher order language model than PurePos. On the other hand, HuLaPos has an inferior performance on unseen words, although it utilizes a simple smoothing algorithm that enables it to handle such words to some extent.

3) Magyarlanc: [13], another commonly used tool, is a full processing chain, consisting of a sentence splitter, tokenizer, part-of-speech tagger, lemmatizer, and its latest version even contains a dependency parser. It also contains a built-in mor-phological analyzer based on morphdb.hu [11]. As a tagger, it is reported to attain 96.33% precision on a random 4:1 split of the Szeged Corpus.

B. Tagger combination schemes

The design process of a combined system of classification or annotation tools involves several steps. First, it needs to be examined whether the errors of each system to be combined are different enough for the aggregate system to be likely to outperform the best individual system significantly. Then an appropriate combining algorithm must be found.

A basic combining scheme, which is often used as a baseline, is majority voting. Other, more advanced, combining schemes involve training a top-level classifier for the task of generating the output of the combined system based on outputs of the individual embedded systems. This class of combination schemes is commonly referred to as stacking learners. The top-level classifier may use various features of both the input

1statistical machine translation

141 III. CONCLUSION

A self-referenced digital holographic microscopy was created with a modified Hariharan-Sen interferometer. The tests showed that this setup is able to create the hologram from a target that is in a large volume and illuminating an incoherent light. Also this can be done without any phase retrieving and scanning, so it can be done with a single exposure. This promising result is fulfilling one of the assumptions to use self-referenced holography at freely moving samples. So to make detection of rare, freely moving and fluorescent object such as algae, the next step will be to increase the luminous power of the optical setup that also has a more sensitive camera.

REFERENCES

[1] D. Gabor, “A new microscopic principle”, Natue 161,777, (1948).

[2] Mertz and N.O. Young, “Optical Instruments” Proc.ICO Conf. Opt. Instr.

(London), (1961).

[3] Kozma Adam and Massey Norman, “Bias level reduction of incoherent holograms”, Applied Optics 8,2, 393-397, (1969).

[4] Gary Cochran, “New method of Making Fresnel Transforms with Incoherent light” JOSA, 56, 11, 1513-1517 (1966).

[5] Lohmann, AW , “Wavefront reconstruction for incoherent objects” JOSA, 55, 11, 1555\_1-1556 (1965).

[6] Joseph Rosen, Gary Brooker, “Fluorescence incoherent color holography”, Opt. Express, 15, 5, 2244-2250, (2007).

[7] Y. Sando, M.Itoh, and T. Yatagai, “Holographic three-dimensional display synthesized from three-dimensional Fourier spectra of real existing objects” Optics Letters, 28, 2518, (2003).

[8] Guy Indebetouw, Alouahab El Maghnouji, and Richard Foster,"Scanning holographic microscopy with transverse resolution exceeding the Rayleigh limit and extended depth of focus"JOSA A, Vol. 22 Issue 5, pp.892-898 (2005)

[9] Siegel, Nisan; Rosen, Joseph; Brooker, Gary, "Reconstruction of objects above and below the objective focal plane with dimensional fidelity by FINCH fluorescence microscopy", Optics Express, Vol. 20 Issue 18, pp.19822-19835 (2012)

Improving Hungarian Morphological

Disambiguation Quality with Tagger Combination

Gy¨orgy Orosz

(Supervisor: dr. G´abor Pr´osz´eky) oroszgy@itk.ppke.hu

Keywords-part-of-speech tagging, morphological disambigua-tion, lemmatizadisambigua-tion, agglutinative languages, natural language processing

I. INTRODUCTION

Part-of-speech tagging is one of the basic and most studied tasks of computational linguistics. There are several freely available tools and algorithms that work with high precision.

II. BACKGROUND

while the rest, about 57000 sentences, were used for training the systems.

A. Morphological annotation tools

B. Tagger combination schemes

1statistical machine translation

Gy. Orosz, “Improving hungarian morphological disambiguation quality with tagger combination,”

in Proceedings of the Interdisciplinary Doctoral School in the 2012-2013 Academic Year, T. Roska, G. Prószéky, P. Szolgay, Eds.

Faculty of Information Technology, Pázmány Péter Catholic University.

Budapest, Hungary: Pázmány University ePress, 2013, vol. 8, pp. 141-144.

and the outputs of the bottom-level classifiers when making its decision. The set of features used may have a significant impact on the performance of the combined system.

Finally, decisions to be made by the top-level classifier can be of at least two sorts: it can either always select the output of one of the bottom-level systems, or it can generate an output of its own that may differ from the output of each individual embedded system. When applying the former solution, the errors of the embedded systems determine a theoretical upper limit on the accuracy of the combined system (it can never generate the expected output whenever neither of the embedded classifiers generate it), thus the latter solution seems more beneficial in theory. However, complexity of the annotation task to be performed and the available training data may have an influence on which of these options is feasible and how they perform in practice. If the cardinality of the output annotation and of the features involved in training the classifier is high, there may be either data sparseness or performance problems with the combining classifier, or it may simply become too complicated.

One of the first attempts of combining English PoS taggers was done by Brill and Wu [2]. They propose a memory-based learning system for tagger combination that employs contextual and lexical clues. In their experiments, the solution where the top-level learner always selects the output of one of the embedded taggers outperformed the more general scheme that allowed the output differ from either of the proposed tags.

A comprehensive study by van Halteren et al. [6] presents detailed overview of previous combination attempts using mainly machine learning techniques. Several combination methods are compared and evaluated systematically in the paper. The authors show that cross-validation can be used to train the top-level classifier for an optimal utilization of the training corpus. They found a scheme perform best in their experiments that they characterize as generalized voting, although it is a scheme that can output annotation that may differ from the output of either of the embedded taggers and thus can also be interpreted as a stacking method. However, the cardinality of the tag set and the dimensionality of the feature space was modest compared to that in our case.

A system of different architecture is presented in e.g. Hajiˇc et al. [4]: in contrast to the parallel and hierarchical architec-ture of the systems above, it employs a serial combination of annotators starting with a rule-based morphological analyzer, followed by constraint-based filters feeding a statistical tagger at the end of the chain.

III. ERROR ANALYSIS

As we mentioned above, it is useful to start the design process with an error analysis of the systems to be combined in order to see whether a system combination is likely to improve performance. We present tagging accuracy values of PurePos (PP) and HuLaPos (HLP) measured on the development set in Table I.²Unfortunately, magyarlanc is not directly comparable

2All other measurements in Sections III and IV were also made on the development set.

TABLE I BASE SYSTEM ACCURACIES

Tagging Lemmatization Full disambig.

PurePos 98.57% 99.58% 98.43%

HuLaPos 97.61% 98.11% 97.03%

TABLE II

COMPARISON OFPUREPOS ANDHULAPOS

Tagging Lemmatization Full disambig.

OER(P P, HL) 22.41% 11.66% 21.16%

OER(HLP, P P) 53.58% 80.21% 58.24%

Agreement rate 97.60% 98.02% 96.92%

Both are right on agreement 99.30% 99.85% 99.29%

One is right on disagreement 97.53% 98.89% 97.14%

Oracle 99.26% 99.83% 99.22%

with the others above, since its built-in annotation scheme is not compatible with the HuMor scheme used by the two other tools.

It may not be evident from these values why and how combining these tools can boost performance, but deeper investigation on common errors suggests that chances for success are good.

We use the metric³ OER(A, B) =

(#errors of A only/#all errors) that measures the percentage of the cases where tagger A is wrong but B is correct in proportion of all errors that were made by either A or B. We do not use the complementarity formula proposed by Brill et al.[2], because that gives hard-to-interpret unlimited negative values in cases where there is a significant overlap between the errors made by the two taggers. Although HuLaPos makes more errors than PurePos, own error rates (Table II) indicate that error distribution is fairly balanced between the two tools for tagging and full disambiguation. In addition, we calculated the agreement rate of the tools and the relative percentage of times they agree on the right morphological annotation. Table II also shows that one of them assigns the right annotation most of the time they disagree. Assuming a hypothetical oracle that can always select the better annotation output, the performance of the better tagger can be increased by more than 0.6%

corresponding to 72.73% relative error rate reduction on the development set. These results encourage us to combine the two tools.

IV. ANNOTATION TOOL COMBINATIONS

It was shown previously [2], [6] that stacking of classifiers can improve PoS tagging accuracy. For an optimal utilization of the training material, we applied training with cross-validation in our experiments, as it was suggested by some authors, e.g. [6]. The training set was split into 5 equal-sized parts and level-0 taggers (PurePos and HuLaPos) were trained 5 times using 4/5 of the corpus, and the rest was annotated by both taggers in each round. The union of these automatically annotated parts of the training corpus was used to train the top-level (or top-level-1) metalearner. Thus the full training data was

3OER=own error rate

TABLE III

FEATURE SETS USED IN THE EXPERIMENTS ID Base FS Additional features

FS1 Brill-Wu —

FS2 FS1 whether the word contains a dot or hyphen FS3 FS1 use at most 5-character suffixes

FS4 FS2, FS3 —

FS5 FS1 guessed tags for the second words (right,left) FS6 FS4 use at most 10-character suffixes

available for level-1 training, yet separating the two phases of the training process. In addition, this workflow made the full training material available also for the level-0 learners.

As the use of a “relatively global, smooth” level-1 learner is suggested in [12], we investigated the na¨ıve Bayes (NB) classi-fier and instance-based (IB) learners⁴[1], which, in addition to be simple, were shown to perform well in sequence classifier combination tasks. We used the instance-based combinators as follows. Given the high agreement rate of the level-0 taggers on correct events, we decided to use all metalearners only in cases of disagreement. After extracting features for a word on the annotation of which the tools disagree, the classifier finds the most similar previously seen case(s) based on Euclidean distance, and it selects the output of the annotation tool that generated the correct output in the most similar case(s). We decided to use the tagger-picking approach, since Hungarian has a tag set with a cardinality of over a thousand and an almost unlimited vocabulary, which suggests that the tag-picking approach would not be feasible.⁵

Brill et al. [2] proposed a simple but powerful feature set (FS1 in Table III) that consists of the word to be tagged, its immediate neighbors and all their suggested tags. We intended to extend this feature set systematically to make them better fit languages with a very productive morphology like Hungarian.

Several experiments were run⁶ in order to investigate whether using word shape (FS2,FS4), word suffix (FS3,FS4,FS6) or wider contextual features (FS5) can improve the performance of tagger selection (see Table III).

In our experiments, the na¨ıve Bayes classifier (NB) per-formed significantly worse than the instance-based learners (IB) even when using seemingly independent features. More-over, lemmatizer combination turned out to be an almost insoluble task for it, as error rate reduction data in Table IV show. It is also interesting that accommodating word shape features (FS2) always increased tagging accuracy. The results show that using longer suffix features is beneficial in cases where assigning a lemma is part of the task. However, for combining part-of-speech taggers only, omitting the word form and using at most five-character-long suffix features gives the best result.

We describe below how these combinations were applied to improve automatic Hungarian morphological annotation

4The C4.5 decision tree algorithm was also tested, but it was unable to handle the large amount of feature data involved in our experiments.

5We plan to verify this assumption in the future. Nevertheless, all experi-ments described in this paper used the tagger-picking model.

6The WEKA [5] machine learning software was used in the experiments.

TABLE IV

ERROR RATE REDUCTION USING METALEARNERS WITH DIFFERENT

In document Proceedings of the Doctoral School, Faculty of Information Technology, Pazmany Peter Catholic University (Pldal 140-144)