• Nem Talált Eredményt

Attempts and examples for the discovery of hidden information of Concise explanatory dictionary of Hungarian (2nd edition, 2003)

N/A
N/A
Protected

Academic year: 2022

Ossza meg "Attempts and examples for the discovery of hidden information of Concise explanatory dictionary of Hungarian (2nd edition, 2003)"

Copied!
2
0
0

Teljes szövegt

(1)

14 Magyar Számitógépes Nyelvészeti Konferencia

Attempts and examples for the discovery o f hidden information o f Concise explanatory dictionary o f

Hungarian (2nd edition, 2003)

M ártonfi Attila

MTA-ELTB Research Group o f Academic Dictionary o f Hungarian rumc i o n y t u d . hu

Keywords: lexicography, knowledge discovery, etymological statistics, Concise explanatory dictionary o f Hungarian

Knowledge discovery and data m ining - as its p art - are trendy areas o f IT , their aim is utilizing characteristically commercial databases. However die goal (namely extracting as m uch hidden data and unknown patterns as possible by m achine) is essentially is the same as the m ost general goal o f scientific research, therefore a t least partially its approach and toolkit are applicable to lexicographical databases. (Since the size o f lexicographical databases is usually sm aller by orders o f m agnitude than monumental commercial databases occurring w ith the prim er area o f data m ining, the device requirem ent o f the operations is significantly less and the extractable inform ation is more restricted.)

The first notable lexicographical database o f Hungarian is Papp Ferenc’s Reverse- alphabetized dictionary o f the Hungarian language (VégSz.) and its derivative database on PC. The database which is the base o f Papp’s dictionary had four fields m ore than the paper-version: the length in characters, die num ber o f senses in ÉrtSz. (Explanatory dictionary o f the Hungarian language), the etym ology based on Etymological dictionary o f Hungarian, and the usage label given in the head o f entries in ÉrtSz. - because o f typographical reasons these are om itted from the paper-version and its derivative database.

The new edition o f Concise explanatory dictionary ofHungarian (ÉKsz.2) - a s an up- to-date lexicographical project should be - was first prepared as an XM L docum ent, and though its grammatical inform ation (constituting the skeleton o f VégSz.) is substantially m ore poor, with suitable conversions a m ore com plete and m ore m odem data tablet can be generated. It is more modem, because ÉKsz.2 provides up-to-date etym ological facts about the w idest group ofH ungarian words, and it is more com plete, since apart from the part-of-speech and w age labels and the num bers o f drawn senses all o f the entries in this dictionary have the absolute frequency based on Hungarian National Corpus, furtherm ore the word-length in the num ber o f phonemes or syllables can be coded.

W ith some simple queries the generated relational database gives token and type frequency indices o f various etymology, usage label, part-of-speech or num ber o f senses word-groups. Such token frequency indices - for w ant o f a satisfactory database or

(2)

Szeged, 2003 december 10-11 15

corpus background - formerly could not have been calculated; the type frequencies provide the possibility for com parision w ith Papp’s exam inations based on form er

sources.

W ith the toolkit o f data m ining more interesting analyses could be perform ed to discover hidden patterns o f the above param eters by m eans o f extracting association rules.

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

Based on the results of research and data analysis from the application of discovery learning models, it improves speech text writing skills researchers.. It can be concluded

Moreover, with consideration of the existing challenges in the field of drug discovery form medicinal plants, this research topic provides up-to-date snapshot of current knowledge

Any direct involvement in teacher training comes from teaching a Sociology of Education course (primarily undergraduate, but occasionally graduate students in teacher training take

The decision on which direction to take lies entirely on the researcher, though it may be strongly influenced by the other components of the research project, such as the

By examining the factors, features, and elements associated with effective teacher professional develop- ment, this paper seeks to enhance understanding the concepts of

Usually hormones that increase cyclic AMP levels in the cell interact with their receptor protein in the plasma membrane and activate adenyl cyclase.. Substantial amounts of

It may seem to be contradictory that the above mentioned up-to-date dictionaries of phrasal verbs (see Oxford, Collins Cobuild, Cambridge Dictionary of Phrasal Verbs and

Rheological measurements were performed by shearing the suspension at constant shear stress (5 Pa) both in lack of electric field and under the influence of field. Low oscillating