• Nem Talált Eredményt

С& П РИ ТЕР Í y'STÊ 14

SYSTEM OVERVIEW

DOSYS / Fig. 1 / is capable of answering queries asked as Czech sentences in written form. The first experimental problem

domain of DOSYS is the enrollment procedure at all the colleges of uharles University, Prague. The system is also intended to supply information concerning agricultural machine spare parts depots of a production company, and to supply information on parts of diagnoses of patients treated in one of the Prague hospitals. So far, DOSYS admits only complete mutually indepen­

dent nueries. It. is not a dialogue system, in some cases, how­

ever, it informs the user that the auer.y should be reworded or that there are some parts of the query that the system has fail­

ed to understand.

The natural language front-end of DOSYS consists of a main program QAS controlling all operations of the entire system and three subprograms - SLOV / dictionary retrieval/, LIAN

/linguistic query analysis and the translation of the query into a formalized shape/ and SYI.'OD / reply synthesis /. The user com­

municates with QAS only.

— -LJ-О —

The database component of DOSYS consists of a system RING and a program INTERPRET. The system RING is based on some ideas of the relational data model. The output of the LIAN program has a form similar to a narrow subset of the GET statements of ALPHA ouery language /1/. The program INTERPRET translates this output to the manipulation language of RING and performs some optimization, as well. In the actual version of DOSYS the feeding and updating of its database and its dictionary is separated from the s.vstem

itself and is performed without any connection to natural langu­

age front-end. The first variant of DOSYS is working in the batch mode.

The input text /nuery/ is constituted by a senuence of word forms separated by spaces. Numbers and signs of punctuation are

eaually considered word forms. The analysis of a ouery is materi­

alized in two steps. The first step consists in the dictionary retrieval performed by the SLOV program rewriting the ouery into the form of a string of words relevant for the reply of the syst­

em. To each relevant word a set of dictionary characteristics required for further linguistic analysis is attributed. To each word the following characteristics can be attributed: morpho­

logical characteristic / in most cases identical with the part of speech indicated in ordinary dictionaries/, semantic characteri­

stic / the name of the semantic class to which the word belongs/, data characteristic / one or more conditions indicating the

relationship of the word to the data stored in the database/ and characteristic of the anticipated context / the phenomena to be

identified in verifying the correct comprehension of the ouery/.

The dictionary implemented is of the form of an oriented tree with labeled nodes and edges. In the first variant of DOSIS the

dictionary is stored in the RING database.

The main principle of the DOSYS linguistic analysis is that of semantic condensation. It is based on the assumption that the factual piece of information in the text is borne by nouns. As far as syntactic aspects are concerned, no classical syntactic analysis is performed - only some structural relations between semantic units are covered. Morphological analysis is not per­

formed at all. It is possible by means of semantic condensation to convert each query into a generalized semantic representation having more or less unified form and to neglect some formally expressed syntactic relations. In eqch query presented to the query system two parts can be distinguished - the object of the user's question and concrete data provided by the user / condit­

ions /. The type of the query determines the operation to be per­

formed / providing the list of items, the sum of items, calculat­

ing the percentage /. The linguistic analysis of the query con­

stitutes the second step of query processing and is performed by the LIAN system of programs. The first program of the system marks some important parts of the query and checks the presence of some features, the second determines the type of the query and the object of the query. The third - sixth programs solve some ambiguities, the seventh processes the queries concerning per­

centage and the eigth processes the conditions /data/.

While the replies of DOSYS are elementary, they are suffi­

cient and well comprehensible. The printed reply consists of three parts: the query itself, the reply and the list of word forms that the system failed to understand. The data retrieved from the database are fed into frames of answers prepared in advance.

115

-The data of DOSYS are maintained in the RING database. It enables creating as many as 125 tables /relations/. The row /tuple/ lengths of relations are 30-900 bytes, the lengths of the keys being 20 bytes at most. The total volume of all data should not exceed 52 Mbytes. The data are stored on magnetic disc in a single physical file with a direct access method. The mani­

pulation language of the RING system is materialized by three parameters of the CALL statement / type of the operation, relat­

ion and tuple specification, data/. It contains operators for defining and deleting relations and for reading, inserting and modifying individual tuples but, owing to its detailed nature, it

is not appropriate for immediate wording of the LIAN output.There­

fore, the program INTERPRET was designed to extend the functions of RING. There are two types of relations in the database: basic and derived. The basic relations contain new data, the function of the derived relations being similar to that of inverted files.

The derived relations are updated automatically whenever some modifications in the basic relations are performed. In the first variant of DOSYS one basic and 38 derived relations are used, the total amount of data being about 20 Mbytes.