Glyph Identification Based on Topological Analysis

Raymond Pardede, Loránd Tóth, Gábor Hosszú, Ferenc Kovács

Abstract:

This paper proposes a novel mathematical model that describes the logical relationship among glyphs belong to the same script. The proposed model is presented as three logical layers namely Topology, Visual Identity, and Phonetic layers. In the Topology Layer, a unique glyph that represents a grapheme is defined by a set of geometrical properties. Furthermore in the Topology Layer, the relation between two different glyphs is defined by the number of topological transformation steps required to transform the shape of one glyph into another. In the Visual Identity Layer, the glyphs of a single grapheme share some topological attributes in common. Some graphemes of a script from different age may have similar Common Identity. For that particular case, to be able to distinguish them, the evaluation must be extended by means of phonetic attribute of the graphemes. The article presents a potential implementation of the proposed three-layer hierarchical grapheme model.

Introduction:

The writing system is a symbolic representation of a language described in terms of linguistic units [1], where the grapheme (most typically a letter) is the smallest semantically distinguishing fundamental unit in a particular writing system.

Likewise, glyph refers to a specific shape, a representation of a grapheme. Different glyphs can be certain representations of the same, abstract grapheme.

The studies related to glyphs of particular script are challenging topics for archeologists and anthropologists, one of them are deciphering undeciphered glyph discovered through excavation, reading patterns in glyphs transformation, etc. The effective software may accelerate the research time and to provide more accurate result through the automatic process. Producing that software needs a support of a solid mathematical model.

Therefore, our main challenge is to develop such descriptive mathematical model as a useful framework for the archeologist and anthropologist in supporting their research.

Such model can be used for describing how a glyph could transform from one shape to another, or try to estimate how other external parameters (e.g. writing instrument) could affect the transformation of a glyph.

Literature Review and Objectives:

The writing system of a spoken language changes periodically after being established [2]. The changes can occur as changing of set of symbols, which encompasses the shape transformation of a glyph. A reason of writing system alteration is the establishment of more advanced writing media or instruments.

The establishment of more advanced writing technology introduces new writing technique, which impacts the shape of the glyph of a grapheme.

Deciphering, decomposing, classifying, and recognizing pattern of glyphs are the main challenges to be answered by the proposed model presented in the followings.

The Proposed Grapheme Model:

The developed grapheme model is formed by using three approaches as the following: (i) topological based, (ii) visual identity based, and (iii) phonetic based approach, and is represented as layer-based model, see Figure 1.

Figure 1: A hierarchical, three-level model for the grapheme In the Topology Layer a single glyph is described by a complete set of geometrical attribute. As an example, a glyph YYYY can be defined as “a glyph that has a single and not rotated vertical line where the top-edge of the line is located at the very top-center of the shape and the bottom-edge of the line is located at the very bottom-center of the shape”.

In this layer, one glyph shape can be transformed into another by applying a chain of basic operators. These operators can be applied onto the whole glyph, or only onto a part of them so called element. The elements of a glyph can be determined by using of the following method: suppose that we write on the paper by using a pen, when the pen touches the paper followed by drawing a line and then pulled up leaving the paper it is identified as a single element, see Figure 2 (a) for some examples.

The transformation from one glyph into another can be expressed in general by the mathematical equation (1). The G₁ is the initial topological structure of one glyph before the transformation and G2 is the topological structure as the result of the transformation of G1. T is the transformation matrix composed of set of basic operators O in sequence.

) ( ₁

T G

G

⁼ (1)

For instance, the Rotating basic operator is expressed as O₃ and has parameter Ө to indicate the degree of the rotation, see equation (2). As an example, see Figure 2 (b).

) , (

' ₃ ₁

O G

G

⁼ ⁽²⁾

G₁ G₁'

Figure 2: (a) How to determine elements of a grapheme (b) An example of a Rotating operator is applied to a glyph When a basic operator Oi is applied only for an element of a glyph, the element is

before the Topology Model is iterated. Using (2), when a basic operator Rotation is applied only for En of G1, (2) must be modified resulting the Equation (3) as follows.

) necessary to distinguish a grapheme from another in a given writing system. In Figure 3, one can recognize all the glyph variants as grapheme “A” since they have the same visual identity. The Common Identity consists of two components: The first component is a loop shape which has to be on the top, and the second component is the symmetrical feet that support the figure.

Figure 3: The grapheme “A” in many glyph variants and its Common Identity The Visual Identity Model relies on the principle that glyphs of one grapheme have common visual elements representing the grapheme. The main differences between the topology based and the visual identity based approaches are defined in the Table 1.

Normalized glyph Common Identity Layer, where it is

defined and used Topology Layer Visual Identity Layer Way of description Ideal essence of the real

glyph Selection of topology attributes

Table 1: Difference between the Topology and the Visual Identity layers The phonetic approach focuses on distinguishing a grapheme by defining its associated set of sound values. In our approach, the Phonetic Layer in the grapheme model (shortly Phonetic Model), the attributes used in the Phonetic Layer are the sounds. Considering the inscriptions, the glyphs used in these inscriptions symbolized various sound values.

In the Phonetic Layer, it is necessary to examine that which phonetic values can be represented by each glyph. If a set of sound values can be defined, the glyphs with Common Identity representing the same set of sound values can be identified as the glyphs of a certain grapheme.

Examples of Glyph Model Implementation: A possible application of the introduced three-level grapheme model is the identification the meaning of still-undeciphered inscriptions [6]. There are several archaeological findings with undeciphered texts on their surfaces. Their graphemes typically differ from the normalized shapes of the

alphabets; therefore it is very hard to identify an undeciphered symbol to the normalized glyph of a grapheme in an alphabet. In several cases one symbol could be identified as different graphemes. In order to decide, which is the good way in deciphering the inscription; it is useful to calculate how many topological operations are necessary to apply in sequence in order to transform a symbol of the inscription into a certain glyph of a grapheme. After calculation the sequence of the topological operation for each symbol of the inscription and each glyphs of the graphemes of an alphabet, the “similarity distance” between a symbol of the inscription and each graphemes can be calculated by comparing the number of necessary operators in the determined sequences.

One example can be seen on Figure 4. The relic from around 900 in was found in Bodrog Village in Somogy County (Hungary) by the archaeologist Magyar on the 24^th of March 1999 [7]. The size of the relic is 6.2 cm × 4.2 cm, its thickness is 1.2 cm [4].

Damage

Personalization

Figure 4 – Application of the three-level grapheme model

The number of possibilities of reconstructing damaged, still undeciphered glyphs is very high. Therefore, this problem needs a large computation effort, which makes the cluster based supercomputing technology as a required solution. The computational background of the described glyph identification method is presented in Figure 5.

Figure 5 – The computation scheme of the glyph identification problem Conclusions: The paper presented a three-layer approach to the grapheme modeling, which is consisted of the Phonetic Layer, Visual Identity Layer, and Topology Layer. The main features of these models with the distinguishing tools being applicable in each layer are introduced. Moreover, two possible applications of the grapheme model, the tie-dependent modeling of the evolution of a script and the scientific verification of the deciphering of an inscription are also described. The use of the proposed new approach is demonstrated on examples.

Acknowledgement: The work reported in the paper has been developed in the framework of the project „Talent care and cultivation in the scientific workshops of BME" project. This project is supported by the grant TÁMOP - 4.2.2.B-10/1--2010-0009 References

[1] Malatesha Joshi, R. & Aaron, P. G. (2006, eds.): Handbook of Orthography and Literacy. Routledge

[2] Henry Rogers (1999): Sociolinguistic factors in borrowed writing systems. In:

Toronto Working Paper in Linguistics, Vol. 17 (1999), pp. 247-262

[3] P. N. Tan, M. Steinbach, & V. Kumar (2006): Introduction to Data Mining. Addison-Wesley

[4] Gábor Hosszú (2011): Heritage of Scribes. The Rovas Scripts’ Relations to Eurasian Writing Systems. First Edition. Budapest, ISBN 978-963-88437-4-6

[5] Joshi R. Malatesha & P.G. Aaron, (2006, eds.): Handbook of Orthography and Literacy. Routledge

[6] Gábor Vékony (1999): 10. századi székely felirat a Somogy megyei Bodrog határában. História, No. 8, Vol. 1999, pp. 30–31.

[7] Kálmán Magyar (1999): Előzetes jelentés a bodrog-bűi X. századi vasolvasztó műhely régészeti kutatásairól. In Hagyományok és újítások a korai középkori vaskohászatban, Ed.: János Gömöri. Sopron-Somogyfajsz, 1999.

Classical and Quantum Communication with Superactivated

In document Proceedings of the PhD Conferences organised by the Doctoral Schools of the BME, in the framework of TÁMOP-4.2.2/B-10/1-2010-0009 (Pldal 100-105)