• Nem Talált Eredményt

Languages and automata

N/A
N/A
Protected

Academic year: 2022

Ossza meg "Languages and automata"

Copied!
28
0
0

Teljes szövegt

(1)

Languages and automata

László Kabódi

(2)

Parse tree

Parse tree for context free grammars

I You can create parse tree for a givenx word andG CF grammar

I The root is the starting variable I the leaves are terminals

I each inner node is a non-terminal

I if the children of a non-terminalA isB1,B2, . . . ,Bn from left to right, then there is anA→B1B2. . .Bn rule inG

I the leaves from left to right givex

(3)

Parse tree

Example

Create a parse tree for the following wordx with the following grammarG.

G :S →S+S|S∗S|(S)|a x=a+a∗a

(4)

Parse tree

One solution

S →S+S →a+S →a+S∗S →a+a∗S →a+a∗a S

S

a

+ S

S

a

S

a

(5)

Parse tree

Another solution

S →S∗S →S+S ∗S →S +S∗a→S+a∗a→a+a∗a S

S

S

a

+ S

a

S

a

(6)

Ambiguity

Ambiguous language

I If a word has two or more dierent parse trees, it is derived ambiguously.

I A grammar is ambiguous if it has a word that can be derived ambiguously.

I A language is ambiguous if all of its grammars are ambiguous.

(7)

Ambiguity

I The previous language is not ambiguous.

I The grammar

S →S +S|S∗S|(S)|a is ambiguous.

I But the grammar E →E +T|T T →T ∗F|F F →(E)|a is not ambiguous.

(8)

Ambiguity

Proof for the unambiguity

I Proof by induction on the length of the created wordw I |w|=1⇒w =a⇒E →T →F →a

I Suppose is holds for all |w|<n, then show for|w|=n I If there is a+ inw outside a parentheses (or no parentheses

at all), then the derivation must generate the last +rst, using theE →E+T. ThenE an T generate subwords that are shorten thann, so can be generated only one way.

I Otherwise if there is a ∗in w outside a parentheses, then the last∗ has to be generated rst byE →T →T ∗F.

I Otherwise there word is (E), which can be generated only one way: E →T →F →(E).

(9)

Ambiguity

Why ambiguity important?

I Arithmetics: the parse tree can be viewed as the order the operations should be performed. If there are multiple such orders with dierent results, than there is no clear answer to a calculation.

I The second, unambiguous grammar generates the usual order in which the operations should be performed.

I Programming languages: the source code should be unambiguous. Otherwise one code could produce dierent results, depending on the compiler.

(10)

Removingε-rules

Removing ε -rules

I Let G be a grammar that is almost context-free, but has some A→εrules

I We can change the rules to be context-free, and keep the generated language the same

I Corollary: There can beε on the right hand side of some rules, it still can be converted to CF grammar.

(11)

Removingε-rules

Algorithm

1. Find the nullable non-terminals (a non-terminal is nullable, ifε can be derived from it):

1.1 N1is the set of all non-terminals, where there is anAεrule:

N1={AV |(Aε)D}

1.2 Ni is the union ofNi−1and the non-terminals, where there is a rule fromAto only non-terminals fromNi−1:

Ni=Ni−1∪ {AV | ∃Aα, αNi− 1}

1.3 IfNi =Ni−1, stop, we have all the nullable non-terminals, N=Ni

1.4 Change the rules:

I For everyAαrule, we create new rules, where we leave out all combinations of nullable non-terminals fromα

I Remove allAεrules

I IfSN, then create a new initial non-terminalS0, and new rules: S0S|ε

(12)

Removingε-rules

Example

Removeε-rules from the following grammar:

S →ABCD A→CD|AC B→Cb C →a|ε D→bD|ε

(13)

Removingε-rules

Solution

The nullable variables:

N0 ={C,D}

N1 ={C,D} ∪ {A}

N2 ={A,C,D} ∪ ∅=N1⇒N ={A,C,D} So the new rules:

S →ABCD|BCD|ABD|ABC|BD|BC|AB|B A→CD|C|D|AC

B→Cb|b C →a D→bD|b

(14)

Removing unit rules

Removing unit rules

I A unit rule is a rule, where on the right hand side there is only one non-terminal: A→B.

I Algorithm:

1. Create a graph from the unit rules:

I Each left hand side is a node

I there is a directed edge for each unit rule 2. Remove the unit rules from the grammar

3. For each non-terminal, create new rulesAα, whereαis the right hand side of the rules, where the right hand side is a non-terminal that can be reached fromAin the previous graph

(15)

Removing unit rules

Example I

Remove the unit rules from the following grammar:

S →ABCD|BCD|ABD|ABC|BD|BC|AB|B A→CD|C|D|AC

B→Cb|b C →a D→bD|b

(16)

Removing unit rules

Solution I

Unit rule graph:

S

B

A

C D

The new rules:

S →ABCD|BCD|ABD|ABC|BD|BC|AB|Cb|b A→CD|a|bD|b|AC

B→Cb|b C →a D→bD|b

(17)

Removing unit rules

Example II

Remove the unit rules from the following grammar:

A→aA|B B→bB|C C →cC

(18)

Removing unit rules

Solution II

Unit rule graph:

A

B

C

The new rules:

A→aA|bB|cC B→bB|cC C →cC

(19)

Removing unit rules

Example III

Remove theε-rules and unit rules from the following grammar:

S →ASA|aB A→B|S B→b|ε

(20)

Removing unit rules

Solution III

Nullable non-terminals:

N0 ={B}

N1 ={B} ∪ {A}

N2 ={A,B} ∪ ∅=N1 ⇒N={A,B} The new rules:

S →ASA|SA|AS|S|aB|a A→B|S

B→b

(21)

Removing unit rules

Solution III

Unit rule graph:

A

B S

The new rules:

S →ASA|SA|AS|aB|a A→b|ASA|SA|AS|aB|a B→b

(22)

Removing unneeded symbols

Removing unneeded symbols

I Two types of unneeded symbols

I Not terminating symbols: cannot nish a derivation from it I Unreachable symbols: cannot be reached from the initial

non-terminal

(23)

Removing unneeded symbols

Removing not terminating symbols

I Collect the terminating symbols, then remove all rules containing any other symbol

I B0 = Σ: the terminals will terminate

I Bi =Bi−1∪ {A∈V | ∃A→α, α∈Bi− 1}: a symbol will terminate, if there is a rule, where all symbols in it will terminate

I If Bi =Bi−1, thenB =Bi the terminating symbols I V0 =V ∩B the new set of non-terminals

(24)

Removing unneeded symbols

Removing unreachable symbols

I Collect all reachable symbols, and leave every other out from the grammar

I T0 ={S}: the initial variable is always reachable

I Ti =Ti1∪ {Z ∈V0∪Σ| ∃A→α,A∈Ti−1,Z ∈α}: a symbol is reachable, if there is a rule, where it is ain the right hand side, and the left hand side is reachable

I If Ti =Ti−1, thenT =Ti the reachable symbols

I Remove all symbols not inT fromV0 andΣ: V00=Ti∩V0 andΣ0 =Ti ∩Σ

I Remove all rules that contain anything not in T

I Always remove not terminating symbols before unreachable ones

(25)

Removing unneeded symbols

Example

Remove all unneeded symbols from the following grammar:

S →AB|a A→B|b

(26)

Removing unneeded symbols

Solution

Not terminating symbols:

B0 ={a,b} B1 =B0∪ {S,A}

B2 =B1∪ ∅=B1 ⇒B =B2 V0 =V ∩ {a,b,S,A}={S,A} The new rules:

S →a A→b

(27)

Removing unneeded symbols

Solution

Unreachable symbols:

T0={S} T1=T0∪ {a}

T2=T1∪ ∅=T1 ⇒T =T2

V00=V0∩T ={S}andΣ0 = Σ∩T ={a} The new rules:

S →a

(28)

Some cosure properties of context free languages

I Let L1 andL2 be two context free languages

I Then L1∪L2,L1L2 andL1 are also context free languages I Proof for union:

I BecauseL1andL2are context free languages, there are context free grammarsG1 andG2 for them.

I Let the non-terminals ofG1 andG2be dierent.

I Create a new starting non-terminalS, and a new rule:

S S1|S2,

whereS1is the starting symbol ofG1, andS2 forG2. I The other rules are the rules fromG1 andG2

I If there isS1εor S2ε, then eliminateεrules

I Proof for concatenation: same as above, but the new rule is S →S1S2

I Proof for transitive closure: same as above, but the new rule is S →S1S|ε, then removeεrules

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

In this paper we study the asymptotic behaviour of so- lutions of delay differential equations when the right hand side of equation can be estimated by the maximum function using a

Exploiting specific structure of retarded functional differential equations (such as right-hand side being independent of the function derivative in second order equations) one

An existence result is proved for systems of differential equations with multi- ple constant delays, time-dependent coefficients and the right-hand side depending on

On the dorsal side of hands, a total number of 680 missed areas were found aggregated for the participants, out of which 359 were on the right hand, and 321 on the left

On the other hand, right-side-dominant participants produced less absolute position errors (2.82° ± 0.72°) with the non-dominant leg compared to left-side-dominant young

uniform boundedness implies – in case of continuous right-hand side – the existence of the global (pullback) attractor, and allows to restrict the analysis to a bounded absorbing

The data mining algorithm discovers alarm frequent episodes and an expert (security specialist) sets the right-hand-side (false alarm or real alarm) of the rules.. Obviously this

The bulk of the rock material is a thick bedded sandstone (sandstone banks), while in the middle of the cliff face silty laminated clay and sandstone occur.. The numerous fractures