Languages and automata

(1)

Languages and automata

László Kabódi

(2)

Parse tree

Parse tree for context free grammars

I You can create parse tree for a givenx word andG CF grammar

I The root is the starting variable I the leaves are terminals

I each inner node is a non-terminal

I if the children of a non-terminalA isB₁,B₂, . . . ,B_n from left to right, then there is anA→B₁B₂. . .Bn rule inG

I the leaves from left to right givex

(3)

Parse tree

Example

Create a parse tree for the following wordx with the following grammarG.

G :S →S+S|S∗S|(S)|a x=a+a∗a

(4)

Parse tree

One solution

S →S+S →a+S →a+S∗S →a+a∗S →a+a∗a S

S

a

+ S

S

a

∗ S

a

(5)

Parse tree

Another solution

S →S∗S →S+S ∗S →S +S∗a→S+a∗a→a+a∗a S

S

a

+ S

a

∗ S

a

(6)

Ambiguity

Ambiguous language

I If a word has two or more dierent parse trees, it is derived ambiguously.

I A grammar is ambiguous if it has a word that can be derived ambiguously.

I A language is ambiguous if all of its grammars are ambiguous.

(7)

Ambiguity

I The previous language is not ambiguous.

I The grammar

S →S +S|S∗S|(S)|a is ambiguous.

I But the grammar E →E +T|T T →T ∗F|F F →(E)|a is not ambiguous.

(8)

Ambiguity

Proof for the unambiguity

I Proof by induction on the length of the created wordw I |w|=1⇒w =a⇒E →T →F →a

I Suppose is holds for all |w|<n, then show for|w|=n I If there is a+ inw outside a parentheses (or no parentheses

at all), then the derivation must generate the last +rst, using theE →E+T. ThenE an T generate subwords that are shorten thann, so can be generated only one way.

I Otherwise if there is a ∗in w outside a parentheses, then the last∗ has to be generated rst byE →T →T ∗F.

I Otherwise there word is (E), which can be generated only one way: E →T →F →(E).

(9)

Ambiguity

Why ambiguity important?

I Arithmetics: the parse tree can be viewed as the order the operations should be performed. If there are multiple such orders with dierent results, than there is no clear answer to a calculation.

I The second, unambiguous grammar generates the usual order in which the operations should be performed.

I Programming languages: the source code should be unambiguous. Otherwise one code could produce dierent results, depending on the compiler.

(10)

Removingε-rules

Removing ε -rules

I Let G be a grammar that is almost context-free, but has some A→εrules

I We can change the rules to be context-free, and keep the generated language the same

I Corollary: There can beε on the right hand side of some rules, it still can be converted to CF grammar.

(11)

Removingε-rules

Algorithm

1. Find the nullable non-terminals (a non-terminal is nullable, ifε can be derived from it):

1.1 N₁is the set of all non-terminals, where there is anA→εrule:

N₁={A∈V |(A→ε)∈D}

1.2 Ni is the union ofNi−1and the non-terminals, where there is a rule fromAto only non-terminals fromNi−1:

Ni=Ni−1∪ {A∈V | ∃A→α, α∈N_i−^∗ ₁}

1.3 IfN_i =N_i−₁, stop, we have all the nullable non-terminals, N=N_i

1.4 Change the rules:

I For everyA→αrule, we create new rules, where we leave out all combinations of nullable non-terminals fromα

I Remove allA→εrules

I IfS∈N, then create a new initial non-terminalS⁰, and new rules: S⁰→S|ε

(12)

Removingε-rules

Example

Removeε-rules from the following grammar:

S →ABCD A→CD|AC B→Cb C →a|ε D→bD|ε

(13)

Removingε-rules

Solution

The nullable variables:

N₀ ={C,D}

N₁ ={C,D} ∪ {A}

N₂ ={A,C,D} ∪ ∅=N₁⇒N ={A,C,D} So the new rules:

S →ABCD|BCD|ABD|ABC|BD|BC|AB|B A→CD|C|D|AC

B→Cb|b C →a D→bD|b

(14)

Removing unit rules

I A unit rule is a rule, where on the right hand side there is only one non-terminal: A→B.

I Algorithm:

1. Create a graph from the unit rules:

I Each left hand side is a node

I there is a directed edge for each unit rule 2. Remove the unit rules from the grammar

3. For each non-terminal, create new rulesA→α, whereαis the right hand side of the rules, where the right hand side is a non-terminal that can be reached fromAin the previous graph

(15)

Removing unit rules

Example I

Remove the unit rules from the following grammar:

S →ABCD|BCD|ABD|ABC|BD|BC|AB|B A→CD|C|D|AC

(16)

Removing unit rules

Solution I

Unit rule graph:

S

B

A

C D

The new rules:

S →ABCD|BCD|ABD|ABC|BD|BC|AB|Cb|b A→CD|a|bD|b|AC

(17)

Removing unit rules

Example II

Remove the unit rules from the following grammar:

A→aA|B B→bB|C C →cC

(18)

Removing unit rules

Solution II

Unit rule graph:

A

B

C

The new rules:

A→aA|bB|cC B→bB|cC C →cC

(19)

Removing unit rules

Example III

Remove theε-rules and unit rules from the following grammar:

S →ASA|aB A→B|S B→b|ε

(20)

Removing unit rules

Solution III

Nullable non-terminals:

N₀ ={B}

N₁ ={B} ∪ {A}

N₂ ={A,B} ∪ ∅=N₁ ⇒N={A,B} The new rules:

S →ASA|SA|AS|S|aB|a A→B|S

B→b

(21)

Removing unit rules

Solution III

Unit rule graph:

A

B S

The new rules:

S →ASA|SA|AS|aB|a A→b|ASA|SA|AS|aB|a B→b

(22)

Removing unneeded symbols

I Two types of unneeded symbols

I Not terminating symbols: cannot nish a derivation from it I Unreachable symbols: cannot be reached from the initial

non-terminal

(23)

Removing not terminating symbols

I Collect the terminating symbols, then remove all rules containing any other symbol

I B₀ = Σ: the terminals will terminate

I Bi =Bi−1∪ {A∈V | ∃A→α, α∈B_i−^∗ ₁}: a symbol will terminate, if there is a rule, where all symbols in it will terminate

I If B_i =Bi−1, thenB =B_i the terminating symbols I V⁰ =V ∩B the new set of non-terminals

(24)

Removing unreachable symbols

I Collect all reachable symbols, and leave every other out from the grammar

I T₀ ={S}: the initial variable is always reachable

I Ti =Ti−1∪ {Z ∈V⁰∪Σ| ∃A→α,A∈Ti−1,Z ∈α}: a symbol is reachable, if there is a rule, where it is ain the right hand side, and the left hand side is reachable

I If T_i =Ti−1, thenT =T_i the reachable symbols

I Remove all symbols not inT fromV⁰ andΣ: V⁰⁰=T_i∩V⁰ andΣ⁰ =T_i ∩Σ

I Remove all rules that contain anything not in T

I Always remove not terminating symbols before unreachable ones

(25)

Example

Remove all unneeded symbols from the following grammar:

S →AB|a A→B|b

(26)

Solution

Not terminating symbols:

B₀ ={a,b} B₁ =B₀∪ {S,A}

B₂ =B₁∪ ∅=B₁ ⇒B =B₂ V⁰ =V ∩ {a,b,S,A}={S,A} The new rules:

S →a A→b

(27)

Solution

Unreachable symbols:

T₀={S} T₁=T₀∪ {a}

T₂=T₁∪ ∅=T₁ ⇒T =T₂

V⁰⁰=V⁰∩T ={S}andΣ⁰ = Σ∩T ={a} The new rules:

S →a

(28)

Some cosure properties of context free languages

I Let L₁ andL₂ be two context free languages

I Then L₁∪L₂,L₁L₂ andL^∗₁ are also context free languages I Proof for union:

I BecauseL₁andL₂are context free languages, there are context free grammarsG₁ andG₂ for them.

I Let the non-terminals ofG₁ andG₂be dierent.

I Create a new starting non-terminalS, and a new rule:

S →S₁|S₂,

whereS₁is the starting symbol ofG₁, andS₂ forG₂. I The other rules are the rules fromG₁ andG₂

I If there isS₁→εor S₂→ε, then eliminateεrules

I Proof for concatenation: same as above, but the new rule is S →S₁S₂

I Proof for transitive closure: same as above, but the new rule is S →S₁S|ε, then removeεrules