Languages and automata
László Kabódi
Parse tree
Parse tree for context free grammars
I You can create parse tree for a givenx word andG CF grammar
I The root is the starting variable I the leaves are terminals
I each inner node is a non-terminal
I if the children of a non-terminalA isB1,B2, . . . ,Bn from left to right, then there is anA→B1B2. . .Bn rule inG
I the leaves from left to right givex
Parse tree
Example
Create a parse tree for the following wordx with the following grammarG.
G :S →S+S|S∗S|(S)|a x=a+a∗a
Parse tree
One solution
S →S+S →a+S →a+S∗S →a+a∗S →a+a∗a S
S
a
+ S
S
a
∗ S
a
Parse tree
Another solution
S →S∗S →S+S ∗S →S +S∗a→S+a∗a→a+a∗a S
S
S
a
+ S
a
∗ S
a
Ambiguity
Ambiguous language
I If a word has two or more dierent parse trees, it is derived ambiguously.
I A grammar is ambiguous if it has a word that can be derived ambiguously.
I A language is ambiguous if all of its grammars are ambiguous.
Ambiguity
I The previous language is not ambiguous.
I The grammar
S →S +S|S∗S|(S)|a is ambiguous.
I But the grammar E →E +T|T T →T ∗F|F F →(E)|a is not ambiguous.
Ambiguity
Proof for the unambiguity
I Proof by induction on the length of the created wordw I |w|=1⇒w =a⇒E →T →F →a
I Suppose is holds for all |w|<n, then show for|w|=n I If there is a+ inw outside a parentheses (or no parentheses
at all), then the derivation must generate the last +rst, using theE →E+T. ThenE an T generate subwords that are shorten thann, so can be generated only one way.
I Otherwise if there is a ∗in w outside a parentheses, then the last∗ has to be generated rst byE →T →T ∗F.
I Otherwise there word is (E), which can be generated only one way: E →T →F →(E).
Ambiguity
Why ambiguity important?
I Arithmetics: the parse tree can be viewed as the order the operations should be performed. If there are multiple such orders with dierent results, than there is no clear answer to a calculation.
I The second, unambiguous grammar generates the usual order in which the operations should be performed.
I Programming languages: the source code should be unambiguous. Otherwise one code could produce dierent results, depending on the compiler.
Removingε-rules
Removing ε -rules
I Let G be a grammar that is almost context-free, but has some A→εrules
I We can change the rules to be context-free, and keep the generated language the same
I Corollary: There can beε on the right hand side of some rules, it still can be converted to CF grammar.
Removingε-rules
Algorithm
1. Find the nullable non-terminals (a non-terminal is nullable, ifε can be derived from it):
1.1 N1is the set of all non-terminals, where there is anA→εrule:
N1={A∈V |(A→ε)∈D}
1.2 Ni is the union ofNi−1and the non-terminals, where there is a rule fromAto only non-terminals fromNi−1:
Ni=Ni−1∪ {A∈V | ∃A→α, α∈Ni−∗ 1}
1.3 IfNi =Ni−1, stop, we have all the nullable non-terminals, N=Ni
1.4 Change the rules:
I For everyA→αrule, we create new rules, where we leave out all combinations of nullable non-terminals fromα
I Remove allA→εrules
I IfS∈N, then create a new initial non-terminalS0, and new rules: S0→S|ε
Removingε-rules
Example
Removeε-rules from the following grammar:
S →ABCD A→CD|AC B→Cb C →a|ε D→bD|ε
Removingε-rules
Solution
The nullable variables:
N0 ={C,D}
N1 ={C,D} ∪ {A}
N2 ={A,C,D} ∪ ∅=N1⇒N ={A,C,D} So the new rules:
S →ABCD|BCD|ABD|ABC|BD|BC|AB|B A→CD|C|D|AC
B→Cb|b C →a D→bD|b
Removing unit rules
Removing unit rules
I A unit rule is a rule, where on the right hand side there is only one non-terminal: A→B.
I Algorithm:
1. Create a graph from the unit rules:
I Each left hand side is a node
I there is a directed edge for each unit rule 2. Remove the unit rules from the grammar
3. For each non-terminal, create new rulesA→α, whereαis the right hand side of the rules, where the right hand side is a non-terminal that can be reached fromAin the previous graph
Removing unit rules
Example I
Remove the unit rules from the following grammar:
S →ABCD|BCD|ABD|ABC|BD|BC|AB|B A→CD|C|D|AC
B→Cb|b C →a D→bD|b
Removing unit rules
Solution I
Unit rule graph:
S
B
A
C D
The new rules:
S →ABCD|BCD|ABD|ABC|BD|BC|AB|Cb|b A→CD|a|bD|b|AC
B→Cb|b C →a D→bD|b
Removing unit rules
Example II
Remove the unit rules from the following grammar:
A→aA|B B→bB|C C →cC
Removing unit rules
Solution II
Unit rule graph:
A
B
C
The new rules:
A→aA|bB|cC B→bB|cC C →cC
Removing unit rules
Example III
Remove theε-rules and unit rules from the following grammar:
S →ASA|aB A→B|S B→b|ε
Removing unit rules
Solution III
Nullable non-terminals:
N0 ={B}
N1 ={B} ∪ {A}
N2 ={A,B} ∪ ∅=N1 ⇒N={A,B} The new rules:
S →ASA|SA|AS|S|aB|a A→B|S
B→b
Removing unit rules
Solution III
Unit rule graph:
A
B S
The new rules:
S →ASA|SA|AS|aB|a A→b|ASA|SA|AS|aB|a B→b
Removing unneeded symbols
Removing unneeded symbols
I Two types of unneeded symbols
I Not terminating symbols: cannot nish a derivation from it I Unreachable symbols: cannot be reached from the initial
non-terminal
Removing unneeded symbols
Removing not terminating symbols
I Collect the terminating symbols, then remove all rules containing any other symbol
I B0 = Σ: the terminals will terminate
I Bi =Bi−1∪ {A∈V | ∃A→α, α∈Bi−∗ 1}: a symbol will terminate, if there is a rule, where all symbols in it will terminate
I If Bi =Bi−1, thenB =Bi the terminating symbols I V0 =V ∩B the new set of non-terminals
Removing unneeded symbols
Removing unreachable symbols
I Collect all reachable symbols, and leave every other out from the grammar
I T0 ={S}: the initial variable is always reachable
I Ti =Ti−1∪ {Z ∈V0∪Σ| ∃A→α,A∈Ti−1,Z ∈α}: a symbol is reachable, if there is a rule, where it is ain the right hand side, and the left hand side is reachable
I If Ti =Ti−1, thenT =Ti the reachable symbols
I Remove all symbols not inT fromV0 andΣ: V00=Ti∩V0 andΣ0 =Ti ∩Σ
I Remove all rules that contain anything not in T
I Always remove not terminating symbols before unreachable ones
Removing unneeded symbols
Example
Remove all unneeded symbols from the following grammar:
S →AB|a A→B|b
Removing unneeded symbols
Solution
Not terminating symbols:
B0 ={a,b} B1 =B0∪ {S,A}
B2 =B1∪ ∅=B1 ⇒B =B2 V0 =V ∩ {a,b,S,A}={S,A} The new rules:
S →a A→b
Removing unneeded symbols
Solution
Unreachable symbols:
T0={S} T1=T0∪ {a}
T2=T1∪ ∅=T1 ⇒T =T2
V00=V0∩T ={S}andΣ0 = Σ∩T ={a} The new rules:
S →a
Some cosure properties of context free languages
I Let L1 andL2 be two context free languages
I Then L1∪L2,L1L2 andL∗1 are also context free languages I Proof for union:
I BecauseL1andL2are context free languages, there are context free grammarsG1 andG2 for them.
I Let the non-terminals ofG1 andG2be dierent.
I Create a new starting non-terminalS, and a new rule:
S →S1|S2,
whereS1is the starting symbol ofG1, andS2 forG2. I The other rules are the rules fromG1 andG2
I If there isS1→εor S2→ε, then eliminateεrules
I Proof for concatenation: same as above, but the new rule is S →S1S2
I Proof for transitive closure: same as above, but the new rule is S →S1S|ε, then removeεrules