CS17604
COMPILER DESIGN
Syntax Analysis
The role of parser
Lexical Analyzer Parser
Source
program
token
getNext
Token
Symbol
table
Parse tree
Rest of Front End
Intermediate
representation
Top Down Parsing
Ambiguity
For some strings if, There exist
• More than one parse tree
• More than one leftmost derivation
• More than one rightmost derivation
• Example: id+id*id
Elimination of ambiguity
Elimination of ambiguity (cont.)
• Idea:
• A statement appearing between a then and an else must be matched
Elimination of left recursion
• A grammar is left recursive if it has a non-terminal A such that there is a
derivation A=> Aα
• Top down parsing methods cannot handle left-recursive grammars
• A simple rule for direct left recursion elimination:
• For a rule like:
• A -> A α|β
• We may replace it with
• A -> β A’
• A’ -> α A’ | ɛ
+
All
symbols
except
left NT
Other
productions
Uses of grammars
E -> E + T | T
T -> T * F | F
F -> (E) | id
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
Left factoring
• Left factoring is a grammar transformation that is useful for producing a grammar suitable
for predictive or top-down parsing.
• Consider following grammar:
• Stmt -> if expr then stmt else stmt
• | if expr then stmt
• On seeing input if it is not clear for the parser which production to use
• We can easily perform left factoring:
• If we have A->αβ1 | αβ2 then we replace it with
• A -> αA’
• A’ -> β1 | β2
Left factoring (cont.)
• Algorithm
• For each non-terminal A, find the longest prefix α common to two or more of its
alternatives. If α<> ɛ, then replace all of A-productions A->αβ1 |αβ2 | … |
αβn | γ by
• A -> αA’ | γ
• A’ -> β1 |β2 | … | βn
• Example:
• S -> I E t S | i E t S e S | a
• E -> b
Prductions
without
common
prefix
α
After Left factoring
S->iEtSS’ | a
S’-> ɛ | eS
E->b
Introduction
• A Top-down parser tries to create a parse tree from the root towards the leafs scanning
input from left to right
• It can be also viewed as finding a leftmost derivation for an input string
• Example: id+id*id
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
E
lm
E
T E’
lm
E
T E’
F T’
lm
E
T E’
F T’
id
lm
E
T E’
F T’
id Ɛ
lm
E
T E’
F T’
id Ɛ
+ T E’
Predictive Parser / Non Recursive descent
Parser/ Table driven parser
• Steps involved before Top down parsing
• Elimination of Left Recursion Or
• Elimination of Left Factoring
• Steps in Predictive Parser
• Compute First.
• Compute Follow.
• Parsing Table Construction.
• Stack implementation using parsing Algorithm
Computing First
• To compute First(X) for all grammar symbols X, apply following rules until
no more terminals or ɛ can be added to any First set:
1. If X is a terminal then First(X) = {X}.
2. If X is a nonterminal with the production X->a then First(X)={a}
3. If X is a nonterminal and X->Y1Y2…Yk is a production for some k >=1, then place a
in First(X) if for some i a is in First(Yi) and ɛ is in all of First(Y1),…,First(Yi-1) that is
Y1…Yi-1 => ɛ. if ɛ is in First(Yj) for j=1,…,k then add ɛ to First(X).
4. If X-> ɛ is a production then add ɛ to First(X)
*
First Example
1. If X is a terminal then First(X) = {X}.
2. If X is a nonterminal with the production X->aα then First(X)={a}
3. If X-> ɛ is a production then add ɛ to First(X)
4. If X is a nonterminal and X->Y1Y2…Yk is a production for some k>=1, then place a in First(X) if
for some i a is in First(Yi) and ɛ is in all of First(Y1),…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is in
First(Yj) for j=1,…,k then add ɛ to First(X).
E->TE’
T->FT’
F->(E) | id
E’->+TE’
First(E’)={+}
First(*)={*}
First(E)=First(T)=First(F)={(,id}
Computing follow
• To compute First(A) for all nonterminals A, apply following rules until
nothing can be added to any follow set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything in First(β) except ɛ is in
Follow(B).
3. If there is a production A-> αB or a production A->αBβ where First(β)
contains ɛ, then everything in Follow(A) is in Follow(B)
LL(1) Grammars
• Predictive parsers are those recursive descent parsers needing no
backtracking
• Grammars for which we can create predictive parsers are called LL(1)
• The first L means scanning input from left to right
• The second L means leftmost derivation
• And 1 stands for using one input symbol for lookahead
*
Construction of predictive parsing table
• For each production A->α in grammar do the following:
1. For each terminal a in First(α) add A-> in M[A,a]
2. If ɛ is in First(α), then for each terminal b in Follow(A) add A-> ɛ to M[A,b]. If ɛ
is in First(α) and $ is in Follow(A), add A-> ɛ to M[A,$] as well
• If after performing the above, there is no production in M[A,a] then set
M[A,a] to error
Example
E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
F
T
E
E’
T’
First Follow
{(,id}
{(,id}
{(,id}
{+,ɛ}
{*,ɛ}
{+, *, ), $}
{+, ), $}
{+, ), $}
{), $}
{), $}
E
E’
T
T’
F
Non -
terminal
Input Symbol
id + * ( ) $
E -> TE’ E -> TE’
E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ
T -> FT’ T -> FT’
T’ -> *FT’
T’ -> Ɛ T’ -> Ɛ T’ -> Ɛ
F -> (E)
F -> id
Another example
S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b
S
S’
E
Non -
terminal
Input Symbol
a b e i t $
S -> a S -> iEtSS’
S’ -> Ɛ
S’ -> eS
S’ -> Ɛ
E -> b
Non-recursive predicting parsing
a + b $
Predictive
parsing
program
output
Parsing
Table
M
stack
X
Y
Z
$
Predictive parsing algorithm
Set ip point to the first symbol of w;
Set X to the top stack symbol;
While (X<>$) { /* stack is not empty */
if (X is a) pop the stack and advance ip;
else if (X is a terminal) error();
else if (M[X,a] is an error entry) error();
else if (M[X,a] = X->Y1Y2..Yk) {
output the production X->Y1Y2..Yk;
pop the stack;
push Yk,…,Y2,Y1 on to the stack with Y1 on top;
}
set X to the top stack symbol;
}
Example
E
E’
T
T’
F
Non -
terminal
Input Symbol
id + * ( ) $
E -> TE’ E -> TE’
E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ
T -> FT’ T -> FT’
T’ -> *FT’
T’ -> Ɛ T’ -> Ɛ T’ -> Ɛ
F -> (E)
F -> id
synch synch
synch synch synch
synch synch synch synch
Stack Input Action
E$ )id*+id$ Error, Skip )
E$ id*+id$ id is in First(E)
TE’$ id*+id$
FT’E’$ id*+id$
idT’E’$ id*+id$
T’E’$ *+id$
*FT’E’$ *+id$
+id$
FT’E’$ Error, M[F,+]=synch
+id$
T’E’$ F has been poped
Example
• id+id*id$
Matched Stack Input Action
E$ id+id*id$
Error recovery in predictive parsing
• Panic mode
• Place all symbols in Follow(A) into synchronization set for nonterminal A: skip tokens until
an element of Follow(A) is seen and pop A from stack.
• Add to the synchronization set of lower level construct the symbols that begin higher level
constructs
• Add symbols in First(A) to the synchronization set of nonterminal A
• If a nonterminal can generate the empty string then the production deriving can be used as
a default
• If a terminal on top of the stack cannot be matched, pop the terminal, issue a message
saying that the terminal was insterted
CS17604_TOP Parser Compiler Design Techniques

CS17604_TOP Parser Compiler Design Techniques

  • 1.
  • 2.
    The role ofparser Lexical Analyzer Parser Source program token getNext Token Symbol table Parse tree Rest of Front End Intermediate representation
  • 3.
  • 4.
    Ambiguity For some stringsif, There exist • More than one parse tree • More than one leftmost derivation • More than one rightmost derivation • Example: id+id*id
  • 5.
  • 6.
    Elimination of ambiguity(cont.) • Idea: • A statement appearing between a then and an else must be matched
  • 7.
    Elimination of leftrecursion • A grammar is left recursive if it has a non-terminal A such that there is a derivation A=> Aα • Top down parsing methods cannot handle left-recursive grammars • A simple rule for direct left recursion elimination: • For a rule like: • A -> A α|β • We may replace it with • A -> β A’ • A’ -> α A’ | ɛ + All symbols except left NT Other productions
  • 8.
    Uses of grammars E-> E + T | T T -> T * F | F F -> (E) | id E -> TE’ E’ -> +TE’ | Ɛ T -> FT’ T’ -> *FT’ | Ɛ F -> (E) | id
  • 9.
    Left factoring • Leftfactoring is a grammar transformation that is useful for producing a grammar suitable for predictive or top-down parsing. • Consider following grammar: • Stmt -> if expr then stmt else stmt • | if expr then stmt • On seeing input if it is not clear for the parser which production to use • We can easily perform left factoring: • If we have A->αβ1 | αβ2 then we replace it with • A -> αA’ • A’ -> β1 | β2
  • 10.
    Left factoring (cont.) •Algorithm • For each non-terminal A, find the longest prefix α common to two or more of its alternatives. If α<> ɛ, then replace all of A-productions A->αβ1 |αβ2 | … | αβn | γ by • A -> αA’ | γ • A’ -> β1 |β2 | … | βn • Example: • S -> I E t S | i E t S e S | a • E -> b Prductions without common prefix α After Left factoring S->iEtSS’ | a S’-> ɛ | eS E->b
  • 11.
    Introduction • A Top-downparser tries to create a parse tree from the root towards the leafs scanning input from left to right • It can be also viewed as finding a leftmost derivation for an input string • Example: id+id*id E -> TE’ E’ -> +TE’ | Ɛ T -> FT’ T’ -> *FT’ | Ɛ F -> (E) | id E lm E T E’ lm E T E’ F T’ lm E T E’ F T’ id lm E T E’ F T’ id Ɛ lm E T E’ F T’ id Ɛ + T E’
  • 12.
    Predictive Parser /Non Recursive descent Parser/ Table driven parser • Steps involved before Top down parsing • Elimination of Left Recursion Or • Elimination of Left Factoring • Steps in Predictive Parser • Compute First. • Compute Follow. • Parsing Table Construction. • Stack implementation using parsing Algorithm
  • 13.
    Computing First • Tocompute First(X) for all grammar symbols X, apply following rules until no more terminals or ɛ can be added to any First set: 1. If X is a terminal then First(X) = {X}. 2. If X is a nonterminal with the production X->a then First(X)={a} 3. If X is a nonterminal and X->Y1Y2…Yk is a production for some k >=1, then place a in First(X) if for some i a is in First(Yi) and ɛ is in all of First(Y1),…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is in First(Yj) for j=1,…,k then add ɛ to First(X). 4. If X-> ɛ is a production then add ɛ to First(X) *
  • 14.
    First Example 1. IfX is a terminal then First(X) = {X}. 2. If X is a nonterminal with the production X->aα then First(X)={a} 3. If X-> ɛ is a production then add ɛ to First(X) 4. If X is a nonterminal and X->Y1Y2…Yk is a production for some k>=1, then place a in First(X) if for some i a is in First(Yi) and ɛ is in all of First(Y1),…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is in First(Yj) for j=1,…,k then add ɛ to First(X). E->TE’ T->FT’ F->(E) | id E’->+TE’ First(E’)={+} First(*)={*} First(E)=First(T)=First(F)={(,id}
  • 15.
    Computing follow • Tocompute First(A) for all nonterminals A, apply following rules until nothing can be added to any follow set: 1. Place $ in Follow(S) where S is the start symbol 2. If there is a production A-> αBβ then everything in First(β) except ɛ is in Follow(B). 3. If there is a production A-> αB or a production A->αBβ where First(β) contains ɛ, then everything in Follow(A) is in Follow(B)
  • 16.
    LL(1) Grammars • Predictiveparsers are those recursive descent parsers needing no backtracking • Grammars for which we can create predictive parsers are called LL(1) • The first L means scanning input from left to right • The second L means leftmost derivation • And 1 stands for using one input symbol for lookahead *
  • 17.
    Construction of predictiveparsing table • For each production A->α in grammar do the following: 1. For each terminal a in First(α) add A-> in M[A,a] 2. If ɛ is in First(α), then for each terminal b in Follow(A) add A-> ɛ to M[A,b]. If ɛ is in First(α) and $ is in Follow(A), add A-> ɛ to M[A,$] as well • If after performing the above, there is no production in M[A,a] then set M[A,a] to error
  • 18.
    Example E -> TE’ E’-> +TE’ | Ɛ T -> FT’ T’ -> *FT’ | Ɛ F -> (E) | id F T E E’ T’ First Follow {(,id} {(,id} {(,id} {+,ɛ} {*,ɛ} {+, *, ), $} {+, ), $} {+, ), $} {), $} {), $} E E’ T T’ F Non - terminal Input Symbol id + * ( ) $ E -> TE’ E -> TE’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ T -> FT’ T -> FT’ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ T’ -> Ɛ F -> (E) F -> id
  • 19.
    Another example S ->iEtSS’ | a S’ -> eS | Ɛ E -> b S S’ E Non - terminal Input Symbol a b e i t $ S -> a S -> iEtSS’ S’ -> Ɛ S’ -> eS S’ -> Ɛ E -> b
  • 20.
    Non-recursive predicting parsing a+ b $ Predictive parsing program output Parsing Table M stack X Y Z $
  • 21.
    Predictive parsing algorithm Setip point to the first symbol of w; Set X to the top stack symbol; While (X<>$) { /* stack is not empty */ if (X is a) pop the stack and advance ip; else if (X is a terminal) error(); else if (M[X,a] is an error entry) error(); else if (M[X,a] = X->Y1Y2..Yk) { output the production X->Y1Y2..Yk; pop the stack; push Yk,…,Y2,Y1 on to the stack with Y1 on top; } set X to the top stack symbol; }
  • 22.
    Example E E’ T T’ F Non - terminal Input Symbol id+ * ( ) $ E -> TE’ E -> TE’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ T -> FT’ T -> FT’ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ T’ -> Ɛ F -> (E) F -> id synch synch synch synch synch synch synch synch synch Stack Input Action E$ )id*+id$ Error, Skip ) E$ id*+id$ id is in First(E) TE’$ id*+id$ FT’E’$ id*+id$ idT’E’$ id*+id$ T’E’$ *+id$ *FT’E’$ *+id$ +id$ FT’E’$ Error, M[F,+]=synch +id$ T’E’$ F has been poped
  • 23.
    Example • id+id*id$ Matched StackInput Action E$ id+id*id$
  • 24.
    Error recovery inpredictive parsing • Panic mode • Place all symbols in Follow(A) into synchronization set for nonterminal A: skip tokens until an element of Follow(A) is seen and pop A from stack. • Add to the synchronization set of lower level construct the symbols that begin higher level constructs • Add symbols in First(A) to the synchronization set of nonterminal A • If a nonterminal can generate the empty string then the production deriving can be used as a default • If a terminal on top of the stack cannot be matched, pop the terminal, issue a message saying that the terminal was insterted