UNIT II
SYNTAX ANALYSIS
SYLLABUS
Role of Parser – Grammars – Error Handling – Context-free grammars
– Writing a grammar –Top Down Parsing – General Strategies
Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift Reduce
Parser-LR Parser – LR(0) Item Construction of SLR Parsing Table –
Introduction to LALR Parser – Error Handling and Recovery in Syntax
Analyzer-YACC.
Syntax Analysis
• By design , every programming language has precise rules that
prescribe the syntactic structure of well formed programs
• In c , for example , a program is made up of functions, a function out
of declarations and statements, a statement out of expressions and so
on.
• The syntax of programming language constructs can be specified by
context-free grammars or BNF( Backus-Naur-Form)notation.
• The Parser determines the syntax or structure of a program. That is,it
checks whether the input is syntactically correct or not.
Role of Parser
• Parser also called syntax analyzer is the one which does parsing
• Parsing is the process of getting tokens from the lexical analyzer and
obtains a derivation for the sequence of tokens and builds a parse tree
• Thus if the program is syntactically correct, the parse tree is generated
• If a derivation for the sequence of tokens does not exist i.e., if the
program is syntactically wrong, it results in syntax error and the parser
displays the appropriate error messages.
• The parse trees are very important in figuring out the meaning of a
program or part of the program
• The parse tree is also called the syntax tree or derivation tree.
Role of a parser
• In compiler model, the parser obtains a string of tokens from the
lexical analyzer and verifies that the string of token names can be
generated by the grammar for the source language.
Contd..
A parser is a software component that takes input data and builds a
data structure called parse tree.
The parser reads the tokens from the lexical analyser and checks
whether the sequence of tokens matches the grammar rules of the
programming language and generate the parse tree.
A syntax error is reported when there is mismatch that is if the token
sequence does not match the grammar.
Types of parser
Three types of parser
1. Universal parser – the Cocke-Youngster-Kasami algorithm and
Earley’s algorithm can parse any grammar,
2. Top-down parser – a top-down parser starts with the root of the
parse tree , labelled with the start or goal symbol of the grammar
and proceeds down to the leaves.
3. Bottom-up parser – a bottom up parser starts with the leaves
and moves to the start symbol or the root.
Error handling in syntax analysis
The following are the common errors:
1. Lexical Error: such as misspelling an identifier, keyword or operator
Example: misspelling as ‘whil’ or ‘caase’
2. Syntactic Error: such as an arithmetic expression with unbalanced
paranthesis
Example: a=a%+s; or a=((a+b)/n)*100)
3. Semantic Error: such as operator applied to an incompatible operand.
Example: c=true*8; this expression is syntactically right, but it does not has meaning.
4. Logical Error: such as infinitely recursive call.
Example: for(i=0;i<n;i++);
Error recovering strategies
The following are the common error recovering strategies:
1. Panic Mode
2. Phrase level Recovery
3. Error production
4. Global correction
1.Panic Mode
• When a parser encounters an error anywhere in the statement ,it
ignores the rest of the statement by not processing input from
erroneous input to delimiter.
• This method often skips a considerable amount of input without
checking it for additional errors.
• It is an easiest way of error-recovery.
• It prevents the parser from developing infinite loops
Contd..
Example:
a=b+c a=b+c // after reach c parser discards input symbol one at a time
d=e+f;
The compiler will discard all subsequent token till the semicolon encountered.
int a, 5abcd, sum, $2;
// After int a, 5abcd , sum, $2 ; // parser discards input symbol
one at a time.
Contd..
• Advantage:
1.It’s easy to use.
2.The program never falls into the loop.
• Disadvantage:
1.This technique may lead to semantic error or runtime error in
further stages.
2.Phrase Level Recovery
• On discovering an error, a parser may perform local corrections on the
remaining input
• It may replace a prefix of the remaining input by some string that
allows the parser to continue
• Parser designers have to be careful here one wrong correction can lead
to infinite loop
Example: in case of an error like the previous ,it will report the error and
generate the”;” and continue processing.
3.Error Production
• It requires good knowledge of common errors that might get
encountered, then we can augment the grammar for the corresponding
language with error productions that generate the erroneous
constructs.
• If error production is used during parsing, we can generate an
appropriate error message to indicate the error that has been
recognized in the input.
• This method is extremely difficult to maintain, because if we change
grammar, then it becomes necessary to change the corresponding
productions.
Contd..
Example: Suppose the input string is abcd.
Grammar: S-> A
A-> aA | bA | a | b
B-> cd
The input string is not obtainable by the above grammar, so we need to add
Augmented Grammar.
Grammar: E->SB // AUGMENT THE GRAMMAR
S-> A
A-> aA| bA | a | b
B-> cd
Now, string abcd is possible to obtain.
Contd..
Advantages:
• Syntactic phase errors are generally recovered by error productions.
Disadvantages:
• The method is very difficult to maintain because if we change the
grammar then it becomes necessary to change the corresponding
production.
• It is difficult to maintain by the developers.
4.Global Correction
• We often want such a compiler that makes very few changes in
processing an incorrect input string to the correct input string.
• Given an incorrect input string x and grammar G, the algorithm
itself can find a parse tree for a related string y (Expected output
string); such that a number of insertions, deletions, and changes
of token require to transform x into y is as low as possible.
• Global correction methods increase time & space requirements
at parsing time. This is simply a theoretical concept.
CONTEXT FREE GRAMMAR
Context free Grammar(CFG)
• Context free grammar is a formal grammar which is used to generate
all possible strings in a given formal language.
• Context free grammar G can be defined by four tuples
as:G= (V, T, P, S)
G  Describes a grammar
V  set of variables or non-terminals
T  set of terminal symbols
P  set of production rules
S  Start symbols

module 2 introduction to syntax analysis

  • 1.
  • 2.
    SYLLABUS Role of Parser– Grammars – Error Handling – Context-free grammars – Writing a grammar –Top Down Parsing – General Strategies Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift Reduce Parser-LR Parser – LR(0) Item Construction of SLR Parsing Table – Introduction to LALR Parser – Error Handling and Recovery in Syntax Analyzer-YACC.
  • 3.
    Syntax Analysis • Bydesign , every programming language has precise rules that prescribe the syntactic structure of well formed programs • In c , for example , a program is made up of functions, a function out of declarations and statements, a statement out of expressions and so on. • The syntax of programming language constructs can be specified by context-free grammars or BNF( Backus-Naur-Form)notation. • The Parser determines the syntax or structure of a program. That is,it checks whether the input is syntactically correct or not.
  • 4.
    Role of Parser •Parser also called syntax analyzer is the one which does parsing • Parsing is the process of getting tokens from the lexical analyzer and obtains a derivation for the sequence of tokens and builds a parse tree • Thus if the program is syntactically correct, the parse tree is generated • If a derivation for the sequence of tokens does not exist i.e., if the program is syntactically wrong, it results in syntax error and the parser displays the appropriate error messages. • The parse trees are very important in figuring out the meaning of a program or part of the program • The parse tree is also called the syntax tree or derivation tree.
  • 5.
    Role of aparser • In compiler model, the parser obtains a string of tokens from the lexical analyzer and verifies that the string of token names can be generated by the grammar for the source language.
  • 6.
    Contd.. A parser isa software component that takes input data and builds a data structure called parse tree. The parser reads the tokens from the lexical analyser and checks whether the sequence of tokens matches the grammar rules of the programming language and generate the parse tree. A syntax error is reported when there is mismatch that is if the token sequence does not match the grammar.
  • 7.
    Types of parser Threetypes of parser 1. Universal parser – the Cocke-Youngster-Kasami algorithm and Earley’s algorithm can parse any grammar, 2. Top-down parser – a top-down parser starts with the root of the parse tree , labelled with the start or goal symbol of the grammar and proceeds down to the leaves. 3. Bottom-up parser – a bottom up parser starts with the leaves and moves to the start symbol or the root.
  • 8.
    Error handling insyntax analysis The following are the common errors: 1. Lexical Error: such as misspelling an identifier, keyword or operator Example: misspelling as ‘whil’ or ‘caase’ 2. Syntactic Error: such as an arithmetic expression with unbalanced paranthesis Example: a=a%+s; or a=((a+b)/n)*100) 3. Semantic Error: such as operator applied to an incompatible operand. Example: c=true*8; this expression is syntactically right, but it does not has meaning. 4. Logical Error: such as infinitely recursive call. Example: for(i=0;i<n;i++);
  • 9.
    Error recovering strategies Thefollowing are the common error recovering strategies: 1. Panic Mode 2. Phrase level Recovery 3. Error production 4. Global correction
  • 10.
    1.Panic Mode • Whena parser encounters an error anywhere in the statement ,it ignores the rest of the statement by not processing input from erroneous input to delimiter. • This method often skips a considerable amount of input without checking it for additional errors. • It is an easiest way of error-recovery. • It prevents the parser from developing infinite loops
  • 11.
    Contd.. Example: a=b+c a=b+c //after reach c parser discards input symbol one at a time d=e+f; The compiler will discard all subsequent token till the semicolon encountered. int a, 5abcd, sum, $2; // After int a, 5abcd , sum, $2 ; // parser discards input symbol one at a time.
  • 12.
    Contd.. • Advantage: 1.It’s easyto use. 2.The program never falls into the loop. • Disadvantage: 1.This technique may lead to semantic error or runtime error in further stages.
  • 13.
    2.Phrase Level Recovery •On discovering an error, a parser may perform local corrections on the remaining input • It may replace a prefix of the remaining input by some string that allows the parser to continue • Parser designers have to be careful here one wrong correction can lead to infinite loop Example: in case of an error like the previous ,it will report the error and generate the”;” and continue processing.
  • 14.
    3.Error Production • Itrequires good knowledge of common errors that might get encountered, then we can augment the grammar for the corresponding language with error productions that generate the erroneous constructs. • If error production is used during parsing, we can generate an appropriate error message to indicate the error that has been recognized in the input. • This method is extremely difficult to maintain, because if we change grammar, then it becomes necessary to change the corresponding productions.
  • 15.
    Contd.. Example: Suppose theinput string is abcd. Grammar: S-> A A-> aA | bA | a | b B-> cd The input string is not obtainable by the above grammar, so we need to add Augmented Grammar. Grammar: E->SB // AUGMENT THE GRAMMAR S-> A A-> aA| bA | a | b B-> cd Now, string abcd is possible to obtain.
  • 16.
    Contd.. Advantages: • Syntactic phaseerrors are generally recovered by error productions. Disadvantages: • The method is very difficult to maintain because if we change the grammar then it becomes necessary to change the corresponding production. • It is difficult to maintain by the developers.
  • 17.
    4.Global Correction • Weoften want such a compiler that makes very few changes in processing an incorrect input string to the correct input string. • Given an incorrect input string x and grammar G, the algorithm itself can find a parse tree for a related string y (Expected output string); such that a number of insertions, deletions, and changes of token require to transform x into y is as low as possible. • Global correction methods increase time & space requirements at parsing time. This is simply a theoretical concept.
  • 18.
  • 19.
    Context free Grammar(CFG) •Context free grammar is a formal grammar which is used to generate all possible strings in a given formal language. • Context free grammar G can be defined by four tuples as:G= (V, T, P, S) G  Describes a grammar V  set of variables or non-terminals T  set of terminal symbols P  set of production rules S  Start symbols