module 2 introduction to syntax analysis

SYLLABUS
Role of Parser – Grammars – Error Handling – Context-free grammars
– Writing a grammar –Top Down Parsing – General Strategies
Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift Reduce
Parser-LR Parser – LR(0) Item Construction of SLR Parsing Table –
Introduction to LALR Parser – Error Handling and Recovery in Syntax
Analyzer-YACC.

Syntax Analysis
• By design , every programming language has precise rules that
prescribe the syntactic structure of well formed programs
• In c , for example , a program is made up of functions, a function out
of declarations and statements, a statement out of expressions and so
on.
• The syntax of programming language constructs can be specified by
context-free grammars or BNF( Backus-Naur-Form)notation.
• The Parser determines the syntax or structure of a program. That is,it
checks whether the input is syntactically correct or not.

Role of Parser
• Parser also called syntax analyzer is the one which does parsing
• Parsing is the process of getting tokens from the lexical analyzer and
obtains a derivation for the sequence of tokens and builds a parse tree
• Thus if the program is syntactically correct, the parse tree is generated
• If a derivation for the sequence of tokens does not exist i.e., if the
program is syntactically wrong, it results in syntax error and the parser
displays the appropriate error messages.
• The parse trees are very important in figuring out the meaning of a
program or part of the program
• The parse tree is also called the syntax tree or derivation tree.

Role of a parser
• In compiler model, the parser obtains a string of tokens from the
lexical analyzer and verifies that the string of token names can be
generated by the grammar for the source language.

Contd..
A parser is a software component that takes input data and builds a
data structure called parse tree.
The parser reads the tokens from the lexical analyser and checks
whether the sequence of tokens matches the grammar rules of the
programming language and generate the parse tree.
A syntax error is reported when there is mismatch that is if the token
sequence does not match the grammar.

Types of parser
Three types of parser
1. Universal parser – the Cocke-Youngster-Kasami algorithm and
Earley’s algorithm can parse any grammar,
2. Top-down parser – a top-down parser starts with the root of the
parse tree , labelled with the start or goal symbol of the grammar
and proceeds down to the leaves.
3. Bottom-up parser – a bottom up parser starts with the leaves
and moves to the start symbol or the root.

Error handling in syntax analysis
The following are the common errors:
1. Lexical Error: such as misspelling an identifier, keyword or operator
Example: misspelling as ‘whil’ or ‘caase’
2. Syntactic Error: such as an arithmetic expression with unbalanced
paranthesis
Example: a=a%+s; or a=((a+b)/n)*100)
3. Semantic Error: such as operator applied to an incompatible operand.
Example: c=true*8; this expression is syntactically right, but it does not has meaning.
4. Logical Error: such as infinitely recursive call.
Example: for(i=0;i<n;i++);

Error recovering strategies
The following are the common error recovering strategies:
1. Panic Mode
2. Phrase level Recovery
3. Error production
4. Global correction

1.Panic Mode
• When a parser encounters an error anywhere in the statement ,it
ignores the rest of the statement by not processing input from
erroneous input to delimiter.
• This method often skips a considerable amount of input without
checking it for additional errors.
• It is an easiest way of error-recovery.
• It prevents the parser from developing infinite loops

Contd..
Example:
a=b+c a=b+c // after reach c parser discards input symbol one at a time
d=e+f;
The compiler will discard all subsequent token till the semicolon encountered.
int a, 5abcd, sum, $2;
// After int a, 5abcd , sum, $2 ; // parser discards input symbol
one at a time.

Contd..
• Advantage:
1.It’s easy to use.
2.The program never falls into the loop.
• Disadvantage:
1.This technique may lead to semantic error or runtime error in
further stages.

2.Phrase Level Recovery
• On discovering an error, a parser may perform local corrections on the
remaining input
• It may replace a prefix of the remaining input by some string that
allows the parser to continue
• Parser designers have to be careful here one wrong correction can lead
to infinite loop
Example: in case of an error like the previous ,it will report the error and
generate the”;” and continue processing.

3.Error Production
• It requires good knowledge of common errors that might get
encountered, then we can augment the grammar for the corresponding
language with error productions that generate the erroneous
constructs.
• If error production is used during parsing, we can generate an
appropriate error message to indicate the error that has been
recognized in the input.
• This method is extremely difficult to maintain, because if we change
grammar, then it becomes necessary to change the corresponding
productions.

Contd..
Example: Suppose the input string is abcd.
Grammar: S-> A
A-> aA | bA | a | b
B-> cd
The input string is not obtainable by the above grammar, so we need to add
Augmented Grammar.
Grammar: E->SB // AUGMENT THE GRAMMAR
S-> A
A-> aA| bA | a | b
B-> cd
Now, string abcd is possible to obtain.

Contd..
Advantages:
• Syntactic phase errors are generally recovered by error productions.
Disadvantages:
• The method is very difficult to maintain because if we change the
grammar then it becomes necessary to change the corresponding
production.
• It is difficult to maintain by the developers.

4.Global Correction
• We often want such a compiler that makes very few changes in
processing an incorrect input string to the correct input string.
• Given an incorrect input string x and grammar G, the algorithm
itself can find a parse tree for a related string y (Expected output
string); such that a number of insertions, deletions, and changes
of token require to transform x into y is as low as possible.
• Global correction methods increase time & space requirements
at parsing time. This is simply a theoretical concept.

Context free Grammar(CFG)
• Context free grammar is a formal grammar which is used to generate
all possible strings in a given formal language.
• Context free grammar G can be defined by four tuples
as:G= (V, T, P, S)
G  Describes a grammar
V  set of variables or non-terminals
T  set of terminal symbols
P  set of production rules
S  Start symbols

module 2 introduction to syntax analysis

More Related Content

Similar to module 2 introduction to syntax analysis

Recently uploaded

module 2 introduction to syntax analysis