This document discusses parsing and syntax analysis. It provides three key points:
1. Parsing involves recognizing the structure of a program or document by constructing a parse tree. This tree represents the structure and is used to guide translation.
2. During compilation, the parser uses a grammar to check the structure of tokens produced by the lexical analyzer. It produces a parse tree and handles syntactic errors and recovery.
3. Parsers are responsible for identifying and handling syntax errors. They must detect errors efficiently and recover in a way that issues clear messages and allows processing to continue without significantly slowing down.
Introduction to syntax analysis and parsing. Parsing recognizes sentences, discovers structure, and creates parse trees for translation.
Explains the role of the parser during compilation, including generating intermediate representations and handling tokens.
Focuses on identifying and handling syntax errors, covering types of errors and challenges in error reporting.
Discusses methods for error recovery in compilers, including challenges and strategies like panic mode and phrase-level recovery.
Introduces context-free grammars, terminology, symbols, and examples for defining language syntax. Describes derivation processes using CFG, with examples of grammar application and derivations of statements.
Illustrates the concept of parse trees, essential for visualizing the structure of expressions defined in a grammar.
Explains ambiguous grammars, challenges they present, and methods for disambiguation and resolving parsing issues.
Parsing
• A.K.A. SyntaxAnalysis
– Recognize sentences in a language.
– Discover the structure of a document/program.
– Construct (implicitly or explicitly) a tree (called as a
parse tree) to represent the structure.
– The above tree is used later to guide translation.
3.
Parsing During Compilation
intermediate
representation
errors
lexical
analyzer
parser
restof
front end
symbol
table
source
program
parse
treeget next
token
token
regular
expressions
• Collecting token
information
• Perform type checking
• Intermediate code
generation
• uses a grammar to check structure of tokens
• produces a parse tree
• syntactic errors and recovery
• recognize correct syntax
• report errors
4.
Parsing Responsibilities
Syntax ErrorIdentification / Handling
Recall typical error types:
1. Lexical : Misspellings
2. Syntactic : Omission, wrong order of tokens
3. Semantic : Incompatible types, undefined IDs
4. Logical : Infinite loop / recursive call
Majority of error processing occurs during syntax analysis
NOTE: Not all errors are identifiable !!
if x<1 thenn y = 5:
if ((x<1) & (y>5)))
if (x+5) then
if (i<9) then ...
Should be <= not <
5.
Error Detection
• Muchresponsibility on Parser
– Many errors are syntactic in nature
– Modern parsing method can detect the presence of syntactic errors in
programs very efficiently
– Detecting semantic or logical error is difficult
• Challenges for error handler in Parser
– It should report error clearly and accurately
– It should recover from error and continue..
– It should not significantly slow down the processing of correct programs
• Good news is
– Common errors are simple and relatively easy to catch.
• Errors don’t occur that frequently!!
• 60% programs are syntactically and semantically correct
• 80% erroneous statements have only 1 error, 13% have 2
• Most error are trivial : 90% single token error
• 60% punctuation, 20% operator, 15% keyword, 5% other error
6.
• Difficult togenerate clear and accurate error messages.
Example
function foo () {
...
if (...) {
...
} else {
...
...
}
<eof>
Example
int myVarr;
...
x = myVar;
...
Adequate Error Reporting is Not a Trivial
Task
Missing } here
Not detected until here
Misspelled ID here
Not detected until here
7.
Error Recovery
• Afterfirst error recovered
– Compiler must go on!
• Restore to some state and process the rest of the input
• Error-Correcting Compilers
– Issue an error message
– Fix the problem
– Produce an executable
Example
Error on line 23: “myVarr” undefined.
“myVar” was used.
May not be a good Idea!!
– Guessing the programmers intention is not easy!
8.
Error Recovery MayTrigger More Errors!
• Inadequate recovery may introduce more errors
– Those were not programmers errors
• Example:
int myVar flag ;
...
x := flag;
...
...
while (flag==0)
...
Too many Error message may be obscuring
– May bury the real message
– Remedy:
• allow 1 message per token or per statement
• Quit after a maximum (e.g. 100) number of errors
Declaration of flag is discarded
Variable flag is undefined
Variable flag is undefined
9.
Error Recovery Approaches:Panic Mode
• Discard tokens until we see a “synchronizing” token.
• The key...
– Good set of synchronizing tokens
– Knowing what to do then
• Advantage
– Simple to implement
– Does not go into infinite loop
– Commonly used
• Disadvantage
– May skip over large sections of source with some errors
Example
Skip to next occurrence of
} end ;
Resume by parsing the next statement
10.
Error Recovery Approaches:Phrase-Level
Recovery
• Compiler corrects the program
by deleting or inserting tokens
...so it can proceed to parse from where it was.
• The key...
Don’t get into an infinite loop
Example
while (x==4) y:= a + b
Insert do to fix the statement
11.
Context Free Grammars(CFG)
• A context free grammar is a formal model that consists of:
• Terminals
Keywords
Token Classes
Punctuation
• Non-terminals
Any symbol appearing on the lefthand side of any rule
• Start Symbol
Usually the non-terminal on the lefthand side of the first rule
• Rules (or “Productions”)
BNF: Backus-Naur Form / Backus-Normal Form
Stmt ::= if Expr then Stmt else Stmt
Context Free Grammars: A First Look
assign_stmt → id := expr ;
expr → expr operator term
expr → term
term → id
term → real
term → integer
operator → +
operator → -
Derivation: A sequence of grammar rule applications and
substitutions that transform a starting non-term into a sequence
of terminals / tokens.
14.
Derivation
Let’s derive: id:= id + real – integer ;
assign_stmt assign_stmt → id := expr ;
→ id := expr ; expr → expr operator term
→id := expr operator term; expr → expr operator term
→id := expr operator term operator term; expr → term
→ id := term operator term operator term; term → id
→ id := id operator term operator term; operator → +
→ id := id + term operator term; term → real
→ id := id + real operator term; operator → -
→ id := id + real - term; term → integer
→ id := id + real - integer;
using production:
15.
Example Grammar: SimpleArithmetic
Expressions
expr → expr op expr
expr → ( expr )
expr → - expr
expr → id
op → +
op → -
op → *
op → /
op → ↑
9 Production rules
Terminals: id + - * / ↑ ( )
Nonterminals: expr, op
Start symbol: expr
16.
Notational Conventions
• Terminals
–Lower-case letters early in the alphabet: a, b, c
– Operator symbols: +, -
– Punctuations symbols: parentheses, comma
– Boldface strings: id or if
• Nonterminals:
– Upper-case letters early in the alphabet: A, B, C
– The letter S (start symbol)
– Lower-case italic names: expr or stmt
• Upper-case letters late in the alphabet, such as X, Y, Z,
represent either nonterminals or terminals.
• Lower-case letters late in the alphabet, such as u, v, …, z,
represent strings of terminals.
17.
Notational Conventions
• Lower-caseGreek letters, such as α, β, γ, represent strings of
grammar symbols. Thus A→ α indicates that there is a single
nonterminal A on the left side of the production and a string of
grammar symbols α to the right of the arrow.
• If A→ α1, A→ α2, …., A→ αk are all productions with A on the
left, we may write A→ α1 | α2 | …. | αk
• Unless otherwise started, the left side of the first production is
the start symbol.
E → E A E | ( E ) | -E | id
A → + | - | * | / | ↑
Ambiguous Grammar
• Morethan one Parse Tree for some sentence.
– The grammar for a programming language may be
ambiguous
– Need to modify it for parsing.
• Also: Grammar may be left recursive.
• Need to modify it for parsing.
29.
Elimination of Ambiguity
•Ambiguous
• A Grammar is ambiguous if there are multiple parse
trees for the same sentence.
• Disambiguation
• Express Preference for one parse tree over others
– Add disambiguating rule into the grammar
30.
Resolving Problems: AmbiguousGrammars
Consider the following grammar segment:
stmt → if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
If E1 then S1 else if E2 then S2 else S3
simple parse tree:
stmt
stmt
stmtexpr
exprE1
E2
S3
S1
S2
then
then
else
else
if
if
stmt stmt
31.
Example : WhatHappens with this string?
If E1 then if E2 then S1 else S2
How is this parsed ?
if E1 then
if E2 then
S1
else
S2
if E1 then
if E2 then
S1
else
S2
vs.
32.
Parse Trees: IfE1 then if E2 then S1 else S2
Form 1:
stmt
stmt
stmtexpr
E1 S2
then elseif
expr
E2
S1
thenif
stmt
stmt
expr
E1
thenif
stmt
expr
E2
S2S1
then else
if
stmt stmt
Form 2:
33.
Removing Ambiguity
Take OriginalGrammar:
stmt → if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
Revise to remove ambiguity:
stmt → matched_stmt | unmatched_stmt
matched_stmt → if expr then matched_stmt else matched_stmt |
other
unmatched_stmt → if expr then stmt
| if expr then matched_stmt else unmatched_stmt
Rule: Match each else with the closest previous
unmatched then.