View
213
Download
0
Category
Preview:
Citation preview
Bottom-Up Syntax Analysis
Mooly Sagiv
html://www.math.tau.ac.il/~msagiv/courses/wcc02.htmlTextbook:Modern Compiler Implementation in C
Chapter 3
Efficient Parsers • Pushdown automata
• Deterministic
• Report an error as soon as the input is not a prefix of a valid program
• Not usable for all context free grammars
bison
context free grammar
tokens parser
“Ambiguity errors”
parse tree
Kinds of Parsers
• Top-Down (Predictive Parsing) LL– Construct parse tree in a top-down matter– Find the leftmost derivation– For every non-terminal and token predict the next
production
• Bottom-Up LR– Construct parse tree in a bottom-up manner– Find the rightmost derivation in a reverse order– For every potential right hand side and token decide when a
production is found
Bottom-Up Syntax Analysis
• Input– A context free grammar– A stream of tokens
• Output– A syntax tree or error
• Method– Construct parse tree in a bottom-up manner– Find the rightmost derivation in (reversed order)– For every potential right hand side and token decide when a
production is found– Report an error as soon as the input is not a prefix of valid
program
Pushdown Automaton
control
parser-table
input
stack
$
$u t w
V
Bottom-Up Parser Actions
• reduce A – Pop | | symbols and | | states from the stack– Apply the associated action– Push a symbol goto[top, A] on the stack
• shift X – Push X onto the stack – Advance the input
• accept– Parsing is complete
• error – Report an error
A Parser Table for S a S b|
state description a b $ S
0 initial s1 r S 4
1 parsed a s1 r S 2
2 parsed S s3
3 parsed a S b r S a S b r S a S b
4 final acc
state a b $ S
0 s1 r S 4
1 s1 r S 2
2 s3
3 r S a S b r S a S b
4 acc
stack input action
0($) aabb$ s1
0($) 1(a) abb$ s1
0($) 1(a)1(a) bb$ r S
0($) 1(a)1(a)2(S) bb$ s3
0($) 1(a)1(a)2(S)3(b) b$ r S a S b
0($) 1(a)2(S) b$ s3
0 ($)1(a)2(S)3(b) $ r S a S b
0($) 4(S) $ acc
state a b $ S
0 s1 r S 4
1 s1 r S 2
2 s3
3 r S a S b r S a S b
4 acc
stack input action
0($) abb$ s1
0($) 1(a) bb$ r S
0($) 1(a)2(S) bb$ s3
0($) 1(a)2(S)3(b) b$ r S a S b
0($) 4(S) b$ err
A Parser Table for S AB | A
A aB a
state description a $ A B S
0 initial s1 2 5
1 parsed a from A r A a r A a
2 parsed A s3 r S A 4
3 parsed a from B r B a
4 parsed AB r S AB
5 final acc
stack input action
0($) aa$ s1
0 ($)1(a) a$ r A a
0 ($)2(A) a$ s3
0 ($)2(A)3(a) $ r B a
0 ($)2(A)4(B) $ r S AB
0($)5(S) $ acc
state a $ A B S
0 s1 2 5
1 r A a r A a
2 s3 r S A 4
3 r B a
4 r S AB
5 acc
A Parser Table for E E + E| id
state description id + $ E
0 initial s1 2
1 parsed id r E id r E id
2 parsed E s3 acc
3 parsed E+ s1 4
4 parsed E+E s3
r E E + Er E E+E
The Challenge
• How to construct a parser-table from a given grammar• LR(1) grammars
– Left to right scanning– Rightmost derivations (reverse)– 1 token
• Different solutions– Operator precedence– SLR(1)
• Simple LR(1)– CLR(1)
• Canonic LR(1)– LALR(1)
• Look Ahead LR(1)• Yacc, Bison, CUP
Grammar HierarchyNon-ambiguous CFG
CLR(1)
LALR(1)
SLR(1)
LL(1)
Constructing an SLR parsing table
• Add a production S’ S$• Construct a finite automaton accepting “valid
stack symbols”– The states of the automaton becomes the states of
parsing-table
– Determine shift operations
– Determine goto operations
• Construct reduce entries by analyzing the grammar
A finite Automaton for S’ S$
S a S b|
state description a b $ S
0 initial s1 4
1 parsed a s1 2
2 parsed S s3
3 parsed a S b
4 final
0 1 2 3
4
a bS
S
a
Constructing a Finite Automaton
• NFA– For X X1 X2 … Xn
• [X X1 X2 …XiXi+1 … Xn]– “prefixes of rhs (handles)”– X1 X2 … Xi is at the top of the stack and we expect
Xi+1 … Xn
• The initial state [S’ .S$] ([X X1…XiXi+1 … Xn], Xi+1 ) = [X X1 …XiXi+1 … Xn]
– For every production Xi+1 ([[X X1 X2 …XiXi+1 … Xn], ) = [Xi+1 ]
• Convert into DFA
NFA S’ S$
S a S b|
[S’ .S$]
[S’ S.$]
[S .aSb]
[S .]
[S a.Sb]
[S aSb.]
[S aS.b]S
Sb
a
[S’ .S$]
[S’ S.$]
[S .aSb]
[S .]
[S a.Sb]
[S aSb.]
[S aS.b]S
Sb
a
[S’ .S$][S .aSb]
[S .]
[S a.Sb][S .aSb]
[S .]
a
[S’ S.$]
S
[S aS.b]S
[S aSb.]
b
DFA
a
state items a b $ S
0 [S .S$, S .aSb, S .] s1 4
1 [S a.Sb$, S .aSb, S .] s1 2
2 [S aS.b] s3
3 [S aSb.]
4 [S’ S.$] acc
[S’ .S$][S .aSb]
[S .]
[S a.Sb][S .aSb]
[S .]
a
[S’ S.$]
S
[S aS.b]S
[S aSb.]
b
a
Filling reduce entries
• For an item [A .] we need to know the tokens that can follow A in a derivation from S’
• Follow(A) = {t | S’ * At}
• See the textbook for an algorithm for constructing Follow from a given grammar
state items a b $ S
0 [S .S$, S .aSb, S .] s1 4
1 [S a.Sb$, S .aSb, S .] s1 2
2 [S aS.b] s3
3 [S aSb.]
4 [S’ S.$] acc
[S’ .S$][S .aSb]
[S .]
[S a.Sb][S .aSb]
[S .]
a
[S’ S.$]
S
[S aS.b]S
[S aSb.]
b
a
Follow(S) = {b, $}
r S r S
r S r S
r S a S b
Example E E + E| E* E | id
[S’ .E$][E .E + E][E .E *E]
[E .id]
id
[E id.]
E [S’ E.$][E E. + E][E E. *E]
[E E +. E][E .E +E][E .E*E][E .id]
+
[E E *. E][E .E +E][E .E*E][E .id]
*
id
id
E E[E E + E.][E E .+E][E E .*E]
[E E *E.][E E .+E][E E .*E]
EE
1 2
3
4
5
67
+
*
*
+
Interesting Non SLR(1) Grammar
S’ S$ S L = R | R
L *R | id
R L
[S’ .S$][S .L=R]
[S .R][L .*R][L .id][R L]
[S L.=R][R L.]L
=
Partial DFA
[S L=.R][R .L][L .*R][L .id]
Follow(R)= {$, =}
LR(1) Parser
• Item [A ., t] is at the top of the stack and we are expecting
t
• LR(1) State– Sets of items
• LALR(1) State– Merge items with the same look-ahead
Interesting Non LR(1) Grammars
• Ambiguous
– Arithmetic expressions
– Dangling-else
• Common derived prefix
– A B1 a b | B2 a c
– B1
– B2
• Optional non-terminals– St OptLab Ass
– OptLab id : | – Ass id := Exp
A motivating example• Create a desk calculator• Challenges
– Non trivial syntax
– Recursive expressions (semantics)• Operator precedence
Solution (lexical analysis)/* desk.l */
%%[0-9]+ { yylval = atoi(yytext);
return NUM ;}“+” { return PLUS ;}“-” { return MINUS ;}“/” { return DIV ;}“*” { return MUL ;} “(“ { return LPAR ;}“)” { return RPAR ;}“//”.*[\n] ; /* comment */[\t\n ] ; /* whitespace */. { error (“illegal symbol”, yytext[0]); }
Solution (syntax analysis)/* desk.y */
%token NUM%left PLUS, MINUS%left MUL, DIV%token LPAR, RPAR%start P%%P : E {printf(“%d\n”, $1) ;} ;E : NUM { $$ = $1 ;} | LPAR e RPAR { $$ = $2; } | e PLUS e { $$ = $1 + $3 ; } | e MINUS e { $$ = $1 - $3 ; } | e MUL e { $$ = $1 * $3; } | e DIV e {$$ = $1 / $3; } ;%%#include “lex.yy.c”
flex desk.l
bison desk.y
cc y.tab.c –ll -ly
Summary• LR is a powerful technique• Generates efficient parsers• Generation tools exit
– Bison, yacc, CUP
• But some grammars need to be tuned– Shift/Reduce conflicts– Reduce/Reduce conflicts– Efficiency of the generated parser
Recommended