View
222
Download
5
Category
Preview:
Citation preview
Syntax Analysis (Part 2)
Martin Sulzmann
Martin Sulzmann Syntax Analysis (Part 2) 1 / 42
Bottom-Up Parsing
Idea
Build right-most derivation.
Scan input and seek for matching right hand sides.
Terminology
If S →∗ β then β is a sentential form of G.
Right sentential form = right most reduction of non-terminal symbols.
Handle = Right hand side α of a production A → α where α is partof a right sentential form S ⇒∗ β.
Martin Sulzmann Syntax Analysis (Part 2) 2 / 42
Example
ConsiderS → aABe (1)A → Abc (2) | b (3)B → d (4)
The sentence abbcde can be reduced to S:
Rule Sentential form
3 abbcde
2 aAbcde
4 aAde
1 aABe
S
S ⇒ aABe
⇒ aAde
⇒ aAbcde
⇒ abbcde
right-most derivation
Marked forms are the handles.
Martin Sulzmann Syntax Analysis (Part 2) 3 / 42
Another Example
ConsiderE → E + E | E ∗ E | (E ) | id
Right-most derivation:E ⇒ E + E ⇒ E + E ∗ E ⇒ E + E ∗ id3 ⇒ E + id2 ∗ id3 ⇒ id1 + id2 ∗ id3Consider how the input string is “reduced”:
right-sentential form Handle reducing production
id1 + id2 ∗ id3 id1 E → id
E + id2 ∗ id3 id2 E → id
E + E ∗ id3 id3 E → id
E + E ∗ E E ∗ E E → E ∗ EE + E E + E E → E + E
E
Martin Sulzmann Syntax Analysis (Part 2) 4 / 42
Handle Pruning (=Shift/Reduce Parsing)
Goal
Scan input, recognize handles and replace (if match found) by left-handside until we reach the start symbol.
Shift-Reduce Approach
Make use of stack to keep track of “partial” handles.
Shift:
Seek for handle.“Shift” input symbols on some stack.
Reduce:
Match for handle detected.Replace right hand by left hand side.For A → β:
“pop” |β| symbols
“pus” A onto stack
Martin Sulzmann Syntax Analysis (Part 2) 5 / 42
Example
ConsiderE → E + E | E ∗ E | (E ) | id
Stack Input Action
$ (id ∗ id)$ shift$( id ∗ id)$ shift$(id ∗id)$ reduce E → id
$(E ∗id)$ shift$(E∗ id)$ shift$(E ∗ id )$ reduce E → id
$(E ∗ E )$ reduce E → E ∗ E$(E )$ shift$(E ) $ reduce E → (E )$E $ accept
E ⇒ (E ) ⇒ (E ∗ E ) ⇒ (E ∗ id) ⇒ (id ∗ id)Martin Sulzmann Syntax Analysis (Part 2) 6 / 42
LR Parsing
Approach
Instance of shift-reduce parser where we employ FSA to recognize handles.
Method
Build a right-most derivation in reverse w ⇐∗ S .
Maintain stack of viable prefixes, i.e. forms α such that S ⇒∗ αw forsome w ∈ V ∗
t .
Shift action: Move token from input w to stack.
Reduce action: Apply grammar rule (in reverse) to top section ofstack.
Martin Sulzmann Syntax Analysis (Part 2) 7 / 42
Skeleton LR Parser
Algorithm
push s0
token:= next_token()
repeat
s:= top of stack
if action[s,token] = ‘‘shift si’’ then
push si
token:= next_token()
else if action[s,token] = ‘‘reduce A->beta’’ then
pop |beta| states
s’:= top of stack
push goto[s’,A]
else if action[s,token] = ‘‘accept’’ then done
else error
Martin Sulzmann Syntax Analysis (Part 2) 8 / 42
Example
S → E
E → T + E | TT → F ∗ T | FF → id
state action goto
id + ∗ $ E T F
0 s4 1 2 31 acc
2 s5 r33 r5 s6 r54 r6 r6 r65 s4 7 2 36 s4 8 37 r28 r4 r4
Stack Input Action$ 0 id ∗ id + id$ s4$ 0 4 ∗id + id$ r6$ 0 3 ∗id + id$ s6$ 0 3 6 id + id$ s4$ 0 3 6 4 +id$ r6$ 0 3 6 3 +id$ r5$ 0 3 6 8 +id$ r4$ 0 2 +id$ s5$ 0 2 5 id$ s4$ 0 2 5 4 $ r6$ 0 2 5 3 $ r5$ 0 2 5 2 $ r3$ 0 2 5 7 $ r2$ 0 1 $ acc
Martin Sulzmann Syntax Analysis (Part 2) 9 / 42
LR(k) Grammars
Idea
A grammar G is LR(k) if given a right-most derivation
S = γ0 ⇒R γ1 ⇒R . . . ⇒R γn = w
we can, for each right-sentential form in the derivation:
1 isolate the handle of each right-sentential form, and
2 determine the production by which to reduce
by scanning γi from left to right, going at most k symbols beyond theright end of the handle of γi .
Martin Sulzmann Syntax Analysis (Part 2) 10 / 42
LR(k) Items
LR(k) Items
The table construction algorithm uses sets of LR(k) items orconfigurations to represent the possible states in a parse.
A LR(k) item is a pair [α, β] where
α is a production from G with a • at some position in the rhs,marking how much of the rhs of a production has already been seen
β is a lookahead string containing k symbols (terminals or $)
As we will see, the two cases of interest are k=0 and k=1.
Martin Sulzmann Syntax Analysis (Part 2) 11 / 42
Example
The • indicates how much of an item we have seen at a given state in theparse:
[A → •XYZ ] indicates that the parser is looking for a string that can bederived from XYZ .
[A → XY • Z ] indicates that the parser has seen a string derived from XY
and is looking for one derivable from Z .
LR(0) items (no lookahead):
A → XYZ generates 4 LR(0) items:
1. [A → •XYZ ] 2. [A → X • YZ ]3. [A → XY • Z ] 4. [A → XYZ•]
Martin Sulzmann Syntax Analysis (Part 2) 12 / 42
The CFSM
The Characteristic Finite State Machine (CFSM) for a grammar is a DFAwhich recognizes viable prefixes of right-sentential forms (Recall: A viableprefix is any prefix that does not extend beyond the handle).
It accepts when a handle has been discovered and needs to be reduced.
To construct the CFSM we need two functions:
closure0(I) to build its states
goto0(I,X) to determine its transitions
Martin Sulzmann Syntax Analysis (Part 2) 13 / 42
closure0
Given an item [A → α • Bβ], its closure contains the item itself and anyother item that can generate legal substrings to follow α.
Thus, if the parser has viable prefix α on the stack, the input shouldreduce to Bβ (or γ for other item [B → •γ] in the closure).
closure0(I):
repeat
if [A → α • Bβ] ∈ I and B → γ is a rule
then add [B → •γ] to I
until no more items can be added to I
return I
Martin Sulzmann Syntax Analysis (Part 2) 14 / 42
goto0
goto0(I ,X ) is the closure of the set of all items [A → αX • β] such that[A → α • Xβ] ∈ I .
If I is the set of valid items fro some viable prefix γ, then goto0(I ,X ) isthe set of valid items for the viable prefix γX .
goto0(I ,X ) represents the state after recognizing X in state I .
goto0(I,X):
let J be the set of items [A → αX • β]such that [A → α • Xβ] ∈ I
return closure0(J)
Martin Sulzmann Syntax Analysis (Part 2) 15 / 42
Example
ConsiderP → CP | ǫC → agBe | aeB → acB | a
Assume I = {[C → ag • Be]}. Then (I omit [ and ] for simplicity)
closure0(I ) =
C → ag • Be,B → •acB,B → •a
Assume I = {[P → •], [P → •CP]}. Then
goto0(I ,C ) =
P → C • P ,P → •CP ,P → •,C → •agBe,C → •ae
Martin Sulzmann Syntax Analysis (Part 2) 16 / 42
Constructing the LR(0) Item Sets
We introduce a new start symbol S ′ where S ′ → S (some presentationsassume S ′ → S$.)
States = {closure0({S ′ → •S})}while we can keep making changes
for each I ∈ States and symbol X
I’ = goto0(I,X)
if I 6∈ States then States = States ∪ {I ′}
Martin Sulzmann Syntax Analysis (Part 2) 17 / 42
LR(0) Example
Grammar:S → E (1)E → E + T (2) | T (3)T → id (4) | (E ) (5)
CFSM:
GFED@ABC?>=<89:;8
//GFED@ABC0
E
��
T
11
(
""id //GFED@ABC?>=<89:;4 GFED@ABC5
E
��
T
OO
id
oo (
ss
GFED@ABC?>=<89:;1+ //GFED@ABC2
id
OO
T
��
(
??��������� GFED@ABC6
+oo
)
��GFED@ABC?>=<89:;3 GFED@ABC?>=<89:;7
LR(0) items:I0 : S → •E I4 : T → id•
E → •E + T I5 : T → (•E )E → •T E → •E + T
T → •id E → •TT → •(E ) T → •id
I1 : S → E• T → •(E )E → E • +T I6 : T → (E•)
I2 : E → E + •T E → E • +T
T → •id I7 : T → (E )•T → •(E ) I8 : E → T•
I3 : E → E + T•
Martin Sulzmann Syntax Analysis (Part 2) 18 / 42
Constructing the Parsing Table
We have constructed LR(0) items and CFSM (state i of the CFSM isconstructed from Ii ).
If A → α • aβ ∈ Ii and goto0(Ii ,a)=Ij then action(i ,a)=shift j .
If A → α• ∈ I then action(I ,a) = reduce(A → α) for all a.
If S ′ → S• ∈ I then action(I ,$) = accept.
If goto0(Ii ,A) = Ij (where A ∈ Vn) then goto(i ,A)=j .
Set undefined entries in action and goto to “error”.
Set initial state of parser to closure0([S ′ → •S ]).
Martin Sulzmann Syntax Analysis (Part 2) 19 / 42
LR(0) Example
GFED@ABC?>=<89:;8
//GFED@ABC0
E
��
T
11
(
""id //GFED@ABC?>=<89:;4 GFED@ABC5
E
��
T
OO
id
oo (
ss
GFED@ABC?>=<89:;1+ //GFED@ABC2
id
OO
T
��
(
??��������� GFED@ABC6
+oo
)
��GFED@ABC?>=<89:;3 GFED@ABC?>=<89:;7
state action goto
id ( ) + $ S E T
0 s4 s5 1 81 acc
2 s4 s5 33 r2 r2 r2 r2 r24 r4 r4 r4 r4 r45 s4 s5 6 86 s7 s27 r5 r5 r5 r5 r58 r3 r3 r3 r3 r3
Martin Sulzmann Syntax Analysis (Part 2) 20 / 42
Another Example
ConsiderS → A | Bc | DA → a | bB → a
D → bc
Is this grammar LR(0)?
Martin Sulzmann Syntax Analysis (Part 2) 21 / 42
Another Example (2)
Grammar:S → A | Bc | DA → a | bB → a
D → bc
LR(0) items:I0 : S ′ → •S I3 : S → B • c
S → •A I4 : S → Bc•S → •Bc I5 : S → D•S → •D I6 : A → a•A → •a B → a•A → •b I7 : A → b•B → •a D → b • cD → •bc . . .
I1 : S ′ → S•I2 : S → A•
Martin Sulzmann Syntax Analysis (Part 2) 22 / 42
Another Example (3)
Grammar:S → A | Bc | DA → a | bB → a
D → bc
LR(0) items:I0 : S ′ → •S I3 : S → B • c
S → •A I4 : S → Bc•S → •Bc I5 : S → D•S → •D I6 : A → a• reduceA → •a B → a• reduce conflict!A → •b I7 : A → b•B → •a D → b • cD → •bc . . .
I1 : S ′ → S•I2 : S → A•
Martin Sulzmann Syntax Analysis (Part 2) 23 / 42
Another Example (4)
Grammar:S → A | Bc | DA → a | bB → a
D → bc
LR(0) items:I0 : S ′ → •S I3 : S → B • c
S → •A I4 : S → Bc•S → •Bc I5 : S → D•S → •D I6 : A → a• reduceA → •a B → a• reduce conflict!A → •b I7 : A → b• reduceB → •a D → b • c shift conflict!D → •bc . . .
I1 : S ′ → S•I2 : S → A•
Above grammar is not LR(0).
Martin Sulzmann Syntax Analysis (Part 2) 24 / 42
LR Parsing
Short Summary
If LR(0) parsing tables contains multiply-defined action entries then G isnot LR(0).
There are two possible types of conflicts:
shift-reduce
reduce-reduce
Conflicts can be resolved by looking ahead.
Martin Sulzmann Syntax Analysis (Part 2) 25 / 42
SLR(1): Simple Lookahead LR
SLR(1)
Add lookaheads after building LR(0) item sets. We construct the SLR(1)parsing table as follows:
If A → α • aβ ∈ I (a ∈ Vt) and goto0(I ,a)=Ii thenaction(I ,a)=shift i .
If A → α• ∈ I then action(I ,a) = reduce(A → α) for alla ∈ Follow(A).
If S ′ → S• ∈ I then action(I ,$) = accept.
If goto0(Ii ,A) = Ij (where A ∈ Vn) then goto(i ,A)=j .
Set undefined entries in action and goto to “error”.
Set initial state of parser to closure0([S ′ → •S ]).
Martin Sulzmann Syntax Analysis (Part 2) 26 / 42
SLR(1) but not LR(0) Example
S → A (1) | Bc (2) | bc (3) Follow(S) = {$}A → a (4) | b (5) Follow(A) = {$}B → a (6) Follow(B) = {c}
I0 : S ′ → •S I2 S → A•S → •A I3 S → B • cS → •Bc I4 S → Bc•S → •bc I5 A → a•A → •a B → a•A → •b I6 A → b•B → •a S → b • c
I1 : S ′ → S• I7 S → Bc•
GFED@ABC?>=<89:;1 GFED@ABC?>=<89:;2
// GFED@ABC?>=<89:;6
c
��
GFED@ABC0b
oo
S
OOA
@@���������
B //
a
��
GFED@ABC3
c
��GFED@ABC?>=<89:;7 GFED@ABC?>=<89:;5 GFED@ABC?>=<89:;4
Martin Sulzmann Syntax Analysis (Part 2) 27 / 42
SLR(1) Parsing Table
S → A (1) | Bc (2) | bc (3) Follow(S) = {$}A → a (4) | b (5) Follow(A) = {$}B → a (6) Follow(B) = {c}
state action goto
a b c $ S A B
0 s5 s6 1 2 31 acc
2 r13 s44 r25 r6 r46 s77 r2
Martin Sulzmann Syntax Analysis (Part 2) 28 / 42
Limitations of SLR(1)
ConsiderS ′ → S
S → L = R | RL → ∗R | idR → L
LR(0) items:I0 : S ′ → •S I1 : S ′ → S•
S → •L = R I2 : S → L• = R shift!S → •R R → L• reduce! (=∈ Follow(R))L → • ∗ RL → •idR → •id
Martin Sulzmann Syntax Analysis (Part 2) 29 / 42
Limitations of SLR(1) (cont’d)
Examples shows that it can be insufficient to consider Follow sets.
Assume we are parsing id = id .
In item 2, the shift action is the only correct choice (to reduce L to R getsus nowhere).
Note that item R → L• came via S ′ ⇒ S ⇒ R ⇒ L (each symbol can onlybe “followed” by $).
Hence, we need to make an effort to remember the actual right context.
Martin Sulzmann Syntax Analysis (Part 2) 30 / 42
LR(1) Items Revisited
Idea
We can incorporate the extra right-context information into items.
An LR(1) item is of the form [A → α • β, t] where t ∈ Vt ∪ {$}.
An item [A → α•, a] calls for the reduction A → α only if the next inputsymbol is a.
Martin Sulzmann Syntax Analysis (Part 2) 31 / 42
LR(1) Items Revisited (cont’d)
closure1 with Context
closure1(I):
while new items can be added
if [A → α • Bγ, c] ∈ I and B → δ is a rule
then for each b ∈ First(γc)add [B → •δ, b] to I
goto1 with Context
goto1(I,X):
let J be the set of items [A → αX • β, a]such that [A → α • Xβ, a] ∈ I
return closure0(J)
Initial State
[S ′ → •S , $]Martin Sulzmann Syntax Analysis (Part 2) 32 / 42
LR(1) Parsing Table
Algorithm
For each set Ii of LR(1) items:
If [A → α • aβ] ∈ Ii and goto1(Ii ,a)=Ij then action(i ,a)=shift j .
If [A → α•, b] ∈ Ii (A 6= S ′) then action(i ,b)= reduce(A → α).
If [S ′ → S•, $] ∈ Ii then action(i ,$)= accept.
If goto1(Ii ,A) = Ij (where A ∈ Vn) then goto(i ,A)=j .
All remaining entries are set to “error”.
Initial state contains [S ′ → •S , $].
Martin Sulzmann Syntax Analysis (Part 2) 33 / 42
Recall Example
ConsiderS ′ → S
S → L = R | RL → ∗R | idR → L
LR(1) items:I0 : [S ′ → •S , $] I1 : [S ′ → S•, $]
[S → •L = R , $] I2 : [S → L• = R , $] no =[S → •R , $] [R → L•, $]
[L → • ∗ R ,=] = due to First(= R)
[L → •id ,=][R → •L, $]
Martin Sulzmann Syntax Analysis (Part 2) 34 / 42
Another Example
Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)
state action goto
c d $ S C
0 s3 s4 1 21 acc
2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2
LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d
S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $
I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $
C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $
I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d
Martin Sulzmann Syntax Analysis (Part 2) 35 / 42
Another Example (2)
Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)
state action goto
c d $ S C
0 s3 s4 1 21 acc
2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2
LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d
S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $
I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $
C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $
I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d
I3 and I6 have the same “core”!
Martin Sulzmann Syntax Analysis (Part 2) 36 / 42
LALR: Lookahead LR
Idea
Define the core of a set of LR(1) items to be the set of LR(0) itemsderived by ignoring the lookahead symbols.
E.g. the two sets
{[C → c • C , c | d ], [C → •cC , c | d ], [C → •d , c | d ]}
{[C → c • C , $], [C → •cC , $], [C → •d , $]}
have the same core.
Key idea: If two sets of LR(1) items, Ii and Ij , have the same core, we canmerge the states that represent them in the action and goto tables.
Martin Sulzmann Syntax Analysis (Part 2) 37 / 42
LALR(1) algorithm
Algorithm
Construct LR(1) items.
Find all cores among sets of LR(1) items, replace these sets by theirunion, update goto function (incrementally).
For each IiIf [A → α • aβ, b] ∈ Ii and goto1(Ii ,a)=Ij then action(i ,a)=shift j .If [A → α•, b] ∈ Ii (A 6= S ′) then action(i ,b)= reduce(A → α).If [S ′ → S•, $] ∈ Ii then action(i ,$)= accept.
If goto1(Ii ,A) = Ij (where A ∈ Vn) then goto(i ,A)=j .
All remaining entries are set to “error”.Initial state = closure1({[S ′ → •S , $]}).More efficient LALR construction possible, see textbook.
Martin Sulzmann Syntax Analysis (Part 2) 38 / 42
Recall Example
Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)
state action goto
c d $ S C
0 s3 s4 1 21 acc
2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2
LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d
S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $
I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $
C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $
I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d
Martin Sulzmann Syntax Analysis (Part 2) 39 / 42
Recall Example (2)
Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)
state action goto
c d $ S C
0 s3 s4 1 21 acc
2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2
LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d
S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $
I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $
C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $
I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d
Merged states:I36 : C → c • C , c | d | $
C → •cC , c | d | $C → •d, c | d | $
I47 : C → d•, c | d | $I89 : C → cC•, c | d | $
Martin Sulzmann Syntax Analysis (Part 2) 40 / 42
Recall Example (3)
Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)
Updated table:state action goto
c d $ S C
0 s36 s47 1 21 acc
2 s36 s47 536 s36 s47 8947 r3 r3 r35 r189 r2 r2 r2
LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d
S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $
I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $
C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $
I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d
Merged states:I36 : C → c • C , c | d | $
C → •cC , c | d | $C → •d, c | d | $
I47 : C → d•, c | d | $I89 : C → cC•, c | d | $
Martin Sulzmann Syntax Analysis (Part 2) 41 / 42
Summary
LR more powerful than LL.
ANTLR is a “pretty cool” Java-based LL parser.
LALR good enough in practice (see yacc).
Precedence can also be used to resolve shift/reduce conflicts.E.g. higher precedence ⇒ shift, left associative ⇒ reduce.
Error recovery important (but no time!), please consult text book.
Martin Sulzmann Syntax Analysis (Part 2) 42 / 42
Recommended