Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite...

Syntax Analysis (Part 2)

Martin Sulzmann

Martin Sulzmann Syntax Analysis (Part 2) 1 / 42

Bottom-Up Parsing

Build right-most derivation.

Scan input and seek for matching right hand sides.

Terminology

If S →∗ β then β is a sentential form of G.

Right sentential form = right most reduction of non-terminal symbols.

Handle = Right hand side α of a production A → α where α is partof a right sentential form S ⇒∗ β.

Example

ConsiderS → aABe (1)A → Abc (2) | b (3)B → d (4)

The sentence abbcde can be reduced to S:

Rule Sentential form

3 abbcde

2 aAbcde

4 aAde

1 aABe

S ⇒ aABe

⇒ aAde

⇒ aAbcde

⇒ abbcde

right-most derivation

Marked forms are the handles.

Another Example

ConsiderE → E + E | E ∗ E | (E ) | id

Right-most derivation:E ⇒ E + E ⇒ E + E ∗ E ⇒ E + E ∗ id3 ⇒ E + id2 ∗ id3 ⇒ id1 + id2 ∗ id3Consider how the input string is “reduced”:

right-sentential form Handle reducing production

id1 + id2 ∗ id3 id1 E → id

E + id2 ∗ id3 id2 E → id

E + E ∗ id3 id3 E → id

E + E ∗ E E ∗ E E → E ∗ EE + E E + E E → E + E

Handle Pruning (=Shift/Reduce Parsing)

Scan input, recognize handles and replace (if match found) by left-handside until we reach the start symbol.

Shift-Reduce Approach

Make use of stack to keep track of “partial” handles.

Shift:

Seek for handle.“Shift” input symbols on some stack.

Reduce:

Match for handle detected.Replace right hand by left hand side.For A → β:

“pop” |β| symbols

“pus” A onto stack

Example

ConsiderE → E + E | E ∗ E | (E ) | id

Stack Input Action

$ (id ∗ id)$ shift$( id ∗ id)$ shift$(id ∗id)$ reduce E → id

$(E ∗id)$ shift$(E∗ id)$ shift$(E ∗ id )$ reduce E → id

$(E ∗ E )$ reduce E → E ∗ E$(E )$ shift$(E ) $ reduce E → (E )$E $ accept

E ⇒ (E ) ⇒ (E ∗ E ) ⇒ (E ∗ id) ⇒ (id ∗ id)Martin Sulzmann Syntax Analysis (Part 2) 6 / 42

LR Parsing

Approach

Instance of shift-reduce parser where we employ FSA to recognize handles.

Method

Build a right-most derivation in reverse w ⇐∗ S .

Maintain stack of viable prefixes, i.e. forms α such that S ⇒∗ αw forsome w ∈ V ∗

Shift action: Move token from input w to stack.

Reduce action: Apply grammar rule (in reverse) to top section ofstack.

Skeleton LR Parser

Algorithm

push s0

token:= next_token()

repeat

s:= top of stack

if action[s,token] = ‘‘shift si’’ then

push si

token:= next_token()

else if action[s,token] = ‘‘reduce A->beta’’ then

pop |beta| states

s’:= top of stack

push goto[s’,A]

else if action[s,token] = ‘‘accept’’ then done

else error

Example

S → E

E → T + E | TT → F ∗ T | FF → id

state action goto

id + ∗ $ E T F

0 s4 1 2 31 acc

2 s5 r33 r5 s6 r54 r6 r6 r65 s4 7 2 36 s4 8 37 r28 r4 r4

Stack Input Action$ 0 id ∗ id + id$ s4$ 0 4 ∗id + id$ r6$ 0 3 ∗id + id$ s6$ 0 3 6 id + id$ s4$ 0 3 6 4 +id$ r6$ 0 3 6 3 +id$ r5$ 0 3 6 8 +id$ r4$ 0 2 +id$ s5$ 0 2 5 id$ s4$ 0 2 5 4 $ r6$ 0 2 5 3 $ r5$ 0 2 5 2 $ r3$ 0 2 5 7 $ r2$ 0 1 $ acc

LR(k) Grammars

A grammar G is LR(k) if given a right-most derivation

S = γ0 ⇒R γ1 ⇒R . . . ⇒R γn = w

we can, for each right-sentential form in the derivation:

1 isolate the handle of each right-sentential form, and

2 determine the production by which to reduce

by scanning γi from left to right, going at most k symbols beyond theright end of the handle of γi .

LR(k) Items

The table construction algorithm uses sets of LR(k) items orconfigurations to represent the possible states in a parse.

A LR(k) item is a pair [α, β] where

α is a production from G with a • at some position in the rhs,marking how much of the rhs of a production has already been seen

β is a lookahead string containing k symbols (terminals or $)

As we will see, the two cases of interest are k=0 and k=1.

Example

The • indicates how much of an item we have seen at a given state in theparse:

[A → •XYZ ] indicates that the parser is looking for a string that can bederived from XYZ .

[A → XY • Z ] indicates that the parser has seen a string derived from XY

and is looking for one derivable from Z .

LR(0) items (no lookahead):

A → XYZ generates 4 LR(0) items:

1. [A → •XYZ ] 2. [A → X • YZ ]3. [A → XY • Z ] 4. [A → XYZ•]

The CFSM

The Characteristic Finite State Machine (CFSM) for a grammar is a DFAwhich recognizes viable prefixes of right-sentential forms (Recall: A viableprefix is any prefix that does not extend beyond the handle).

It accepts when a handle has been discovered and needs to be reduced.

To construct the CFSM we need two functions:

closure0(I) to build its states

goto0(I,X) to determine its transitions

closure0

Given an item [A → α • Bβ], its closure contains the item itself and anyother item that can generate legal substrings to follow α.

Thus, if the parser has viable prefix α on the stack, the input shouldreduce to Bβ (or γ for other item [B → •γ] in the closure).

closure0(I):

repeat

if [A → α • Bβ] ∈ I and B → γ is a rule

then add [B → •γ] to I

until no more items can be added to I

return I

goto0(I ,X ) is the closure of the set of all items [A → αX • β] such that[A → α • Xβ] ∈ I .

If I is the set of valid items fro some viable prefix γ, then goto0(I ,X ) isthe set of valid items for the viable prefix γX .

goto0(I ,X ) represents the state after recognizing X in state I .

goto0(I,X):

let J be the set of items [A → αX • β]such that [A → α • Xβ] ∈ I

return closure0(J)

Example

ConsiderP → CP | ǫC → agBe | aeB → acB | a

Assume I = {[C → ag • Be]}. Then (I omit [ and ] for simplicity)

closure0(I ) =

C → ag • Be,B → •acB,B → •a

Assume I = {[P → •], [P → •CP]}. Then

goto0(I ,C ) =

P → C • P ,P → •CP ,P → •,C → •agBe,C → •ae

Constructing the LR(0) Item Sets

We introduce a new start symbol S ′ where S ′ → S (some presentationsassume S ′ → S$.)

States = {closure0({S ′ → •S})}while we can keep making changes

for each I ∈ States and symbol X

I’ = goto0(I,X)

if I 6∈ States then States = States ∪ {I ′}

LR(0) Example

Grammar:S → E (1)E → E + T (2) | T (3)T → id (4) | (E ) (5)

GFED@ABC?>=<89:;8

//GFED@ABC0

��

""id //GFED@ABC?>=<89:;4 GFED@ABC5

��

GFED@ABC?>=<89:;1+ //GFED@ABC2

��

??�� GFED@ABC6

��GFED@ABC?>=<89:;3 GFED@ABC?>=<89:;7

LR(0) items:I0 : S → •E I4 : T → id•

E → •E + T I5 : T → (•E )E → •T E → •E + T

T → •id E → •TT → •(E ) T → •id

I1 : S → E• T → •(E )E → E • +T I6 : T → (E•)

I2 : E → E + •T E → E • +T

T → •id I7 : T → (E )•T → •(E ) I8 : E → T•

I3 : E → E + T•

Constructing the Parsing Table

We have constructed LR(0) items and CFSM (state i of the CFSM isconstructed from Ii ).

If A → α • aβ ∈ Ii and goto0(Ii ,a)=Ij then action(i ,a)=shift j .

If A → α• ∈ I then action(I ,a) = reduce(A → α) for all a.

If S ′ → S• ∈ I then action(I ,$) = accept.

If goto0(Ii ,A) = Ij (where A ∈ Vn) then goto(i ,A)=j .

Set undefined entries in action and goto to “error”.

Set initial state of parser to closure0([S ′ → •S ]).

LR(0) Example

GFED@ABC?>=<89:;8

//GFED@ABC0

��

""id //GFED@ABC?>=<89:;4 GFED@ABC5

��

GFED@ABC?>=<89:;1+ //GFED@ABC2

��

??�� GFED@ABC6

��GFED@ABC?>=<89:;3 GFED@ABC?>=<89:;7

state action goto

id ( ) + $ S E T

0 s4 s5 1 81 acc

2 s4 s5 33 r2 r2 r2 r2 r24 r4 r4 r4 r4 r45 s4 s5 6 86 s7 s27 r5 r5 r5 r5 r58 r3 r3 r3 r3 r3

Another Example

ConsiderS → A | Bc | DA → a | bB → a

D → bc

Is this grammar LR(0)?

Another Example (2)

Grammar:S → A | Bc | DA → a | bB → a

D → bc

LR(0) items:I0 : S ′ → •S I3 : S → B • c

S → •A I4 : S → Bc•S → •Bc I5 : S → D•S → •D I6 : A → a•A → •a B → a•A → •b I7 : A → b•B → •a D → b • cD → •bc . . .

I1 : S ′ → S•I2 : S → A•

Another Example (3)

D → bc

LR(0) items:I0 : S ′ → •S I3 : S → B • c

S → •A I4 : S → Bc•S → •Bc I5 : S → D•S → •D I6 : A → a• reduceA → •a B → a• reduce conflict!A → •b I7 : A → b•B → •a D → b • cD → •bc . . .

I1 : S ′ → S•I2 : S → A•

Another Example (4)

D → bc

LR(0) items:I0 : S ′ → •S I3 : S → B • c

S → •A I4 : S → Bc•S → •Bc I5 : S → D•S → •D I6 : A → a• reduceA → •a B → a• reduce conflict!A → •b I7 : A → b• reduceB → •a D → b • c shift conflict!D → •bc . . .

I1 : S ′ → S•I2 : S → A•

Above grammar is not LR(0).

LR Parsing

Short Summary

If LR(0) parsing tables contains multiply-defined action entries then G isnot LR(0).

There are two possible types of conflicts:

shift-reduce

reduce-reduce

Conflicts can be resolved by looking ahead.

SLR(1): Simple Lookahead LR

SLR(1)

Add lookaheads after building LR(0) item sets. We construct the SLR(1)parsing table as follows:

If A → α • aβ ∈ I (a ∈ Vt) and goto0(I ,a)=Ii thenaction(I ,a)=shift i .

If A → α• ∈ I then action(I ,a) = reduce(A → α) for alla ∈ Follow(A).

If S ′ → S• ∈ I then action(I ,$) = accept.

Set undefined entries in action and goto to “error”.

Set initial state of parser to closure0([S ′ → •S ]).

SLR(1) but not LR(0) Example

S → A (1) | Bc (2) | bc (3) Follow(S) = {$}A → a (4) | b (5) Follow(A) = {$}B → a (6) Follow(B) = {c}

I0 : S ′ → •S I2 S → A•S → •A I3 S → B • cS → •Bc I4 S → Bc•S → •bc I5 A → a•A → •a B → a•A → •b I6 A → b•B → •a S → b • c

I1 : S ′ → S• I7 S → Bc•

GFED@ABC?>=<89:;1 GFED@ABC?>=<89:;2

// GFED@ABC?>=<89:;6

��

GFED@ABC0b

@@��

��

GFED@ABC3

��GFED@ABC?>=<89:;7 GFED@ABC?>=<89:;5 GFED@ABC?>=<89:;4

SLR(1) Parsing Table

S → A (1) | Bc (2) | bc (3) Follow(S) = {$}A → a (4) | b (5) Follow(A) = {$}B → a (6) Follow(B) = {c}

state action goto

a b c $ S A B

0 s5 s6 1 2 31 acc

2 r13 s44 r25 r6 r46 s77 r2

Limitations of SLR(1)

ConsiderS ′ → S

S → L = R | RL → ∗R | idR → L

LR(0) items:I0 : S ′ → •S I1 : S ′ → S•

S → •L = R I2 : S → L• = R shift!S → •R R → L• reduce! (=∈ Follow(R))L → • ∗ RL → •idR → •id

Limitations of SLR(1) (cont’d)

Examples shows that it can be insufficient to consider Follow sets.

Assume we are parsing id = id .

In item 2, the shift action is the only correct choice (to reduce L to R getsus nowhere).

Note that item R → L• came via S ′ ⇒ S ⇒ R ⇒ L (each symbol can onlybe “followed” by $).

Hence, we need to make an effort to remember the actual right context.

LR(1) Items Revisited

We can incorporate the extra right-context information into items.

An LR(1) item is of the form [A → α • β, t] where t ∈ Vt ∪ {$}.

An item [A → α•, a] calls for the reduction A → α only if the next inputsymbol is a.

LR(1) Items Revisited (cont’d)

closure1 with Context

closure1(I):

while new items can be added

if [A → α • Bγ, c] ∈ I and B → δ is a rule

then for each b ∈ First(γc)add [B → •δ, b] to I

goto1 with Context

goto1(I,X):

let J be the set of items [A → αX • β, a]such that [A → α • Xβ, a] ∈ I

return closure0(J)

Initial State

[S ′ → •S , $]Martin Sulzmann Syntax Analysis (Part 2) 32 / 42

LR(1) Parsing Table

Algorithm

For each set Ii of LR(1) items:

If [A → α • aβ] ∈ Ii and goto1(Ii ,a)=Ij then action(i ,a)=shift j .

If [A → α•, b] ∈ Ii (A 6= S ′) then action(i ,b)= reduce(A → α).

If [S ′ → S•, $] ∈ Ii then action(i ,$)= accept.

All remaining entries are set to “error”.

Initial state contains [S ′ → •S , $].

Recall Example

ConsiderS ′ → S

S → L = R | RL → ∗R | idR → L

LR(1) items:I0 : [S ′ → •S , $] I1 : [S ′ → S•, $]

[S → •L = R , $] I2 : [S → L• = R , $] no =[S → •R , $] [R → L•, $]

[L → • ∗ R ,=] = due to First(= R)

[L → •id ,=][R → •L, $]

Another Example

Grammar:S′ → S (0)S → CC (1)C → cC (2) | d (3)

state action goto

c d $ S C

0 s3 s4 1 21 acc

2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2

LR(1) items:I0 : S′ → •S, $ I4 : C → d•, c | d

S → •CC , $ I5 : S → CC•, $C → •cC , c | d I6 : C → c • C , $C → •d, c | d C → •cC , $

I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $

C → •cC , $ I8 : C → cC•, c | dC → •d, $ I9 : C → cC•, $

I3 : C → c • C , c | dC → •cC , c | dC → •d, c | d

Another Example (2)

state action goto

c d $ S C

0 s3 s4 1 21 acc

2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2

I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $

I3 and I6 have the same “core”!

LALR: Lookahead LR

Define the core of a set of LR(1) items to be the set of LR(0) itemsderived by ignoring the lookahead symbols.

E.g. the two sets

{[C → c • C , c | d ], [C → •cC , c | d ], [C → •d , c | d ]}

{[C → c • C , $], [C → •cC , $], [C → •d , $]}

have the same core.

Key idea: If two sets of LR(1) items, Ii and Ij , have the same core, we canmerge the states that represent them in the action and goto tables.

LALR(1) algorithm

Algorithm

Construct LR(1) items.

Find all cores among sets of LR(1) items, replace these sets by theirunion, update goto function (incrementally).

For each IiIf [A → α • aβ, b] ∈ Ii and goto1(Ii ,a)=Ij then action(i ,a)=shift j .If [A → α•, b] ∈ Ii (A 6= S ′) then action(i ,b)= reduce(A → α).If [S ′ → S•, $] ∈ Ii then action(i ,$)= accept.

All remaining entries are set to “error”.Initial state = closure1({[S ′ → •S , $]}).More efficient LALR construction possible, see textbook.

Recall Example

state action goto

c d $ S C

0 s3 s4 1 21 acc

2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2

I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $

Recall Example (2)

state action goto

c d $ S C

0 s3 s4 1 21 acc

2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2

I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $

Merged states:I36 : C → c • C , c | d | $

C → •cC , c | d | $C → •d, c | d | $

I47 : C → d•, c | d | $I89 : C → cC•, c | d | $

Recall Example (3)

Updated table:state action goto

c d $ S C

0 s36 s47 1 21 acc

2 s36 s47 536 s36 s47 8947 r3 r3 r35 r189 r2 r2 r2

I1 : S′ → S•, $ C → •d, $I2 : S → C • C , $ I7 : C → d•, $

Merged states:I36 : C → c • C , c | d | $

C → •cC , c | d | $C → •d, c | d | $

I47 : C → d•, c | d | $I89 : C → cC•, c | d | $

Summary

LR more powerful than LL.

ANTLR is a “pretty cool” Java-based LL parser.

LALR good enough in practice (see yacc).

Precedence can also be used to resolve shift/reduce conflicts.E.g. higher precedence ⇒ shift, left associative ⇒ reduce.

Error recovery important (but no time!), please consult text book.

Syntax Analysis (Part 2) - HS-KARLSRUHEsuma0002/CS4212/syntax2.pdf · The Characteristic Finite...

Documents

Bacteriocin Protein BacL1 of Enterococcus faecalis Targets ... · Bacteriocin Protein BacL 1 of Enterococcus faecalis Targets Cell Division Loci and Speciﬁcally Recognizes L-Ala

‘Westlake Weighs Down’ recognizes its biggest losers …media.westlakebayvillageobserver.com/issue_pdfs/Westlake...First-place male winner David Bivin ... PHOTOGRAPHY Kim Bonvissuto

§لعربية-من-نحو-الجملة... · eu.:Jl 'JaJ ( ) 1.1 ; text grammar ( U 1.4 . intra sentential constituents intersentential relations , . ( discourse ) text o.a:Jl ,

CURRICULUM VITAE BENJAMIN A. S. VAN MOOY Senior Scientist ... · Van Mooy - CV Page 2. Schlanger Undergraduate Earth Sciences Award. Recognizes superior scholarship in the Department

Sentential Negation & Indefinite Objects...In colloquial Danish, Norwegian, and Swedish, grouped together as Scan2, NEG-shift can only apply in clauses without auxiliary verbs. According

Program Management Professional (PgMP)® Preparation Course ... · The PgMP® is a credential that recognizes demonstrated experience, skill, and performance in the oversight of multiple,

Daftar Isi - upperlineid · tanggung jawab sosial Perusahaan dan serangkaian kebijakan yang sesuai dengan tujuan pembangunan berkelanjutan. Bank Aceh recognizes that business processes

Regulamento Interno • Centro de Formação Sá de Miranda Internos/CFSM - Regulamento Inter… · Regulamento Interno • Centro de Formação Sá de Miranda 4 k) participar em

1. OBJETO - Bolsa Mercantil de Colombia · 2019-07-03 · manizales (cfsm) calle 53 no. 25a- 35 barrio la arboleda 7 guajira (maicao, paraguanchon, riohacha, valledupar) maicao (cfsm)

Water Scarcity in Managing Urban - World Bankdocuments.worldbank.org/curated/en/... · 25-12-2017 · management of water, and recognizes the importance of developing planning mechanisms

NIGHTSCAPEdarksky.org/wp-content/uploads/2016/03/IDA_2015_Annual-Report_lo… · wintering grounds in West Africa. IDA now recognizes this preserve, De Boschplaat, as an International

SPANISH SUBJECT PRONOUNS - Blutner · SPANISH SUBJECT PRONOUNS Adult and Child comprehension and production of subject pronouns in inter-sentential anaphora. ... len, Grafiken u

Water Scarcity in Managing Urban - World Bank · 2017. 12. 25. · management of water, and recognizes the importance of developing planning mechanisms to address water scarcity

Probability - UTKweb.eecs.utk.edu/~mjr/ECE504/PresentationSlides/Probability.pdf · Conditional Probability Example The army has an image analysis system that recognizes the two types

Guide to International Trade Managementcontents.kocw.or.kr/document/02_ Incoterms(2).pdf · 2013-06-26 · Consequently, the subtitle of the Incoterms 2010 rules formally recognizes

Operators and strong versions of sentential logics in ...diposit.ub.edu/dspace/bitstream/2445/102182/5/HCA_THESIS.pdf · Operators and strong versions of sentential logics in Abstract

Royal Government of Bhutan - gnhc.gov.bt · Preliminary Framework of Competition Policy and Law for Bhutan, May 2012 The Competition Policy also recognizes the importance of the following

INFORME DE GESTION SERVICIO AL CIUDADANO 2016 Enero€¦ · 1. Las quejas hacen referencia a la atención inadecuada en los CFSM 88 y PCM. 2. Las Felicitaciones provienen de las buenas

· a 2005. évi LXXVIII. törvény felhatalmazása alapján elismeri, hogy a Authorized by the law LXXVIII of 2005 the Hungarian Accreditation Board recognizes that SEQOMICS Biotechnológia

ARGUMENT IN REPLY I. INTRODUCTIONffrd.org/Lawsuit/FinalReply.pdf · •••••••• ARGUMENT IN REPLY I. INTRODUCTION Defendants’ brief recognizes that plainti ffs’ claims