88
Theory of Compilation 236360 Erez Petrank Lecture 3: Syntax Analysis: Bottom-up Parsing 1

Theory of Compilation 236360 Erez Petrank

  • Upload
    hera

  • View
    29

  • Download
    2

Embed Size (px)

DESCRIPTION

Theory of Compilation 236360 Erez Petrank. Lecture 3 : Syntax Analysis: Bottom-up Parsing. Source text. txt. Executable code. exe. You are here. Compiler. Lexical Analysis. Syntax Analysis Parsing. Semantic Analysis. Inter. Rep. (IR). Code Gen. Parsers: from tokens to AST. - PowerPoint PPT Presentation

Citation preview

Page 1: Theory of Compilation 236360 Erez Petrank

1

Theory of Compilation 236360

Erez Petrank

Lecture 3: Syntax Analysis:

Bottom-up Parsing

Page 2: Theory of Compilation 236360 Erez Petrank

2

You are here

Executable

code

exe

Source

text

txt

Compiler

LexicalAnalysi

s

Syntax Analysi

s

Parsing

Semantic

Analysis

Inter.Rep.

(IR)

Code

Gen.

Page 3: Theory of Compilation 236360 Erez Petrank

3

Parsers: from tokens to AST

LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

Inter.Rep.

Code Gen.

<ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>

‘b’ ‘4’

‘b’‘a’

‘c’

ID

ID

ID

ID

ID

factor

term factorMULT

term

expression

expression

factor

term factorMULT

term

expression

term

MULT factor

MINUS

SyntaxTree

Page 4: Theory of Compilation 236360 Erez Petrank

4

Parsing

• A context free language can be recognized by a non-deterministic pushdown automaton

• Parsing can be seen as a search problem– Can you find a derivation from the start symbol to the

input word?– Easy (but very expensive) to solve with backtracking

• We want efficient parsers– Linear in input size– Deterministic pushdown automata– We will sacrifice generality for efficiency

Page 5: Theory of Compilation 236360 Erez Petrank

5

Reminder: Efficient Parsers

• Top-down (predictive parsing)

Bottom-up (shift reduce)

to be read…already read…

Page 6: Theory of Compilation 236360 Erez Petrank

6

LR(k) Grammars

• A grammar is in the class LR(K) when it can be derived via:– Bottom-up derivation– Scanning the input from left to right (L)– Producing the rightmost derivation (R)– With lookahead of k tokens (k)

• A language is said to be LR(k) if it has an LR(k) grammar

• The simplest case is LR(0), which we discuss next.

Page 7: Theory of Compilation 236360 Erez Petrank

7

LR is More Powerful, But…

• Any LL(k) language is also in LR(k) (and not vice versa), i.e., LL(k) ⊂ LR(k).

• But the lookahead is counted differently in the two cases. • In an LL(k) derivation the algorithm sees k tokens of the

right-hand side of the rule and then must select the derivation rule.

• With LR(k), the algorithm sees all right-hand side of the derivation rule plus k more tokens. – LR(0) sees the entire right-side.

• LR is more popular today. • We will see Yacc, a parser generator for LR parsing, in

the exercises and homework.

Page 8: Theory of Compilation 236360 Erez Petrank

An Example: a Simple Grammar

E → E * B | E + B | BB → 0 | 1

• Let us number the rules:

(1) E → E * B(2) E → E + B(3) E → B(4) B → 0(5) B → 1

Page 9: Theory of Compilation 236360 Erez Petrank

Goal: Reduce the Input to the Initial Variable

Example: 0 + 0 * 1B + 0 * 1E + 0 * 1E + B * 1E * 1E * BE

E → E * B | E + B | BB → 0 | 1

E

BE *

B 1

0

We’d like to go over the input so far, and upon seeing a right-hand side of a rule, “invoke” the rule and replace the right-hand side with the left-hand side (which is a single varuable).

B

0

E +

Page 10: Theory of Compilation 236360 Erez Petrank

Use Shift & Reduce

In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules. Example: “0+0*1”.

E → E * B | E + B | BB → 0 | 1

E

BE *

B 1

0

Stack Input action0+0*1$

shift0 +0*1$reduceB +0*1$reduceE +0*1$shiftE+ 0*1$shiftE+0 *1$reduceE+B *1$reduceE *1$shiftE* 1$shiftE*1 $ reduceE*B $ reduceE $ accept

B

0

E +

Page 11: Theory of Compilation 236360 Erez Petrank

11

How does the parser know what to do?

• A state will keep the info gathered so far. • A table will tell it “what to do” based on current

state and next token. • Some info will be kept in a stack.

Page 12: Theory of Compilation 236360 Erez Petrank

LR(0) Parsing

Stack

Parser

Input

Output

Action Table

Goto table

Page 13: Theory of Compilation 236360 Erez Petrank

States and LR(0) Items

• The state will “remember” the potential derivation rules given the part that was already identified.

• For example, if we have already identified E then the state will remember the two alternatives:

(1) E → E * B, (2) E → E + B• Actually, we will also remember where we are in each of

them: (1) E → E ● * B, (2) E → E ● + B• A derivation rule with a location marker is called LR(0) item. • The state is actually a set of LR(0) items. E.g.,

q13 = { E → E ● * B , E → E ● + B}

E → E * B | E + B | BB → 0 | 1

Page 14: Theory of Compilation 236360 Erez Petrank

Intuition• We gather the input token by token until we find a right-

hand side of a rule and then we replace it with the variable on the left side.

• Go over a token and remembering it in the stack is a shift. • Each shift moves to a state that remembers what we’ve

seen so far. • A reduce replaces a string in the stack with the variable

that derives it.

Page 15: Theory of Compilation 236360 Erez Petrank

Why do we need a stack?

• Suppose so far we have discovered E → B → 0 and gather information on “E +”.

• In the given grammar this can only mean E → E + ● B

• Suppose state q6 represents this possibility.

• Now, the next token is 0, and we need to ignore q6 for a minute, and work on B → 0 to obtain E+B.

• Therefore, we push q6 to the stack, and after identifying B, we pop it to continue.

E

BE +

B 0

0

0 (q4,E) (q6,+) (q1,0) top

Page 16: Theory of Compilation 236360 Erez Petrank

The Stack• The stack contains states.• For readability we will also include variables and tokens.

(The algorithm does not need it.)

• Except for the q0 in the bottom of the stack, the rest of the stack will contains pairs of (state, token) or (state, variable).

• The initial stack contains the initial variable only.

• At each stage we need to decide if to shift the next token to the stack (and change to the appropriate state) or apply a “reduce”.

• The action table tells us what to do based on current state and next token: s n : shift and move to qn. r m: reduce according to production rule m. (also accept and error).

Page 17: Theory of Compilation 236360 Erez Petrank

The Goto Table• The second table serves reduces. • After reducing a right-hand side to the deriving variable,

we need to decide what the next state is. • This is determined by the previous state (which is on the

stack) and the variable we got. • Suppose we reduce according to M → β. • We remove β from the stack, and look at the state q that

is currently at the top. GOTO(q,M) specifies the next state.

Page 18: Theory of Compilation 236360 Erez Petrank

For example…(1) E → E * B(2) E → E + B(3) E → B(4) B → 0(5) B → 1 טבלת

gotoטבלת הפעולות

B E $ 1 0 + *

4 3 s2 s1 q0

r4 r4 r4 r4 r4 q1

r5 r5 r5 r5 r5 q2

acc s6 s5 q3

r3 r3 r3 r3 r3 q4

7 s2 s1 q5

8 s2 s1 q6

r1 r1 r1 r1 r1 q7

r2 r2 r2 r2 r2 q8

Page 19: Theory of Compilation 236360 Erez Petrank

An Execution Example: 0 + 0 * 1

q0

gotoטבלת טבלת הפעולות

B E $ 1 0 + *

4 3 s2 s1 q0

r4 r4 r4 r4 r4 q1

r5 r5 r5 r5 r5 q2

acc s6 s5 q3

r3 r3 r3 r3 r3 q4

7 s2 s1 q5

8 s2 s1 q6

r1 r1 r1 r1 r1 q7

r2 r2 r2 r2 r2 q8q0

q1

0

(1) E → E * B(2) E → E + B(3) E → B(4) B → 0(5) B → 1

Page 20: Theory of Compilation 236360 Erez Petrank

An Execution Example: 0 + 0 * 1

q0 q0

q1

0

q0

q4

B

q0

q3

E

q0

q3

E

+

q6

● ● ●

(1) E → E * B(2) E → E + B(3) E → B(4) B → 0(5) B → 1

Page 21: Theory of Compilation 236360 Erez Petrank

The Algorithm Formally• Initiate the stack to q0.

• Repeat until halting: – Consider action [t,q] for q at the head of the

stack and t the next token. – “Shift n”: remove t from the input and push t

and then qn to the stack.

– “Reduce m”, where Rule m is: M → β: • Remove |β| pairs from the stack, let q be the state

at the top of the stack. • Push M and the the state Goto[q,M] to the stack.

– “accept”: halt successfully. – Empty cell: halt with an error.

Page 22: Theory of Compilation 236360 Erez Petrank

22

LR(0) Item

E ➞ 𝛂β

Already matched To be matched

Input

So far we’ve matched α, expecting to see β

Page 23: Theory of Compilation 236360 Erez Petrank

23

Using LR Items to Build the Tables

• Typically a state consists of several LR items. • For example, if we identified a string that is

reduced to E, then we may be in one of the following LR items: E → E ● + B or E → E ● * B

• Therefore one of the states will be:q = {E → E ● + B , E → E ● * B}.

• But if the current state includes E → E + ● B, then we must be able to let B be derived: Closure!

Page 24: Theory of Compilation 236360 Erez Petrank

Adding the Closure• If the current state has the item E → E + ● B, then it

should be possible to add derivation of B, e.g., to 0 or 1. • Generally, if a state includes an item in which the dot

stands before a non-terminal, then we add all the rules for this non-terminal with a dot in the beginning.

• Definition: the closure set for a set of items is the set of items in which for every item in the set of the form A → ● B 𝛂 β, the set contains an item B → ● δ, for each rule of the form B → δ in the grammar.

• Building the closure set is iterative, as δ may also begin with a variable.

Page 25: Theory of Compilation 236360 Erez Petrank

Closure: an Example

• For example, consider the grammar:and the set:

C = { E → E + ● B }• C’s closure is:

clos(C) = { E → E + ● B , B → ● 0 , B → ● 1 }

• And this will be the state that will be used for the parser.

E → E * B | E + B | BB → 0 | 1

Page 26: Theory of Compilation 236360 Erez Petrank

Extended Grammar• Goal: easy termination. • We will assume that the initial variable only appears in a

single rule. This guarantee that the last reduction can be (easily) determined.

• A grammar can be (easily) extended to have such structure.

Example: the grammar (1) E → E * B(2) E → E + B(3) E → B(4) B → 0(5) B → 1

Can be extended into :(0) S → E(1) E → E * B(2) E → E + B(3) E → B(4) B → 0(5) B → 1

Page 27: Theory of Compilation 236360 Erez Petrank

The Initial State• To build the action/goto table, we go through all possible

states possible during derivation. • Each state represents a (closure) set of LR(0) items.

• The initial state q0 is the closure of the initial rule.

• In our example the initial rule is S → ● E, and therefore the initial state is

q0 = clos({S → ● E }) =

{S → ● E , E → ● E * B , E → ● E + B , E → ● B , B → ● 0 , B → ● 1}

We go on and build all possible continuations by seeing a single token (or variable).

Page 28: Theory of Compilation 236360 Erez Petrank

The Next States• For each possible terminal or variable x, and each

possible state (closure set) q, 1. Find all items in the set of q in which the dot is before

an x. We denote this set by q|x

2. Move the dot ahead of the x in all items in q|x. 3. Find the closure of the obtained set: this is the state

into which we move from q upon seeing x.

• Formally, the next set from a set S and the next symbol X– step(S,X) = { N➞αXβ | N➞αXβ in the item set S}– nextSet(S,X) = -closure(step(S,X))

Page 29: Theory of Compilation 236360 Erez Petrank

The Next States• Recall that in our example:

q0 = clos({S → ● E }) =

{S → ● E , E → ● E * B , E → ● E + B ,E → ● B , B → ● 0 , B → ● 1}

Let us check which states are reachable from it.

Page 30: Theory of Compilation 236360 Erez Petrank

States reachable from q0 in the example:

State q1:x = 0q0|0= { B → ● 0 }

q1 = clos(q0|0)

= { B → 0● }

State q2:x = 1q0|1= { B → ● 1 }

q2 = clos({ B → 1 ● })

= { B → 1 }∙

State q3:x = Eq0|E= {S → E , E → E * B , E → E + B}∙ ∙ ∙

q3 = clos ({S → E , E → E * B , E → E + B})∙ ∙ ∙

={S → E , E → E *B , E → E + B}∙ ∙ ∙

State q4:x = Bq0|B = { E → B }∙

q4 = clos ({ E → B })∙

= {E →B }∙

Page 31: Theory of Compilation 236360 Erez Petrank

From these new states there are more reachable states

• From q1, q2, q4, there are no continuations because the dot is in the end of any item in their sets.

• From state q3 we can reach the following two states:

5מצב q:x = *q3|* = { E → E * B }∙

q5 = clos ({ E → E * B })∙

= { E → E * B , B → 0 , B → 1 }∙ ∙ ∙

6מצב q:q6 = { E → E + B , B → 0 , B → 1 }∙ ∙ ∙

Page 32: Theory of Compilation 236360 Erez Petrank

Finally,

• From q5 we can proceed with x=0, or x=1, or x=B. • But for x=0 we reach q1 again and for x=1 we reach q2. • For x=B we get q7:

• Similarly, from q6 with x=B we get q8:

• These two states have no continuations. (Why?)

State q7:clos(q7) = { E → E * B }∙

State q8: clos(q8) = { E → E + B }∙

Page 33: Theory of Compilation 236360 Erez Petrank

Building the Tables

• A row for each state.

• If qj was obtained when x comes after qi, then in row qi and column x, we write j.

gotoטבלת טבלת הפעולות מצב

B E $ 1 0 + *

4 3 2 1 q0

q1

q2

6 5 q3

q4

7 2 1 q5

8 2 1 q6

q7

q8

Page 34: Theory of Compilation 236360 Erez Petrank

Building the tables: accept

• Add “acc” in column $ for each state that has S → E ● as an item.

gotoטבלת טבלת הפעולות מצב

B E $ 1 0 + *

4 3 2 1 q0

q1

q2

acc 6 5 q3

q4

7 2 1 q5

8 2 1 q6

q7

q8

Page 35: Theory of Compilation 236360 Erez Petrank

Building the Tables: Shift

• Any number n in the action table becomes s n.

gotoטבלת טבלת הפעולות מצב

B E $ 1 0 + *

4 3 s2 s1 q0

q1

q2

acc s6 s5 q3

q4

7 s2 s1 q5

8 s2 s1 q6

q7

q8

Page 36: Theory of Compilation 236360 Erez Petrank

Building the Tables: Reduce

• For any state whose set includes the item A → α ●, such that A → α is production rule m (m>o) :

• Fill all columns of that state in the action table with “r m”.

gotoטבלת טבלת הפעולות מצב

B E $ 1 0 + *

4 3 s2 s1 q0

r4 r4 r4 r4 r4 q1

r5 r5 r5 r5 r5 q2

acc s6 s5 q3

r3 r3 r3 r3 r3 q4

7 s2 s1 q5

8 s2 s1 q6

r1 r1 r1 r1 r1 q7

r2 r2 r2 r2 r2 q8

Page 37: Theory of Compilation 236360 Erez Petrank

Note LR(0)

• When a reduce is possible, we execute it without checking the next token.

gotoטבלת טבלת הפעולות מצב

B E $ 1 0 + *

4 3 s2 s1 q0

r4 r4 r4 r4 r4 q1

r5 r5 r5 r5 r5 q2

acc s6 s5 q3

r3 r3 r3 r3 r3 q4

7 s2 s1 q5

8 s2 s1 q6

r1 r1 r1 r1 r1 q7

r2 r2 r2 r2 r2 q8

Page 38: Theory of Compilation 236360 Erez Petrank

38

Another Example

Z ➞ expr EOF

expr ➞ term | expr + term

term ➞ ID | ( expr )

Z ➞ E $

E ➞ T | E + T

T ➞ i | ( E )

(just shorthand of the grammar on the top)

Page 39: Theory of Compilation 236360 Erez Petrank

39

Example: Parsing with LR Items

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

} Closure

Input: i + i $

Page 40: Theory of Compilation 236360 Erez Petrank

40

Input: i + i $

T ➞ i Reduce item!

i

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

Shift

Page 41: Theory of Compilation 236360 Erez Petrank

41

Input: i + i $

i

E ➞ T Reduce item!

TZ ➞ E $E ➞ T | E + TT ➞ i | ( E )

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Page 42: Theory of Compilation 236360 Erez Petrank

42

Input: i + i $

T

E ➞ T Reduce item!

i

E

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

Page 43: Theory of Compilation 236360 Erez Petrank

43

Input: i + i $

T

i

E +

E ➞ E+ T

Z ➞ E$

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

Page 44: Theory of Compilation 236360 Erez Petrank

44

Input: i + i $

E ➞ E+ T

Z ➞ E$E ➞ E+T

T ➞ i

T ➞ ( E )

E + i

T

i

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

Page 45: Theory of Compilation 236360 Erez Petrank

45

Input: i + i $

E ➞ E+ T

Z ➞ E$E ➞ E+T

T ➞ i

T ➞ ( E )

E + i $

T

i

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

T ➞ i

Reduce item!

Page 46: Theory of Compilation 236360 Erez Petrank

Input: i + i $

E + T

T

i

E ➞ E+ T

Z ➞ E$E ➞ E+T

T ➞ i

T ➞ ( E )

i

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

Page 47: Theory of Compilation 236360 Erez Petrank

47

Input: i + i $

E + T

T

i

E ➞ E+ T

Z ➞ E$E ➞ E+T

T ➞ i

T ➞ ( E )

i

E ➞ E+T

$

Reduce item!

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

Page 48: Theory of Compilation 236360 Erez Petrank

48

Input: i + i $

E $

E

T

i

+ T

Z ➞ E$

E ➞ E+ T

i

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

Page 49: Theory of Compilation 236360 Erez Petrank

49

Input: i + i $

E $

E

T

i

+ T

Z ➞ E$

E ➞ E+ T

Z ➞ E$

Reduce item!

i

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

Page 50: Theory of Compilation 236360 Erez Petrank

50

Input: i + i $

Z

E

T

i

+ T

Z ➞ E$

E ➞ E+ T

Z ➞ E$

Reducing the initial rulemeans accept

E $

i

E ➞ T

E ➞ E + T

T ➞ i

T ➞ ( E )

Z ➞ E $

Z ➞ E $E ➞ T | E + TT ➞ i | ( E )

Page 51: Theory of Compilation 236360 Erez Petrank

51

View as an LR(0) Automaton

Z ➞ E$E ➞ T

E ➞ E + T

T ➞ iT ➞ (E)

T ➞ (E)E ➞ T

E ➞ E + T

T ➞ iT ➞ (E)

E ➞ E + TT ➞ (E) Z ➞ E$

Z ➞ E$E ➞ E+

T

E ➞ E+TT ➞ i

T ➞ (E)

T ➞ i

T ➞ (E)E ➞ E+T

E ➞ Tq0

q1

q2

q3

q4

q5

q6 q7

q8

q9

T

(

i

E

+

$T

)

+

E

i

T

(i

(

Page 52: Theory of Compilation 236360 Erez Petrank

52

GOTO/ACTION TableState i + ( ) $ E T

q0 s5 s7 1 6q1 s3 s2q2 r1 r1 r1 r1 r1 r1 r1q3 s5 s7 4q4 r3 r3 r3 r3 r3 r3 r3q5 r4 r4 r4 r4 r4 r4 r4q6 r2 r2 r2 r2 r2 r2 r2q7 s5 s7 8 6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5

(1)Z ➞ E $(2)E ➞ T (3)E ➞ E +

T(4)T ➞ i (5)T ➞( E )

rn = reduce using rule number nsm = shift to state m

Page 53: Theory of Compilation 236360 Erez Petrank

53

GOTO/ACTION Tablest i + ( ) $ E Tq0 s5 s7 s1 s6q1 s3 s2q2 r1 r1 r1 r1 r1 r1 r1q3 s5 s7 s4q4 r3 r3 r3 r3 r3 r3 r3q5 r4 r4 r4 r4 r4 r4 r4q6 r2 r2 r2 r2 r2 r2 r2q7 s5 s7 s8 s6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5

Stack Input Actionq0 i + i $ s5q0 i q5 + i $ r4q0 T q6 + i $ r2q0 E q1 + i $ s3q0 E q1 + q3 i $ s5q0 E q1 + q3 i q5 $ r4q0 E q1 + q3 T q4

$ r3

q0 E q1 $ s2q0 E q1 $ q2 r1q0 Z

top is on the right

(1)Z ➞ E $(2)E ➞ T (3)E ➞ E +

T(4)T ➞ i (5)T ➞( E )

Page 54: Theory of Compilation 236360 Erez Petrank

54

Are we done?

• Can make a transition diagram for any grammar • Can make a GOTO table for every grammar

• But the states are not always clear on what to do• Cannot make a deterministic ACTION table for

every grammar

Page 55: Theory of Compilation 236360 Erez Petrank

55

LR(0) Conflicts

Z ➞ E $E ➞ T E ➞ E + TT ➞ i T ➞ ( E )T ➞ i[E]

Z ➞ E$E ➞ T

E ➞ E + TT ➞ i

T ➞ (E)T ➞ i[E] T ➞ i

T ➞ i[E]

q0

q5

T

(

i

E Shift/reduce conflict

Page 56: Theory of Compilation 236360 Erez Petrank

56

View in Action/Goto Table• shift/reduce conflict…

State i + ( ) $ [ E Tq0 s5 s7 1 6q1 s3 s2q2 r1 r1 r1 r1 r1 r1 r1 r1q3 s5 s7 4q4 r3 r3 r3 r3 r3 r3 r3 r3q5 r4 r4 r4 r4 r4 r4/s10 r4 r4q6 r2 r2 r2 r2 r2 r2 r2 r2q7 s5 s7 8 6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5 r5

Page 57: Theory of Compilation 236360 Erez Petrank

57

LR(0) Conflicts

Z ➞ E $E ➞ T E ➞ E + TT ➞ i V ➞ iT ➞ (E)

Z ➞ E$E ➞ T

E ➞ E + TT ➞ iV ➞ i

T ➞ (E) T ➞ iV ➞ i

q0

q5

T

(

i

E reduce/reduce conflict

Can there be a shift/shift conflict?

Page 58: Theory of Compilation 236360 Erez Petrank

58

View in Action/Goto Table• reduce/reduce conflict…

State i + ( ) $ E Tq0 s5 s7 1 6q1 s3 s2q2 r1 r1 r1 r1 r1 r1 r1q3 s5 s7 4q4 r3 r3 r3 r3 r3 r3 r3q5 r4/r5 r4/r5 r4/r5 r4/r5 r4/r5 r4/r5 r4/r5q6 r2 r2 r2 r2 r2 r2 r2q7 s5 s7 8 6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5

Page 59: Theory of Compilation 236360 Erez Petrank

59

LR(0) Conflicts

• Any grammar with an -rule cannot be LR(0)• You should always be able to tell if an (invisible)

should be reduced to A, without looking at the next token.

• Inherent shift/reduce conflict– A➞ - reduce item– P ➞αAβ – shift item– A➞ can always be predicted from P ➞αAβ (and should

be predicted without looking at any token).

Page 60: Theory of Compilation 236360 Erez Petrank

60

Back to Action/Goto Table• Note that reductions ignore the input…

State i + ( ) $ E Tq0 s5 s7 1 6q1 s3 s2q2 r1 r1 r1 r1 r1 r1 r1q3 s5 s7 4q4 r3 r3 r3 r3 r3 r3 r3q5 r4 r4 r4 r4 r4 r4 r4q6 r2 r2 r2 r2 r2 r2 r2q7 s5 s7 8 6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5

Page 61: Theory of Compilation 236360 Erez Petrank

61

SLR Grammars

• A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N

• A reduce item N ➞ 𝛂 is applicable only when the look-ahead is in FOLLOW(N)

• Differs from LR(0) only on the originally “reduce” rows.

• We will not reduce sometimes and will shift (or do nothing = error) instead.

Page 62: Theory of Compilation 236360 Erez Petrank

62

GOTO/ACTION TableState i + ( ) $ E T

q0 s5 s7 1 6q1 s3 s2q2 r1q3 s5 s7 4q4 r3 r3 r3 r3 r3 r3 r3q5 r4 r4 r4 r4 r4 r4 r4q6 r2 r2 r2 r2 r2 r2 r2q7 s5 s7 8 6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5

(1)Z ➞ E(2)E ➞ T (3)E ➞ E +

T(4)T ➞ i (5)T ➞( E )

Rule 1 can only be used with the end of input “$” sign.

Page 63: Theory of Compilation 236360 Erez Petrank

63

GOTO/ACTION TableState i + ( ) $ E T

q0 s5 s7 1 6q1 s3 s2q2 r1q3 s5 s7 4q4 r3 r3 r3q5 r4 r4 r4 r4 r4 r4 r4q6 r2 r2 r2 r2 r2 r2 r2q7 s5 s7 8 6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5

(1)Z ➞ E(2)E ➞ T (3)E ➞ E +

T(4)T ➞ i (5)T ➞( E )

The things that can follow E are ‘+’ ‘)’ and ‘$’.

Page 64: Theory of Compilation 236360 Erez Petrank

64

GOTO/ACTION TableState i + ( ) $ E T

q0 s5 s7 1 6q1 s3 s2q2 r1q3 s5 s7 4q4 r3 r3 r3q5 r4 r4 r4 r4 r4 r4 r4q6 r2 r2 r2q7 s5 s7 8 6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5

(1)Z ➞ E(2)E ➞ T (3)E ➞ E +

T(4)T ➞ i (5)T ➞( E )

The things that can follow E are ‘+’ ‘)’ and ‘$’.

Page 65: Theory of Compilation 236360 Erez Petrank

65

GOTO/ACTION TableState i + ( ) $ E T

q0 s5 s7 1 6q1 s3 s2q2 r1q3 s5 s7 4q4 r3 r3 r3q5 r4 r4 r4q6 r2 r2 r2q7 s5 s7 8 6q8 s3 s9q9 r5 r5 r5 r5 r5 r5 r5

(1)Z ➞ E(2)E ➞ T (3)E ➞ E +

T(4)T ➞ i (5)T ➞( E )

Same for T

Page 66: Theory of Compilation 236360 Erez Petrank

66

GOTO/ACTION TableState i + ( ) $ E T

q0 s5 s7 1 6q1 s3 s2q2 r1q3 s5 s7 4q4 r3 r3 r3q5 r4 r4 r4q6 r2 r2 r2q7 s5 s7 8 6q8 s3 s9q9 r5 r5 r5

(1)Z ➞ E(2)E ➞ T (3)E ➞ E +

T(4)T ➞ i (5)T ➞( E )

Same for T

Page 67: Theory of Compilation 236360 Erez Petrank

67

The shift/reduce conflict

Z ➞ E $E ➞ T E ➞ E + TT ➞ i T ➞ ( E )T ➞ i[E]

Z ➞ E$E ➞ T

E ➞ E + TT ➞ i

T ➞ (E)T ➞ i[E] T ➞ i

T ➞ i[E]

q0

q5

T

(

i

E Shift/reduce conflict

Page 68: Theory of Compilation 236360 Erez Petrank

68

Now let’s add “T ➞ i[E]” State i + ( ) $ [ ] E T

q0 s5 s7 1 6q1 s3 s2q2 r1q3 s5 s7 4q4 r3 r3 r3 s 10q5 r4 r4 r4q6 r2 r2 r2q7 s5 s7 8 6q8 s3 s9q9 r5 r5 r5

(1)Z ➞ E(2)E ➞ T (3)E ➞ E +

T(4)T ➞ i (5)T ➞( E )(6)T ➞ i[E]

Page 69: Theory of Compilation 236360 Erez Petrank

69

SLR: check next token when reducing

• Simple LR(1), or SLR(1), or SLR. • Example demonstrates elimination of a

shift/reduce conflict. • Can eliminate reduce/reduce conflict when

conflicting rule’s variables satisfy FOLLOW(T) ≠ FOLLOW(V).

• But cannot solve all conflicts.

Page 70: Theory of Compilation 236360 Erez Petrank

70

Consider the following grammar

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

Page 71: Theory of Compilation 236360 Erez Petrank

71

S’ → SS → L = RS → RL → * RL → idR → L

S’ → S

S → L = RR → L

S → R

L → * RR → LL → * RL → id

L → id

S → L = RR → LL → * RL → id

L → * R

R → L

S → L = R

S

L

R

id

*

=

R

*

id

R

L*

L

id

q0

q4

q7

q1

q3

q9

q6

q8

q2

q5

Page 72: Theory of Compilation 236360 Erez Petrank

72

Shift/reduce conflict

• S → L = R vs. R → L • FOLLOW(R) contains =

– S ⇒ L = R ⇒ * R = R

• SLR cannot resolve the conflict either

S → L = RR → L

S → L = RR → LL → * RL → id

=q6

q2

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

Page 73: Theory of Compilation 236360 Erez Petrank

Resolving the Conflict• In SLR: a reduce item N ➞ α is applicable only when the

look-ahead is in FOLLOW(N).• But there is a full sentential form that we have discovered

so far. • We can ask what the next token may be given all the

derivation made so far. • For example, even looking at the follow of the entire

sentential form is more restrictive than looking at the follow of the last variable.

• In a way, FOLLOW(N) merges look-ahead for all alternatives for N

follow(σA) follow(A)

• LR(1) keeps look-ahead with each LR item

Page 74: Theory of Compilation 236360 Erez Petrank

74

LR(1) Item• LR(1) item is a pair: [ LR(0) item , Look-ahead token ]

• Meaning: We matched the part left of the dot, looking to match the part on the right of the dot, followed by the look-ahead token.

• Example: the production L ➞ id yields the following LR(1) items

[L → ● id, *][L → ● id, =][L → ● id, id][L → ● id, $][L → id ●, *][L → id ●, =][L → id ●, id][L → id ●, $]

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

Page 75: Theory of Compilation 236360 Erez Petrank

75

Creating the states for LR(1)

• We start with the initial state:

q0 will be the closure of: (S’ → ● S , $)

• Closure for LR(1): • For every [A → α ● Bβ , c] in the state:

– for every production B→δ and every token b in the grammar such that b FIRST(βc)

– Add [B → ● δ , b] to S

Page 76: Theory of Compilation 236360 Erez Petrank

Closure of (S’ → ● S , $)

• We would like to add rules that start with S, but keep track of possible lookahead.

• (S’ → ● S , $)• (S → ● L = R , $) :Rules for S• (S → ● R , $)• (L → ● * R , = )

:Rules for L• (L → ● id , = )• (R → ● L , $ ) :Rules for R• (L → ● id , $ ) :More rules

for L• (L → ● * R , $ )

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

Page 77: Theory of Compilation 236360 Erez Petrank

The State Machine 0מצב(S’ → S , $)∙(S → L ∙ = R , $)(S → R , $)∙(L → ∙ * R , = )(L → ∙ id , = )(R → L , $ )∙(L → id , $ )∙(L → * R , $ )∙

1מצב (S’ → S , $)∙

2מצב (S → L ∙ = R , $)(R → L , $)∙

3מצב (S → R , $)∙

4מצב (L → * R , =)∙(R → L , =)∙(L → ∙ * R , =)(L → ∙ id , =)(L → * R , $)∙(R → L , $)∙(L → ∙ * R , $)(L → ∙ id , $)

5מצב (L → id , $)∙(L → id , =)∙

6מצב (S → L = R , $)∙(R → L , $)∙(L → ∙ * R , $)(L → ∙ id , $)

7מצב (L → * R , =)∙(L → * R , $)∙

8מצב (R → L , =)∙(R → L , $)∙

S

L

R

id

*

=

R

*

id

L

(0) S’ → S(1) S → L = R(2) S → R(3) L → * R(4) L → id(5) R → L

Page 78: Theory of Compilation 236360 Erez Petrank

0מצב (S’ → S , $)∙(S → L ∙ = R , $)(S → R , $)∙(L → ∙ * R , = )(L → ∙ id , = )(R → L , $ )∙(L → id , $ )∙(L → * R , $ )∙

1מצב (S’ → S , $)∙

2מצב (S → L ∙ = R , $)(R → L , $)∙

3מצב (S → R , $)∙

4מצב (L → * R , =)∙(R → L , =)∙(L → ∙ * R , =)(L → ∙ id , =)(L → * R , $)∙(R → L , $)∙(L → ∙ * R , $)(L → ∙ id , $)

5מצב (L → id , $)∙(L → id , =)∙

6מצב (S → L = R , $)∙(R → L , $)∙(L → ∙ * R , $)(L → ∙ id , $)

7מצב (L → * R , =)∙(L → * R , $)∙

8מצב (R → L , =)∙(R → L , $)∙

9מצב (S → L = R , $)∙S

L

R

id

*

=

R

*

id

R

L

*

L

id11מצב

12מצב

10 מצב

The State Machine

Page 79: Theory of Compilation 236360 Erez Petrank

1מצב (S’ → S , $)∙

2מצב (S → L ∙ = R , $)(R → L , $)∙

3מצב (S → R , $)∙

5מצב (L → id , $)∙(L → id , =)∙

6מצב (S → L = R , $)∙(R → L , $)∙(L → ∙ * R , $)(L → ∙ id , $)

7מצב (L → * R , =)∙(L → * R , $)∙

8מצב (R → L , =)∙(R → L , $)∙

9מצב (S → L = R , $)∙S

L

R

id

=

R

id

R

L

*

Lid

10מצב (L → * R , $)∙(R → L , $)∙(L → ∙ * R , $)(L → ∙ id , $)

11מצב (L → id , $)∙

12מצב (R → L , $)∙

13מצב (L → * R , $)∙

id

R

L

*

id

The State Machine

Page 80: Theory of Compilation 236360 Erez Petrank

80

Back to the conflict

• Is there a conflict now?

(S → L ∙ = R , $)(R → L ∙ , $)

(S → L = ∙ R , $)(R → ∙ L , $)(L → ∙ * R , $)(L → ∙ id , $)

=

q6

q2

Page 81: Theory of Compilation 236360 Erez Petrank

81

Bottom-up Parsing

• LR(K)• SLR• LALR

• All follow the same pushdown-based algorithm• Differ on type of “LR Items”

Page 82: Theory of Compilation 236360 Erez Petrank

Building the Tables

• Similarly to SLR, we start with the automaton. • Turn each transition in a token column to a shift.• The variables columns form the goto section. • The “acc” is put in the $ column, for any rule that contains

(S’ → S● , $) . • For any state that contains an item of the form [A →β●,a],

and any rule of the form A →β whose number is m, use “reduce m” for the row of this state and the column of token a.

Page 83: Theory of Compilation 236360 Erez Petrank

Building the TableGoto Actions

L R S $ = * id

2 3 1 s4 s5 0

acc 1

r5 s6 2

r2 3

8 7 s4 s5 4

r4 r4 5

12 9 s10 s11 6

r3 r3 7

r5 r5 8

r1 9

12 13 s10 s11 10

r4 11

r5 12

r1 13

Page 84: Theory of Compilation 236360 Erez Petrank

84

LALR• LR tables have large number of entries• Often don’t need such refined observation (and

cost)• LALR idea: find states with the same LR(0)

component and merge their look-ahead component as long as there are no conflicts

• LALR not as powerful as LR(1)• For a language like C:

– SLR cannot handle some of the structures and requires hundreds of states.

– CLR handles everything but requires thousands of states.

– LALR handles everything with hundreds of states (not much higher than SLR).

Page 85: Theory of Compilation 236360 Erez Petrank

85

Chomsky Hierarchy

Regular

Context free

Context sensitive

Recursively enumerable

Finite-state automaton

Non-deterministic pushdown automaton

Linear-bounded non-deterministic

Turing machine

Turing machine

Page 86: Theory of Compilation 236360 Erez Petrank

86

Grammar Hierarchy

Non-ambiguous CFG

CLR(1)

LALR(1)

SLR(1)

LL(1)

LR(0)

Page 87: Theory of Compilation 236360 Erez Petrank

87

Summary• Bottom up derivation• LR(k) can decide on a reduce after seeing the

entire right side of the rule plus k look-ahead tokens.

• particularly LR(0). • Using a table and a stack to derive. • LR Items and the automaton. • Creating the table from the automaton. • LR parsing with pushdown automata• LR(0), SLR, LR(1) – different kinds of LR items,

same basic algorithm• LALR: in the exercise.

Page 88: Theory of Compilation 236360 Erez Petrank

88

Next time

• Semantic analysis