1 語言 ( Language ) 是由文法 ( Grammar ) 來描述其靜態 (Static) 結構。而文法可藉由語法描述工具 ( 如 BNF, Contex-Free Grammar ) 來表達。一個 Contex-Free

1

語言 (Language) 是由文法 (Grammar) 來描述其靜態 (Static) 結構。而文法可藉由語法描述工具 (如 BNF, Contex-Free Grammar) 來表達。一個 Contex-Free Grammar 共有四部份 : 1. 一個由終端符號 (Terminal Symbol / Token) 所構成的集合。 2. 一個由非終端符號 (Nonterminal Symbol) 所構成的集合。 3. 一個由產生規則 (Production rules) 所構成的集合。 4. 並指出非終端符號中的一個作為開始符號 (Start Symbol) 。Def: a context-free grammar consists of the following: 1. A set T of terminals. 2. A set N of nonterminals( disjoint from T) 3. A set P of productions, or grammar rules, of the form A where A is an element of N and is an element of (T N)* 4. A start symbol S from the set N.

2

語法解析程式之設計 : 如何判定輸入敘述句合乎文法 ?

語言由文法來描述其 static 結構文法藉由 BNF (Backus-Normal / Naur Form) 表達

　英文句子 ( Sentence ) 用 BNF 描述，則如下述

<Sentence> :: = <Subject> <Verb> <Object>

<Subject> :: = <Noun> | <Noun> <Adverb>

<Verb> :: = likes | gets | helps

<Object> :: = <Noun> | <Adjective> <Noun>

<Noun> ::= Horse | Man

<Adverb> ::= extremely | always

<Adjective> ::= beautiful | dirty

　以上述之文法 ( 以 BNF 描述者 ) 可以判斷下面三句是真句子或不合乎句子之文法

Horse always helps dirty Man. ( 是句子 )

dirty Man extremely likes beautiful Horse. ( 不是句子 )

Man dirty gets Horse beautiful. ( 不是句子 )

non-terminal

terminal 終端記號

3

以圖形 ( 構文樹 Parse tree ; 語法樹 ) 表示上頁之第一個句子

< Sentence >

< Subject > < Verb > < Object >

< Noun > < Adverb > < Adjective > < Noun >

Horse always helps dirty Man

But

ton-

up

pars

ing

而語法 ( 構文 ) 解析程式之功能如下圖 :

構文解析表parsing table

驅動副程式 driver

( 輸出 )構文樹

( 輸入 )單語串

文法解析 : parsing

4

例如： Simplified Pascal Grammar.<prog> ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END.

<prog-name> ::= id

<dec-list> ::= <dec> | <dec-list> ; <dec>

<dec> ::= <id-list> : <type>

<type> ::= INTEGER

<id-list> ::= id | <id-list> , id

<stmt-list> ::= <stmt> | <stmt-list> ; <stmt>

<stmt> ::= <assign> | <read> | <write> | <for>

<assign> ::= id := <exp>

<exp> ::= <term> | <exp> + <term> | <exp> - <term>

<term> ::= <factor> | <term> * <factor> | <term> DIV <factor>

<factor> ::= id | int | (<exp>)

<read> ::= READ (<id-list>)

<write> ::= WRITE (<id-list>)

<for> ::= FOR <index-exp> DO <body>

<index-exp> ::= id := <exp> TO <exp>

<body> ::= <stmt> | BEGIN <stmt-list> END

5

Driver

℗ 根據文法規則將單語串建構成語法樹

單語串語法樹

語法解析程式

Parsing table

Driver

如何建構語法解析表 (parsing table)?

方式有二 :

1. Bottom-up parsing

2. Top-down parsing

stack

6

1. (LR 解析法 )

例示 : 有一組文法 G 定義如下 : 1. E ::= E + T 2. E ::= T 3. T ::= T * F 4. T ::= F 5. F ::= (E) 6. F ::= id

根據左邊之文法 G 定義之文法解析表 (parsing table) 如下 :

state Action Gotoid + * ( ) $ E T F

0 1 2 3 4 5 6 7 8 9 10 11

s5 s4 s6 acc r2 s7 r2 r2

1 2 3

r4 r4 r4 r4s5 s4 r6 r6 r6 r6

8 2 3

s5 s4 s5 s4 s6 s11 r1 s7 r1 r1

9 3 10

r3 r3 r3 r3 r5 r5 r5 r5

acc: accept; 接受 (done!)

si: shift to state i

ri: reduce the ith production

blank: error occurs

Driver單語串語法樹Parsing table

Driver

LR Parser

Input bufferstack

7

STACK INPUT BUFFER actions

0 id * id + id $ s5

0 id 5 * id + id $ r6

0 F 3 * id + id $ goto 3 及 r4

0 T 2 * id + id $ goto 2 及 s7

0 T 2 * 7 id + id $ s5

0 T 2 * 7 id 5 + id $ r6

0 T 2 * 7 F 10 + id $ goto 10 及 r3

0 T 2 + id $ goto 2 及 r2

0 E 1 + id $ goto 1 及 s6

0 E 1 + 6 id $ s5

0 E 1 + 6 id 5 $ r6

0 E 1 + 6 F 3 $ goto 3 及 r4

0 E 1 + 6 T 9 $ goto 9 及 r1

0 E 1 $ goto 1 及 accept !

8

E

E T +

T

T * F

F

id

idF

id

1

2 3

4

5

6

7

8

Bottom

-up

parsin

g

想一想 : (1) How to create the LR parsing table ?

(2) How many LR parsing tables are there ?

9

LR Parsing tables include:

(1) Simple LR parsing table {SLR}

(2) Canonical LR parsing table { LR}

(3) LookAhead LR parsing table {LALR}

利用

左述

三種

方法

產生

的語

法解

析表

長相

均相

同,唯有

狀態

數不

同

SLR parsing table 之建構流程 : (you might recall the processing of NFADFA)

( 一 ) 增加一條語法規則 (0)S’ ::= S {S 表文法之 start symbol} 於文法G 中 , 令新語法名曰 G’

( 二 ) 計算 items I 之 closure 集合 closure(I)

( 三 ) 計算 grammar symbol X 之 goto 值 I’ goto(I, X) 並求其 closure(I’)

( 四 ) 重覆上一步驟直到不再產生新的 goto 值 { 而得 item set}

( 五 ) 計算所有 nonterminals 之 FIRST 與 FOLLOW 集合

( 六 ) 藉 Item 與 FOLLOW 兩集合建置 parsing table

10

例示 : 0. E’ ::= E 1. E ::= E + T 2. E ::= T 3. T ::= T * F 4. T ::= F 5. F ::= (E) 6. F ::= id

E’ .E E .E+T

E .T T .T*F

T .F F .(E) F .id

I0:

I1: E’ E. E E.+T

E

I2: E T. T T.*F

I3: T F.

I4: F (.E) E .E+T

E .T T .T*F

T .F F .(E) F .id

I5: F id.

(

F

id

T

I6: E E+.T T .T*F

T .F F .(E)

F .id

+

I7: T T*.F F .(E)

F .id

*

I8: F (E.) E E.+T

T

I9: E E+T. T T.*FT

I10: T T*F.I11: F (E).

)

F

Canonical LR(0) collection for grammar G’

11

1. 回顧上一頁之 I5, 當您於該 state 遇到的 input symbol 是F 的 follow 集合中之某一個 token 時 , 則可建構 F id 的 parse tree; 學術上言 , 即是 reduce 6 ( i.e. r6 是也 ).

2. 一個 nonterminal A 之 follow 集合意指何事 ?

A 質言之 , 即所有可能跟在 A 之後之 input symbol 之集合 .

E T +

T

T * F

F

id

idF 1

2 3

4

5

6

7

E8

id

12

3. 要計算出 FOLLOW 集合卻不能不先求 FIRST, Why ?

By Definition:

FIRST() = { a| a } { | }

FOLLOW(A) = { a| S A a } { $ | S A }

* *

* *

Here are conventions:

Terminals: a, b, c, d, 0, 1, +, (, ), begin

Non-terminals: A, B, C, D, S, <word>

Vocabulary symbols: U, V, W, X, Y, Z

Strings of terminals: u, v, w, x, y, z

Strings of vocabulary symbols: , ,

A ::= is the same as A

means n times derivation, for n 0*

13

根據定義 , FIRST(X) 集合之計算可依下列三步驟而得 :

1. If X , then FIRST(X) = { X }.

2. If X ::= , then add to FIRST(X).

3. If X N, and X ::= Y1 Y2 . . . Yn , then

add all non- elements of FIRST(Y1) to FIRST(X),

if FIRST(Y1), then

add all non- elements of FIRST(Y2) to FIRST(X),

. . .

if FIRST(Yn), then

add to FIRST(X).

14

例 3: 文法 G 定義如下 :

E TE’

E’ +TE’ |

T FT’

T’ *FT’ |

F (E) | id

則其 FIRST 求解如下 :

FIRST

E ( idE’ + T ( id

T’ * F ( id

特例 1: 文法 G 定義為 :

S ABCDEF

A a | | D+

B b | | C

C c | | E | *F

D d | | D/

E e | | A+

F f | | B*

FIRST

D d /

A a d / +

E a d e / +

C a c d e / + *

B a b c d e / + *

F a b c d e f / + *

S a b c d e f / + *

15

根據定義 , FOLLOW(X) 集合之計算可依下列三步驟而得 :

1. Put $ into FOLLOW(S’)

2. For each A B , where

add all non- elements of FIRST() to FOLLOW(B)

3. For each A B or A B , where FIRST()

add all of FOLLOW(A) to FOLLOW(B)

以本章之第一例示而言 , 其 FIRST 與 FOLLOW 可求解如下 :

FIRST FOLLOW

E’ ( id $

E ( id + ) $

T ( id * + ) $

F ( id * + ) $

16

例三之 FOLLOW 之求解 :

FOLLOW

E $ )E’ $ )T + $ )T’ + $ )F * + $ )

特例 1 之 FOLLOW 求解 :

( 一 ) ( 二 ) ( 三 )

S $

A (1) f b c e a d / + * (6) $

B (2) (5)

C (3) (4)

D (4) (3)

E (5) (2)

F (1) $ f b c e a d / + *

17

I0I1 I6 I9

I2

I3

I4

I5

I8I11

I7 I10

E + T *I7

FI3

(

I4

id

I5

T

F

(

id

*

F

id

I5(

(

E )

+

I6

id

T

FI2

I3

18

藉 Item 與 FOLLOW 兩集合建置 parsing table對於每一個 state Ii , 其 action 之決定乃根據下述五者之一行之 :

(1) shift action:

若 [ A ::= . a ] 在 Ii 中 , 且 goto (Ii , a) = Ij , a ,

則令 action[i. a] = <shift> j

(2) reduce action:

若 [ A ::= . ] 在 Ii 中 , 此處 A S’ ,

則對集合 FOLLOW(A) 中之每一個 input symbol a ,

令 action[i, a] = <reduce> 文法 A ::=

(3) accept action:

若 [S’ ::= S.] 在 Ii 中 ,

則令 action[ i, $] = accept

(4) goto action:

若 Ii 中 dot(.) 之前面是 non-terminal A; 即 GOTO (Ii , A) = Ij ,

則令 GOTO [ i, A] = j

(5) 無法被上述 action 所定義之表格欄位則是錯誤欄

19

1. Once a parsing table is done, the said grammar G is called SLR(1) grammar.

2. 每一個 SLR(1) 文法必定非曖昧 (unambiguous), 然而也有一些非曖昧文法卻不是 SLR(1) 文法 ;如次頁之例 . 其原因乃在於 SLR(1) 不足於記憶已看過之資訊 (e.g. left context), 也因而有 canonical 與LookAhead LR 之世出 .

3. 曖昧文法 (ambiguous grammar) 之定義 :

A grammar that produces more than one parse tree for some sentence is said to be ambiguous.1. Augment the grammar to G’

2. Compute the FIRST & FOLLOW sets3. Create the collection items (carefully judge the conflict)

4. Make the parsing table.

20

G: S ::= L=R | R L ::= id | *R R ::= L

其 LR(0) item set 則為 :

I0: S’ ::= .S S ::= .L=R

S ::= .R L ::= .*R L ::= .id R ::= .L

I1: S’ ::= S.

I2: S ::= L.=R R ::= L.

I3: S ::= R.

I4: L ::= *.R R ::= .L

L ::= .*R L ::= .id

I5: L ::= id.

I6: S ::= L=.R R ::= .L

L ::= .*R L ::= .id

I7: L ::= *R.

I8: R ::= L.

I9: S ::= L=R.

First Follow

S’ * id $

S * id $

L * id = $

R * id $ =

觀察 I2 : 由於對 input symbol “=“ 不知該 shift 或 reduce ?

此即 shift/reduce conflict

21

鑑于 historical information 尚需融入 state Ii 之中 , 以解決 shift/reduce conflict !

今以一實例說明 canonical LR parsing table 之建構 :

G’: S’ ::= S S ::= CC C ::= cC

C ::= d

求 LR(1) item set

S’ ::= .S ,$S ::= .CC ,$

C ::= .cC ,c/d C ::= .d ,c/d

I0

First($)

First(C$)

I1

I2

I3

I4

S’ ::= S. ,$I1

S ::= C.C ,$C ::= .cC ,$C ::= .d ,$

I2

C ::= c.C ,c/dC ::= .cC ,c/dC ::= .d ,c/d

I3

C ::= d. ,c/dI4

S ::= CC. ,$

C ::= c.C ,$C ::= .cC ,$C ::= .d ,$

C ::= d. ,$

C ::=cC. ,c/d

C ::= cC. ,$

I5

I6

I7

I8

I9

需求取 First 集合而不必求 Follow 集合 !

22

之建構步驟

1. Augment the grammar G into grammar G’.

2. Construct C={I0, I1, . . . , In}, the collection of sets LR(1) items for G’.

3. State i of the parser is constructed from Ii. The parsing actions for state i are determined as follows:

(a) If [A . a, b] is in Ii and goto(Ii , a) = Ij, then set action[i,a] to “shift j.”

(b) If [A . , a] is in Ii , A S’, then set action[i, a] to “reduce A . ”

(c) If [S’ S. , $] is in Ii , then set action[i, a] to “accept.”

If a conflict results from the above rules, the grammar is said NOT to be LR(1), and the algorithm is said to fail.

4. The goto transitions for state i are determined as follows:

If goto(Ii , a) = Ij, then goto[i, a] = j.

5. All entries not defined by rules (3) and (4) are made “error”.

6. The initial state is I0.

23

State action goto c d $ S C0 s3 s4 1 21 acc 2 s6 s7 53 s3 s4 84 r3 r35 r16 s6 s7 97 r38 r2 r29 r2

Canonical parsing table for grammar G’

Every SLR(1) grammar is an LR(1) grammar, but for an SLR(1) grammar the canonical LR parser may have more states than the SLR parser for the same grammar.

The grammar of the previous example is SLR and has an SLR parser with sevenstates, compared with the ten shown above.

由於就 the number of states 而言 ,canonical LR parser 實在太龐大 , 因此時常難以落實 , 而 SLR parser 卻又能力有所未逮 , 於是 LALR parser 於焉誕生 ; 其狀態數與 SLR parser 完成相同 , 然 shift/reduce conflict 較少發生 .

24

1.Show that the following grammar Ｇ is not LR(1). Ｇ：　　 S ---> AaAb │ BbBa A ---> ε B ---> ε

2. Prove S ---> aSa │ a is not LR(1).

3. Find the sets of FIRST and FOLLOW of every nonterminals of grammar Ｇ　 below. Ｇ：　　 S ----> a │ (T) │ ε T ----> T@S │ S

25

4. Consider the following grammar

E ::= E + T | T T ::= TF | F F ::= F* | a | b

construct the SLR parsing table for this grammar.

5. Show that the following grammar

S ::= Aa | bAc | Bc | bBa A ::= d B ::= d

is LR(1).

6. Construct an SLR parsing table for the grammar

E E sub R | E sup E | {E} | c

R E sup E | E

Resolve the parsing action conflict so that expressions will be parsed in the same way as by the LR parser.

26

S’

S

A

B

b, d

b, d

d

d

FIRST

0

1

2

3

4

5

6

7

8

9

10

11

12

a b c d $ S A B

action GOTO

s3 s5 1 2 4

acc

s6

s9 7 8

s10

r5 r6

r1

s11

s12

r6 r5

r3

r2

r4

27

7. Which of the following grammars are SLR(1)? LR(1)? Justify your answers.

a. S id = E;

E E + P

E P

P id

P (E)

P id = E

b. S id = A;

A id = A

A E

E E + P

E P

P id

P (A)

28

c. S id = A;

A P E

P id = P

P

E E + P

E P

P id

P (A)

d. S SAP

S A

A AP

A P

A

P (aP)

P

P b

NOT LR(1)

NOT LR(1)

29

1. 根據 page 19 之方法求得 a collection of sets of LR(1) items.

2. 將所有 items 中之 first components 相同者 (the same cores ) 合併 .

例示 : 以 page 19 之實例來說 , 可資合併者包括 : 3 and 6, 4 and 7, 8 and 9 等三對 , 合併而成 :

C ::= c.C ,c/d/$C ::= .cC ,c/d/$C ::= .d ,c/d/$

C ::= d. ,c/d/$ C ::=cC. ,c/d/$

I36 I47 I89

State action goto c d $ S C0 s36 s47 1 21 acc 2 s36 s47 536 s36 s47 8947 r3 r3 r35 r189 r2 r2 r2

Comes up with

30

1. I4 與 I7 有何不同 ( 在語法解析上 )?

由於文法產生的集合是 : c*dc*d, 當輸入資料是 cc…cdcc...cd 時 ,parser 推移 cc…cd 到 stack 中 , 則進入 state 4, 然後再供給 input symbol c 或 d 時 , 則 parser 將做 reduction C d, 而當第二次看到 d 時才會進入 state 7, 且會看到 $ 而 reduce the third production rule. 此即兩者之異 .

2. 若將 I4 與 I7 合併 , 則一方面雖可減少狀態數 , 另一方面卻可能帶來新問題 ! What kind problem ? Shift/reduce conflict ?

Goto conflict ?( 深思之 )…

@ 都不是 , 而是 reduce-reduce conflict.

S’ SS aAdS bBdS aBeS bAe

A cB c

S’ .S , $S .aAd , $S .bBd ,$S .aBe ,$S .bAe ,$

S’ S. ,$

S a.Ad ,$S a.Be ,$A .c ,dB .c ,e

S b.Bd ,$S b.Ae ,$B .c ,dA .c ,e

S aA.d, $

S aB.e , $

A c. , dB c. , e

S bB.d , $

S bA.e ,$

B c. ,dA c. ,e

A c. ,d/eB c. ,d/e

31

1. 將文法改成 operator-precedence grammar ; 即 (1) no production right side is ,

and (2) no production has two adjacent nonterminals.

2. 求解所有 nonterminals 之 LEADING 與 TRAILING 兩集合 ;

LEADING(A) = { a | A Ka , K N { } }

TRAILING(A) = { a | A aK , K N { } }

3. 根據下列五者之一決定任意兩 terminals 之 precedences:

(1) a = b if A ::= aKb (2) a < b if A ::= aB and B Kb

(3) a > b if A ::= Bb and B aK (4) $ < a if S Ka

(5) a > $ if S aK

4. 利用 stack 與上述方法產生之 parsing table 執行語法解析 ( 以一實例說明之 .)

任意兩 terminals 之間必有順位值 , 一旦看到 < 與 > 成對 , 即 reduce 該 sub-parse tree.

直到左右兩“ $” 遙遙相對即大功告成 !

+

+

+ +

+

32

E E + TE TT T * PT PP (E)P id

LEADING TRAILING

E + * ( id + * ) id

T * ( id * ) id

P ( id ) id

+ * ( ) id $

+ > < < > < >

* > > < > < >

( < < < = <

) > > > >

id > > > >

$ < < < < acc

33

Stack input buffer actions

$ id + id * id + id $ $ < id

$ < id + id * id + id $ id > +

$ + id * id + id $ $ < +

$ < + id * id + id $ + < id

$ < + < id * id + id $ id > *

$ < + * id + id $ + < *

$ < + < * id + id $ * < id

$ < + < * id + id $ id > +

$ < + < * + id $ * > +

$ < + + id $ + > +

$ < + id $ $ < +

$ < + id $ + < id

$ < + < id $ id > $

$ < + $ + > $

$ $ ACC

運算子順位之語法解析

34

( 一 ) 最早的語法解析方式 1. 利用 recursive procedure 撰寫

2. 可能需要 back-tracking token

例示 S ::= cAdA ::= abA ::= a

若 input string 是 cad 其 top-down parsing 如下 :

S

c A d

(1)

S

c A d

(2)

a b

S

c A d

a (3)

Procedure S( )begin if input symbol = ‘c’ then ADVANCE( ) if A( ) then if input symbol = ‘d’ then ADVANCE( ) return true end if end if end if return falseend

Procedure A( )begin isave = input-point if input symbol = ‘a’ then ADVANCE( ) if input symbol = ‘b’ then ADVANCE( ) return true end if input-point = isave // 無法找到 ab // if input symbol = ‘a’ then ADVANCE( ) return true end if else return false end ifend

35

上述方法之缺失1. 若文法有 left-recursion 則陷入無限循環 ; 如 A A .

2. Backtracking 造成時間浪費 , 若能 lookahead 則可免 .

3. 選擇 production rules 之次序會影響執行 parsing 之效率 .

4. 一旦錯誤被檢測出 , 其 error messages 往往只能用 syntax error 唐塞 .改善之策

1. Elimination of Left Recursion

若 A A 1 | A 2 | . . . A n | 1 | 2 | . . . | m 則可改成 :

A 1 A’ | 2 A’ | . . . | m A’

A’ 1 A’ | 2 A’ | . . . n A’|

實例 1. E ::= E + T 2. E ::= T 3. T ::= T * F 4. T ::= F 5. F ::= (E) 6. F ::= id

E TE’

E’ +TE’ |

T FT’

T’ *FT’ |

F (E) | id不過 , 若遇到間接遞迴 , 則需更 powerful 的方法 !

(to be continued)S Aa | b

A Ac | Sd | S Aa Sda

36

Input: Grammar G with no cycles or - productions.

Output: An equivalent grammar with no left recursion.

Method: (Note that the resulting non-left-recursive grammar may have - productions)

1. Arrange the nonterminals in some order A1, A2, . . . , An .

2. for i := 1 to n do begin

for j := 1 to i-1 do begin

replace each production of the form Ai Aj by the production

Ai 1 |2 | . . . |k , where Aj 1 |2 | . . . |k are all the current

Aj-productions;

end

eliminate the immediate left recursion among the Ai-productions

end

37

S Aa | b

A Ac | Sd |

We order the nonterminals S, A.

There is no immediate left recursion among the

S-production, so nothing happens during step (2) for the case i =1.

For i =2 , we substitute the S-productions in A Sd to obtain the following A-productions.

A Ac| Aad | bd |

Eliminating the immediate left recursion among the A-productions yields the following grammar.

S Aa | b

A bdA’ | A’

A’ cA’ | adA’ |

38

Example:

A Ba | Aa |c

B Bb | Ab |d

↓

A BaA’ | cA’

A’ aA’ | ε

B Bb | Ab | d

↓

A BaA’ | cA’

A’ aA’ |ε

B Bb | BaA’b | cA’b | d

↓

A BaA’ | cA’

A’ aA’ |ε

B cA’bB’ | dB’

B’ bB’ | aA’bB’ |ε

39

If A 1 | 2 | . . . | n |

Then after left factoring process:

A A’ |

A’ 1 | 2 | . . . | n

實例若文法 G 為 S iCtS | iCtSeS | a

C b

則提左因子後 : S iCtSS’ | a

S’ eS |

C b

Driver單語串語法樹Parsing table

Driver

LL Parser

Input bufferstack

做了 eliminating left recursion 與 left factoring , 則文法 G 可以

利用一種不需要 backtracking 之 recursive-decent parser (i.e.,a predictive parser) 外加 stack

來執行 Top-Down parsing.

40

E TE’

E’ +TE’ |

T FT’

T’ *FT’ |

F (E) | id

1. E ::= E + T 2. E ::= T 3. T ::= T * F 4. T ::= F 5. F ::= (E) 6. F ::= id

經過 eliminating left recursion 與 left factoring

之後

FIRST

F ( idT’ * T ( id

E’ + E ( id

FOLLOW

E $ )E’ $ )T + $ )T’ + $ )F * + $ )

How to create a predictive parsing table{ LL(1) }:How to create a predictive parsing table{ LL(1) }:

1. Compute the sets of First and Follow for each nonterminal.

2. For each production A of the grammar, do steps 3 and 4.

3. For each terminal a in First( ) , add A to M[A, a].

4. If is in First( ) , add A to M[A, b] for each terminal b in Follow(A).

If is in First( ) and $ is in Follow(A), add A to M[A,$] .

5. Make each undefined entry of M be error.

41

Id + * ( ) $

E

E’

T

T’

F

ETE’ E TE’

E’ +TE’ E’ E’

T FT’ T FT’

T’ T’ *FT’ T’ T’

F id F (E)

Stack Input Buffer Actions$ E id + id * id $ 1. M[E, id] = ETE’$ E’ T id + id * id $ 2. M[T, id] = T FT’ $ E’ T’ F id + id * id $ 3. M[F, id] = F id $ E’ T’ id id + id * id $ Pop-up id & advance to next token$ E’ T’ + id * id $ 4.M[T’,+] = T’ $ E’ + id * id $ 5.M[E’, +] = E’ +TE’ $ E’ T + + id * id $ Pop-up + & advance to next token$ E’ T id * id $ $ E’ T’ F id * id $$ E’ T’ id id * id $$ E’ T’ * id $$ E’ T’ F * * id $$ E’ T’ F id $$ E’ T’ id id $$ E’ T’ $$ E’ $$ $

上表謂之 LL(1) parsing table.

42

G: S 2S’

S’ 1AS’ |

A B1A |

B 3B’

B’ B1 |

S

S’

A

B

B’

FIRST FOLLOW

2

1 ,

3 ,

3

3 ,

$

$

1 , $

1

1

S

S’

A

B

B’

1 2 $ 3

S 2S’

S’ 1AS’

A

B’

S’

A

A B1A

B 3B’

B’ B1

43

G: E id E’

E’ E + E’ | E * E’ | E’ E C |

C +E’ | *E’

E

E’

C

FIRST FOLLOW

id + , * , $

+ , * , + , * , $

+ , * + , * , $

E

E’

C

+ * id $

E id E’

E’ E’ E’ E C E’

C +E’ C *E’

44

A Production rule is written as or ::= .

A phrase structure grammar G is a quadruple (N, , P, S), where

N: finite set of non-terminals.

: finite set of alphabet (terminals).

P: a set of products.

S: the start symbol.

Example:

G1 = ({A, S}, {0,1}, P, S) where P is S 0A1

0A 00A1

A

If is a string in (N)* and is a production in G, then we say

directly derives and write .

: derives in one or more steps.

: derives in zero or more steps.

+

*

45

If S then is called a Sentential Form of G.

If S x then x is called a Sentence of G.

The language generated by G, written L(G), is {x| x * and Sx}

Now,

G1 = ({A, S}, {0,1}, P, S) where P is S 0A1

0A 00A1

A , therefore

L(G1) = {0n1n | n > 0}

CONVENTIONS:

Terminals: a, b, c, d, 0, 1, +, (, ), begin

Non-terminals: A, B, C, D, S, <word>

Vocabulary symbols: U, V, W, X, Y, Z

Strings of terminals: u, v, w, x, y, z

Strings of vocabulary symbols: , ,

*

*

*

46

Type 0: Unrestricted Grammars

any

Type 1: Context Sensitive Grammars(CSG)

for all , || ||

Type 2: Context Free Grammars(CFG)

for all , N (i.e., A )

Type 3: Right (or Left)-Linear Grammars

if all productions are of the form

A x or A xB

G2 = ({S, B, C}, {a, b, c}, P, S)P: S aSBC

S abCCB BCbB bbbC bccC cc

Which Type ? What the language is?

G3 :S S + SS S * SS (S)S a

Which Type ?What language ?

47

An Ambiguous Grammar is one for which some sentence has two or more different parse trees.

// Show that the last one at previous page is ambiguous grammar//

// Try to prove the following CFG grammar is ambiguous:

S AB | CD

A 0A |

B 1B2 |

C 2C |

D 0D1 | //

// Try to prove the following CFG grammar is ambiguous:

S if X then S | M

M if X then M else S

X X + T | T

T T * F | F

F (X) | a //

48

L = {0n1n | n 1} is a Context Free Language ?

Yes, since S 0S1 | 01 generates L.

A RECOGNIZER is a machine (system) with a finite description that can accept a terminal string for some grammar and determine whether the string is in the language accepted by the grammar.

A PARSER can, in addition, find a derivation for the string.

PARSING Alternatives:

Suppose we want to parse id * id + id in G0 : E E + T | T

T T * F | F

F (E) | id , then

E

E + T

T F

T * F id

F id

id

This parse tree might be created with left-most derivation or right-most derivation as follows:

49

E E + T

T + T

T * P + T

P * P + T

id * P + T

id * id + T

id * id + P

id * id + id

lm

lm

Try it yourself !

50

Prove the grammar G with productions S 0S1 | 01 accepts exactly L={0 n1 n | n1}

PROOF: First show L(G) L (i.e., the grammar generates only string in L.)

Inductive hypothesis: If w L(G) derived in k steps, then w L.

Basis: k=1, the only one-step derivation is S01 and 01 L.

Inductive step: assume inductive hypothesis is true for k = k0 1; show true for k = k0 +1>1.

Since k >1 the first step must be S 0S1 0x1 = w.

But S x is of no more then k0 steps, so by hypothesis x L, say x = 0 i1 i , i 1.

Then w = 0x1 = 0 i+11 i+1 L.

Now show L L(G) (i.e., the grammar generates all strings of L.)

Inductive hypothesis: If w L and |w| = 2k, w L(G).

Basis: k=1, the only string in L of length 2 is 01. But S01 so 01 L(G).

Inductive step: assume inductive hypothesis is true for k=k0 1; show true for k = k0 +1>1.

Since the length of w is 2k, w = 0k1k . By inductive hypothesis 0k-11k-1 L(G) and thus

S 0k-11k-1. So S 0S1 0 0k-11k-1 1 = w is a valid derivation for w. Thus w L(G).

L L(G), so L = L(G).

K-1

K-1

* *

51

A Push-Down Automaton (PDA) is a septuple P=(Q, , , , q0, z, F), where

Q is finite set of states,

is a finite input alphabet,

is a finite stack alphabet,

maps elements of Q x ( x {}) x into finite subsets of Q x *

q0 Q is start state,

z is start stack symbol,

F Q is set of final states.

Example: Let P=({q0, q1, q2}, {0,1}, {Z, 0}, , q0, Z, {q0}) where

(q0, 0, Z) = {(q1, 0Z)}(q1, 0, 0) = {(q1, 00)}(q1, 1, 0) = {(q2, )}(q2, 1, 0) = {(q2, )}(q2, , Z) = {(q0, )}

L(P)={0n1n| n 0} ? Why ?

52

A Configuration of P is a triple (q, w, ) Q x * x *.

A Move (q, aw, Z) (qi , w, i ) occurs if (qi , i ) (q, a, Z).

An Initial Configuration is (q0, w, Z).

A string w is Accepted by P if (q0, w, Z) (q, , ) for q F, *.

The Language Accepted by P, L(P) is the set of all strings P accepts.

*

接續上一頁之話題 :(q0 , 0011, Z) (q1 , 011, 0Z)

(q1 , 11, 00Z)(q2 , 1, 0Z) (q2 , , Z)(q0 , , )

用暫代

Now, try to build a PDA that accepts L={wwR | w (0, 1)+}.

53

(q0 , 0 , Z) = {(q0 , 0Z) }

(q0 , 1 , Z) = {(q0 , 1Z) }

(q0 , 0 , 1) = {(q0 , 01) }

(q0 , 1 , 0) = {(q0 , 10) }

(q0 , 0 , 0) = {(q0 , 00), (q1 , ) }

(q0 , 1 , 1) = {(q0 , 11), (q1 , ) }

(q1 , 0 , 0) = {(q1 , ) }

(q1 , 1 , 1) = {(q1 , ) }

(q1 , , Z) = {(q1 , ) }

Two items are included, thus it is a

Nondeterministic PDA.

54

A Deterministic PDA is one in which

(1). q Q, Z , whenever (q , , Z) , then (q , a , Z)= a .

(2). q Q, a ( {}), Z , (q , a , Z) contains at most one element.

Converting a CFG to a PDA :

For each production A , make (q, ) (q , , A) .

For each a , make (q, ) (q , a , a) .

Show whether some specific language L is a CFL ?

1. If L is NOT a CFL, then we may prove it by pumping lemma of CFL.

2. If L is a CFL, then we may prove it by

(a) giving a deterministic/nondeterministic pushdown automaton for L( but

sometime this DPDA doesn’t exist, since DPDA accepts only a subset of

all CFL’s) or,

(b) giving a context-free grammar for L.

55

Theorem: For any CFL L, there exists a constant p depending on L such that z L,

where |z| p, z may be written as z = uvwxy such that

1. |vx| 1 (i.e., both are not )

2. |vwx| p

3. uviwxiy L i 0 . { 證明相似於 RE.}

Prove L ={ aibici | i 0} is NOT a CFL.

Proof:

If it were, by pumping lemma of CFL, p>0 z L where |z| p,

let z = apbpcp = uvwxy such that

(i). |vx| 1

(ii). |vwx| p

(iii). uviwxiy L i 0 .

56

But (1) suppose vwx = aj , j p, then uwy = ap-lbpcp L, since |vx|0, l 0 .

It is a contradiction to (iii) uwy L when let i=0 .

The same argument holds for vwx = bj or vwx = cj.

(2) suppose vwx = ajbk , j,k p, then uwy = ap-l’bp-l’’cp L, since |vx|0,

either l’ 0 or l’’ 0 or both.

It is a contradiction to (iii) uwy L when let i=0 .

The same argument holds for vwx = bjck .

(3) suppose vwx = ajbpck , but |vwx| p,

so vwx cannot contain both a’s and c’s.

Thus, there are no pumpable substrings.

It concludes that L cannot be context free.

57

Begin by extending to FIRSTk and FOLLOWk:

FIRSTk() = { w | ( |w| < k and w) or ( |w| = k and wx for some x) }**

*

The domain of FIRSTk is extended to sets of strings in the natural way.

FOLLOWk(A) = { w | S A and w FIRSTk( ) }*

G is LL(k) for some fixed k iff whenever there are two leftmost derivations

S wA w w x

and

S wA w w y

and , then FIRSTk(x) FIRSTk(y) .

58

S Abc | aAcb

A | b | c

For left-sentential form S:

FIRST1(Abc) = { b, c } FIRST1(aAcb) = { a }

For left-sentential form Abc:

FIRST1(bc) = { b } FIRST1(bbc) = { b }

FIRST1(cbc) = { c }

FIRST2(bc) = { bc } FIRST2(bbc) = { bb }

FIRST2(cbc) = {cb }

In left-sentential form Acb:

FIRST2(cb) = { cb } FIRST2(bcb) = { bc }

FIRST2(ccb) = {cc }

No multiply defined entries

No multiply defined entries

We know LL(2) grammar.

59

FIRST1 FOLLOW1 FIRST2 FOLLOW2

S a, b, c $ ab,ac,bb,bc,cb $$

A , b, c b, c , b, c bc, cb

Some grammars are not LL(k) for any k. For instance,

S A | B

A aAb | 0

B aBbb | 1 L(G) = {an0bn | n 0} {an1b2n | n 0} is not LL(k) .

Assume it were, S A an0bn , S B an1b2n for any n.

Let k = 2m, m I+, then FIRSTk(a2m0b2m ) = FIRSTk(a2m1b4m ),

But A B. Since k is arbitrary, the G is not LL(k) for any k.

60


S AaAb | BbBa

A

B is LL(1) but not SLR(1).


S Aa | bAc | dc | bda

A d is LL(1). LR(1) ? SLR(1) ?


S Aa | bAc | Bc | bBa

A d

B d is LR(1). LL(1) ? Not LALR(1).

61

Write an SLR(1) parser for the simple programming language described below . Each time the parser makes a reduction, print out the production used.( Later this printing will be replaced by the generation of intermediate code, don’t scatter the print statements if that will cause trouble in the future.) If you discover a syntax error in the input, issue an error message and discard the offending token. (We will assume all errors are caused by additional garbage in the input.)

Note: You need not write a program to build FIRST or FOLLOW, the canonical collection of items, or the parse table. These can be by hand.

62

<program> ::= <decl-list> <proc-list> <stmt-list>< decl-list> ::= <decl-list> <decl> | ε< decl > ::= DECLARE <id-list> : <type><id-list> ::= <id-list>, <id> | <id><type> ::= INT | CHAR<proc-list> ::= <proc-list> <proc> | <proc><proc> ::= PROC <id> <stmt-list> CORP<stmt-list> ::= <stmt-list> <stmt> |ε<stmt> ::= <assign> | <if> | <loop> | <call><assign> ::= <id> := <expr><if> ::= IF <test> THEN <stmt-list> ELSE <stmt-list> FI<loop> ::= FOR <id> := <expr> TO <expr> DO <stmt-list> FOR<call> ::= CALL <id><test> ::= <test> AND <alt> | <alt><alt> ::= <alt> OR <rel> | <rel><rel> ::= <expr> <relop> <expr> | ( <test>) | NOT ( <test> )<relop> ::= > | > = | =<expr> ::= <expr> <addop> <term> | <term><addop> ::= + | -<term> ::= <term> <mulop> <prim> | <prim><mulop> ::= * | /<prim> ::= ( <expr> ) | <id> | <number> | <char>The <id>, <number>, and <char> are tokens returned by the scanner.

63

TOKEN TYPES

The TokenType codes returned by the scanner (you should design it ! ) are shown below. Only <id>s, <number>s, <char>s, and keywords have a Token Value ----the index of the entry in the symbol table. All other tokens have NO_VALUE (0) for their TokenValue.

DECLARE ….1 <id> ..………32 INT ………….3 <number>…..33 CHAR ………4 <char>……... 34 PROC ……….7 , ………... 35 CORP ……….8 : ………... 36 FOR …………9 := ……….. 37 TO ………….10 ( ………….. 38 DO ………… 11 ) …….…… 39 ROF ……….. 12 > …………. 42 IF ………….. 13 >= …………. 43 THEN ………14 = ………….. 44 ELSE ………..15 + …………… 46 FI ……………16 - …………….47 CALL ……….17 * …………..…48 AND ..............18 / ………….…49 OR ……….…19 <eof>…….....100 NOT ………..20

64

GRADING

The parser is due on Jan. 7 at the start of class. You should hand in the following:a ． The first ten states in your canonical collection of items.b ． Your complete parser table IN A READABLE FORMAT. If your table can be read easi

ly form the listing you need not turn in another copy. DO NOT TURN IN YOUR ONLY COPY.

c ． The compiled listing of your program.d ． The output of a run against PARSER DATA I (will be given later).

Grading will be as follows:a ． 50%- Is the design correct ？ Was proper use made of the SLR(1) techniques as discus

sed in class ？b ． 40%- Were good programming techniques used ？ (This includes things like those list

ed in ELMENTS OF PROGRAMMING STYLE BY Kernighan and plauger. Also: minimizing globals; structuring data and control flow; using symbolic constants; mnemonic names; good comments; use of enumerated data types; perspicuous; naturalness of algorithm implementation; Consistent, readable indention; overall readablility. )

c ． 10% - Does it run properly on test data provided ？d ． 10% - per day (or part thereof) penalty for late submission.

65

1. Define (a) grammar (b) regular grammar (c) context-free grammar. (10%)

2. A language L is called a regular component if L = ,for some , Show that every infinite regular language contains a component. (10%)

3. (1) Give the three regular operations.(2) Let . Give three regular expressions over Σ. (10%)

4. (1) Give the definition of pushdown automaton. How a pushdown automaton move ? How a pushdown automaton accept a word ? How a pushdown automaton accept a language.(2) Give the definition of linear bounded automaton. (20%)

5. (1) Give the definition of Theorem. (2) Let Σ={a, b, c}. By using the Theorem show that the language L = is not a context-free language.

wuv *, vu v

},{ ba

uvwxyuvwxy

}2|{ ncba nnn

66

6. Answer the following questions.(1) What is the Chomsky Normal Form for context-free grammar ?(2) What is the Greibach Normal Form for context-free grammar ? (10%)

7. If and are context-free language, then the catenation of the languages and is a context-free language. Prove it . (10%)

8. Describe the so call Turing Machine if you can. (10%)

1L 2L 21LL

1L 2L

Documents

1 語言 ( Language ) 是由文法 ( Grammar ) 來描述其靜態 (Static) 結構。 而文法可藉由語法描述工具 ( 如 BNF, Contex-Free Grammar ) 來表達。 一個 Contex-Free

1 語言 ( Language ) 是由文法 ( Grammar ) 來描述其靜態 (Static) 結構。而文法可藉由語法描述工具 ( 如 BNF, Contex-Free Grammar ) 來表達。一個 Contex-Free