17
제 4 제 제제 제제 제제제제 제제

제 4 장 어휘 분석 컴파일러 입문. Lexical Analysis the process by which the compiler groups certain strings of characters into individual tokens. Lexical Analyzer

Embed Size (px)

Citation preview

Page 1: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

제 4 장 어휘 분석

컴파일러 입문

Page 2: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

SourceProgram

Lexical AnalyzerTokenStream

4.1 서 론

Lexical Analysis the process by which the compiler groups certain

strings of characters into individual tokens.

Lexical Analyzer Scanner Lexer

Text p.130

Page 3: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

Token 문법적으로 의미 있는 최소 단위

Token - a single syntactic entity(terminal symbol).

Token Number - string 처리의 효율성 위한 integer number.

Token Value - numeric value or string value.

ex) if ( a > 10 ) ...

Token Number : 32 7 4 25 5 8 Token Value : 0 0 ‘a’ 0 10 0

Page 4: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

Token classes Special form - language designer

1. Keyword --- const, else, if, int, ...2. Operator symbols --- +, -, *, /, ++, -- etc.3. Delimiters --- ;, ,, (, ), [, ] etc.

General form - programmer4. identifier --- stk, ptr, sum, ...5. constant --- 526, 3.0, 0.1234e-10, ‘c’, “string” etc.

Token Structure - represented by regular expres-sion.

ex) id = (l + _)( l + d + _)*

Page 5: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

Interaction of Lexical Analyzer with Parser Lexical Analyzer is the procedure of Syntax Ana-

lyzer.

L.A. Finite Automata.

S.A. Pushdown Automata.

Token type scanner 가 parser 에게 넘겨주는 토큰 형태 .

(token number, token value)

ex) if ( x > y ) x = 10 ; (32,0) (7,0) (4,x) (25,0) (4,y) (8,0) (4,x) (23,0) (5,10) (20,0)

SourceProgram

Lexical Analyzer(=Scanner)

Shift(get-token)ReduceAcceptError

Syntax Analyzer(=Parser)

get token

token

Page 6: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

The reasons for separating the analysis phase of compiling into lexical analysis(scanning) and syntax analysis(parsing).

1. modular construction - simpler design.2. compiler efficiency is improved.3. compiler portability is enhanced.

Parsing table Parser 의 행동 (Shift, Reduce, Accept, Error) 을 결정 .

Token number 는 Parsing table 의 index.

Tokennum State

Page 7: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

Symbol table 의 용도 L.A 와 S.A 시 identifier 에 관한 정보를 수집하여 저장 . Semantic analysis 와 Code generation 시에 사용 . name + attributes

ex) Hashed symbol table

- chapter 12 참조

attributesname

symbol tablebucket

Page 8: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

4.2 토큰 인식

Specification of token structure - RE Specification of PL - CFG Scanner design steps

1. describe the structure of tokens in re.2. or, directly design a transition diagram for the tokens.3. and program a scanner according to the diagram.4. moreover, we verify the scanner action through regular

language theory.

Character classification letter : a | b | c... | z | A | B | C |…| Z l digit : 0 | 1 | 2... | 9 d special character : + | - | * | / | . | , | ...

Page 9: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

S Astartl, _

l, d, _

4.2.1 Identifier Recognition

Transition diagram

Regular grammar S lA | _A A lA | dA | _A | ε

Regular expression S = lA + _A = (l + _)A A = lA + dA + _A + ε = (l + d + _)A + ε = (l + d + _)*

S = (l + _)( l + d + _)*

Page 10: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

Form : 10 진수 , 8 진수 , 16 진수로 구분되어진다 . 10 진수 : 0 이 아닌 수 시작

8 진수 : 0 으로 시작 , 16 진수 : 0x, 0X 로 시작

Transition diagram

4.2.2 Integer number Recognition

S An

D

start

B C

E

0o

x, Xh

o

h

d

n : non-zero digito : octal digit h : hexa digit

Page 11: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

Regular grammar

S nA | 0B A dA | ε B oC | xD | XD | ε

C oC | ε D hE E hE | ε

Regular expression E = hE + ε = h*ε = h* D = hE = hh* = h+

C = oC + ε = o* B = oC + xD + XD + ε = o+ + (x + X)D = o+ + (x + X)h+ + ε A = dA + ε = d*

S = nA + 0B = nd* + 0(o+ + (x + X)h+ + ε) = nd* + 0 + 0o+ + 0(x + X)h+

∴ S = nd* + 0 + 0o+ + 0(x + X)h+

Page 12: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

S Cd

start o

d

A B D E

G

F

.

d e

d

d

+

-

d d

d

4.2.3 Real number Recognition

Form : Fixed-point number & Floating-point number Transition diagram

Regular grammar S dA D dE | +F | -G A dA | .B E dE |ε B dC F dE C dC | eD |ε G dE

Page 13: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

Regular expressionE = dE + ε = d* F = dE = dd* = d+ G = dE = dd* = d+

D = dE + '+'F + -G = dd* + '+'d+ + -d +

= d+ + '+'d+ + -d+ = (ε + '+' +-)d +

C = dC + eD + ε = dC+e(ε + '+' +-)d+ + e

= d*(e(ε + '+' +-) d+ + ε)

B = dC=dd*(e(ε + '+' +-)d+ +ε)

= d++(e(ε + '+' +-) d+ +ε)

A = dA + .B

= d*.d+(e(ε + '+' +-)d+ + ε)

S = dA

= dd*. d+(e(ε + '+' +-) d+ +ε)

= d+.d+(e(ε + '+' +-) d+ + ε)

= d+.d++ d+.d+e(ε + '+' +-) d+

참고 Terminal + 를 ‘ +’ 로 표기 .

Page 14: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

Form : a sequence of characters between a pair of double

quotes.

Transition diagram

where, a = char_set - {", \} and c = char_set

Regular grammar

S "A A aA | "B | \C B ε C cA

Bstart" "

A

c\

a

S

C

4.2.4 String Constant Recognition

Page 15: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

Regular expression

A = aA + " B + \C

= aA + " + \cA

= (a + \c)A + "

= (a + \c)* "

S = " A

= "(a + \c)*"

∴ S = "(a + \c)* "

Page 16: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

a

start S /*

A DB C

*

*/

b

4.2.5 Comment Recognition

Transition diagram

where, a = char_set - {*} and b = char_set - {*, /}.

Regular grammarS /AA *BB aB | *CC *C | bB | /DD ε

Page 17: 제 4 장 어휘 분석 컴파일러 입문.  Lexical Analysis  the process by which the compiler groups certain strings of characters into individual tokens.  Lexical Analyzer

Regular expression

C = *C + bB + /D = **(bB + /)

B = aB + ***(bB + /)

= aB + ***bB + ***/

= (a + *** b)B + ***/= (a + ***b)****/

A = *B = *(a + ***b)****/

S = /A = /* (a + ***b)****/

A program which recognizes a comment statement.

do {

while (ch != '*') ch = getchar(); ch = getchar();

} while (ch != '/');