Upload
rafe-patterson
View
239
Download
6
Embed Size (px)
Citation preview
제 4 장 어휘 분석
컴파일러 입문
SourceProgram
Lexical AnalyzerTokenStream
4.1 서 론
Lexical Analysis the process by which the compiler groups certain
strings of characters into individual tokens.
Lexical Analyzer Scanner Lexer
Text p.130
Token 문법적으로 의미 있는 최소 단위
Token - a single syntactic entity(terminal symbol).
Token Number - string 처리의 효율성 위한 integer number.
Token Value - numeric value or string value.
ex) if ( a > 10 ) ...
Token Number : 32 7 4 25 5 8 Token Value : 0 0 ‘a’ 0 10 0
Token classes Special form - language designer
1. Keyword --- const, else, if, int, ...2. Operator symbols --- +, -, *, /, ++, -- etc.3. Delimiters --- ;, ,, (, ), [, ] etc.
General form - programmer4. identifier --- stk, ptr, sum, ...5. constant --- 526, 3.0, 0.1234e-10, ‘c’, “string” etc.
Token Structure - represented by regular expres-sion.
ex) id = (l + _)( l + d + _)*
Interaction of Lexical Analyzer with Parser Lexical Analyzer is the procedure of Syntax Ana-
lyzer.
L.A. Finite Automata.
S.A. Pushdown Automata.
Token type scanner 가 parser 에게 넘겨주는 토큰 형태 .
(token number, token value)
ex) if ( x > y ) x = 10 ; (32,0) (7,0) (4,x) (25,0) (4,y) (8,0) (4,x) (23,0) (5,10) (20,0)
SourceProgram
Lexical Analyzer(=Scanner)
Shift(get-token)ReduceAcceptError
Syntax Analyzer(=Parser)
get token
token
The reasons for separating the analysis phase of compiling into lexical analysis(scanning) and syntax analysis(parsing).
1. modular construction - simpler design.2. compiler efficiency is improved.3. compiler portability is enhanced.
Parsing table Parser 의 행동 (Shift, Reduce, Accept, Error) 을 결정 .
Token number 는 Parsing table 의 index.
Tokennum State
Symbol table 의 용도 L.A 와 S.A 시 identifier 에 관한 정보를 수집하여 저장 . Semantic analysis 와 Code generation 시에 사용 . name + attributes
ex) Hashed symbol table
- chapter 12 참조
attributesname
symbol tablebucket
4.2 토큰 인식
Specification of token structure - RE Specification of PL - CFG Scanner design steps
1. describe the structure of tokens in re.2. or, directly design a transition diagram for the tokens.3. and program a scanner according to the diagram.4. moreover, we verify the scanner action through regular
language theory.
Character classification letter : a | b | c... | z | A | B | C |…| Z l digit : 0 | 1 | 2... | 9 d special character : + | - | * | / | . | , | ...
S Astartl, _
l, d, _
4.2.1 Identifier Recognition
Transition diagram
Regular grammar S lA | _A A lA | dA | _A | ε
Regular expression S = lA + _A = (l + _)A A = lA + dA + _A + ε = (l + d + _)A + ε = (l + d + _)*
S = (l + _)( l + d + _)*
Form : 10 진수 , 8 진수 , 16 진수로 구분되어진다 . 10 진수 : 0 이 아닌 수 시작
8 진수 : 0 으로 시작 , 16 진수 : 0x, 0X 로 시작
Transition diagram
4.2.2 Integer number Recognition
S An
D
start
B C
E
0o
x, Xh
o
h
d
n : non-zero digito : octal digit h : hexa digit
Regular grammar
S nA | 0B A dA | ε B oC | xD | XD | ε
C oC | ε D hE E hE | ε
Regular expression E = hE + ε = h*ε = h* D = hE = hh* = h+
C = oC + ε = o* B = oC + xD + XD + ε = o+ + (x + X)D = o+ + (x + X)h+ + ε A = dA + ε = d*
S = nA + 0B = nd* + 0(o+ + (x + X)h+ + ε) = nd* + 0 + 0o+ + 0(x + X)h+
∴ S = nd* + 0 + 0o+ + 0(x + X)h+
S Cd
start o
d
A B D E
G
F
.
d e
d
d
+
-
d d
d
4.2.3 Real number Recognition
Form : Fixed-point number & Floating-point number Transition diagram
Regular grammar S dA D dE | +F | -G A dA | .B E dE |ε B dC F dE C dC | eD |ε G dE
Regular expressionE = dE + ε = d* F = dE = dd* = d+ G = dE = dd* = d+
D = dE + '+'F + -G = dd* + '+'d+ + -d +
= d+ + '+'d+ + -d+ = (ε + '+' +-)d +
C = dC + eD + ε = dC+e(ε + '+' +-)d+ + e
= d*(e(ε + '+' +-) d+ + ε)
B = dC=dd*(e(ε + '+' +-)d+ +ε)
= d++(e(ε + '+' +-) d+ +ε)
A = dA + .B
= d*.d+(e(ε + '+' +-)d+ + ε)
S = dA
= dd*. d+(e(ε + '+' +-) d+ +ε)
= d+.d+(e(ε + '+' +-) d+ + ε)
= d+.d++ d+.d+e(ε + '+' +-) d+
참고 Terminal + 를 ‘ +’ 로 표기 .
Form : a sequence of characters between a pair of double
quotes.
Transition diagram
where, a = char_set - {", \} and c = char_set
Regular grammar
S "A A aA | "B | \C B ε C cA
Bstart" "
A
c\
a
S
C
4.2.4 String Constant Recognition
Regular expression
A = aA + " B + \C
= aA + " + \cA
= (a + \c)A + "
= (a + \c)* "
S = " A
= "(a + \c)*"
∴ S = "(a + \c)* "
a
start S /*
A DB C
*
*/
b
4.2.5 Comment Recognition
Transition diagram
where, a = char_set - {*} and b = char_set - {*, /}.
Regular grammarS /AA *BB aB | *CC *C | bB | /DD ε
Regular expression
C = *C + bB + /D = **(bB + /)
B = aB + ***(bB + /)
= aB + ***bB + ***/
= (a + *** b)B + ***/= (a + ***b)****/
A = *B = *(a + ***b)****/
S = /A = /* (a + ***b)****/
A program which recognizes a comment statement.
do {
while (ch != '*') ch = getchar(); ch = getchar();
} while (ch != '/');