제 4 장 어휘 분석

컴파일러 입문

SourceProgram Lexical Analyzer Token

Stream

4.1 서 론

Lexical Analysis the process by which the compiler groups certain

strings of characters into individual tokens.

Lexical Analyzer Scanner Lexer

Text p.130

Token 문법적으로 의미 있는 최소 단위

Token - a single syntactic entity(terminal symbol). Token Number - string 처리의 효율성 위한 integer number. Token Value - numeric value or string value.

ex) if ( a > 10 ) ...

Token Number : 32 7 4 25 5 8 Token Value : 0 0 ‘a’ 0 10 0

Token classes Special form - language designer

1. Keyword --- const, else, if, int, ...2. Operator symbols --- +, -, *, /, ++, -- etc.3. Delimiters --- ;, ,, (, ), [, ] etc.

General form - programmer4. identifier --- stk, ptr, sum, ...5. constant --- 526, 3.0, 0.1234e-10, ‘c’, “string” etc.

Token Structure - represented by regular expres-sion.

ex) id = (l + _)( l + d + _)*

Interaction of Lexical Analyzer with Parser Lexical Analyzer is the procedure of Syntax Ana-

lyzer. L.A. Finite Automata. S.A. Pushdown Automata.

Token type scanner 가 parser 에게 넘겨주는 토큰 형태 .

(token number, token value)

ex) if ( x > y ) x = 10 ; (32,0) (7,0) (4,x) (25,0) (4,y) (8,0) (4,x) (23,0) (5,10) (20,0)

SourceProgram

Lexical Analyzer(=Scanner)

Shift(get-token)ReduceAcceptError

Syntax Analyzer(=Parser)

get token

The reasons for separating the analysis phase of compiling into lexical analysis(scanning) and syntax analysis(parsing).

1. modular construction - simpler design.2. compiler efficiency is improved.3. compiler portability is enhanced.

Parsing table Parser 의 행동 (Shift, Reduce, Accept, Error) 을 결정 .

Token number 는 Parsing table 의 index.

Tokennum State

Symbol table 의 용도 L.A 와 S.A 시 identifier 에 관한 정보를 수집하여 저장 . Semantic analysis 와 Code generation 시에 사용 . name + attributes

ex) Hashed symbol table

- chapter 12 참조

attributesname

symbol tablebucket

4.2 토큰 인식

Specification of token structure - RE Specification of PL - CFG Scanner design steps

1. describe the structure of tokens in re.2. or, directly design a transition diagram for the tokens.3. and program a scanner according to the diagram.4. moreover, we verify the scanner action through regular

language theory. Character classification

letter : a | b | c... | z | A | B | C |…| Z l digit : 0 | 1 | 2... | 9 d special character : + | - | * | / | . | , | ...

S Astartl, _

l, d, _

4.2.1 Identifier Recognition

Transition diagram

Regular grammar S lA | _A A lA | dA | _A | ε

Regular expression S = lA + _A = (l + _)A A = lA + dA + _A + ε = (l + d + _)A + ε = (l + d + _)*

S = (l + _)( l + d + _)*

Form : 10 진수 , 8 진수 , 16 진수로 구분되어진다 . 10 진수 : 0 이 아닌 수 시작

8 진수 : 0 으로 시작 , 16 진수 : 0x, 0X 로 시작

Transition diagram

4.2.2 Integer number Recognition

n : non-zero digito : octal digit h : hexa digit

Regular grammar S nA | 0B A dA | ε B oC | xD | XD | ε

C oC | ε D hE E hE | ε

Regular expression E = hE + ε = h*ε = h* D = hE = hh* = h+

C = oC + ε = o* B = oC + xD + XD + ε = o+ + (x + X)D = o+ + (x + X)h+ + ε A = dA + ε = d*

S = nA + 0B = nd* + 0(o+ + (x + X)h+ + ε) = nd* + 0 + 0o+ + 0(x + X)h+

∴ S = nd* + 0 + 0o+ + 0(x + X)h+

start o

A B D E

4.2.3 Real number Recognition

Form : Fixed-point number & Floating-point number Transition diagram

Regular expressionE = dE + ε = d* F = dE = dd* = d+ G = dE = dd* = d+ D = dE + '+'F + -G = dd* + '+'d+ + -d + = d+ + '+'d+ + -d+ = (ε + '+' +-)d +

C = dC + eD + ε = dC+e(ε + '+' +-)d+ + e = d*(e(ε + '+' +-) d+ + ε)B = dC=dd*(e(ε + '+' +-)d+ +ε) = d++(e(ε + '+' +-) d+ +ε) A = dA + .B = d*.d+(e(ε + '+' +-)d+ + ε) S = dA = dd*. d+(e(ε + '+' +-) d+ +ε) = d+.d+(e(ε + '+' +-) d+ + ε) = d+.d++ d+.d+e(ε + '+' +-) d+

참고 Terminal + 를 ‘ +’ 로 표기 .

Form : a sequence of characters between a pair of double quotes.

Transition diagram

where, a = char_set - {", \} and c = char_set

Regular grammar S "A A aA | "B | \C B ε C cA

Bstart " "A

4.2.4 String Constant Recognition

Regular expression

A = aA + " B + \C = aA + " + \cA = (a + \c)A + " = (a + \c)* "

S = " A = "(a + \c)*"

∴ S = "(a + \c)* "

start S /*A DB C

4.2.5 Comment Recognition

Transition diagram

where, a = char_set - {*} and b = char_set - {*, /}. Regular grammar

S /AA *BB aB | *CC *C | bB | /DD ε

Regular expressionC = *C + bB + /D = **(bB + /)

B = aB + ***(bB + /)

= aB + ***bB + ***/

= (a + *** b)B + ***/= (a + ***b)****/

A = *B = *(a + ***b)****/

S = /A = /* (a + ***b)****/

A program which recognizes a comment statement.

do { while (ch != '*') ch = getchar(); ch = getchar();} while (ch != '/');

제 4 장 어휘 분석

Documents

한국어 어휘 교수 - 학습의 표준 방안

토픽 어휘 목록_공개 목록

제 8 장 커널 & 파일 시스템 분석

2 장 . 컬러 공간 분석

21회+중급 1교시(어휘-쓰기b) (1)

5.1 서 론 5.2 컴파일러 일반적 구성 5.3 컴파일러 자동화 도구 5.4 어휘 분석 5.5 구문 분석 구문 분석 방법 구문 분석기의 출력 Top-down 방법

1 장 . 자료구조의 개요 1 절 . 자료구조와 알고리즘 2 절 . 추상데이터 타입 3 절 . 성능 분석

17회중급 1교시(어휘-쓰기b)

교재 : 알고리즘 , 도경구 역 , 사이텍미디어 제 1 장 알고리즘 : 효율 , 분석 , 차수

제 6 장 구문 분석

제 2 장 현황조사 및 분석 - suwon.go.krž¥-현… · 제 2 장 현황조사 및 분석 9 [그림 2.6] 수원시 하천 및 저수지 현황 구 분 공원별 개소수 면적(천m2)

21회+고급 1교시(어휘-쓰기b)-최종

환경시스템 분석 제 6 장 . 하천의 고전적 오염물질

Disclaimers-space.snu.ac.kr/bitstream/10371/120672/1/000000005486.pdf · 어휘 능력은 지식, 태도, 기능의 세 영역으로 크게 나누어진다. 이 큰 틀을 어휘

24회 중급 1교시(어휘-쓰기b)

주제별 어휘 사전의 바탕을 확립한 로제의 ≪시소러스≫ · 2017-10-10 · 113 주제별 어휘 사전의 바탕을 확립한 로제의 ≪시소러스≫ 배연경

22회 중급 1교시(어휘-쓰기b)

medicine.gachon.ac.krmedicine.gachon.ac.kr/img/2028_report.pdf제장 과 발전방향 발전목표 제 장 대학운영체계 영역 발전계획 현황 분석 대학운영체계 영역

제 7 장 LL 구문 분석 컴파일러 입문 Deterministic Top-Down Parsing ::= deterministic selection of production rules to be applied in top-down syntax analysis

Disclaimer - Seoul National Universitys-space.snu.ac.kr/bitstream/10371/138247/1/000000145552.pdf · 2019-11-14 · 제 Ⅵ 장 분석 ... ② 예비타당성조사운용지침 제2조에서는