28
7장 파싱

7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

제7장 파싱

Page 2: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

파싱의개요

• 파싱 (Parsing)

– 입력문장의 구조를분석하는 과정

• 문법 (grammar)

– 언어에서 허용되는 문장의구조를 정의하는체계

• 파싱기법 (parsing techniques)

– 문장의구조를 문법에따라 분석하는과정

– 차트파싱 (Chart Parsing)

– …

2

Page 3: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

문장의구조와트리

• 문장 : “John ate the apple.”

• Tree Representation List Representation

(S (NP (N John))(VP (V ate)

(NP (DET the)(N apple)) ) )

• 의미 (meaning)– S는 NP와 VP로이루어졌다.– NP는 NAME인 “John”으로 이루어졌다.– VP는 VERB인 “ate”와 다른 NP로이루어졌다.– NP는 DET인 “the”와 NOUN인 “apple”로 이루어졌다.

3

S

NP VP

N V NP

John ate

DET N

the apple

Page 4: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

문맥자유문법 (Context-Free Grammar)

• 문법의구성요소

– 단어및품사기호 (terminals)

• ate, the, apple 등

• V, DET, N 등

– 구문기호 (nonterminals)

• NP, VP, S 등

– 문법규칙 (productions)

S NP VP

NP N | DET N

VP V | V NP | V PP

4

Page 5: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

하향식파싱

• 하향식파싱 (Top-Down Parsing)– 문장기호 S로부터 입력 문장방향으로 진행

– 문법규칙의 LHS (left-hand side) 기호를 RHS (right-hand side) 기호로대체하는 과정의 반복

• 하향식파싱의예 (leftmost derivation)

S NP VP N VP John VP John V NP John ate NP John ate DET N John ate the N John ate the apple

5

G : S NP VP

NP N | DET N

VP V NP

N John

DET the

V ate

N apple

Input Sentence :

John ate the apple

Page 6: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

하향식파싱과정• Grammar G

S NP VPNP NNP DET NVP V NPN JohnV ateDET theN apple

• Input Sentence

John ate the apple

6

S

NP VP

N

John

V NP

ate

DET N

the apple

Page 7: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

상향식파싱

• 상향식파싱 (Bottom-Up Parsing)– 입력문장으로부터 문법 기호 S 방향으로 진행

– 문법규칙의 RHS를 LHS로 대체하는 과정의 반복

• 상향식파싱의예 (reverse rightmost derivation)

John ate the apple├ N ate the apple├ NP ate the apple├ NP V the apple├ NP V DET apple├ NP V DET N├ NP V NP├ NP VP├ S

7

G : S NP VP

NP N | DET N

VP V NP

N John

DET the

V ate

N apple

Input Sentence :

John ate the apple

Page 8: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

상향식파싱과정

• Grammar G

S NP VPNP NNP DET NVP V NPN JohnV ateDET theN apple

• Input Sentence

John ate the apple

8

S

NP

N

VP

V NP

DET

John ate the apple

N

Page 9: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

자연언어의중의성 (1)

• 구조적 중의성 (Structural Ambiguity)

– 하나의문장이다수의 구조로해석될수있는성질

• 구조중의성의 예

G : S NP VP Input Sentence :NP N | DET N | NP PP John saw Mary in the park.VP V NP | VP PPPP P NP

9

S

John saw Mary in the park

N V N P DET N

NPNP NP

PPVP

VP

S

John saw Mary in the park

N V N P DET N

NP

NP NP

PP

VP

NP

Page 10: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

자연언어의중의성 (2)

• 어휘적 중의성 (Lexical Ambiguity)

– 하나의단어가복수의 품사로서사용되는경우

– 어휘적중의성으로구조적 중의성발생

• 어휘적 중의성의 예

G : S NP VP Input Sentence :NP D N | A N | N Time flies like an arrowVP V | VP NP | VP PPPP P NP

10

S

Time files like an arrow

N V P D N

NP NP

VP

PP

S

Time files like an arrow

A N V D N

NP NP

VP

Page 11: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

차트파싱

• 차트 (chart)– 파싱의진행과정을기록하는테이블– Bookkeeping mechanism– Keep track of constituents that were built up during part of

parse, but may be used by other rules

• 차트파싱 (chart parsing)– 차트를이용하는파싱

• Backtracking에 의해동일한분석을 반복하는 overhead 제거

– 구체적인 parsing strategy에대해서는 no comments• top-down or bottom-up• left-to-right, right-to-left, or island-driven

– 일반적인 CFG parsing algorithm (CYK, Early algorithm 등) 이용

11

Page 12: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

차트파싱의장점

• A Grammar GG : S NP VP NP DET N

S NP VP PP NP NP PPVP V PP P NP

• Sentence : The rabbit with a saw nibbled on an orange

• Traditional Parsing (with backtracking)

– S NP VP 규칙을적용하여실패할경우 backtracking한 후,

– S NP VP PP 규칙을적용하여파싱• 이규칙에서 NP와 VP는 S NP VP 규칙에서분석했던내용과동일한데도

처음부터다시분석해야함 (비효율적)

• 차트파싱

– S NP VP 규칙을 적용하여실패하였다고해도, 부분결과로만들어진 NP, VP 구조를버리지않고 chart에기록해둠

– S NP VP PP 규칙에서 NP, VP는새로분석할필요없이 chart에기록된내용을그대로이용

12

Page 13: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

차트파싱알고리즘

• CYK algorithm

–가장기본적인차트파싱알고리즘

– Bottom-up 방식

– Complexity: O(n^3)

• Early algorithm

– CYK algorithm을개선한차트파싱알고리즘

• 필요없는구성성분들이덜나오도록!

– Bottom-up + Top down 방식

– Complexity: O(n^3)

13

Page 14: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

The CYK Algorithm

• The membership problem:– Problem:

• Given a context-free grammar G and a string w– G = (V, ∑ ,P , S) where

» V finite set of variables

» ∑ (the alphabet) finite set of terminal symbols

» P finite set of rules

» S start symbol (distinguished element of V)

» V and ∑ are assumed to be disjoint

– G is used to generate the string of a language

– Question: • Is w in L(G)?

Page 15: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

The CYK Algorithm

• J. Cocke

• D. Younger,

• T. Kasami

– Independently developed an algorithm to answer this question.

Page 16: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

The CYK Algorithm Basics

– The Structure of the rules in a Chomsky Normal Form grammar

– Uses a “dynamic programming” or “table-filling algorithm”

Page 17: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

Chomsky Normal Form

• Normal Form is described by a set of conditions that each rule in the grammar must satisfy

• Context-free grammar is in CNF if each rule has one of the following forms:– A BC at most 2 symbols on right side

– A a, or terminal symbol

– S λ null string

–where B, C Є V – {S}

Page 18: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

Construct a Triangular Table

• Each row corresponds to one length of substrings

–Bottom Row – Strings of length 1

–Second from Bottom Row – Strings of length 2

– .

– .

–Top Row – string ‘w’

Page 19: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

Construct a Triangular Table

•Xi, i is the set of variables A such that

• A wi is a production of G

•Compare at most n pairs of previously computed sets:

–(Xi, i , Xi+1, j ), (Xi, i+1 , Xi+2, j ) … (Xi, j-1 , Xj, j )

–e.g. i=1, j=5

–(X1,1 , X2,5 ), (X1,2 , X3,5 ) … (X1,4 , X5,5 )

Page 20: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

Construct a Triangular Table

X1, 5

X1, 4 X2, 5

X1, 3 X2, 4 X3, 5

X1, 2 X2, 3 X3, 4 X4, 5

X1, 1 X2, 2 X3, 3 X4, 4 X5, 5

w1 w2 w3 w4 w5

Table for string ‘w’ that has length 5

Page 21: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

X1, 5

X1, 4 X2, 5

X1, 3 X2, 4 X3, 5

X1, 2 X2, 3 X3, 4 X4, 5

X1, 1 X2, 2 X3, 3 X4, 4 X5, 5

w1 w2 w3 w4 w5

Construct a Triangular Table

Looking for pairs to compare

Page 22: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

Example CYK Algorithm

• Show the CYK Algorithm with the following example:– CNF grammar G

• S NP VP

• NP DET NP | NP NP | time | flies | arrow

• VP VP NP | VP PP | flies | like

• PP P NP

• DET an

• P like

– w is "time flies like an arrow"

– Question Is "time flies like an arrow" in L(G)?

Page 23: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

Constructing The Triangular Table

{NP} (X1,1) {NP,VP} (X2,2) {VP,P} (X3,3) {DET} (X4,4) {NP} (X5,5)

time (1) flies (2) like (3) an (4) arrow (5)

Calculating the Bottom ROW: Xi, i

Page 24: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

Constructing The Triangular Table

{NP,S} (X1,2) {S} (X2,3) {} (X3,4) {NP} (X4,5)

{NP} {NP,VP} {VP,P} {DET} {NP}

time (1) flies (2) like (3) an (4) arrow (5)

X1,2: (X1,1 , X2,2 )

X2,3: (X2,2 , X3,3 )

X3,4: (X3,3 , X4,4 )

X4,5: (X4,4 , X5,5 )

Page 25: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

Constructing The Triangular Table

{S} (X1,3) {} (X2,4) {VP,PP} (X3,5)

{NP,S} {S} {} {NP}

{NP} {NP,VP} {VP,P} {DET} {NP}

time (1) flies (2) like (3) an (4) arrow (5)

X1,3: (X1,1 , X2,3 ), (X1,2 , X3,3 )

X3,5: (X3,3 , X4,5 ), (X3,4 , X5,5 )

Page 26: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

Constructing The Triangular Table

{} (X1,4) {S,VP} (X2,5)

{S} {} {VP,PP}

{NP,S} {S} {} {NP}

{NP} {NP,VP} {VP,P} {DET} {NP}

time (1) flies (2) like (3) an (4) arrow (5)

X1,4: (X1,1 , X2,4 ), (X1,2 , X3,4 ) , (X1,3 , X4,4 )

X2,5: (X2,2 , X3,5 ), (X2,3 , X4,5 ) , (X2,4 , X5,5 )

Page 27: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

Constructing The Triangular Table

{S} (X1,5)

{} {S,VP}

{S} {} {VP,PP}

{NP,S} {S} {} {NP}

{NP} {NP,VP} {VP,P} {DET} {NP}

time (1) flies (2) like (3) an (4) arrow (5)

X1,5: (X1,1 , X2,5 ), (X1,2 , X3,5 ) , (X1,3 , X4,5 ) , (X1,4 , X5,5 )

Page 28: 7장파싱 - Kangwonleeck/NLP/07_parsing.pdf · 2019-11-20 · 차트파싱 •차트(chart) –파싱의진행과정을기록하는테이블 –Bookkeeping mechanism –Keep track

CYK algorithm: Pseudocode•let the input be a string S consisting of n characters: a1 ... an.

•let the grammar contain r nonterminal symbols R1 ... Rr.

•This grammar contains the subset Rs which is the set of start symbols. letP[n,n,r] be an array of booleans.

•Initialize all elements of P to false.

•for each i = 1 to n

• for each unit production Rj ai

• set P[i,1,j] = true

•for each i = 2 to n -- Length of span

• for each j = 1 to n-i+1 -- Start of span

• for each k = 1 to i-1 -- Partition of span

• for each production RA RB RC

• if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true

•if any of P[1,n,x] is true (x is iterated over the set s, where s are all the indices for Rs) then

• S is member of language

•else