19
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발발발 : 발발발 발발발 : 2007. 10. 6.

10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

  • Upload
    lyre

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-. 발표자 : 정영임 발표일 : 2007. 10. 6. Table of Contents. 12.1 Probabilistic Context-Free Grammars 12.2 Problems with PCFGs 12.3 Probabilistic Lexicalized CFGs. Introduction. Goal - PowerPoint PPT Presentation

Citation preview

Page 1: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

10. Lexicalized and Probabilistic Parsing-Speech and Language Processing-

발표자 : 정영임발표일 : 2007. 10. 6.

Page 2: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

2

Table of Contents

12.1 Probabilistic Context-Free Grammars12.2 Problems with PCFGs12.3 Probabilistic Lexicalized CFGs

Page 3: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

3

Introduction

Goal To build probabilistic models of sophisticated syntactic information To use this probabilistic information in efficient probabilistic parser

Use of Probabilistic Parser To disambiguation

Earley algorithm can represent the ambiguities of sentences, but it can not resolve them

Probabilistic grammar can choose the most-probable interpretation To make language model

For speech recognizer, N-gram model was used in predicting upcoming words, and helping constrain the search for words

Probabilistic version of more sophisticated grammar can provide additional predictive power to speech recognition

Page 4: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

4

12.1 Probabilistic Context-Free Grammars

Probabilistic Context-Free Grammars(PCFG) PCFG is also known as the Stochastic Context Free Grammar(SCFG) Five parameters of PCFG

5-tuple G=(N, ∑, P, S, D)1.A set of non-terminal symbols (or “variables”) N2.A set of terminal symbols ∑ (disjoint from N)3.A set of productions P, each of the form A→β, where A is a non-terminal and β is a string of symbols from the infinite set ofstrings (∑∪N)*4. A designated start symbol S5. A function assigning probabilities to each rule in P

P(A →β) or P(A→β|A)

Page 5: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

5

12.1 Probabilistic Context-Free Grammars

Sample PCFG for a miniature grammar

Page 6: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

6

12.1 Probabilistic Context-Free Grammars

Probability of a particular parse T Production of the probabilities of all the rules r used to expand

each node n in the parse tree

By the definition of conditional probability

Since a parse tree includes all the words of sentence, P(S|T) is 1

Page 7: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

7

12.1 Probabilistic Context-Free Grammars

Higher probability

Page 8: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

8

12.1 Probabilistic Context-Free Grammars

Formalization of selecting the parse with highest probability The best tree for a sentence S out of the set of parse trees for S(which w

e’ll call τ (S))

Since P(S) is constant for each tree, we can eliminate it

Since P(T,S) = P(T)

Page 9: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

9

12.1 Probabilistic Context-Free Grammars

Probability of an ambiguous sentence Sum of probabilities of all the parse trees for the sentence

Page 10: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

10

Other issue on PCFG

Prefix Jelinek and Lafferty (1991) gives an Algorithm for efficiently computing th

e probability of a prefix of sentence Stolcke (1995) describes how the standard Earley parser can be augmen

ted to compute these prefix probabilities Jurafsky et al (1995) describes an application of a version of this algorith

m as the language model for a speech recognizer

Consistent PCFG is said to be consistent if the sum of the probabilities of all senten

ces in the language equals 1 Certain kinds of recursive rules cause a grammar to be inconsistent by ca

using infinitely looping derivations for some sentences Booth and Thompson (1973) gives more details on consistent and incons

istent grammars

Page 11: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

11

Probabilistic CYK Parsing of PCFGs

Parsing problem for PCFG Can be interpretated into how to compute the most-likely parse for a give

n sentence

Algorithms for computing the most-likely parse Augmented Earley algorithm (Stolcke, 1995)

Probabilistic Earley algorithm is somewhat complex to present Probabilistic CYK(Cocke-Younger-Kasami) algorithm

CYK algorithm is worth understanding

Page 12: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

12

Probabilistic CYK Parsing of PCFGs

Probabilistic CYK(Cocke-Younger-Kasami) algorithm CYK algorithm is essentially a bottom-up parser using dynamic programm

ing table Bottom-up makes it more efficient when processing lexicalized grammar Probabilistic CYK parsing was first described by Ney(1991) CYK parsing algorithm presented here

Collins(1999), Aho and Ullman(1972)

Page 13: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

13

Probabilistic CYK Parsing of PCFGs

Input, output, and data structure of Probabilistic CYK

Page 14: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

14

Probabilistic CYK Parsing of PCFGs

Page 15: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

15

Probabilistic CYK Parsing of PCFGs

Pseudocode for Probabilistic CYK algorithm

Page 16: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

16

Learning PCFG Probabilities

Where do PCFG probabilities come from Obtaining the PCFG probabilities from Tree bank

Tree bank: a corpus of already-parsed sentences E.g.) Penn Tree bank(Marcus et al., 1993)

– Brown Corpus, Wall street Journal, parts of Switchboard corpus

Probability of each expansion of a non-terminal– By counting the number of times that expansion occurs– Normalizing

Page 17: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

17

Learning PCFG Probabilities

Where do PCFG probabilities come from Learning the PCFG probabilities

PCFG probabilities can be generated by first parsing a (raw) corpus Unambiguous sentences

– Parse the corpus– Increment a counter for every rule in the parse– Then normalize to get probabilities

Ambiguous sentences– We need to keep a separate count for each parse of a

sentence and weight each partial count by the probability of the parse it appears in

– Standard algorithm for computing this is called Inside-Outside algorithm proposed by Baker(1979)

» Cf.) Manning and Schuze(1999) for a complete description

Page 18: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

18

12.2 Problems with PCFGs

Page 19: 10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

19