10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-

10. Lexicalized and Probabilistic Parsing-Speech and Language Processing-

발표자 : 정영임발표일 : 2007. 10. 6.

2

Table of Contents

12.1 Probabilistic Context-Free Grammars12.2 Problems with PCFGs12.3 Probabilistic Lexicalized CFGs

3

Introduction

Goal To build probabilistic models of sophisticated syntactic information To use this probabilistic information in efficient probabilistic parser

Use of Probabilistic Parser To disambiguation

Earley algorithm can represent the ambiguities of sentences, but it can not resolve them

Probabilistic grammar can choose the most-probable interpretation To make language model

For speech recognizer, N-gram model was used in predicting upcoming words, and helping constrain the search for words

Probabilistic version of more sophisticated grammar can provide additional predictive power to speech recognition

4

12.1 Probabilistic Context-Free Grammars

Probabilistic Context-Free Grammars(PCFG) PCFG is also known as the Stochastic Context Free Grammar(SCFG) Five parameters of PCFG

5-tuple G=(N, ∑, P, S, D)1.A set of non-terminal symbols (or “variables”) N2.A set of terminal symbols ∑ (disjoint from N)3.A set of productions P, each of the form A→β, where A is a non-terminal and β is a string of symbols from the infinite set ofstrings (∑∪N)*4. A designated start symbol S5. A function assigning probabilities to each rule in P

P(A →β) or P(A→β|A)

5


Sample PCFG for a miniature grammar

6


Probability of a particular parse T Production of the probabilities of all the rules r used to expand

each node n in the parse tree

By the definition of conditional probability

Since a parse tree includes all the words of sentence, P(S|T) is 1

7


Higher probability

8


Formalization of selecting the parse with highest probability The best tree for a sentence S out of the set of parse trees for S(which w

e’ll call τ (S))

Since P(S) is constant for each tree, we can eliminate it

→

→

Since P(T,S) = P(T)

9


Probability of an ambiguous sentence Sum of probabilities of all the parse trees for the sentence

10

Other issue on PCFG

Prefix Jelinek and Lafferty (1991) gives an Algorithm for efficiently computing th

e probability of a prefix of sentence Stolcke (1995) describes how the standard Earley parser can be augmen

ted to compute these prefix probabilities Jurafsky et al (1995) describes an application of a version of this algorith

m as the language model for a speech recognizer

Consistent PCFG is said to be consistent if the sum of the probabilities of all senten

ces in the language equals 1 Certain kinds of recursive rules cause a grammar to be inconsistent by ca

using infinitely looping derivations for some sentences Booth and Thompson (1973) gives more details on consistent and incons

istent grammars

11

Probabilistic CYK Parsing of PCFGs

Parsing problem for PCFG Can be interpretated into how to compute the most-likely parse for a give

n sentence

Algorithms for computing the most-likely parse Augmented Earley algorithm (Stolcke, 1995)

Probabilistic Earley algorithm is somewhat complex to present Probabilistic CYK(Cocke-Younger-Kasami) algorithm

CYK algorithm is worth understanding

12


Probabilistic CYK(Cocke-Younger-Kasami) algorithm CYK algorithm is essentially a bottom-up parser using dynamic programm

ing table Bottom-up makes it more efficient when processing lexicalized grammar Probabilistic CYK parsing was first described by Ney(1991) CYK parsing algorithm presented here

Collins(1999), Aho and Ullman(1972)

13


Input, output, and data structure of Probabilistic CYK

14


15


Pseudocode for Probabilistic CYK algorithm

16

Learning PCFG Probabilities

Where do PCFG probabilities come from Obtaining the PCFG probabilities from Tree bank

Tree bank: a corpus of already-parsed sentences E.g.) Penn Tree bank(Marcus et al., 1993)

– Brown Corpus, Wall street Journal, parts of Switchboard corpus

Probability of each expansion of a non-terminal– By counting the number of times that expansion occurs– Normalizing

17

Learning PCFG Probabilities

Where do PCFG probabilities come from Learning the PCFG probabilities

PCFG probabilities can be generated by first parsing a (raw) corpus Unambiguous sentences

– Parse the corpus– Increment a counter for every rule in the parse– Then normalize to get probabilities

Ambiguous sentences– We need to keep a separate count for each parse of a

sentence and weight each partial count by the probability of the parse it appears in

– Standard algorithm for computing this is called Inside-Outside algorithm proposed by Baker(1979)

» Cf.) Manning and Schuze(1999) for a complete description

18

12.2 Problems with PCFGs

19

Documents

10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-