Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
复旦大学大数据学院School of Data Science, Fudan University
DATA130006 Text Management and Analysis
Statistical Natural Language Parsing魏忠钰
November29th,2017
Adapted from Stanford CS124U
NLP Startups
NLP Startups
Constituency (phrase structure)
§ Phrasestructureorganizeswordsintonestedconstituents.
Parsing Tree Example
Attachment ambiguities
§ Akeyparsingdecisionishowwe‘attach’variousconstituents§ PPs,adverbialorparticipialphrases,infinitives,coordinations,etc.
§ Catalan numbers: Cn = (2n)!/[(n+1)!n!]
§ An exponentially growing series, which arises in many tree-like contexts
Two problems to solve: 1. Repeated work…
Two problems to solve: 2. Choosing the correct parse
§Wordsaregoodpredictorsofattachment
§ Moscowsentmorethan100,000soldiersintoAfghanistan…
§ SydneyWaterbreachedanagreementwithNSWHealth…
§Ourstatisticalparserswilltrytoexploitsuchstatistics.
context-free grammars (CFGs)
§G=(T,N,S,R)§ Tisasetofterminalsymbols§ Nisasetofnonterminalsymbols§ Sisthestartsymbol(S∈ N)§ Risasetofrules/productionsoftheformX® g
§ X∈ Nandg∈ (N∪ T)*
§ AgrammarGgeneratesalanguageL.
A phrase structure grammar
S® NPVPVP® VNPVP® VNPPPNP® NPNPNP® NPPPNP® NNP® ePP® PNP
peoplefishtankspeoplefishwithrods
N® peopleN® fishN® tanksN® rodsV® peopleV® fishV® tanksP® with
Many parses
Probabilistic – or stochastic – context-free grammars (PCFGs)
• G=(T,N,S,R,P)• Tisasetofterminalsymbols• Nisasetofnonterminalsymbols• Sisthestartsymbol(S∈ N)• Risasetofrules/productionsoftheformX® g• Pisaprobabilityfunction
• P:R® [0,1]•
• AgrammarGgeneratesalanguagemodelL.
€
∀X ∈ N, P(X →γ) =1X→γ ∈R∑
€
P(γ) =1γ ∈T *∑
A PCFG
S® NPVP 1.0VP® VNP 0.6VP® VNPPP 0.4NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks 0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
The rise of annotated data: The Penn Treebank((S(NP-SBJ(DTThe)(NNmove))(VP(VBDfollowed)(NP(NP(DTa)(NNround))(PP(INof)(NP(NP(JJsimilar)(NNSincreases))(PP(INby)(NP(JJother)(NNSlenders)))(PP(INagainst)(NP(NNPArizona)(JJreal)(NNestate)(NNSloans))))))
(,,)(S-ADV(NP-SBJ(-NONE- *))(VP(VBGreflecting)(NP(NP(DTa)(VBGcontinuing)(NNdecline))(PP-LOC(INin)(NP(DTthat)(NNmarket)))))))
(..)))
The probability of trees and strings
§ P(t)– Theprobabilityofatreet istheproductoftheprobabilitiesoftherulesusedtogenerateit.
§ P(s)– Theprobabilityofthestrings isthesumoftheprobabilitiesofthetreeswhichhavethatstringastheiryield
P(s)=Σj P(s,t)wheret isaparseofs
=Σj P(t)
P(t1)=1.0× 0.7× 0.4× 0.5× 0.6× 0.7× 1.0× 0.2× 1.0× 0.7× 0.1
=0.0008232
P(t2)=1.0× 0.7× 0.6× 0.5× 0.6× 0.2× 0.7× 1.0× 0.2× 1.0× 0.7× 0.1
=0.00024696
Tree and String Probabilities
• s =peoplefishtankswithrods• P(t1)=1.0× 0.7× 0.4× 0.5× 0.6× 0.7
× 1.0× 0.2× 1.0× 0.7× 0.1=0.0008232
• P(t2)=1.0× 0.7× 0.6× 0.5× 0.6× 0.2× 0.7× 1.0× 0.2× 1.0× 0.7× 0.1
=0.00024696• P(s)=P(t1)+P(t2)
=0.0008232 +0.00024696=0.00107016
Verb attach
Noun attach
• A PP is more likely attached to a VP compared to NP.
Outline
§Restricting the grammar form for efficient parsing
Chomsky Normal Form
§ AllrulesareoftheformX® YZorX® w§ X,Y,Z∈ Nandw∈ T
§ Atransformationtothisformdoesn’tchangethegenerativecapacityofaCFG§ Thatis,itrecognizesthesamelanguage
§ Butmaybewithdifferenttrees
§ Emptiesandun-aries areremovedrecursively§ n-ary rulesaredividedbyintroducingnewnon-terminals(n>2)
A phrase structure grammar
S® NPVPVP® VNPVP® VNPPPNP® NPNPNP® NPPPNP® NNP® ePP® PNP
S® VPVP® V
N® peopleN® fishN® tanksN® rodsV® peopleV® fishV® tanksP® with
Chomsky Normal Form steps
S® NPVPS® VPVP® VNPVP® VVP® VNPPPVP® VPPNP® NPNPNP® NPNP® NPPPNP® PPNP® NPP® PNPPP® P
N® peopleN® fishN® tanksN® rodsV® peopleV® fishV® tanksP® with
S® VNPS® V
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® V
S® V
VP® VNPPP
S® VNPPP
VP® VPP
S® VPP
NP® NPNP
NP® NP
NP® NPPP
NP® PP
NP® N
PP® PNP
PP® P
N® peopleN® fishN® tanksN® rodsV® peopleV® fishV® tanksP® with
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® V
VP® VNPPP
S® VNPPP
VP® VPP
S® VPP
NP® NPNP
NP® NP
NP® NPPP
NP® PP
NP® N
PP® PNP
PP® P
N® peopleN® fishN® tanksN® rodsV® peopleS® peopleV® fishS® fishV® tanksS® tanksP® with
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® VNPPP
S® VNPPP
VP® VPP
S® VPP
NP® NPNP
NP® NP
NP® NPPP
NP® PP
NP® N
PP® PNP
PP® P
N® people
N® fish
N® tanks
N® rods
V® people
S® people
VP® people
V® fish
S® fish
VP® fish
V® tanks
S® tanks
VP® tanks
P® with
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® VNPPP
S® VNPPP
VP® VPP
S® VPP
NP® NPNP
NP® NPPP
NP® PNP
PP® PNP
VP® VX
X® NPPP
X=@VP_V
NP® people
NP® fish
NP® tanks
NP® rods
V® people
S® people
VP® people
V® fish
S® fish
VP® fish
V® tanks
S® tanks
VP® tanks
P® with
PP® with
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® V@VP_V
@VP_V® NPPP
S® V@S_V
@S_V® NPPP
VP® VPP
S® VPP
NP® NPNP
NP® NPPP
NP® PNP
PP® PNP
NP® people
NP® fish
NP® tanks
NP® rods
V® people
S® people
VP® people
V® fish
S® fish
VP® fish
V® tanks
S® tanks
VP® tanks
P® with
PP® with
A phrase structure grammar
S® NPVPVP® VNPVP® VNPPPNP® NPNPNP® NPPPNP® NNP® ePP® PNP
N® peopleN® fishN® tanksN® rodsV® peopleV® fishV® tanksP® with
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® V@VP_V
@VP_V® NPPP
S® V@S_V
@S_V® NPPP
VP® VPP
S® VPP
NP® NPNP
NP® NPPP
NP® PNP
PP® PNP
NP® people
NP® fish
NP® tanks
NP® rods
V® people
S® people
VP® people
V® fish
S® fish
VP® fish
V® tanks
S® tanks
VP® tanks
P® with
PP® with
Chomsky Normal Form
§Withsomeextrabook-keepinginsymbolnames,youcanevenreconstructthesametreeswithade-transform
§ Youshouldthinkofthisasatransformationforefficientparsing
§ InpracticefullChomskyNormalFormisapain§ Reconstructingn-aries iseasy§ Reconstructingunaries/emptiesistrickier
§Binarization iscrucialforcubictimeCFGparsing
An example: before binarization…
ROOT
S
NP VP
N
people
V NP PP
P NP
rodswithtanksfish
N N
After binarization…
P NP
rods
N
with
NP
N
people tanksfish
N
VP
V NP PP
@VP_V
ROOT
S
Treebank: empties and unaries
ROOT
S-HLN
NP-SUBJ VP
VB-NONE-
e Atone
PTB Tree
ROOT
S
NP VP
VB-NONE-
e Atone
NoFuncTags
ROOT
S
VP
VB
Atone
NoEmpties
ROOT
S
Atone
NoUnaries
ROOT
VB
Atone
High Low
Outline
§Restricting the grammar form for efficient parsing
§ Exact polynomial time parsing of (P)CFGs
Constituency Parsing
fishpeoplefishtanks
RuleProb θiS® NPVP θ0NP® NPNP θ1…
N® fish θ42N® people θ43V® fish θ44…
PCFG
N N V N
VP
NPNP
S
Cocke-Kasami-Younger (CKY) Constituency Parsing
fishpeoplefishtanks
Viterbi (Max) Scores
peoplefish
NP 0.35V 0.1N 0.5
VP 0.06NP 0.14V 0.6N 0.2
S® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
S -> NPVP=0.35*0.06*0.9VP -> V NP = 0.1 *0.5 *0.14NP -> NPNP = 0.1 *0.35 *0.14
Extended CKY parsing
§Unaries canbeincorporatedintothealgorithm§ Messy,butdoesn’tincreasealgorithmiccomplexity
§ Emptiescanbeincorporated§ Doesn’tincreasecomplexity;essentiallylikeu-naries
§ Binarizationisvital§ Withoutbinarization,youdon’tgetparsingcubicinthelengthofthesentenceandinthenumberofnon-terminalsinthegrammar
The CKY algorithm (1960/1965) extended to unaries
function CKY(words, grammar) returns [most_probable_parse,prob]score = new double[#(words)+1][#(words)+1][#(nonterms)]back = new Pair[#(words)+1][#(words)+1][#nonterms]]for i=0; i<#(words); i++for A in nontermsif A -> words[i] in grammarscore[i][i+1][A] = P(A -> words[i])
//handle unariesboolean added = truewhile added added = falsefor A, B in nontermsif score[i][i+1][B] > 0 && A->B in grammarprob = P(A->B)*score[i][i+1][B]if prob > score[i][i+1][A]score[i][i+1][A] = probback[i][i+1][A] = Badded = true
The CKY algorithm (1960/1965) extended to unaries
for span = 2 to #(words)for begin = 0 to #(words)- spanend = begin + spanfor split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
//handle unariesboolean added = truewhile addedadded = falsefor A, B in nontermsprob = P(A->B)*score[begin][end][B];if prob > score[begin][end][A]score[begin][end][A] = probback[begin][end][A] = Badded = true
return buildTree(score, back)
The grammar: Binary, no epsilons,
S® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks 0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
score[0][1]
score[1][2]
score[2][3]
score[3][4]
score[0][2]
score[1][3]
score[2][4]
score[0][3]
score[1][4]
score[0][4]
0
1
2
3
4
1 2 3 4fish people fish tanks
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for i=0; i<#(words); i++for A in nontermsif A -> words[i] in grammarscore[i][i+1][A] = P(A -> words[i]);
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for i=0; i<#(words); i++for A in nontermsif A -> words[i] in grammarscore[i][i+1][A] = P(A -> words[i]);
N® fish0.2V® fish0.6
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
// handle unariesboolean added = true
while added added = falsefor A, B in nonterms
if score[i][i+1][B] > 0 && A->B in grammarprob = P(A->B)*score[i][i+1][B]if(prob > score[i][i+1][A])
score[i][i+1][A] = probback[i][i+1][A] = Badded = true
N® fish0.2V® fish0.6
N® people0.5V® people0.1
N® fish0.2V® fish0.6
N® tanks0.2V® tanks0.1
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
// handle unariesboolean added = true
while added added = falsefor A, B in nonterms
if score[i][i+1][B] > 0 && A->B in grammarprob = P(A->B)*score[i][i+1][B]if(prob > score[i][i+1][A])
score[i][i+1][A] = probback[i][i+1][A] = Badded = true
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1
N® fish0.2V® fish0.6
N® tanks0.2V® tanks0.1
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
// handle unariesboolean added = true
while added added = falsefor A, B in nonterms
if score[i][i+1][B] > 0 && A->B in grammarprob = P(A->B)*score[i][i+1][B]if(prob > score[i][i+1][A])
score[i][i+1][A] = probback[i][i+1][A] = Badded = true
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if (prob > score[begin][end][A])
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if (prob > score[begin][end][A])
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® NPVP0.00126
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if (prob > score[begin][end][A])
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® NPVP0.00126
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® NPVP0.00378
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
//handle unariesboolean added = truewhile added
added = falsefor A, B in nonterms
prob = P(A->B)*score[begin][end][B];if prob > score[begin][end][A]
score[begin][end][A] = probback[begin][end][A] = Badded = true
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
NP® NPNP0.0000686
VP® VNP0.00147
S® NPVP0.000882
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
NP® NPNP0.0000686
VP® VNP0.00147
S® NPVP0.000882
NP® NPNP0.0000686
VP® VNP0.000098
S® NPVP0.01323
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
NP® NPNP0.0000686
VP® VNP0.00147
S® NPVP0.000882
NP® NPNP0.0000686
VP® VNP0.000098
S® NPVP0.01323
NP® NPNP0.0000009604
VP® VNP0.00002058
S® NPVP0.00018522
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
CallbuildTree(score,back)togetthebestparse
Quiz Question!
runs down
NNS 0.0023VB 0.001
PP 0.2IN 0.0014NNS 0.0001
PP → IN 0.002NP → NNS NNS 0.01NP → NNS NP 0.005NP → NNS PP 0.01VP → VB PP 0.045VP → VB NP 0.015
?? ???? ??
What constituents (with what probability can you make?
Outline
§ Restrictingthegrammarformforefficientparsing§ Exactpolynomialtimeparsingof(P)CFGs§ ConstituencyParserEvaluation
Evaluating constituency parsing
Evaluating constituency parsing
Goldstandardbrackets:S-(0:11), NP-(0:2),VP-(2:9),VP-(3:9),NP-(4:6),PP-(6-9),NP-(7,9),NP-(9:10)
Candidatebrackets:S-(0:11),NP-(0:2),VP-(2:10),VP-(3:10),NP-(4:6),PP-(6-10),NP-(7,10)
LabeledPrecision 3/7=42.9%LabeledRecall 3/8=37.5%LP/LRF1 40.0%TaggingAccuracy 11/11=100.0%
How good are PCFGs?
§ PennWSJparsingaccuracy:about73%LP/LRF1
§ Robust§Usuallyadmiteverything,butwithlowprobability
§ Partialsolutionforgrammarambiguity§ APCFGgivessomeideaoftheplausibilityofaparse§ Butnotsogoodbecausetheindependenceassumptionsaretoostrong
§Giveaprobabilisticlanguagemodel§ Butinthesimplecaseitperformsworsethanatrigrammodel
§ TheproblemseemstobethatPCFGslackthelexicalizationofatrigrammodel