69
Parsing V Taylor BergKirkpatrick – CMU Slides: Dan Klein – UC Berkeley Algorithms for NLP

FA16 11-711 lecture 13-- parsing V copy 2

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FA16 11-711 lecture 13-- parsing V copy 2

Parsing  V  Taylor  Berg-­‐Kirkpatrick  –  CMU  

Slides:  Dan  Klein  –  UC  Berkeley  

 

Algorithms  for  NLP  

Page 2: FA16 11-711 lecture 13-- parsing V copy 2

BinarizaEon  /  MarkovizaEon  NP  

DT JJ NN NN

v=1,h=∞

DT

NP  

@NP[DT]  

@NP[DT,JJ,NN]   NN

JJ @NP[DT,JJ]  

NN

v=1,h=0

DT

NP  

JJ

@NP  

@NP NN

@NP

NN

v=1,h=1

DT

NP  

JJ

@NP[DT]  

@NP[…,NN]   NN

@NP[…,JJ]  

NN

Page 3: FA16 11-711 lecture 13-- parsing V copy 2

BinarizaEon  /  MarkovizaEon  NP  

DT JJ NN NN

v=2,h=∞

DT^NP

NP^VP  

JJ^NP

@NP^VP[DT]  

@NP^VP[DT,JJ,NN]   NN^NP

@NP^VP[DT,JJ]  

NN^NP

v=2,h=0

DT^NP

NP^VP  

JJ^NP

@NP  

@NP NN^NP

@NP

NN^NP

v=2,h=1

DT^NP

NP^VP  

JJ^NP

@NP^VP[DT]  

@NP^VP[…,NN]   NN^NP

@NP^VP[…,JJ]  

NN^NP

Page 4: FA16 11-711 lecture 13-- parsing V copy 2

Grammar  ProjecEons  

NP  →  DT  @NP  

Coarse Grammar Fine Grammar

DT

NP  

JJ

@NP  

@NP NN

@NP

NN

DT^NP

NP^VP  

JJ^NP

@NP^VP[DT]  

@NP^VP[…,NN]   NN^NP

@NP^VP[…,JJ]  

NN^NP

NP^VP  →  DT^NP  @NP^VP[DT]  

Note:  X-­‐Bar  Grammars  are  projec2ons  with  rules  like  XP  →  Y  @X  or  XP  →  @X  Y  or  @X  →  X  

Page 5: FA16 11-711 lecture 13-- parsing V copy 2

Grammar  ProjecEons  

NP  Coarse Symbols Fine Symbols

DT  

@NP  

NP^VP  NP^S  @NP^VP[DT]  @NP^S[DT]  @NP^VP[…,JJ]  @NP^S[…,JJ]  DT^NP  

Page 6: FA16 11-711 lecture 13-- parsing V copy 2

Efficient  Parsing  for  Structural  AnnotaEon  

Page 7: FA16 11-711 lecture 13-- parsing V copy 2

Coarse-­‐to-­‐Fine  Pruning  

… QP NP VP …

coarse:  

fine:  

E.g.  consider  the  span  5  to  12:  

< threshold P (X|i, j, S)

Page 8: FA16 11-711 lecture 13-- parsing V copy 2

Coarse-­‐to-­‐Fine  Pruning  

For  each  coarse  chart  item  X[i,j],  compute  posterior  probability:      

… QP NP VP …

coarse:  

fine:  

E.g.  consider  the  span  5  to  12:  

< threshold ↵(X, i, j) · �(X, i, j)

↵(root, 0, n)

Page 9: FA16 11-711 lecture 13-- parsing V copy 2

CompuEng  Marginals  ↵(X, i, j) =

X

X!Y Z

X

k2(i,j)

P (X ! Y Z)↵(Y, i, k)↵(Z, k, j)

Page 10: FA16 11-711 lecture 13-- parsing V copy 2

CompuEng  Marginals  �(X, i, j) =

X

Y!ZX

X

k2[0,i)

P (Y ! ZX)�(Y, k, j)↵(B, k, i)

+X

Y!XZ

X

k2(j,n]

P (Y ! XZ)�(Y, i, k)↵(Z, j, k)

Page 11: FA16 11-711 lecture 13-- parsing V copy 2

Pruning  with  A*  

§  You  can  also  speed  up  the  search  without  sacrificing  opEmality  

§  For  agenda-­‐based  parsers:  §  Can  select  which  items  to  process  first  

§  Can  do  with  any  “figure  of  merit”  [Charniak  98]  

§  If  your  figure-­‐of-­‐merit  is  a  valid  A*  heurisEc,  no  loss  of  opEmiality  [Klein  and  Manning  03]  

X

n0 i j

Page 12: FA16 11-711 lecture 13-- parsing V copy 2

Efficient  Parsing  for  Lexical  Grammars  

Page 13: FA16 11-711 lecture 13-- parsing V copy 2

Lexicalized  Trees  

§  Add  “head  words”  to  each  phrasal  node  §  SyntacEc  vs.  semanEc  

heads  §  Headship  not  in  (most)  

treebanks  §  Usually  use  head  rules,  

e.g.:  §  NP:  

§  Take  lelmost  NP  §  Take  rightmost  N*  §  Take  rightmost  JJ  §  Take  right  child  

§  VP:  §  Take  lelmost  VB*  §  Take  lelmost  VP  §  Take  lel  child  

Page 14: FA16 11-711 lecture 13-- parsing V copy 2

Lexicalized  PCFGs?  §  Problem:  we  now  have  to  esEmate  probabiliEes  like  

§  Never  going  to  get  these  atomically  off  of  a  treebank  

§  SoluEon:  break  up  derivaEon  into  smaller  steps  

Page 15: FA16 11-711 lecture 13-- parsing V copy 2

Lexical  DerivaEon  Steps  §  A  derivaEon  of  a  local  tree  [Collins  99]  

Choose a head tag and word

Choose a complement bag

Generate children (incl. adjuncts)

Recursively derive children

Page 16: FA16 11-711 lecture 13-- parsing V copy 2

Lexicalized  CKY  

bestScore(X,i,j,h) if (j = i+1) return tagScore(X,s[i]) else return max max score(X[h]->Y[h] Z[h’]) * bestScore(Y,i,k,h) * bestScore(Z,k,j,h’) max score(X[h]->Y[h’] Z[h]) * bestScore(Y,i,k,h’) * bestScore(Z,k,j,h)

Y[h] Z[h’]

X[h]

i h k h’ j

k,h’,X->YZ

(VP->VBD •)[saw] NP[her]

(VP->VBD...NP •)[saw]

k,h’,X->YZ

Page 17: FA16 11-711 lecture 13-- parsing V copy 2

QuarEc  Parsing  §  Turns  out,  you  can  do  (a  liple)  beper  [Eisner  99]  

§  Gives  an  O(n4)  algorithm  §  SEll  prohibiEve  in  pracEce  if  not  pruned  

Y[h] Z[h’]

X[h]

i h k h’ j

Y[h] Z

X[h]

i h k j

Page 18: FA16 11-711 lecture 13-- parsing V copy 2

Pruning  with  Beams  §  The  Collins  parser  prunes  with  per-­‐

cell  beams  [Collins  99]  §  EssenEally,  run  the  O(n5)  CKY  §  Remember  only  a  few  hypotheses  for  

each  span  <i,j>.  §  If  we  keep  K  hypotheses  at  each  span,  

then  we  do  at  most  O(nK2)  work  per  span  (why?)  

§  Keeps  things  more  or  less  cubic  (and  in  pracEce  is  more  like  linear!)  

§  Also:  certain  spans  are  forbidden  enErely  on  the  basis  of  punctuaEon  (crucial  for  speed)  

Y[h] Z[h’]

X[h]

i h k h’ j

Page 19: FA16 11-711 lecture 13-- parsing V copy 2

Pruning  with  a  PCFG  

§  The  Charniak  parser  prunes  using  a  two-­‐pass,  coarse-­‐to-­‐fine  approach  [Charniak  97+]  §  First,  parse  with  the  base  grammar  §  For  each  X:[i,j]  calculate  P(X|i,j,s)  

§  This  isn’t  trivial,  and  there  are  clever  speed  ups  §  Second,  do  the  full  O(n5)  CKY  

§  Skip  any  X  :[i,j]  which  had  low  (say,  <  0.0001)  posterior  §  Avoids  almost  all  work  in  the  second  phase!  

§  Charniak  et  al  06:  can  use  more  passes  §  Petrov  et  al  07:  can  use  many  more  passes  

Page 20: FA16 11-711 lecture 13-- parsing V copy 2

Results  

§  Some  results  §  Collins  99  –  88.6  F1  (generaEve  lexical)  §  Charniak  and  Johnson  05  –  89.7  /  91.3  F1  (generaEve  lexical  /  reranked)  

§  Petrov  et  al  06  –  90.7  F1  (generaEve  unlexical)  §  McClosky  et  al  06  –  92.1  F1  (gen  +  rerank  +  self-­‐train)  

Page 21: FA16 11-711 lecture 13-- parsing V copy 2

Latent  Variable  PCFGs  

Page 22: FA16 11-711 lecture 13-- parsing V copy 2

§  AnnotaEon  refines  base  treebank  symbols  to  improve  staEsEcal  fit  of  the  grammar  §  Parent  annotaEon  [Johnson  ’98]  §  Head  lexicalizaEon  [Collins  ’99,  Charniak  ’00]  §  AutomaEc  clustering?  

The  Game  of  Designing  a  Grammar  

Page 23: FA16 11-711 lecture 13-- parsing V copy 2

Latent  Variable  Grammars  

Parse Tree Sentence Parameters

...

Derivations

Page 24: FA16 11-711 lecture 13-- parsing V copy 2

Backward  

Learning  Latent  AnnotaEons  

EM  algorithm:      

  X1

X2 X7 X4

X5 X6 X3

He was right

.

§  Brackets are known §  Base categories are known §  Only induce subcategories

Just  like  Forward-­‐Backward  for  HMMs.  Forward  

Page 25: FA16 11-711 lecture 13-- parsing V copy 2

Refinement  of  the  DT  tag  

DT

DT-1 DT-2 DT-3 DT-4

Page 26: FA16 11-711 lecture 13-- parsing V copy 2

Hierarchical  refinement  

Page 27: FA16 11-711 lecture 13-- parsing V copy 2

Hierarchical  EsEmaEon  Results  

74

76

78

80

82

84

86

88

90

100 300 500 700 900 1100 1300 1500 1700

Total Number of grammar symbols

Par

sing

acc

urac

y (F

1)

Model F1 Flat Training 87.3 Hierarchical Training 88.4

Page 28: FA16 11-711 lecture 13-- parsing V copy 2

Refinement  of  the  ,  tag  §  Spli}ng  all  categories  equally  is  wasteful:  

Page 29: FA16 11-711 lecture 13-- parsing V copy 2

Adaptive Splitting

§  Want to split complex categories more §  Idea: split everything, roll back splits which

were least useful

Page 30: FA16 11-711 lecture 13-- parsing V copy 2

AdapEve  Spli}ng  Results  

Model F1 Previous 88.4 With 50% Merging 89.5

Page 31: FA16 11-711 lecture 13-- parsing V copy 2

0

5

10

15

20

25

30

35

40

NP

VP PP

ADVP S

ADJP

SBAR Q

P

WH

NP

PRN

NX

SIN

V

PRT

WH

PP SQ

CO

NJP

FRAG

NAC UC

P

WH

ADVP INTJ

SBAR

Q

RR

C

WH

ADJP X

RO

OT

LST

Number  of  Phrasal  Subcategories  

Page 32: FA16 11-711 lecture 13-- parsing V copy 2

Number  of  Lexical  Subcategories  

0

10

20

30

40

50

60

70

NNP JJ

NNS NN VBN RB

VBG VB VBD CD IN

VBZ

VBP DT

NNPS CC JJ

RJJ

S :PR

PPR

P$ MD

RBR

WP

POS

PDT

WRB

-LRB

- .EX

WP$

WDT

-RRB

- ''FW RB

S TO$

UH, ``

SYM RP LS #

Page 33: FA16 11-711 lecture 13-- parsing V copy 2

Learned  Splits  

§  Proper Nouns (NNP):

§  Personal pronouns (PRP):

NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters

NNP-15 New San Wall NNP-3 York Francisco Street

PRP-0 It He I PRP-1 it he they PRP-2 it them him

Page 34: FA16 11-711 lecture 13-- parsing V copy 2

§  RelaEve  adverbs  (RBR):  

§  Cardinal  Numbers  (CD):  

RBR-0 further lower higher RBR-1 more less More RBR-2 earlier Earlier later

CD-7 one two Three CD-4 1989 1990 1988 CD-11 million billion trillion CD-0 1 50 100 CD-3 1 30 31 CD-9 78 58 34

Learned  Splits  

Page 35: FA16 11-711 lecture 13-- parsing V copy 2

Final  Results  (Accuracy)  

≤ 40 words F1

all F1

ENG

Charniak&Johnson ‘05 (generative) 90.1 89.6

Split / Merge 90.6 90.1

GER

Dubey ‘05 76.3 -

Split / Merge 80.8 80.1

CH

N

Chiang et al. ‘02 80.0 76.6

Split / Merge 86.3 83.4

Still higher numbers from reranking / self-training methods

Page 36: FA16 11-711 lecture 13-- parsing V copy 2

Efficient  Parsing  for  Hierarchical  Grammars  

Page 37: FA16 11-711 lecture 13-- parsing V copy 2

Coarse-­‐to-­‐Fine  Inference  §  Example:  PP  apachment  

?????????

Page 38: FA16 11-711 lecture 13-- parsing V copy 2

Hierarchical  Pruning  

… QP NP VP … coarse:

split in two: … QP1 QP2 NP1 NP2 VP1 VP2 …

… QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 … split in four:

split in eight: … … … … … … … … … … … … … … … … …

Page 39: FA16 11-711 lecture 13-- parsing V copy 2

Bracket  Posteriors  

Page 40: FA16 11-711 lecture 13-- parsing V copy 2

1621  min  111  min  35  min  

15  min  (no  search  error)  

Page 41: FA16 11-711 lecture 13-- parsing V copy 2

Other  SyntacEc  Models  

Page 42: FA16 11-711 lecture 13-- parsing V copy 2

Dependency  Parsing  

§  Lexicalized  parsers  can  be  seen  as  producing  dependency  trees  

§  Each  local  binary  tree  corresponds  to  an  apachment  in  the  dependency  graph  

questioned

lawyer witness

the the

Page 43: FA16 11-711 lecture 13-- parsing V copy 2

Dependency  Parsing  

§  Pure  dependency  parsing  is  only  cubic  [Eisner  99]  

§  Some  work  on  non-­‐projec2ve  dependencies  §  Common  in,  e.g.  Czech  parsing  §  Can  do  with  MST  algorithms  [McDonald  and  Pereira  05]  

Y[h] Z[h’]

X[h]

i h k h’ j

h h’

h

h k h’

Page 44: FA16 11-711 lecture 13-- parsing V copy 2

Shil-­‐Reduce  Parsers  

§  Another  way  to  derive  a  tree:  

§  Parsing  §  No  useful  dynamic  programming  search  §  Can  sEll  use  beam  search  [Ratnaparkhi  97]  

Page 45: FA16 11-711 lecture 13-- parsing V copy 2

Tree  InserEon  Grammars  

§  Rewrite  large  (possibly  lexicalized)  subtrees  in  a  single  step  

§  Formally,  a  tree-­‐inser2on  grammar  §  DerivaEonal  ambiguity  whether  subtrees  were  generated  atomically  

or  composiEonally  §  Most  probable  parse  is  NP-­‐complete  

Page 46: FA16 11-711 lecture 13-- parsing V copy 2

TIG:  InserEon  

Page 47: FA16 11-711 lecture 13-- parsing V copy 2

Tree-­‐adjoining  grammars  

§  Start  with  local  trees  §  Can  insert  structure  

with  adjunc2on  operators  

§  Mildly  context-­‐sensiEve  

§  Models  long-­‐distance  dependencies  naturally  

§  …  as  well  as  other  weird  stuff  that  CFGs  don’t  capture  well  (e.g.  cross-­‐serial  dependencies)  

Page 48: FA16 11-711 lecture 13-- parsing V copy 2

TAG:  Long  Distance  

Page 49: FA16 11-711 lecture 13-- parsing V copy 2

CCG  Parsing  

§  Combinatory  Categorial  Grammar  §  Fully  (mono-­‐)  

lexicalized  grammar  §  Categories  encode  

argument  sequences  §  Very  closely  related  

to  the  lambda  calculus  (more  later)  

§  Can  have  spurious  ambiguiEes  (why?)  

Page 50: FA16 11-711 lecture 13-- parsing V copy 2

Empty  Elements  

Page 51: FA16 11-711 lecture 13-- parsing V copy 2

Empty  Elements  

Page 52: FA16 11-711 lecture 13-- parsing V copy 2

Empty  Elements  §  In  the  PTB,  three  kinds  of  empty  elements:  

§  Null  items  (usually  complemenEzers)  §  DislocaEon  (WH-­‐traces,  topicalizaEon,  relaEve  clause  and  heavy  NP  extraposiEon)  

§  Control  (raising,  passives,  control,  shared  argumentaEon)  

§  Need  to  reconstruct  these  (and  resolve  any  indexaEon)  

Page 53: FA16 11-711 lecture 13-- parsing V copy 2

Example:  English  

Page 54: FA16 11-711 lecture 13-- parsing V copy 2

Example:  German  

Page 55: FA16 11-711 lecture 13-- parsing V copy 2

Types  of  EmpEes  

Page 56: FA16 11-711 lecture 13-- parsing V copy 2

A  Papern-­‐Matching  Approach  §  [Johnson  02]  

Page 57: FA16 11-711 lecture 13-- parsing V copy 2

Papern-­‐Matching  Details  §  Something  like  transformaEon-­‐based  learning  §  Extract  paperns  

§  Details:  transiEve  verb  marking,  auxiliaries  §  Details:  legal  subtrees  

§  Rank  paperns  §  Pruning  ranking:  by  correct  /  match  rate  §  ApplicaEon  priority:  by  depth  

§  Pre-­‐order  traversal  §  Greedy  match  

Page 58: FA16 11-711 lecture 13-- parsing V copy 2

Top  Paperns  Extracted  

Page 59: FA16 11-711 lecture 13-- parsing V copy 2

Results  

Page 60: FA16 11-711 lecture 13-- parsing V copy 2

SemanEc  Roles  

Page 61: FA16 11-711 lecture 13-- parsing V copy 2

SemanEc  Role  Labeling  (SRL)  

§  Characterize  clauses  as  rela2ons  with  roles:  

§  Says  more  than  which  NP  is  the  subject  (but  not  much  more):  §  RelaEons  like  subject  are  syntacEc,  relaEons  like  agent  or  message  are  

semanEc  §  Typical  pipeline:  

§  Parse,  then  label  roles  §  Almost  all  errors  locked  in  by  parser  §  Really,  SRL  is  quite  a  lot  easier  than  parsing  

Page 62: FA16 11-711 lecture 13-- parsing V copy 2

SRL  Example  

Page 63: FA16 11-711 lecture 13-- parsing V copy 2

PropBank  /  FrameNet  

§  FrameNet:  roles  shared  between  verbs  §  PropBank:  each  verb  has  its  own  roles  §  PropBank  more  used,  because  it’s  layered  over  the  treebank  (and  so  has  

greater  coverage,  plus  parses)  §  Note:  some  linguisEc  theories  postulate  fewer  roles  than  FrameNet  (e.g.  

5-­‐20  total:  agent,  paEent,  instrument,  etc.)  

Page 64: FA16 11-711 lecture 13-- parsing V copy 2

PropBank  Example  

Page 65: FA16 11-711 lecture 13-- parsing V copy 2

PropBank  Example  

Page 66: FA16 11-711 lecture 13-- parsing V copy 2

PropBank  Example  

Page 67: FA16 11-711 lecture 13-- parsing V copy 2

Shared  Arguments  

Page 68: FA16 11-711 lecture 13-- parsing V copy 2

Path  Features  

Page 69: FA16 11-711 lecture 13-- parsing V copy 2

Results  

§  Features:  §  Path  from  target  to  filler  §  Filler’s  syntacEc  type,  headword,  case  §  Target’s  idenEty  §  Sentence  voice,  etc.  §  Lots  of  other  second-­‐order  features  

§  Gold  vs  parsed  source  trees  

§  SRL  is  fairly  easy  on  gold  trees  

§  Harder  on  automaEc  parses