33

E zien - UZH

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

1

Institutf�urComputerlinguistikderUniZ�urich:EÆzienteAnalyse

unbeschr�ankterTexte

Vorlesung3:TreebanksandDependencyGrammar

GeroldSchneider

IFI,Universit�atZ�urich

gschneid@i�.unizh.ch

3.November2003

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

2

Contents

1.Treebanks

(a)Tagging

(b)BasicClauseStructure

(c)Sentences

(d)VPs

(e)NPsandPPs

(f)GrammaticalFunctionLabels

2.DependencyGrammar

(a)DependencyandConstituency

(b)Robustness

(c)Headedness

(d)Projection

(e)Functionalism

(f)Long-DistanceDependencies

(g)DependencyandProbabilistic

Grammars

3.Conclusions

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

3

1

Treebanks

Treebanksaresyntacticallyannotatedcorpora.Theannotationisas

theory-neutralaspossible:

�NoX-bartheory,nointermediatecategories

�NoCP,IP,DP(ase.g.inGovernment&Binding)

�Auxiliary-mainverbrelationsareexpressedbyVPreduplication

�ThetopnodeofasentenceisS,notaverbalprojection(asin

HPSG,LFG,DG)

�Functionalroles(usedbyLFGandDG)areonlypartially

annotated.

�Carefuluseofemptycategoriesandco-indexation.

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

4

ThefollowingTreebankslidesarepartlycopiedfromtheTreebank

documentationat

http://www.cis.upenn.edu/~treebank/home.html

suchas(agoodintroductiontotheTreebankgenerally)[MSM93]

ftp://ftp.cis.upenn.edu/pub/treebank/doc/cl93.ps.gz

and(agoodintroductiontoTreebank-IIannotation)[MKM+94]

ftp://ftp.cis.upenn.edu/pub/treebank/doc/arpa94.ps.gz

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

5

1.1

Tagging

Simpli�edtagsetwithonly36tags.Alltags,inalphabeticalorder:

1.

CC

Coordinatingconjunction

2.

CD

Cardinalnumber

3.

DT

Determiner

4.

EX

Existentialthere

5.

FW

Foreignword

6.

IN

Prepositionorsubordinatingconjunction

7.

JJ

Adjective

8.

JJR

Adjective,comparative

9.

JJS

Adjective,superlative

10.

LS

Listitem

marker

11.

MD

Modal

12.

NN

Noun,singularormass

13.

NNS

Noun,plural

14.

NNP

Propernoun,singular

15.

NNPS

Propernoun,plural

16.

PDT

Predeterminer

17.

POS

Possessiveending

18.

PRP

Personalpronoun

19.

PRP$

Possessivepronoun

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

6

20.

RB

Adverb

21.

RBR

Adverb,comparative

22.

RBS

Adverb,superlative

23.

RP

Particle

24.

SYM

Symbol

25.

TO

to

26.

UH

Interjection

27.

VB

Verb,baseform

28.

VBD

Verb,pasttense

29.

VBG

Verb,gerundorpresentparticiple

30.

VBN

Verb,pastparticiple

31.

VBP

Verb,non-3rdpersonsingularpresent

32.

VBZ

Verb,3rdpersonsingularpresent

33.

WDT

Wh-determiner

34.

WP

Wh-pronoun

35.

WP$

Possessivewh-pronoun

36.

WRB

Wh-adverb

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

7

1.2

BasicClauseStructure

Brackettedstructure:

((S

(NP-SBJ

(NP(NNPPierre)(NNPVinken))

(,,)

(ADJP

(NP(CD61)(NNSyears))

(JJold))

(,,))

(VP(MDwill)

(VP(VBjoin)

(NP(DTthe)(NNboard))

(PP-CLR(INas)

(NP(DTa)(JJnonexecutive)(NNdirector)))

(NP-TMP(NNPNov.)(CD29))))

(..)))

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

8

Equvivalenttree(simpli�ed)

S

NP-SBJ

NP

NNP

Pierre

NNP

Vinken

ADJP

NP

CD6

1

NNS

years

JJ

old

VP

MD

will

VP

VB

join

NP

DT

the

NN

board

PP-CLR

INas

NP

DTa

NN

director

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

9

1.3

Sentences:S,SINV,SBAR,SBARQ,SQ,

S-CLF,FRAG.

Simpledeclarativesentences:

(S(NP-SBJCasey)

(VPthrew

(NPtheball)))

Passives:

Thesurfacesubjectistagged-SBJ,thepassivetraceisindicatedwith(NP*)

andcoindexedtothesurfacesubject,theby-phraseisachildofVP,andthe

logicalsubjectistagged-LGS.(Notethatthe-LGStaggoesontheNPandnot

onthePPoftheby-phrase.)

(S(NP-SBJ-1Theball)

(VPwas

(VPthrown

(NP*-1)

(PPby

(NP-LGSCasey)))))

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

10

Imperatives:

ImperativesarelabeledSandgivenanullsubject(NP-SBJ*).

(S(NP-SBJ*)

(VPThrow

(NPtheball))

!)

Questionswithdeclarativewordorder:

Sentenceswithaquestionmarkbutnon-invertedwordorderareS:

(S(NP-SBJThis)

(VPis

(NP-PRDJapan))

?)

(S(NP-SBJYou)

(VPdid

(NPwhat))

?)

However,questionsthataremissingbothsubjectandauxiliaryarelabeledSQ

.

(SQ(NP-SBJ*)

(VPSee

(NPthatcutedog))

?)

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

11

In�nitives:

In�nitivesarelabeledSandtake(NP-SBJ*)asthenullsubject,whereto

representsthehighestleveloftheVP.

1.Complementclauses.

Whenthein�nitiveisaVPcomplement,thenullsubjectofthein�nitiveis

coindexedtoitslogicalsubject(orobject,control).

(S(NP-SBJ-1Casey)

(VPwants

(S(NP-SBJ*-1)

(VPto

(VPthrow

(NPtheball))))))

2.Purposeclauses.

PurposeclausesareattachedatSandlabeled-PRP(purpose/reason).The

subjectiscoindexedtothesurfacesubjectofthematrixclausewhenappropriate

(S(NP-SBJ-1Sue)

(VParrived

(ADVP-TMPearly)

(S-PRP(NP-SBJ*-1)

(VPto

(VPget

(NPagoodseat))))))

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

12

3.In�nitivalrelatives.

Inthecaseofin�nitivalrelatives,therelativeisadjoinedtoNPanddominatedby

SBARwithazerowh-complementizerlabeledaccordingtotheroleplayedbythe

gappedconstituent.A*T*inthepositionofthegapiscoindexedtothe

wh-complementizer.The(NP-SBJ*)isnotindexed.

(NP(NPamovie)

(SBAR(WHNP-10)

(S(NP-SBJ*)

(VPto

(VPsee

(NP*T*-1))))))

Participialandgerundclauses:

Participialclauseshavefullclausestructure,witheitheralexicalornull(NP-SBJ

*)subject.Whenappropriate,thenullsubjectiscoindexed.

(S(S-ADV(NP-SBJThecrowd)

|

(S(S-ADV(NP-SBJ*-1)

(VPcheering

|

(VPRunning

(ADVP-MNRmadly)))

|

(PP-DIRtoward

,

|

(NPCasey))))

(NP-SBJWillie)

|

,

(VPcaught

|

(NP-SBJ-1Willie)

(NPtheball)))

|

(VPcaught

|

(NPtheball)))

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

13

InvertedClauses.TheSINVlabelisusedforsubject-auxiliaryinversioninthe

caseofnegativeinversion,conditionalinversion,locativeinversion,andsome

topicalizations.SQ

isusedwithyes/noquestions.

(SINV(ADVP-TMPNever)

(VPhad)

(NP-SBJI)

(VPseen

(NPsuchaplace)))

(SQDid

(NP-SBJCasey)

(VPthrow

(NPtheball))

?)

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

14

WH-Questions.

TheSBARQ

labelmarkswh-questions(i.e.,thosethatcontainagapand

thereforerequireatrace).Afurtherlevelofstructure,SQ,containstheinverted

auxiliary(ifthereisone)andtherestofthesentence.Theinvertedauxiliaryin

wh-questionsisnotlabeled.

(SBARQ(WHNP-1Who)

(SQ(NP-SBJ*T*-1)

(VPthrew

(NPtheball)))

?)

(SBARQ(WHNP-2What)

(SQdid

(NP-SBJCasey)

(VPthrow

(NP*T*-2)))

?)

(SBARQ(WHNP-3Who)

(SQ(NP-SBJ*T*-3)

(VPwill

(VPthrow

(NPtheball))))

?)

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

15

Cleft.

Declarativeit-cleftsarelabeledS-CLF,expletiveitistaggedasthesurface

subject(-SBJ),theSBARisattachedatVP-level,andatraceiscoindexed.

(S-CLF(NP-SBJIt)

(VPwas

(NPCasey)

(SBAR(WHNP-1who)

(S(NP-SBJ*T*-1)

(VPthrew

(NPtheball))))))

Frontedelements

(S(NP-TPC-5This)

(NP-SBJeveryman)

(VPcontains

(NP*T*-5)

(PP-LOCwithin

(NPhim))))

(S(NP-TMPYesterday)

(NP-SBJI)

(VPwent

(PP-DIRto

(NPthestore))))

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

16

1.4

VPs

CompositetensesareannotatedwithVP-reduplication:

(S(NPCasey)

(VPshould

(VPhave

(VPthrown

(NPtheball)))))

Postmodi�ersofVPareattachedunderVP,withadverbialfunctiontag(s)where

appropriate.Structurallyallverbcomplementsarearguments.

(VPreading

(PP-CLRabout

(NPtoads))

(PP-LOCon

(NPtheInternet)))

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

17

1.5

NPsandPPs

SinceitisdiÆculttoconsistentlyannotateanargument/adjunctdistinction,allPP

modi�ersofnounsareChomsky-adjoinedtotheNP.Structurallyallnoun

complementsareadjuncts.

(NP(NPateacher)

(PPof

(NPchemistry)))

Often(butnotalways)theargument/adjunctdistinctionforNPsandPPscanbe

derivedfrom

thefunctionallabel.

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

18

1.6

GrammaticalFunctionLabels

Tag

M

arks:

Example:

GrammaticalFunctions

-CLF

trueclefts

[S-CLFitwasCaseywho...]

-NOM

nonNPsfunctioningasNPs

heardof[S-NOM

asbestosbeingdangerous]

-ADV

clausalandNPadverbials

reaches10,000barrels[NP-ADVaday]

-LGS

logicalsubjectsinpassives

doneby[NP-LGSthepresident]

-PRD

nonVPpredicates

is[NP-PRDaproducer]

-SBJ

surfacesubject

[NP-SBJPeter]walks.

-TPC

topicalizedfrontedconst.

[S-TPC-1Iagree,hesaid[SBAR[S-1]]

-CLR

closelyrelated

\openclassofothercases"

SemanticRoles

-VOC

vocatives

Closethedoor,[NP-VOCJohn]!

-DIR

direction&

trajectory

attention[PP-DIRtotheproblem]

-LOC

location

declines[PP-LOCininterestrates]

-MNR

manner

happy[PP-MNRlikeakid]

-PRP

purposeandreason

[PP-PRP(inorder)to...]

-TMP

temporalphrases

sharesrose[NP-TMPyesterday]

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

19

2

DependencyGrammar(DG)

2.1

DependencyandConstituency

DG[Tes59]focusesonthedependenciesbetweenwords

m

Constituencyfocusesonwhataphraseconsistsof

Anexampleofa(labeled)Dependencystructure:

ROOT

the

man

thatcameeats

bananas

with

a

fork

W

SENT

Subj

�Det

W

Rel

W

TH

W

Obj

W

PP

W

PObj

�Det

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

20

2.1.1

Valency

DGisavalencygrammarinwhichtheconceptofvalencyisextended

toallword-classes,andwherenon-subcategorizedmaterialis

attachedinsimilarways.

(1)a.Iamafraidofdogs.

b.*Iamreadyofdogs.

c.*Iamafraidforaction.

d.Iamreadyforaction.

(2)giveh[NPPP(to)]i

afraidh[PP(of)]i

readyh[PP(for)]i

presidenth[PP(of)]i

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

21

2.1.2

Lexicalism

DGisstrictlylexicalist,non-terminalnodesonlyexistasaderived

concept,endocentricityisnaturallyenforced�!

[Cho95]:Bare

PhraseStructureeats/V

man/N

the/D

the

man/N

man

eats/V

eats/V

eats

bananas/N

bananas

with/P

with/P

with

fork/N

a/Da

fork/N

fork

Theheadofaphraseanditsprojectionareisomorphic.

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

22

2.1.3

StrongEquivalencetoX-bar

DGstructuresareequivalenttoa atconstituencyrepresentation

withoutintermediatestructures(X').

ButifthelabelsusedbyaparticularDGcanbemappedontothe

permissibleX-barrelations,wehavestrongequivalence[Cov94]

A

B

C

D

W

Compl.

� Spec.

W

Adjunct

�!

Arethereanydi�erencesbetweenConstituencyandDepedency?

�!

WhatcouldDGpossiblybegoodfor?

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

23

2.2

Robustness

�IsomorphismofWordsandProjections�!

Buildingthemax.

projectionalwayssucceeds

ROOTSheeatscbYiXX09

W

SENT

�Subj

�Achunkerwithhead-extractiono�ersthesamehead/phrase

isomorphism[Abn95]�!

divide&conquer

ROOT[Basephraserecognition][canreduce][parsingcomplexity]

W

SENT

Subj

W

Obj

�Shallownessatwill

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

24

2.3

Headedness

DGoftenmakesdi�erentassumptionsaboutheadednessthane.g.

GB�!

thediscussionisopenagain(�!

HPSG)

Onouraccount,amarkerisawordthatis'functional'or'grammatical'as

opposedtosubstantive,inthesensethatitssemanticcontentispurely

logicalinnature(perhapsevenvacuous).Amarker,so-calledbecauseit

markstheconstituentinwhichitoccurs,combineswithanotherelement

thatheadsthatconstituent.Inadditiontothecomplementizersthatand

for,otherexamplesofmarkersincludethecomparativewordsthanandas,

thecase-markingpost-cliticsofJapaneseandKorean,andperhaps

nonpredicativeadpositionsin(thevastmajorityof)languageswhere

adpositionsstrandingdoesnotoccur.[PS94]

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

25

2.4

Projection

ForDG,projectionisdeterministic.VprojectstoVP,NtoNP

(endocentricity).Insomecases,projectionisexocentricbutmostly

deterministic(�

bottom-upparsing).ForNPprojection,apreferredhead

hierarchyisused:

NP

NN(noun)

>

PRP(personalpronoun)

>

CD(number)

>

JJ(adjective)

>

RB(little,most,much)

>

DT(determiner)

>

WP,WDT..(what,who,...)

Participlescanprojecttoverboradjective,developing/VBGcountries.

DGstructuresandgrammarsareconsiderablysimpleryetasexpressive.

Onlylexicalrelationsbetweencontentwordsaremodeled.

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

26

2.4.1

AnEndocentricityConstraint:SubcatIsEssential

�Verbwithorwithoutsubjectin\Dreyfusthebestfundwaslow"

Incorrect:2subjects

ROOT

[Dreyfus]

[thebestfund]

wasv

low

subj

�subj

W

SENT

-v

Correct:1subject

ROOT

[Dreyfus]

[thebestfund]

wasv

low

subj

W

nmod

W

SENT

-v

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

27

�Isanounacomplementoranadjunctofaverb?

�VerbCoordination:\Hesaw(him)andshottheduck."

�Doesaverbtakeasentencecomplement?(oftenGardenPath)

means

____________________|____________________

/

|

\

means

->sentobj->

be

__________|__________

__________|_________

/

|

\

/

|

\

period

<-subj<-

means

temporao

<-subj<-

be

|

|

|

|

___|__

|

___|__

__|__

/

\

|

/

\

/

\

The_DT

.

means_VBZ

the_DT

.

will_MD

.

__|__

|

|

/

\

|

|

dry_JJ

.

temporao_NN

,

|

_|_

|

/

\

period_NN

be_VBlate_RB

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

28

2.5

Functionalism

DGwasconceivedtobeadeep-syntactic,proto-semantictheory.

�SubjandObjasprimitives(�!

LFGf-structure)

�Onlycontentwordscanbeheads,so-callednuclei

�Allfunctionalwordsarepartsofsomenucleusn;m

;o

�Headsofchunksandallwordsoutsidechunksarenormallynuclei

�Somewordsoutsidechunksarepartofanon-projectivenucleusn!

ROOTCanthissentencem

beanalyzedn

byDependencyTheoryo ?

W

SENT�

Subj

W

Obj

Prep

n!

n

m

o

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

29

Functionalism-contd.

�Similaranalysesforfunctionallyrelatedsentences

ROOTPetergivesthebookm

toMaryn

W

SENT�

Subj

W

Obj

W

Obj

n

m

ROOTPetergives

Mary

thebookm

W

SENT�

Subj

W

Obj

W

Obj

m

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

30

2.6

Long-DistanceDependencies

ROOT

What

didshebelieve

Peter

said

Mary

wasthinking

W

SENT

�Subj

W

Obj�

Subj

W

Obj

Obj�A;wh

Subj

DGoftenusesrestrictednon-projectivityinsteadoftransformations

forlong-distancedependencies(LDD)[Tes59],[TJ97]

MostLDDsareeitherorseveralof

�AUX-MAINintheverbnucleus(HPSGargumentcomposition)

�easytospot(wh,sentence-initial,�A-movement)

�easytotreatwithaSLASH-feature(GPSG,HPSG,Prolog)

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

31

2.7

DependencyandProbabilisticGrammars

[Col96],[Col97]

Thedependencyadvantage:

CFGrulesarebrokenupintoindividualdependencies-lesssparse

data,morevaluableinformation,probabilitiesareassignedtothe

dependencies

VP�!

VNP(\givesthemoney")

VP�!

VNPNP(\givesthemallhismoney")

VP�!

VNPPP(\giveshismoneytothepoor")

VP�!

VPP(\givestothepoor")

NP�!

DT$CDNN("the$200hat")

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

32

3

Conclusions

�DGandconstituencyarelargelyequivalent

�TreebankstructurescanthusbeexpectedtomapontoDG

�DGislexicalistbyde�nition

�ChunkingandHead-ExtractionnaturallyintegratesintoaDG

parser

�DGisinherentlymorerobustthanaconstituencygrammar

�DGisinherentlywellsuitedforalexicalizedprobabilisticparser

�DGdoesnotneedaprojectionoperation(savestime)

�LabeledDGnaturallytakesgrammaticalrelationsintoaccount

�DG(typically)usescrossingdependenciesinsteadofmovements

EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik

33

References

[Abn95]

StevenAbney.Chunksanddependencies:Bringingprocessingevidencetobearon

syntax.InJenniferCole,GeorgiaGreen,andJerryMorgan,editors,Computational

LinguisticsandtheFoundationsofLinguisticTheory,pages145{164.CSLI,1995.

[Cho95]

Noam

Chomsky.TheMinimalistProgram.TheMIT

Press,Cambridge,Massachusetts,

1995.

[Col96]

MichaelCollins.A

new

statisticalparserbasedonbigram

lexicaldependencies.In

ProceedingsoftheThirty-FourthAnnualMeetingoftheAssociationforComputationalLinguistics,

pages184{191,Philadelphia,1996.

[Col97]

MichaelCollins.Threegenerative,lexicalisedmodelsforstatisticalparsing.InProc.of

the35thAnnualMeetingoftheACL,pages16{23,Madrid,Spain,1997.

[Cov94]

MichaelA.Covington.AnempiricallymotivatedreinterpretationofDependency

Grammar.TechnicalReportAI1994-01,UniversityofGeorgia,Athens,Georgia,1994.

[MKM+94]

MitchMarcus,GraceKim,M.A.Marcinkiewicz,RobertMacIntyre,AnnBies,Mark

Ferguson,KarenKatz,andBrittaSchasberger.ThePennTreebank:Annotating

predicateargumentstructure.InProceedingsofARPA

'94,1994.

[MSM93]

MitchMarcus,BeatriceSantorini,andM.A.Marcinkiewicz.Buildingalargeannotated

corpusofEnglish:thePennTreebank.ComputationalLinguistics,19:313{330,1993.

[PS94]

CarlPollardandIvanSag.Head-DrivenPhraseStructureGrammar.ChicagoUniversity

Press,Chicago,Illinois,1994.

[Tes59]

LucienTesni�ere.El�ementsdeSyntaxeStructurale.LibrairieKlincksieck,Paris,1959.

[TJ97]

PasiTapanainenandTimoJ�arvinen.A

non-projectivedependencyparser.InProceedings

ofthe5thConferenceonAppliedNaturalLanguageProcessing,pages64{71.Associationfor

ComputationalLinguistics,1997.