Upload
others
View
3
Download
0
Embed Size (px)
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
1
Institutf�urComputerlinguistikderUniZ�urich:EÆzienteAnalyse
unbeschr�ankterTexte
Vorlesung3:TreebanksandDependencyGrammar
GeroldSchneider
IFI,Universit�atZ�urich
gschneid@i�.unizh.ch
3.November2003
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
2
Contents
1.Treebanks
(a)Tagging
(b)BasicClauseStructure
(c)Sentences
(d)VPs
(e)NPsandPPs
(f)GrammaticalFunctionLabels
2.DependencyGrammar
(a)DependencyandConstituency
(b)Robustness
(c)Headedness
(d)Projection
(e)Functionalism
(f)Long-DistanceDependencies
(g)DependencyandProbabilistic
Grammars
3.Conclusions
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
3
1
Treebanks
Treebanksaresyntacticallyannotatedcorpora.Theannotationisas
theory-neutralaspossible:
�NoX-bartheory,nointermediatecategories
�NoCP,IP,DP(ase.g.inGovernment&Binding)
�Auxiliary-mainverbrelationsareexpressedbyVPreduplication
�ThetopnodeofasentenceisS,notaverbalprojection(asin
HPSG,LFG,DG)
�Functionalroles(usedbyLFGandDG)areonlypartially
annotated.
�Carefuluseofemptycategoriesandco-indexation.
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
4
ThefollowingTreebankslidesarepartlycopiedfromtheTreebank
documentationat
http://www.cis.upenn.edu/~treebank/home.html
suchas(agoodintroductiontotheTreebankgenerally)[MSM93]
ftp://ftp.cis.upenn.edu/pub/treebank/doc/cl93.ps.gz
and(agoodintroductiontoTreebank-IIannotation)[MKM+94]
ftp://ftp.cis.upenn.edu/pub/treebank/doc/arpa94.ps.gz
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
5
1.1
Tagging
Simpli�edtagsetwithonly36tags.Alltags,inalphabeticalorder:
1.
CC
Coordinatingconjunction
2.
CD
Cardinalnumber
3.
DT
Determiner
4.
EX
Existentialthere
5.
FW
Foreignword
6.
IN
Prepositionorsubordinatingconjunction
7.
JJ
Adjective
8.
JJR
Adjective,comparative
9.
JJS
Adjective,superlative
10.
LS
Listitem
marker
11.
MD
Modal
12.
NN
Noun,singularormass
13.
NNS
Noun,plural
14.
NNP
Propernoun,singular
15.
NNPS
Propernoun,plural
16.
PDT
Predeterminer
17.
POS
Possessiveending
18.
PRP
Personalpronoun
19.
PRP$
Possessivepronoun
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
6
20.
RB
Adverb
21.
RBR
Adverb,comparative
22.
RBS
Adverb,superlative
23.
RP
Particle
24.
SYM
Symbol
25.
TO
to
26.
UH
Interjection
27.
VB
Verb,baseform
28.
VBD
Verb,pasttense
29.
VBG
Verb,gerundorpresentparticiple
30.
VBN
Verb,pastparticiple
31.
VBP
Verb,non-3rdpersonsingularpresent
32.
VBZ
Verb,3rdpersonsingularpresent
33.
WDT
Wh-determiner
34.
WP
Wh-pronoun
35.
WP$
Possessivewh-pronoun
36.
WRB
Wh-adverb
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
7
1.2
BasicClauseStructure
Brackettedstructure:
((S
(NP-SBJ
(NP(NNPPierre)(NNPVinken))
(,,)
(ADJP
(NP(CD61)(NNSyears))
(JJold))
(,,))
(VP(MDwill)
(VP(VBjoin)
(NP(DTthe)(NNboard))
(PP-CLR(INas)
(NP(DTa)(JJnonexecutive)(NNdirector)))
(NP-TMP(NNPNov.)(CD29))))
(..)))
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
8
Equvivalenttree(simpli�ed)
S
NP-SBJ
NP
NNP
Pierre
NNP
Vinken
ADJP
NP
CD6
1
NNS
years
JJ
old
VP
MD
will
VP
VB
join
NP
DT
the
NN
board
PP-CLR
INas
NP
DTa
NN
director
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
9
1.3
Sentences:S,SINV,SBAR,SBARQ,SQ,
S-CLF,FRAG.
�
Simpledeclarativesentences:
(S(NP-SBJCasey)
(VPthrew
(NPtheball)))
�
Passives:
Thesurfacesubjectistagged-SBJ,thepassivetraceisindicatedwith(NP*)
andcoindexedtothesurfacesubject,theby-phraseisachildofVP,andthe
logicalsubjectistagged-LGS.(Notethatthe-LGStaggoesontheNPandnot
onthePPoftheby-phrase.)
(S(NP-SBJ-1Theball)
(VPwas
(VPthrown
(NP*-1)
(PPby
(NP-LGSCasey)))))
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
10
�
Imperatives:
ImperativesarelabeledSandgivenanullsubject(NP-SBJ*).
(S(NP-SBJ*)
(VPThrow
(NPtheball))
!)
�
Questionswithdeclarativewordorder:
Sentenceswithaquestionmarkbutnon-invertedwordorderareS:
(S(NP-SBJThis)
(VPis
(NP-PRDJapan))
?)
(S(NP-SBJYou)
(VPdid
(NPwhat))
?)
However,questionsthataremissingbothsubjectandauxiliaryarelabeledSQ
.
(SQ(NP-SBJ*)
(VPSee
(NPthatcutedog))
?)
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
11
�
In�nitives:
In�nitivesarelabeledSandtake(NP-SBJ*)asthenullsubject,whereto
representsthehighestleveloftheVP.
1.Complementclauses.
Whenthein�nitiveisaVPcomplement,thenullsubjectofthein�nitiveis
coindexedtoitslogicalsubject(orobject,control).
(S(NP-SBJ-1Casey)
(VPwants
(S(NP-SBJ*-1)
(VPto
(VPthrow
(NPtheball))))))
2.Purposeclauses.
PurposeclausesareattachedatSandlabeled-PRP(purpose/reason).The
subjectiscoindexedtothesurfacesubjectofthematrixclausewhenappropriate
(S(NP-SBJ-1Sue)
(VParrived
(ADVP-TMPearly)
(S-PRP(NP-SBJ*-1)
(VPto
(VPget
(NPagoodseat))))))
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
12
3.In�nitivalrelatives.
Inthecaseofin�nitivalrelatives,therelativeisadjoinedtoNPanddominatedby
SBARwithazerowh-complementizerlabeledaccordingtotheroleplayedbythe
gappedconstituent.A*T*inthepositionofthegapiscoindexedtothe
wh-complementizer.The(NP-SBJ*)isnotindexed.
(NP(NPamovie)
(SBAR(WHNP-10)
(S(NP-SBJ*)
(VPto
(VPsee
(NP*T*-1))))))
�
Participialandgerundclauses:
Participialclauseshavefullclausestructure,witheitheralexicalornull(NP-SBJ
*)subject.Whenappropriate,thenullsubjectiscoindexed.
(S(S-ADV(NP-SBJThecrowd)
|
(S(S-ADV(NP-SBJ*-1)
(VPcheering
|
(VPRunning
(ADVP-MNRmadly)))
|
(PP-DIRtoward
,
|
(NPCasey))))
(NP-SBJWillie)
|
,
(VPcaught
|
(NP-SBJ-1Willie)
(NPtheball)))
|
(VPcaught
|
(NPtheball)))
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
13
�
InvertedClauses.TheSINVlabelisusedforsubject-auxiliaryinversioninthe
caseofnegativeinversion,conditionalinversion,locativeinversion,andsome
topicalizations.SQ
isusedwithyes/noquestions.
(SINV(ADVP-TMPNever)
(VPhad)
(NP-SBJI)
(VPseen
(NPsuchaplace)))
(SQDid
(NP-SBJCasey)
(VPthrow
(NPtheball))
?)
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
14
�
WH-Questions.
TheSBARQ
labelmarkswh-questions(i.e.,thosethatcontainagapand
thereforerequireatrace).Afurtherlevelofstructure,SQ,containstheinverted
auxiliary(ifthereisone)andtherestofthesentence.Theinvertedauxiliaryin
wh-questionsisnotlabeled.
(SBARQ(WHNP-1Who)
(SQ(NP-SBJ*T*-1)
(VPthrew
(NPtheball)))
?)
(SBARQ(WHNP-2What)
(SQdid
(NP-SBJCasey)
(VPthrow
(NP*T*-2)))
?)
(SBARQ(WHNP-3Who)
(SQ(NP-SBJ*T*-3)
(VPwill
(VPthrow
(NPtheball))))
?)
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
15
�
Cleft.
Declarativeit-cleftsarelabeledS-CLF,expletiveitistaggedasthesurface
subject(-SBJ),theSBARisattachedatVP-level,andatraceiscoindexed.
(S-CLF(NP-SBJIt)
(VPwas
(NPCasey)
(SBAR(WHNP-1who)
(S(NP-SBJ*T*-1)
(VPthrew
(NPtheball))))))
�
Frontedelements
(S(NP-TPC-5This)
(NP-SBJeveryman)
(VPcontains
(NP*T*-5)
(PP-LOCwithin
(NPhim))))
(S(NP-TMPYesterday)
(NP-SBJI)
(VPwent
(PP-DIRto
(NPthestore))))
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
16
1.4
VPs
CompositetensesareannotatedwithVP-reduplication:
(S(NPCasey)
(VPshould
(VPhave
(VPthrown
(NPtheball)))))
Postmodi�ersofVPareattachedunderVP,withadverbialfunctiontag(s)where
appropriate.Structurallyallverbcomplementsarearguments.
(VPreading
(PP-CLRabout
(NPtoads))
(PP-LOCon
(NPtheInternet)))
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
17
1.5
NPsandPPs
SinceitisdiÆculttoconsistentlyannotateanargument/adjunctdistinction,allPP
modi�ersofnounsareChomsky-adjoinedtotheNP.Structurallyallnoun
complementsareadjuncts.
(NP(NPateacher)
(PPof
(NPchemistry)))
Often(butnotalways)theargument/adjunctdistinctionforNPsandPPscanbe
derivedfrom
thefunctionallabel.
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
18
1.6
GrammaticalFunctionLabels
Tag
M
arks:
Example:
GrammaticalFunctions
-CLF
trueclefts
[S-CLFitwasCaseywho...]
-NOM
nonNPsfunctioningasNPs
heardof[S-NOM
asbestosbeingdangerous]
-ADV
clausalandNPadverbials
reaches10,000barrels[NP-ADVaday]
-LGS
logicalsubjectsinpassives
doneby[NP-LGSthepresident]
-PRD
nonVPpredicates
is[NP-PRDaproducer]
-SBJ
surfacesubject
[NP-SBJPeter]walks.
-TPC
topicalizedfrontedconst.
[S-TPC-1Iagree,hesaid[SBAR[S-1]]
-CLR
closelyrelated
\openclassofothercases"
SemanticRoles
-VOC
vocatives
Closethedoor,[NP-VOCJohn]!
-DIR
direction&
trajectory
attention[PP-DIRtotheproblem]
-LOC
location
declines[PP-LOCininterestrates]
-MNR
manner
happy[PP-MNRlikeakid]
-PRP
purposeandreason
[PP-PRP(inorder)to...]
-TMP
temporalphrases
sharesrose[NP-TMPyesterday]
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
19
2
DependencyGrammar(DG)
2.1
DependencyandConstituency
DG[Tes59]focusesonthedependenciesbetweenwords
m
Constituencyfocusesonwhataphraseconsistsof
Anexampleofa(labeled)Dependencystructure:
ROOT
the
man
thatcameeats
bananas
with
a
fork
W
SENT
�
Subj
�Det
W
Rel
W
TH
W
Obj
W
PP
W
PObj
�Det
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
20
2.1.1
Valency
DGisavalencygrammarinwhichtheconceptofvalencyisextended
toallword-classes,andwherenon-subcategorizedmaterialis
attachedinsimilarways.
(1)a.Iamafraidofdogs.
b.*Iamreadyofdogs.
c.*Iamafraidforaction.
d.Iamreadyforaction.
(2)giveh[NPPP(to)]i
afraidh[PP(of)]i
readyh[PP(for)]i
presidenth[PP(of)]i
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
21
2.1.2
Lexicalism
DGisstrictlylexicalist,non-terminalnodesonlyexistasaderived
concept,endocentricityisnaturallyenforced�!
[Cho95]:Bare
PhraseStructureeats/V
man/N
the/D
the
man/N
man
eats/V
eats/V
eats
bananas/N
bananas
with/P
with/P
with
fork/N
a/Da
fork/N
fork
Theheadofaphraseanditsprojectionareisomorphic.
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
22
2.1.3
StrongEquivalencetoX-bar
DGstructuresareequivalenttoa atconstituencyrepresentation
withoutintermediatestructures(X').
ButifthelabelsusedbyaparticularDGcanbemappedontothe
permissibleX-barrelations,wehavestrongequivalence[Cov94]
A
B
C
D
W
Compl.
� Spec.
W
Adjunct
�!
Arethereanydi�erencesbetweenConstituencyandDepedency?
�!
WhatcouldDGpossiblybegoodfor?
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
23
2.2
Robustness
�IsomorphismofWordsandProjections�!
Buildingthemax.
projectionalwayssucceeds
ROOTSheeatscbYiXX09
W
SENT
�Subj
�Achunkerwithhead-extractiono�ersthesamehead/phrase
isomorphism[Abn95]�!
divide&conquer
ROOT[Basephraserecognition][canreduce][parsingcomplexity]
W
SENT
�
Subj
W
Obj
�Shallownessatwill
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
24
2.3
Headedness
DGoftenmakesdi�erentassumptionsaboutheadednessthane.g.
GB�!
thediscussionisopenagain(�!
HPSG)
Onouraccount,amarkerisawordthatis'functional'or'grammatical'as
opposedtosubstantive,inthesensethatitssemanticcontentispurely
logicalinnature(perhapsevenvacuous).Amarker,so-calledbecauseit
markstheconstituentinwhichitoccurs,combineswithanotherelement
thatheadsthatconstituent.Inadditiontothecomplementizersthatand
for,otherexamplesofmarkersincludethecomparativewordsthanandas,
thecase-markingpost-cliticsofJapaneseandKorean,andperhaps
nonpredicativeadpositionsin(thevastmajorityof)languageswhere
adpositionsstrandingdoesnotoccur.[PS94]
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
25
2.4
Projection
ForDG,projectionisdeterministic.VprojectstoVP,NtoNP
(endocentricity).Insomecases,projectionisexocentricbutmostly
deterministic(�
bottom-upparsing).ForNPprojection,apreferredhead
hierarchyisused:
NP
�
NN(noun)
>
PRP(personalpronoun)
>
CD(number)
>
JJ(adjective)
>
RB(little,most,much)
>
DT(determiner)
>
WP,WDT..(what,who,...)
Participlescanprojecttoverboradjective,developing/VBGcountries.
DGstructuresandgrammarsareconsiderablysimpleryetasexpressive.
Onlylexicalrelationsbetweencontentwordsaremodeled.
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
26
2.4.1
AnEndocentricityConstraint:SubcatIsEssential
�Verbwithorwithoutsubjectin\Dreyfusthebestfundwaslow"
Incorrect:2subjects
ROOT
[Dreyfus]
[thebestfund]
wasv
low
�
subj
�subj
W
SENT
-v
Correct:1subject
ROOT
[Dreyfus]
[thebestfund]
wasv
low
�
subj
W
nmod
W
SENT
-v
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
27
�Isanounacomplementoranadjunctofaverb?
�VerbCoordination:\Hesaw(him)andshottheduck."
�Doesaverbtakeasentencecomplement?(oftenGardenPath)
means
____________________|____________________
/
|
\
means
->sentobj->
be
__________|__________
__________|_________
/
|
\
/
|
\
period
<-subj<-
means
temporao
<-subj<-
be
|
|
|
|
___|__
|
___|__
__|__
/
\
|
/
\
/
\
The_DT
.
means_VBZ
the_DT
.
will_MD
.
__|__
|
|
/
\
|
|
dry_JJ
.
temporao_NN
,
|
_|_
|
/
\
period_NN
be_VBlate_RB
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
28
2.5
Functionalism
DGwasconceivedtobeadeep-syntactic,proto-semantictheory.
�SubjandObjasprimitives(�!
LFGf-structure)
�Onlycontentwordscanbeheads,so-callednuclei
�Allfunctionalwordsarepartsofsomenucleusn;m
;o
�Headsofchunksandallwordsoutsidechunksarenormallynuclei
�Somewordsoutsidechunksarepartofanon-projectivenucleusn!
ROOTCanthissentencem
beanalyzedn
byDependencyTheoryo ?
W
SENT�
Subj
W
Obj
�
Prep
�
n!
�
n
�
m
�
o
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
29
Functionalism-contd.
�Similaranalysesforfunctionallyrelatedsentences
ROOTPetergivesthebookm
toMaryn
W
SENT�
Subj
W
Obj
W
Obj
�
n
�
m
ROOTPetergives
Mary
thebookm
W
SENT�
Subj
W
Obj
W
Obj
�
m
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
30
2.6
Long-DistanceDependencies
ROOT
What
didshebelieve
Peter
said
Mary
wasthinking
W
SENT
�Subj
W
Obj�
Subj
W
Obj
�
Obj�A;wh
�
Subj
�
�
DGoftenusesrestrictednon-projectivityinsteadoftransformations
forlong-distancedependencies(LDD)[Tes59],[TJ97]
MostLDDsareeitherorseveralof
�AUX-MAINintheverbnucleus(HPSGargumentcomposition)
�easytospot(wh,sentence-initial,�A-movement)
�easytotreatwithaSLASH-feature(GPSG,HPSG,Prolog)
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
31
2.7
DependencyandProbabilisticGrammars
[Col96],[Col97]
Thedependencyadvantage:
CFGrulesarebrokenupintoindividualdependencies-lesssparse
data,morevaluableinformation,probabilitiesareassignedtothe
dependencies
VP�!
VNP(\givesthemoney")
VP�!
VNPNP(\givesthemallhismoney")
VP�!
VNPPP(\giveshismoneytothepoor")
VP�!
VPP(\givestothepoor")
NP�!
DT$CDNN("the$200hat")
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
32
3
Conclusions
�DGandconstituencyarelargelyequivalent
�TreebankstructurescanthusbeexpectedtomapontoDG
�DGislexicalistbyde�nition
�ChunkingandHead-ExtractionnaturallyintegratesintoaDG
parser
�DGisinherentlymorerobustthanaconstituencygrammar
�DGisinherentlywellsuitedforalexicalizedprobabilisticparser
�DGdoesnotneedaprojectionoperation(savestime)
�LabeledDGnaturallytakesgrammaticalrelationsintoaccount
�DG(typically)usescrossingdependenciesinsteadofmovements
EÆzienteAnalyseunbeschr�ankterTexte:TreebanksundDependenzgrammatik
33
References
[Abn95]
StevenAbney.Chunksanddependencies:Bringingprocessingevidencetobearon
syntax.InJenniferCole,GeorgiaGreen,andJerryMorgan,editors,Computational
LinguisticsandtheFoundationsofLinguisticTheory,pages145{164.CSLI,1995.
[Cho95]
Noam
Chomsky.TheMinimalistProgram.TheMIT
Press,Cambridge,Massachusetts,
1995.
[Col96]
MichaelCollins.A
new
statisticalparserbasedonbigram
lexicaldependencies.In
ProceedingsoftheThirty-FourthAnnualMeetingoftheAssociationforComputationalLinguistics,
pages184{191,Philadelphia,1996.
[Col97]
MichaelCollins.Threegenerative,lexicalisedmodelsforstatisticalparsing.InProc.of
the35thAnnualMeetingoftheACL,pages16{23,Madrid,Spain,1997.
[Cov94]
MichaelA.Covington.AnempiricallymotivatedreinterpretationofDependency
Grammar.TechnicalReportAI1994-01,UniversityofGeorgia,Athens,Georgia,1994.
[MKM+94]
MitchMarcus,GraceKim,M.A.Marcinkiewicz,RobertMacIntyre,AnnBies,Mark
Ferguson,KarenKatz,andBrittaSchasberger.ThePennTreebank:Annotating
predicateargumentstructure.InProceedingsofARPA
'94,1994.
[MSM93]
MitchMarcus,BeatriceSantorini,andM.A.Marcinkiewicz.Buildingalargeannotated
corpusofEnglish:thePennTreebank.ComputationalLinguistics,19:313{330,1993.
[PS94]
CarlPollardandIvanSag.Head-DrivenPhraseStructureGrammar.ChicagoUniversity
Press,Chicago,Illinois,1994.
[Tes59]
LucienTesni�ere.El�ementsdeSyntaxeStructurale.LibrairieKlincksieck,Paris,1959.
[TJ97]
PasiTapanainenandTimoJ�arvinen.A
non-projectivedependencyparser.InProceedings
ofthe5thConferenceonAppliedNaturalLanguageProcessing,pages64{71.Associationfor
ComputationalLinguistics,1997.