임성신[email protected] Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING

임성신임성신

[email protected]@pusan.ac.kr

Speech and Language Processing

Ch8. WORD CLASSES AND PART-OF-Ch8. WORD CLASSES AND PART-OF-SPEECH TAGGINGSPEECH TAGGING

2Artificial Intelligence LaboratoryArtificial Intelligence Laboratory

AgendaAgenda

What are they?What are they? DistributionDistribution TagsetsTagsets TaggingTagging

Rules Probabilities Transformation-Based(Brill)


Parts of SpeechParts of Speech

Start with eight basic categoriesStart with eight basic categories Noun, verb, pronoun, preposition, adjective, adverb, article,

conjunction

These categories are based on morphological and These categories are based on morphological and distributional properties (not semantics)distributional properties (not semantics)

Some cases are easy, others are murkySome cases are easy, others are murky


Parts of SpeechParts of Speech

Two kinds of categoryTwo kinds of category Closed class

• Prepositions, articles, conjunctions, pronouns

Open class• Nouns, verbs, adjectives, adverbs


Fig 8.1 Prepositions(and particles) of English from the CELEX on-line dictionary.Frequency counts are from the COBUILD 16 million word corpus.


Fig 8.2 English single-word particles from Quirk et al.(1985).


Fig 8.3 Coordinating and subordinating conjunctions of English from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.


Fig 8.4 Pronouns of English from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.


Fig 8.5 English modal verbs from the CELEX on-line dictionary.Frequency counts are from the COBUILD 16 million word corpus.


Sets of Parts of Speech: TagsetsSets of Parts of Speech: Tagsets

There are various standard tagsets to choose from; There are various standard tagsets to choose from; some have a lot more tags than otherssome have a lot more tags than others

The choice of tagset is based on the applicationThe choice of tagset is based on the application Accurate tagging can be done with even large tagsetAccurate tagging can be done with even large tagset

ss


Fig 8.6 Penn Treebank part-of-speech tags (including punctuation).


TaggingTagging

Part of speech tagging is the process of assigning pPart of speech tagging is the process of assigning parts of speech to each word in a sentence… Assume arts of speech to each word in a sentence… Assume we havewe have A tagset A dictionary that gives you the possible set of tags for each

entry A text to be tagged A reason?

The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS./.


Figure 8.7 The number of word types in Brown corpus by degree of ambiguity (after DeRose(1988)).


Tagging - RulesTagging - Rules

Hand-crafted rules for ambiguous words that test the Hand-crafted rules for ambiguous words that test the context to make appropriate choicescontext to make appropriate choices Early attempts fairly error-prone Extremely labor-intensive


Figure 8.8 Sample lexical entries from the ENGTWOL lexicon described in Voutilainen(1995) and Heikkila(1995).


Tagging - ProbabilitiesTagging - Probabilities

장점장점 충분한 크기의 태그부탁 말뭉치만 주어지면 태깅에 필요한

통계정보의 추출이 용이하기 때문에 확장성이 좋고 적용범위가 넓으며 전체적인 정확성이 비교적 높다는 장점

단점단점 말뭉치에 의존적 의미 있는 통계정보를 추출하기 위해서는 일정크기 이상의

태그부탁 말뭉치 필요 말뭉치 구축에 시간과 노력이 많이 요구됨 말뭉치가 편중되어 있거나 불충분한 경우에는 data sparseness

로 인해 신뢰도가 떨어짐


Tagging - ProbabilitiesTagging - Probabilities

We want the best set of tags for a sequence of wordsWe want the best set of tags for a sequence of words(a sentence)(a sentence)

)(

)()|(maxarg)|(maxarg

WP

TPTWPWTP

)()|(maxarg)|(maxarg TPTWPWTP

W is a sequence of wordsW is a sequence of wordsT is a sequence of tagsT is a sequence of tags

The probability of the word sequence P(W)will be the same for each tag sequence

n

i

ii

n

i

ii ttPtPtwP2

112

)|(*)(*)|(maxarg


Tagging - Transformation-Based(Brill tagging)Tagging - Transformation-Based(Brill tagging)

Combine rules and statistics…Combine rules and statistics… TBL(Transformation-Based Learning) is based on rules Rules are automatically induced from the data(ML)


Brill tagging - ExamplesBrill tagging - Examples

RaceRace “race” as NN: .98 “race” as VB: .02

So you’ll be wrong 2% of the time, which really isn’t So you’ll be wrong 2% of the time, which really isn’t badbad

Patch the cases where you know it has to be a verbPatch the cases where you know it has to be a verb Change NN to VB when previous tag is TO


Brill tagging - RulesBrill tagging - Rules

Where did that transformational rule come from?Where did that transformational rule come from? Define a hypothesis space of rules that might help decrease

an error rate Search that space (exhaustively?) to find rules that most

reduce an error rate. Continue to add rules until some stopping criteria is

reached

Figure 8.9 Brill’s(1995) templates. Each begins with “Change tag a to tag b when : …”. The variables a, b, z and w range over parts-of-speech.

Documents

임성신[email protected] Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING