19
Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking Ikuya Yamada 1,2,3 Hideaki Takeda 3 Yoshiyasu Takefuji 2 1 Studio Ousia 2 Keio University 3 National Institute of Informatics 15731日金曜日

Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking

Embed Size (px)

Citation preview

Enhancing Named Entity Recognition inTwitter Messages Using Entity Linking

Ikuya Yamada1,2,3 Hideaki Takeda3 Yoshiyasu Takefuji2

1Studio Ousia 2Keio University 3National Institute of Informatics

15年7月31日金曜日

STUDIO OUSIA

Background

‣ Twitter NER is difficult because of the noisy, short, and colloquial nature of tweets

‣ The performance of standard NER software suffers significantly

2

15年7月31日金曜日

STUDIO OUSIA

Entity Linking

3

New Frozen Boutique to Open at Disney's Hollywood Studios

/wiki/Frozen_(2013_film)/wiki/The_Walt_Disney_Company /wiki/Disney’s_Hollywood_Studios

‣ Entity Linking: The task of linking entity mentions to entries in a knowledge base (KB) (e.g., Wikipedia)

‣ Recently entity linking has received considerable attention✦ Many research papers (2006-) [Cucerzan 2007, Milne et al. 2008, etc.]

✦ Competitions (TAC KBP, ERD@SIGIR, #Microposts@WWW, etc.)

15年7月31日金曜日

STUDIO OUSIA

Can we enhance Twitter NERby using entity linking?

4

15年7月31日金曜日

STUDIO OUSIA 5

New Frozen Boutique to Open at Disney's Hollywood Studios

Detecting “Frozen” from this tweet is difficult

15年7月31日金曜日

STUDIO OUSIA

Entity Linking

6

New Frozen Boutique to Open at Disney's Hollywood Studios

/wiki/Frozen_(2013_film)/wiki/The_Walt_Disney_Company /wiki/Disney’s_Hollywood_Studios

‣ By using entity linking, we can detect “Frozen”:✦ “Frozen” is a very popular entity (from Wikipedia link

structure and page view count)

✦ “Frozen” is semantically related to the context entities

15年7月31日金曜日

STUDIO OUSIA

Our Approach

‣ Our system first performs entity linking in an end-to-end manner

‣ Detected entity mentions are used to enhance the NER tasks

‣ The data of entities are extracted from several open knowledge bases (Wikipedia, DBpedia, Freebase)

‣ Segmentation and classification tasks are addressed by using separate components

7

End-to-EndEntity Linking

Segmentation(NER)

Classification(NER)

15年7月31日金曜日

End-to-End Entity LinkingEnd-to-End

Entity LinkingSegmentation

(NER)Classification

(NER)

15年7月31日金曜日

STUDIO OUSIA

End-to-End Entity Linking

‣ An entity linking system specifically designed for tweets✦ Does not depend on NER to detect entity mentions (considering all

possible n-grams as mention candidates)✦ Based on supervised machine-learning (random forest) using various kinds

of features (trained using #Microposts2015 dataset)✦ Winner of a recent Twitter entity linking competition called

#Microposts2015 NEEL Challenge at WWW2015

‣ For further details, please refer to:Yamada et al, An End-to-End Entity Linking Approach for Tweetsin Proceedings of #Microposts 2015

9

Image taken from NEEL2015 Challenge Summary: http://www.slideshare.net/giusepperizzo/neel2015-challenge-summary

15年7月31日金曜日

Segmentation of Named EntitiesEnd-to-End

Entity LinkingSegmentation

(NER)Classification

(NER)

15年7月31日金曜日

STUDIO OUSIA

Segmentation: Approach

‣ Supervised machine-learning is used to assign a binary label to each of possible n-grams

‣ Random forest is used as the machine-learning algorithm

‣ Overlaps of mentions are resolved by iteratively selecting the longest entity mention from the beginning of the tweet

‣ Machine-learning features can be classified as follows:✦ Entity-based features✦ Linguistic features

11

15年7月31日金曜日

STUDIO OUSIA

Segmentation: Entity-based Features

‣ The relevance score assigned by the entity linking system

‣ The popularity of the entity:✦ The number of inbound links of the entity in

Wikipedia

✦ The average page view count of the Wikipedia entity

‣ Mention statistics in Wikipedia:✦ Link probability✦ Capitalization probability

12

15年7月31日金曜日

STUDIO OUSIA

Segmentation: Link Probability Feature

13

Her public image is associated with Japan's kawaisa

culture centered in Harajuku, Tokyo

Takeshita Street is a street lined with

fashion boutiques, and cafes in Harajuku

in Tokyo, Japan.

Department Store and Museum is a department store

located in the Harajuku...

Takeshita Street Kyary Pamyu Pamyu Laforet

Link Plain text

LINK_PROBABILITY(Harajuku) = 2/3

15年7月31日金曜日

STUDIO OUSIA

Segmentation: Linguistic Features

‣ Whether or not Stanford NER detects the mention

‣ Part-of-speech tags of the current and surrounding words

‣ Whether or not the current and surrounding words are capitalized

‣ Mention length (# of words, # of characters)

14

15年7月31日金曜日

Classification of Named EntitiesEnd-to-End

Entity LinkingSegmentation

(NER)Classification

(NER)

15年7月31日金曜日

STUDIO OUSIA

Classification‣ Supervised machine-learning is used to classify detected

mentions into the predefined types

‣ Linear SVM is used as the machine-learning algorithm

‣ Main machine-learning features:✦ Entity types in knowledge bases

(DBpedia Ontology Classes and Freebase Types)✦ Entity type detected by Stanford NER

(i.e., PERSON, ORGANIZATION, LOCATION)✦ The average of vectors of words in the n-gram using

Stanford GloVe word embeddings (840B model)✦ The relevance score assigned by entity linking

16

15年7月31日金曜日

STUDIO OUSIA

Results

‣ Our method outperformed the 2nd-ranked method by 10.34 F1 at the segmentation task and by 5.01 F1 at the end-to-end task!

17

Performances of the proposed systems at segmenting entities

Performances of the proposed systems at both segmentation and classification tasks

15年7月31日金曜日

STUDIO OUSIA

Conclusion

‣ Twitter NER can be enhanced by using entity linking

‣ Entity linking enables us to use quality data in knowledge bases for NER tasks

18

15年7月31日金曜日

THANK YOU!

15年7月31日金曜日