35
Emerging Topic Detection on Twitter based on Temporal and Social Terms Evaluation KDD 2010 Workshop on Multimedia Data Mining Chin Hui Chen ( 陳陳陳 )

Emerging topic detection on twitter based on temporal and social terms evaluation

Embed Size (px)

Citation preview

Page 1: Emerging topic detection on twitter based on temporal and social terms evaluation

Emerging Topic Detection on Twitter based on Temporal and Social Terms EvaluationKDD 2010 Workshop on Multimedia Data Mining

Chin Hui Chen (陳晉暉 )

Page 3: Emerging topic detection on twitter based on temporal and social terms evaluation

Agenda

• Introduction• The Main Steps • Content Extraction• User Authority• Content Aging Theory• Selection of Emerging Terms• From Emerging Terms to Emerging Topics

• Experiments and Evaluation

Page 4: Emerging topic detection on twitter based on temporal and social terms evaluation

Introduction

• Twitter.com• 75 million users on December 2009.• 6.2 million new accounts/per month (2-3 per second)

• People post tweets for …• Daily chatter • Conversations• Sharing information/URLs• Reporting news

Page 5: Emerging topic detection on twitter based on temporal and social terms evaluation

Introduction (con’t)

• One of the founders of Twitter.com …

• A low level information news flashes portal.

Page 6: Emerging topic detection on twitter based on temporal and social terms evaluation

Introduction (con’t)

• Target : Extract the emerging topics.• Process : • Content Extraction• User Authority• Content Aging Theory• Selection of Emerging Terms• From Emerging Terms to Emerging Topics

Page 7: Emerging topic detection on twitter based on temporal and social terms evaluation

Agenda

• Introduction• The Main Steps • Content Extraction• User Authority• Content Aging Theory• Selection of Emerging Terms• From Emerging Terms to Emerging Topics

• Experiments and Evaluation

Page 8: Emerging topic detection on twitter based on temporal and social terms evaluation

Step 1: Content Extraction

• Target : Tweets => Vector• t-th considered interval :

• Each tweet => tweet vector

Page 9: Emerging topic detection on twitter based on temporal and social terms evaluation

Content Extraction (con’t)

where , = vocabulary size.

where , is the term freq value of the x-th vocab terms in j-th tweet, and returns the highest term freq value of the j-th tweet.

Page 10: Emerging topic detection on twitter based on temporal and social terms evaluation

Step 2: User Authority

• Target : Which User is Important ?

• Define an author-based graph G(U,F) , where U is the set of users and F is the set of directed edges.

follower

Page 11: Emerging topic detection on twitter based on temporal and social terms evaluation

User Authority (con’t)

Page 12: Emerging topic detection on twitter based on temporal and social terms evaluation

User Authority (con’t)

• Compute Authority • => PageRank

Page 13: Emerging topic detection on twitter based on temporal and social terms evaluation

User Authority (con’t)

Page 14: Emerging topic detection on twitter based on temporal and social terms evaluation

Step 3: Content Aging Theory

• Target : Find Emerging Term.

• An Emerging keyword can be viewed as a semantic unit which links to a very recent news event.

• Chien Chin Chen, Yao-Tsung Chen, Yeali S. Sun, Meng Chang Chen: Life Cycle Modeling of News Events Using Aging Theory. ECML 2003

• See each term as a living organism:• With nourishment => life cycle is prolonged. => high energy• Without nourishment => die => low energy

Page 15: Emerging topic detection on twitter based on temporal and social terms evaluation

Content Aging Theory (con’t)

• Term with high energy => important currently• Term with low energy => out of favor

• So, we need to know how to compute Nutrition and Energy.• Content Nutrition• Content Energy

Page 16: Emerging topic detection on twitter based on temporal and social terms evaluation

Content Aging Theory (con’t) – Content Nutrition• Each food brings a different calory contribution depending on

its ingredients.• Different tweets containing the same keyword generate

different amount of nutrition.• Define the amount of nutrition :

Page 17: Emerging topic detection on twitter based on temporal and social terms evaluation

Content Aging Theory (con’t) – Content Energy• Now we obtained the nutrition of a semantic unit => map into

energy => effective contribution (how much it is emergent).

• Hot Terms :

• Emergent Terms :

Page 18: Emerging topic detection on twitter based on temporal and social terms evaluation

Content Aging Theory (con’t) – Content Energy

Page 19: Emerging topic detection on twitter based on temporal and social terms evaluation

Content Aging Theory (con’t) – Content Energy• Define s = number of previous time slots.

Page 20: Emerging topic detection on twitter based on temporal and social terms evaluation

Step 4: Selection of Emerging Terms• Target : How to select emerging keywords.• 1. Supervised

• ( )• 2. Unsupervised• Dynamically sets the critical drop• CoSeNa: a Context-based Search and Navigation System

Page 21: Emerging topic detection on twitter based on temporal and social terms evaluation

Step 5: From Emerging Terms to Emerging Topics

• Target : Find Emerging Topics!

• Define topic as a minimal set of a terms semantically related to an emerging keyword.

• “victory”• Nov 2008 : “elections”, “Obama”, “USA” • Feb 2010 : “football”, “superbowl”, “New Orleans Saints”

• Method : co-occurrences

Page 22: Emerging topic detection on twitter based on temporal and social terms evaluation

From Emerging Terms to Emerging Topics• 1. Generate Correlation Vector

• a. the keyword k as query.• b. the set of tweets containing k as relevance feedback.• c. relying on probabilistic feedback mechanism.

Page 23: Emerging topic detection on twitter based on temporal and social terms evaluation

From Emerging Terms to Emerging Topics• 2. Construct Topic Graph

Keyword-based topic graph :

Thinning.

Page 24: Emerging topic detection on twitter based on temporal and social terms evaluation

From Emerging Terms to Emerging Topics• 3. Topic Detection and Ranking

Page 25: Emerging topic detection on twitter based on temporal and social terms evaluation

From Emerging Terms to Emerging Topics• Find SCC (Strongly Connect Component) :

• Emerging Topic as a subgraph representing a set of keywords semantically related to term z within the time interval.

Use DFS.

Page 26: Emerging topic detection on twitter based on temporal and social terms evaluation

From Emerging Terms to Emerging Topics• Ranking

Page 27: Emerging topic detection on twitter based on temporal and social terms evaluation

From Emerging Terms to Emerging Topics

Page 28: Emerging topic detection on twitter based on temporal and social terms evaluation

Experiments and Evaluation

• Dataset : • 15 days (between 13th and 28th of April 2010)• More than 3 millions of tweets ( 10k/hr )• More then 300k different keywords

Page 29: Emerging topic detection on twitter based on temporal and social terms evaluation

Real Case Study

• Set r = 15 mins , time slot s = 200. (2 solar days)• Result :

Page 30: Emerging topic detection on twitter based on temporal and social terms evaluation

History Worthiness• Analyze two diff number of considered slots, s=100 and s=200.

Page 31: Emerging topic detection on twitter based on temporal and social terms evaluation

History Worthiness (con’t)• “morning” => periodic events

Page 32: Emerging topic detection on twitter based on temporal and social terms evaluation

History Worthiness (con’t)

• Life status of a keyword depends => number of time intervals.• Temporal relevance of the retrieved

topics. (Relevance是跟時間有關 )

Page 33: Emerging topic detection on twitter based on temporal and social terms evaluation

Conclusion

• 1. Formalized the Keyword Life Cycle.• (now => frequently , past => rare)• 2. Study the Social Relationships.• 3. Formalized the Keyword-based Topic

Graph.

Page 35: Emerging topic detection on twitter based on temporal and social terms evaluation