Upload
stanislav-nikolov
View
34.388
Download
1
Embed Size (px)
DESCRIPTION
Stanislav Nikolov (MIT, Twitter) Devavrat Shah (MIT) Interdisciplinary Workshop on Information and Decision in Social Networks 2012
Citation preview
Detecting Trends!Stanislav Nikolov §,† Devavrat Shah §
§ †
Source: http://twoinformcanada.ca/wp-content/uploads/2012/07/barclays.jpg
Source: http://twoinformcanada.ca/wp-content/uploads/2012/07/barclays.jpg
The Barclays Libor scandal #
12:49: “#Barclays” is listed as a trending topic on Twitter
• Is there enough information before the “jump”?
• Is there enough information before the “jump”?
• Can we predict which topics will trend in advance?
Yes.
(best parameter setting)
• 79% early detection • 1.43 hours mean early detection • 95% TPR, 4% FPR.
What are Trending Topics? • Twitter: a global communication network.
What are Trending Topics? • Twitter: a global communication network. • Tweet: a short, public message.
What are Trending Topics? • Twitter: a global communication network. • Tweet: a short, public message.
• Topic: a phrase in a tweet.
What are Trending Topics? • Twitter: a global communication network. • Tweet: a short, public message.
• Topic: a phrase in a tweet. • Trending topic (a “trend”): a topic that
becomes popular.
A Parametric Model • Expect certain type of pattern (e.g.
constant + jumps).
time
activ
ity
A Parametric Model • Expect certain type of pattern (e.g.
constant + jumps). • Fit parameters to data (e.g. how much of
a jump).
time
activ
ity
A Parametric Model • Expect certain type of pattern (e.g.
constant + jumps). • Fit parameters to data (e.g. how much of
a jump).
time
activ
ity
p = 0.1
A Parametric Model!• Expect certain type of pattern (e.g.
constant + jumps). • Fit parameters to data (e.g. how much of
a jump).
time
activ
ity
p = 0.6
A Parametric Model!• Expect certain type of pattern (e.g.
constant + jumps). • Fit parameters to data (e.g. how much of
a jump).
time
activ
ity
p = 4.1
A Parametric Model!• Expect certain type of pattern (e.g.
constant + jumps). • Fit parameters to data (e.g. how much of
a jump). • Decide if jump is big enough.
trend detected!
time
activ
ity
p = 4.1
Parametric Models are Inadequate!
trend detected!
time
activ
ity
Parametric Models are Inadequate!
trend detected!
time
activ
ity
Parametric Models are Inadequate!
trend detected!
time
activ
ity
Parametric Models are Inadequate!
trend detected!
time
activ
ity
A Data-Driven Approach • All of the information is in the data.
A Data-Driven Approach • All of the information is in the data. • Hypothesis
A Data-Driven Approach!• All of the information is in the data. • Hypothesis – Tweets are written by people.
A Data-Driven Approach • All of the information is in the data. • Hypothesis – Tweets are written by people. – People are simple.
A Data-Driven Approach!• All of the information is in the data. • Hypothesis – Tweets are written by people. – People are simple.
• In how they spread information.
A Data-Driven Approach!• All of the information is in the data. • Hypothesis – Tweets are written by people. – People are simple.
• In how they spread information. • In how they connect to one another.
A Data-Driven Approach!• All of the information is in the data. • Hypothesis – Tweets are written by people. – People are simple.
• In how they spread information. • In how they connect to one another.
– Small number of distinct “ways” in which a topic can become trending.
Classification by Experts
Classification by Experts!
s
observation
Classification by Experts!
s r
observation
Classification by Experts!
s r
vote
observation
Classification by Experts!
s r
vote
observation
Classification by Experts!
s r
vote
observation
Classification by Experts!
s r
vote
observation
Classification by Experts!
s r
vote
observation
Classification by Experts!
s r
vote
observation
Classification by Experts!
s r
observation
Properties • Simple (just compute distances) • Scalable (can compute distances in
parallel) • Non-parametric – model “parameters”
scale with the data
Experimental Results
Experiment • 500 trends. • 500 non-trends. • Do trend detection on a 50% hold out set. • Online signal classification.
Results – Early Detection
(best parameter setting)
Results – FPR / TPR Tradeoff
Results – Early / Late Tradeoff
Concluding Remarks • Algorithm to detect trends early • Scalable nonparametric time series
analysis
Concluding Remarks • Algorithm to detect trends early • Scalable nonparametric time series
analysis
classification
Concluding Remarks • Algorithm to detect trends early • Scalable nonparametric time series
analysis
classification anomaly detection
Concluding Remarks • Algorithm to detect trends early • Scalable nonparametric time series
analysis
prediction classification anomaly detection
Concluding Remarks • Algorithm to detect trends early • Scalable nonparametric time series
analysis
prediction classification anomaly detection