Upload
yuto-yamaguchi
View
1.136
Download
0
Embed Size (px)
Citation preview
Online User Loca.on Inference Exploi'ng Spa'otemporal
Correla'ons in Social Streams
Yuto Yamaguchi†, Toshiyuki Amagasa†, Hiroyuki Kitagawa†, and Yohei Ikawa‡
† University of Tsukuba ‡ IBM Research -‐ Tokyo
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 1
Tweets that help us
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 2
Shaked !!!
We can infer your home loca'on immediately
Thunder
Social and Loca'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 3
• Lots of social media users
• Frequent updates
• User home loca'ons
Loca'on-‐based Applica'ons
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 4
• Event Detec'on
• Loca'on-‐based Marke'ng
• Epidemics Analysis
Lack of home loca'ons
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 5
Most users do not disclose their home loca'ons
• 74% of TwiXer users [Cheng+, 10]
• 94% of Facebooks users [Backstrom+, 10]
Our Objec've
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 6
To infer home loca'ons of social media users
Focus & Contribu'ons
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 7
〜〜〜〜〜〜〜〜
〜〜〜〜〜〜〜〜
Time
〜〜〜〜〜〜〜〜
Our Focus
Social contents are not sta-c, but like a stream
Our Contribu.ons
1. Online & Incremental Inference
2. Exploi'ng Spa'otemporal features
ONLINE & INCREMENTAL INFERENCE
Contribu'on 1
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 8
Exis'ng methods: Batch inference
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 9
Exis'ng Methods
Inference Results Batch Input
Perform batch inference just once acer “enough data” is stored
è Can’t update the results L è What is “enough”? L è When will it be enough? L
Our method: Online & incremental inference method
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 10
Online & incremental method
Inference Results Social Stream
Perform loca'on inference every 'me new post arrives
è Can keep the results up to date J
EXPLOITING SPATIOTEMPORAL FEATURES
Contribu'on 2
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 11
Local words
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 12
地震だ! Steelers!
Home loca.on known
Local words: strongly correlated to a specific loca.on
Steelers!
Home loca.on unknown
Infer PiXsburgh?
Exis'ng methods: Only sta'c features
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 13
地震だ! Thunder!
Home loca'on known
Thunder!
Home loca'on unknown
“Thunderbolt” is not a local word sta'cally
Thunder!
Thunder! Thunder!
Home loca'on known
Home loca'on known
Home loca'on known
è Can’t u.lize this word L
Can’t infer
Our method: Spa'otemporal correla'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 14
地震だ! Thunder!
Home loca'on known
Thunder!
Home loca'on unknown
“Thunderbolt” can be a local word temporally è Our method can u.lize this word J
In a specific .me period
Can infer
PROPOSED METHOD OLIM: Online Loca'on Inference Method
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 15
The Algorithm
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 16
1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-‐ getUser(p) 5. if u is loca'on-‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) .
Preprocessing
Main
Slide 17
Slide 20
Slide 24
divideMap
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 17
Each region is treated as a categorical loca'on
Quadtree decomposi.on
L = l1, l2,…, lK{ }
Loca.on inference is reduced to a classifica.on problem
Popula'on distribu'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 18
…
l1 l2 l3 l4 l5 lK
What frac.on of loca.on-‐known users live in each loca.on Used for local words extrac.on
Categorical distribu'on
The Algorithm
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 19
1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-‐ getUser(p) 5. if u is loca'on-‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) .
Preprocessing
Main
Slide 17
Slide 20
Slide 24
updateLocalWords: Sliding window and word distribu'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 20
Sliding window with length N
e.g.) N = 5
…
l1 l2 l3 l4 l5 lK…
Word distribu'on
Where the word posted from?
updateLocalWords: Local Word Intui'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 21
…
…
l1 l2 l3 l4 l5 lK…
…
l1 l2 l3 l4 l5 lK…
Popula'on distribu'on Word distribu'on
Word distribu'on
KL Divergence small
Local word
…
l1 l2 l3 l4 l5 lK
Detail
updateLocalWords: Online upda'ng
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 22
Detail
Window length N is fixed
We can update KL in O(1) every .me new post arrives J
The Algorithm
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 23
1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-‐ getUser(p) 5. if u is loca'on-‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) .
Preprocessing
Main
Slide 17
Slide 20
Slide 24
updateUserLoca'on: user distribu'on
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 24
…
l1 l2 l3 l4 l5 lK
Denotes how likely this user lives in each loca'on
u
User distribu'on of u
updateUserLoca'on: update
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 25
…
l1 l2 l3 l4 l5 lK
prior
…
l1 l2 l3 l4 l5 lK
posterior
update
If user u posts local word w:
…
l1 l2 l3 l4 l5 lK…
w
Word distribu'on of w
Dirichlet-‐Mul.nomial Compound for Bayesian updates
Detail
EXPERIMENTS Accuracy & Costs
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 26
Data from TwiXer
• Data size – 200K loca'on-‐known users in Japan
• Geocode loca'on profiles into coordinates – 200 tweets for each user (40M in total) – 34M follow edges (for exis'ng methods)
• 90% for training; 5% for valida'on; 5% for test
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 27
Inference accuracy
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 28
Exis.ng methods
Cost per update
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 29
Be]er
Variants of ours
Exis.ng methods
Feed 40M tweets in the dataset chronologically
Conclusion
• Proposed loca'on inference method – online & incremental inference
• Constant 'me complexity
– exploi'ng spa'otemporal correla'on • BeXer accuracy
14/11/05 CIKM 2014 -‐ Yuto Yamaguchi 30