30
Online User Loca.on Inference Exploi’ng Spa’otemporal Correla’ons in Social Streams Yuto Yamaguchi , Toshiyuki Amagasa , Hiroyuki Kitagawa , and Yohei Ikawa † University of Tsukuba ‡ IBM Research Tokyo 14/11/05 CIKM 2014 Yuto Yamaguchi 1

Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Embed Size (px)

Citation preview

Page 1: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Online  User  Loca.on  Inference  Exploi'ng  Spa'otemporal  

Correla'ons  in  Social  Streams

Yuto  Yamaguchi†,  Toshiyuki  Amagasa†,  Hiroyuki  Kitagawa†,  and  Yohei  Ikawa‡  

 

†  University  of  Tsukuba  ‡  IBM  Research  -­‐  Tokyo

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 1

Page 2: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Tweets  that  help  us  

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 2

Shaked  !!!

We  can  infer  your  home  loca'on  immediately

Thunder

Page 3: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Social  and  Loca'on

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 3

•  Lots  of  social  media  users  

•  Frequent  updates  

•  User  home  loca'ons

Page 4: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Loca'on-­‐based  Applica'ons

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 4

•  Event  Detec'on  

•  Loca'on-­‐based  Marke'ng  

•  Epidemics  Analysis  

Page 5: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Lack  of  home  loca'ons

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 5

Most  users  do  not  disclose  their  home  loca'ons  

•  74%  of  TwiXer  users            [Cheng+,  10]  

•  94%  of  Facebooks  users            [Backstrom+,  10]

Page 6: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Our  Objec've

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 6

To  infer  home  loca'ons  of  social  media  users

Page 7: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Focus  &  Contribu'ons

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 7

〜〜〜〜〜〜〜〜

〜〜〜〜〜〜〜〜

Time

〜〜〜〜〜〜〜〜

Our  Focus

Social  contents  are  not  sta-c,  but  like  a  stream

Our  Contribu.ons

1.  Online  &  Incremental  Inference  

2.  Exploi'ng  Spa'otemporal  features

Page 8: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

ONLINE  &  INCREMENTAL  INFERENCE

Contribu'on  1

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 8

Page 9: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Exis'ng  methods:  Batch  inference

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 9

Exis'ng  Methods

Inference  Results Batch  Input

Perform  batch  inference  just  once  acer  “enough  data”  is  stored

è   Can’t  update  the  results  L  è   What  is  “enough”?  L  è   When  will  it  be  enough?  L

Page 10: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Our  method:  Online  &  incremental  inference  method

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 10

Online  &  incremental  method

Inference  Results Social  Stream

Perform  loca'on  inference  every  'me  new  post  arrives

è  Can  keep  the  results  up  to  date  J

Page 11: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

EXPLOITING  SPATIOTEMPORAL  FEATURES

Contribu'on  2

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 11

Page 12: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Local  words

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 12

地震だ!  Steelers!  

Home  loca.on  known

Local  words:  strongly  correlated  to  a  specific  loca.on

Steelers!  

Home  loca.on  unknown

Infer PiXsburgh?

Page 13: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Exis'ng  methods:  Only  sta'c  features

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 13

地震だ!  Thunder!  

Home  loca'on  known

Thunder!  

Home  loca'on  unknown

“Thunderbolt”  is  not  a  local  word  sta'cally

Thunder!  

Thunder!  Thunder!  

Home  loca'on  known

Home  loca'on  known

Home  loca'on  known

è  Can’t  u.lize  this  word  L

Can’t  infer

Page 14: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Our  method:  Spa'otemporal  correla'on

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 14

地震だ!  Thunder!  

Home  loca'on  known

Thunder!  

Home  loca'on  unknown

“Thunderbolt”  can  be  a  local  word  temporally è  Our  method  can  u.lize  this  word  J

In  a  specific  .me  period

Can  infer

Page 15: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

PROPOSED  METHOD OLIM:  Online  Loca'on  Inference  Method

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 15

Page 16: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

The  Algorithm

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 16

1.  divideMap()  2.  calcPopula'onDistribu'on()  3.  for  post  p  from  SocialStream  4.    user  u  <-­‐  getUser(p)  5.    if  u  is  loca'on-­‐known  6.      updateLocalWords(p)  7.    else  8.      updateUserLoca'on(u,p)                          .                        

Preprocessing    

Main

Slide  17

Slide  20

Slide  24

Page 17: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

divideMap

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 17

Each  region  is  treated  as  a  categorical  loca'on

Quadtree  decomposi.on

L = l1, l2,…, lK{ }

Loca.on  inference  is  reduced  to  a  classifica.on  problem

Page 18: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Popula'on  distribu'on

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 18

l1 l2 l3 l4 l5 lK

What  frac.on  of  loca.on-­‐known  users  live  in  each  loca.on    Used  for  local  words  extrac.on

Categorical  distribu'on

Page 19: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

The  Algorithm

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 19

1.  divideMap()  2.  calcPopula'onDistribu'on()  3.  for  post  p  from  SocialStream  4.    user  u  <-­‐  getUser(p)  5.    if  u  is  loca'on-­‐known  6.      updateLocalWords(p)  7.    else  8.      updateUserLoca'on(u,p)                          .                        

Preprocessing    

Main

Slide  17

Slide  20

Slide  24

Page 20: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

updateLocalWords:  Sliding  window  and  word  distribu'on

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 20

Sliding  window  with  length  N  

e.g.)  N  =  5

l1 l2 l3 l4 l5 lK…

Word  distribu'on  

Where  the  word  posted  from?

Page 21: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

updateLocalWords:  Local  Word  Intui'on

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 21

l1 l2 l3 l4 l5 lK…

l1 l2 l3 l4 l5 lK…

Popula'on  distribu'on Word  distribu'on

Word  distribu'on

KL  Divergence  small

Local  word

l1 l2 l3 l4 l5 lK

Detail

Page 22: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

updateLocalWords:  Online  upda'ng

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 22

Detail

Window  length  N  is  fixed

We  can  update  KL  in  O(1)  every  .me  new  post  arrives  J

Page 23: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

The  Algorithm

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 23

1.  divideMap()  2.  calcPopula'onDistribu'on()  3.  for  post  p  from  SocialStream  4.    user  u  <-­‐  getUser(p)  5.    if  u  is  loca'on-­‐known  6.      updateLocalWords(p)  7.    else  8.      updateUserLoca'on(u,p)                          .                        

Preprocessing    

Main

Slide  17

Slide  20

Slide  24

Page 24: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

updateUserLoca'on:  user  distribu'on

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 24

l1 l2 l3 l4 l5 lK

Denotes  how  likely  this  user  lives  in  each  loca'on

u

User  distribu'on  of  u

Page 25: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

updateUserLoca'on:  update

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 25

l1 l2 l3 l4 l5 lK

prior

l1 l2 l3 l4 l5 lK

posterior

update

If  user  u  posts  local  word  w:

l1 l2 l3 l4 l5 lK…

w

Word  distribu'on  of  w

 Dirichlet-­‐Mul.nomial  Compound  for  Bayesian  updates

Detail

Page 26: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

EXPERIMENTS Accuracy  &  Costs

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 26

Page 27: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Data  from  TwiXer

•  Data  size  – 200K  loca'on-­‐known  users  in  Japan  

•  Geocode  loca'on  profiles  into  coordinates  – 200  tweets  for  each  user  (40M  in  total)  – 34M  follow  edges  (for  exis'ng  methods)  

•  90%  for  training;  5%  for  valida'on;  5%  for  test

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 27

Page 28: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Inference  accuracy

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 28

Exis.ng  methods

Page 29: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Cost  per  update

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 29

Be]er

Variants  of  ours

Exis.ng  methods

Feed  40M  tweets  in  the  dataset  chronologically

Page 30: Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

Conclusion

•  Proposed  loca'on  inference  method  – online  &  incremental  inference  

•  Constant  'me  complexity  

– exploi'ng  spa'otemporal  correla'on  •  BeXer  accuracy  

14/11/05 CIKM  2014  -­‐  Yuto  Yamaguchi 30