26
Latent Subtopics in Yelp Restaurant Reviews Stephanie Rogers, James Huang, Eunkwang Joo

LatentSubtopics&in&Yelp&& Restaurant)Reviews)bid.berkeley.edu/cs294-1-spring13/images/1/1f/Presentation_Latent.pdfAmerican&Cuisine& &Service& Healthy&& &American&Cuisine&2& ... B’

  • Upload
    vudang

  • View
    220

  • Download
    6

Embed Size (px)

Citation preview

Latent  Subtopics  in  Yelp    Restaurant  Reviews  

Stephanie  Rogers,  James  Huang,  Eunkwang  Joo  

Mo2va2on  

“An   extra   half-­‐star   ra2ng   causes   restaurants   to  sell   out   19   percentage   points   more   frequently  (increase  from  30%  to  49%)”  

Ques2on  

•  How  can  a  restaurant  point  out  demand  of  customers  from  a  large  amount  of  reviews?  

•  What  latent  topics  exist  in  Yelp  Restaurant  Reviews?    

•  .…  and  can  these  provide  any  meaningful  insights  to  these  restaurants?  

Dataset  

•  Yelp  Dataset  Challenge  Ø     Business,  Review,  User,  Checkin  data  Ø     JSON  Objects:  

Dataset  

•  5,000  Restaurants  •  Over  158,000  Restaurant  Reviews    

•  Why?  Ø     TRENDS!!!  Ø     Figure  out  what  customers  care  about  Ø     Tell  restaurants  what  they  can  do  beber  at  

Methodology  

•  Latent  Dirichlet  Alloca2on  (LDA)  

Ø     Each  document  is  a  bag  of  words  Ø     A  document  covers  several  topics  Ø     A  topic  is  responsible  for  each  word  

Latent  Topics  

Latent  Topics  1 star

“ Bummer, we were psyched to have a new burger place. Don't bother- we waited an hour and a half and found out that our waiter "never turned in our order”- Uh, what? We won't be back. The patio is too small and the staff is incompetent. No go!”  

American  Cuisine    Service  

 Healthy  

 American  Cuisine  2  

 Lunch  

 Decor  

 Loca2on  

 

36.87%  American    

13.49%  Service    

5.27%  Healthy  

 

4.89%  American  

 

4.63%  Lunch  

 

4.64%  Decor  

 

4.64%  Loca2on  

 

LDA  -­‐  Expecta2on  Maximiza2on  p(Documents | topics, topic_distribution_for_doc) repeat until convergence:

initialize topics randomly for every document:

repeat until convergence:

update topic_distribution_for_doc calculate topics on topic_distribution_for_doc

Online  LDA  p(Documents | topics, topic_distribution_for_doc) repeat until convergence:

initialize topics randomly for a batch of documents:

repeat until convergence:

update topic_distribution_for_doc B’ = calculate topics on topic_distribution_for_doc

topics = (1-x)topics + x*B’

Tools  

•  Gensim  Python  Library  Ø     Topic  Modeling  Tool  on  Documents    

•  PyGal  Ø     Data  Visualiza2on  Tool  

Results  

Breakdown  of  Hidden  Topics  Over  All  Reviews  

29.95%    of  latent  topics  (8.8%  of  all  reviews)  

21.04%    13.09%  

10.76%    

9.42%    

Results  

•  Predict  stars  per  hidden  topic  discovered  Ø     Overall:  4  Ø     Service:  4.5  Ø     Healthiness:  3  

•  Proof  of  Concept  •  Temporal  Insights  

Hidden  Topic  Stars  Joe’s  Farm  Grill  

Thai  Rama  

S2ngray  Sushi  

Rollerz  

Joe’s  Farm  Grill  

Unhealthy?  “The  side  of  veggie  fries  was  literally  3  pounds  of  fried   veggies,   full   of   cholesterol,   and   way   too  much  for  any  human  to  consume.”    3.0  

Hidden  Topic  Stars  Joe’s  Farm  Grill  

Thai  Rama  

S2ngray  Sushi  

Rollerz  

Results  

•  Predict  stars  per  hidden  topic  discovered  Ø     Overall:  4  Ø     Service:  4.5  Ø     Healthiness:  3  

•  Proof  of  Concept  •  Temporal  Insights  

Proof  of  Concept  

•  Service!    median  ra2ng:  3.0    median  weight:  0.05228    mean  ra2ng:  2.4067    mean  weight:  0.3899  

 •  There  are  a  lot  of  food  places  where  service  is  not  reviewed  at  all:  Ø     Sandwich,  Bagel,  Pizza,  Cafes,  Fast  Food,  Bars,  Smoothies    

 etc  –  brings  high  disparity    

Reviews  in  Service  

     Top  25  Good  Reviews                      Top  25  Bad  Reviews  

•  NOTE:  45%  of  worst/best  top  25  service  is  Thai      

 

Reviews  in  Service  Insights  

•  Western  cuisine  cares  enough  to  stay  off  worst  service  list  

•  10  2mes  more  men2ons  of  Groupon  in  BAD  service  reviews  than  GOOD  

 

Worst  Service  Providers  Explained  

•  Quality  of  food  is  highly  correlated  with  service:  Ø     Fake,  badly  imitated  Asian  foods  

•  Bigram  LDA  Ø     Great  Food,  Great  Service  Ø     Bad  Food,  Service  Bad    Ø     Halo  Effect,  Cogni2ve  Bias  

•  Average  ra2ngs  of  restaurants  lower  for  bad  service  reviews  Ø     Good:  4.12,  Bad:  3.46  

Results  

•  Predict  stars  per  hidden  topic  discovered  Ø     Overall:  4  Ø     Service:  4.5  Ø     Healthiness:  3  

•  Proof  of  Concept  •  Temporal  Insights  

Temporal  Data  

•  Breakfast,  Lunch,  Dinner  Scores    Ø     Average  across  all  reviews  with  these  hidden    

 subtopics    

•  Checkin  Data  Ø     Determine  2mes  when  restaurant  is  most  busy  

Temporal  Insights  

•  Breakfast,  Lunch,  Dinner  Scores  vs.  Checkin  Data  

Ø     Only  23%  of  restaurants  are  rated  the  highest      during  peak  busy  hours…  (aka  when  they  are  more  popular)  

Ø     On  average,  restaurants  are  rated  0.4  lower  when        they  are  busier  

Future  Work  

•  Apply  Bigram  LDA  Topics  in  similar  ways  

•  If  a  topic  is  more  than  50%  of  a  review,  it  doesn’t  make  sense…  filter  these  out  

•  Calculate  stars  of  hidden  topics  with  weights  of  neighbor  words  

•  Supervised  learning?