Exploring Review Content for Recommendation via Latent Factor Model

Xiaoyu Chen

Joint Work with Yuan Yao, Feng Xu and Jian Lu

Exploring Review Content for Recommendation

via Latent Factor Model

Recommender System

2014/12/13

RecSys

2

Most recommendation use …

• memory-based

• model-based

2014/12/13 3

d1 d2 d3 d4 d5 …

u1 3 4

u2 1

u3 4

u4 5

u5 2 4 3

u6 4

…

？

？

？

Challenge

2014/12/13 4

d1 d2 d3 d4 d5 …

u13 4

u21

u34

u45

u52 4 3

u64

…

？

？

？review1,1

review3,3

review5,2

review6,3

review5,4

review2,5

• Review content is unavailable for the user-item to be predicted

• It cannot be directly used as features under the supervised machine learning framework

• How to incorporate the review content into model

• Review content is usually noisy

Problem: How to improve recommendation accuracy with review content attached?

Basic Idea

• Process review content• Aggregate the review content along users and items to get

documents as profile• Apply Latent Dirichlet Allocation on documents

• Incorporate the review content into the LFM• Latent topic distributions

• guidance termuse document topics to guide the latent factor learning

• regularization termconstrain the preference differences between similar users

2014/12/13 5

Preliminaries

• Basic latent factor model

• Minimize object function

• indicates user’s preference over aspects

indicates item’s corresponding performanceover these aspects

2014/12/13 6

ˆ TR QP

P

Q

Process Review Content

• Example1

2014/12/13 7

Pros-

1. Big beautiful and responsive display.

2. Very fast and fluid.

3. Very thin and light design.

4. Much improved keyboard.

Cons-

1. Only a single USB port.

2. Can get pretty hot with intense use.

3. Price.

4. The app store is still very limited.

5. Battery life is still not optimal for tablet use.

…because I liked the portability and flexibility that it provided…It is an

overall better experience…

1. http://www.amazon.com/review/RS91T27HLH8D6/

Topics

ˆ TR QP

Noise in Review Content

• Not all words is meaningful and useful• We believe, in aspect identification, most adj/adv/verb

are noise, except for noun.• Regardless of the product is cheap or costly, the aspect that

the user cares about is price

• We need the latent aspect preference, not performance

2014/12/13 8

Nouns are more informative

• Example

2014/12/13 9

Pros-

1. Big beautiful and responsive display.

2. Very fast and fluid.

3. Very thin and light design.

4. Much improved keyboard.

Cons-

1. Only a single USB port.

2. Can get pretty hot with intense use.

3. Price.

4. The app store is still very limited.

5. Battery life is still not optimal for tablet use.

…because I liked the portability and flexibility that it provided…It is an

overall better experience…

Only nouns are reserved!

Process Review Content

• Review aggregation• For user , we group all his previous reviews, treat them

as a document for the user,• Similar to item,

• Topic model• Latent Dirichlet Allocation, a generative probabilistic

model for collections of discrete data• Applying LDA on documents

• user topic distribution • item topic distribution

2014/12/13 10

u

Guidance Term

•

• The favor of user u on item i over f aspects is composed of two parts

• user’s own preference • community’s overall assessment of this item

2014/12/13 11

ˆ TR QP

Example

• control the importance of two parts on this factor model

Guidance Term

2014/12/13 12

user preference

community assessment

',

C overall ratio of the guidance term weight between user’s preference and community’s assessment

deserves more weight if this user comments a lot

user document

item document

Regularization Term

• If user and have similar topic distributions, their preferences ( and ) should also be similar to each other

• regularization term

2014/12/13 13

u l

uPlP

Regularization Term

• Similarity between documents• Words-with-frequency similarity is not adequate to this

problem• When describing the experience people may use different

terms to refer to the same topic (e.g., value and money for price)

• Use distance of the preference vectors(topic distribution) instead

2014/12/13 14

Putting Everything Together…

• The GTRT Model

• Gradient descent method• partial derivatives

2014/12/13 15

#(user’s neighbors) differs a lot

Data Set

• Amazon• Books, Music, DVD/VHS and mProducts• mProducts only

• reviews in the other three categories are more like descriptions instead of opinions

• remove inactive users with < 3 reviews• 55,086 reviews and ratings from 11,011 users to 36,222

items

• Yelp• Phoenix, AZ metropolitan area• 173,586 reviews and ratings from 23,890 users to 6,265

items

2014/12/13 16http://www.cs.uic.edu/~liub/

Experiments

• R lang and 8G memory

• Evaluation Metrics• RMSE

• Train / Test• 90% / 10%• 80% / 20%

• Compared Methods• MEAN: taking average rating as predictions• LFM: standard matrix factorization

• GTM: with guidance term only• RTM: with regularization term only• GTRT: the proposed method combining GTM and RTM

2014/12/13 17

baseline

Experiments

• Review Aspect Identification

2014/12/13 18

Words Topic Top Score Terms

Noun

#5symbol, teacher, ad, home-study, pollard, singer, card, hardness, development, rudder, notes, technician

#12amazon, system, presentation, bluetooth, piano, update, brand, worktext, head-band, theme, challenge

#45 mpvideo, verde, telemarket, tunes, juzz, surprise, year, picture, player, album

Noun,Adj,Adv,Verb

#11 winter, discuss, report, return, delay, sleep, run, life, painstaking, win

#18disagree, beginner, outstanding, fadeup, soon, highlight, strong, menu, trigger, rudder suitable

#27sound, refer, performance, clock, purchase, divide, control, fadein, laugh, speak,internet

Experiments

• Review Aspect Identification

2014/12/13 19

Words Topic Top Score Terms

Noun

#5symbol, teacher, ad, home-study, pollard, singer, card, hardness, development, rudder, notes, technician

#12amazon, system, presentation, bluetooth, piano, update,brand, worktext, head-band, theme, challenge

#45 mpvideo, verde, telemarket, tunes, juzz, surprise, year, picture, player, album

Noun,Adj,Adv,Verb

#11 winter, discuss, report, return, delay, sleep, run, life, painstaking, win

#18disagree, beginner, outstanding, fadeup, soon, highlight, strong, menu, trigger, rudder suitable

#27sound, refer, performance, clock, purchase, divide, control, fadein, laugh, speak,internet

Topic words from nouns only are more informative for identifying latent aspects!

Experiments

• Rating Prediction(RMSE)

2014/12/13 20

Dataset Test MEAN LFM GTM RTM GTRT

Amazon10% 1.3423 1.3351 1.3006 1.3218 1.2926

20% 1.3603 1.3460 1.3111 1.3285 1.3026

Yelp10% 1.1176 1.1055 0.9787 1.0663 0.9695

20% 1.1566 1.1489 1.0875 1.1269 1.0762

12.3% improvement!

Review content can help to improve rating prediction, and our method GTRT effectively leverages the review content!

Experiments

• Impact of parameters and

2014/12/13 21

C

C control the weight of review content, ratio of the guidance term adjust the weight between user’s preference and community’s assessment

C

1.0C 1.6C

7

Experiment

• Prediction for cold start users• review content provide additional information• < 5 ratings• more than 50%

2014/12/13 22

Datasetf = 20 f = 50

LFM GTRT LFM GTRT

Amazon 1.4302 1.3669 1.4199 1.3529

Yelp 1.0676 1.0158 1.1698 0.9948

Improvement is greater than the average improvement over all users!

4.7% improvement

14.9% improvement

Final Conclusion

• We study the problem that why review content is valuable for recommendation

• We process review content for latent factor model

• We employ two strategies to leverage review: guidance term and regularization term

2014/12/13 23

Thank you!

2014/12/13 24

Exploring Review Content for Recommendation via Latent Factor Model

Internet

Exploring Review Content for Recommendation via Latent Factor Model