How to build recommender system

  • View
    93

  • Download
    3

  • Category

    Science

Preview:

Citation preview

ФАКУЛТЕТ ЗА ИНФОРМАТИЧКИ НАУКИ И КОМПЈУТЕРСКО ИНЖЕНЕРСТВО

СИСТЕМИ ЗА ПРЕПОРАКИ.

Системи на знаење

СИСТЕМИ НА ЗНАЕЊЕ

Препораки

База

Пребарување Препораки

Производи, веб страни, блогови, новинарски текстови, …

Items

Search Recommendations

Products, web sites, blogs,  news  items,  …

1/29/2013 4 Jure Leskovec, Stanford C246: Mining Massive Datasets

Examples:

СИСТЕМИ НА ЗНАЕЊЕ

Пример на систем за препораки

Customer X Buys Metallica CD Buys Megadeth CD

Customer Y Does search on Metallica Recommender system

suggests Megadeth from data collected about customer X

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 3 3

Customer X Buys Metallica CD Buys Megadeth CD

Customer Y Does search on Metallica Recommender system

suggests Megadeth from data collected about customer X

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 3

СИСТЕМИ НА ЗНАЕЊЕ

Од оскудност до изобилство (From scarcity to abundance)

n Shelf space is a scarce commodity for traditional retailers ¨ Also: TV networks, movie theaters,…

n The web enables near-zero-cost dissemination of information about products ¨ From scarcity to abundance

n More choice necessitates better filters ¨ Recommendation engines ¨ How Into Thin Air made Touching the Void a

bestseller: http://www.wired.com/wired/archive/12.10/tail.html

СИСТЕМИ НА ЗНАЕЊЕ

Долгата опашка (The Long Tail)

Source: Chris Anderson (2004) Source: Chris Anderson (2004)

1/29/2013 6 Jure Leskovec, Stanford C246: Mining Massive Datasets

СИСТЕМИ НА ЗНАЕЊЕ

Физички наспроти Интернет продавници

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 7

Read http://www.wired.com/wired/archive/12.10/tail.html to learn more! 6

СИСТЕМИ НА ЗНАЕЊЕ

Recommendation Types

n Editorial and hand curated ¨ List of favorites

¨ Lists of “essential” items

n Simple aggregates ¨ Top 10, Most Popular, Recent Uploads

n Tailored to individual users ¨ Amazon, Netflix, …

СИСТЕМИ НА ЗНАЕЊЕ

Formal Model

n X = set of Customers

n S = set of Items n Utility function u: X x S àR

¨ R = set of ratings

¨ R is a totally ordered set

¨ e.g., 0-5 stars, real number in [0,1]

СИСТЕМИ НА ЗНАЕЊЕ

Utility Matrix

0.410.2

0.30.50.21

King Kong LOTR Matrix Nacho Libre

Alice

Bob

Carol

David

LOTR = Lord of the Rings

СИСТЕМИ НА ЗНАЕЊЕ

Key Problems (1) Gathering  “known”  ratings  for  matrix

How to collect the data in the utility matrix

(2) Extrapolate unknown ratings from the known ones Mainly interested in high unknown ratings

We  are  not  interested  in  knowing  what  you  don’t  like   but what you like

(3) Evaluating extrapolation methods How to measure success/performance of

recommendation methods

1/29/2013 11 Jure Leskovec, Stanford C246: Mining Massive Datasets

СИСТЕМИ НА ЗНАЕЊЕ

Gathering Ratings

n Explicit ¨ Ask people to rate items ¨ Doesn’t work well in practice – people can’t

be bothered n  Implicit

¨ Learn ratings from user actions ¨ e.g., purchase implies high rating ¨ What about low ratings?

СИСТЕМИ НА ЗНАЕЊЕ

Extrapolating Utilities n Key problem: matrix U is sparse

¨ most people have not rated most items ¨ Cold start:

n  New items have no ratings n  New users have no history

n Three approaches to recommender systems ¨ 1) Content-based ¨ 2) Collaborative ¨ 3) Hybrid (Latent factor based)

СИСТЕМИ НА ЗНАЕЊЕ

Content-based Recommender Systems 13

СИСТЕМИ НА ЗНАЕЊЕ

Content-based recommendations

n Main idea: recommend items to customer x similar to previous items rated highly by x

Examples: n Movie recommendations

¨ recommend movies with same actor(s), director, genre, …

n Websites, blogs, news ¨ recommend other sites with “similar” content

СИСТЕМИ НА ЗНАЕЊЕ

Plan of action

likes

Item profiles

Red Circles

Triangles User profile

match

recommend build

СИСТЕМИ НА ЗНАЕЊЕ

Item Profiles n For each item, create an item profile

n Profile is a set (vector) of features ¨ movies: author, title, actor, director,…

¨ text: set of “important” words in document

n How to pick important features (words)? ¨ Usual heuristic from text mining is TF-IDF

(Term Frequency * Inverse Doc Frequency) n  Term ... Feature

n  Document ... Item

СИСТЕМИ НА ЗНАЕЊЕ

TF-IDF fij = frequency of term (feature) i in doc (item) j

ni = number of docs that mention term i N = total number of docs TF-IDF score: wij = TFij × IDFi

Doc profile = set of words with highest TF-IDF scores, together with their scores

1/29/2013 18 Jure Leskovec, Stanford C246: Mining Massive Datasets

Note: we normalize TF to  discount  for  “longer”   documents

СИСТЕМИ НА ЗНАЕЊЕ

User profiles and prediction

User profile possibilities: Weighted average of rated item profiles Variation: weight by difference from average

rating for item …

Prediction heuristic: Given user profile x and item profile i, estimate 𝑢(𝒙, 𝒊)  =  cos  (𝒙, 𝒊)  =   𝒙·𝒊

| 𝒙 |⋅| 𝒊 |

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 19

СИСТЕМИ НА ЗНАЕЊЕ

Model-based approaches

n For each user, learn a classifier that classifies items into rating classes ¨ liked by user and not liked by user ¨ e.g., Bayesian, regression, SVM

n Apply classifier to each item to find recommendation candidates

n Problem: scalability ¨ Won’t investigate further in this class

СИСТЕМИ НА ЗНАЕЊЕ

Advantages of Content-based approach (Pros)

20

+: No need for data on other users No cold-start or sparsity problems

+: Able to recommend to users with unique tastes

+: Able to recommend new & unpopular items No first-rater problem

+: Able to provide explanations Can provide explanations of recommended items by

listing content-features that caused an item to be recommended

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 20

СИСТЕМИ НА ЗНАЕЊЕ

Limitations of content-based approach (Cons)

–: Finding the appropriate features is hard E.g., images, movies, music

–: Overspecialization Never  recommends  items  outside  user’s  

content profile People might have multiple interests Unable to exploit quality judgments of other users

–: Recommendations for new users How to build a user profile?

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 21

СИСТЕМИ НА ЗНАЕЊЕ

Collaborative Filtering

22

СИСТЕМИ НА ЗНАЕЊЕ

Collaborative Filtering Consider user x

Find set N of other users whose ratings are  “similar”  to   x’s  ratings

Estimate x’s  ratings   based on ratings of users in N

1/29/2013 23 Jure Leskovec, Stanford C246: Mining Massive Datasets

x

N

СИСТЕМИ НА ЗНАЕЊЕ

Similar users Let rx be the vector of user x’s  ratings Jaccard similarity measure

Problem: Ignores the value of the rating Cosine similarity measure

sim(x, y) = cos(rx, ry) = ⋅

|| ||⋅|| ||

Problem: Treats  missing  ratings  as  “negative” Pearson correlation coefficient

Sxy = items rated by both users x and y

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 24

rx = [*, _, _, *, ***] ry = [*, _, **, **, _]

rx, ry as sets: rx = {1, 4, 5} ry = {1, 3, 4}

rx, ry as points: rx = {1, 0, 0, 1, 3} ry = {1, 0, 2, 2, 0}

rx, ry …  avg. rating of x, y

𝒔𝒊𝒎 𝒙, 𝒚 =∑ 𝒓𝒙𝒔 − 𝒓𝒙 𝒓𝒚𝒔 − 𝒓𝒚𝒔∈𝑺𝒙𝒚

∑ 𝒓𝒙𝒔 − 𝒓𝒙 𝟐𝒔∈𝑺𝒙𝒚 ∑ 𝒓𝒚𝒔 − 𝒓𝒚

𝟐𝒔∈𝑺𝒙𝒚

Let rx be the vector of user x’s  ratings Jaccard similarity measure

Problem: Ignores the value of the rating Cosine similarity measure

sim(x, y) = cos(rx, ry) = ⋅

|| ||⋅|| ||

Problem: Treats  missing  ratings  as  “negative” Pearson correlation coefficient

Sxy = items rated by both users x and y

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 24

rx = [*, _, _, *, ***] ry = [*, _, **, **, _]

rx, ry as sets: rx = {1, 4, 5} ry = {1, 3, 4}

rx, ry as points: rx = {1, 0, 0, 1, 3} ry = {1, 0, 2, 2, 0}

rx, ry …  avg. rating of x, y

𝒔𝒊𝒎 𝒙, 𝒚 =∑ 𝒓𝒙𝒔 − 𝒓𝒙 𝒓𝒚𝒔 − 𝒓𝒚𝒔∈𝑺𝒙𝒚

∑ 𝒓𝒙𝒔 − 𝒓𝒙 𝟐𝒔∈𝑺𝒙𝒚 ∑ 𝒓𝒚𝒔 − 𝒓𝒚

𝟐𝒔∈𝑺𝒙𝒚

СИСТЕМИ НА ЗНАЕЊЕ

Rating predictions

25

Let rx be the vector of user x’s  ratings Let N be the set of k users most similar to x

who have rated item i Prediction for item s of user x:

𝑟 =  ∑ 𝑟∈

𝑟 =∑ ⋅∈∑ ∈

Other options? Many  other  tricks  possible…

1/29/2013 26 Jure Leskovec, Stanford C246: Mining Massive Datasets

Shorthand: 𝒔𝒙𝒚 = 𝒔𝒊𝒎 𝒙, 𝒚

- тежнее да ги усредни рејтинзите

СИСТЕМИ НА ЗНАЕЊЕ

Complexity

n Expensive step is finding k most similar customers ¨ O(|U|)

n Too expensive to do at runtime ¨ Need to pre-compute

n Naïve precomputation takes time O(N|U|) ¨ Simple trick gives some speedup

n Can use clustering, partitioning as alternatives, but quality degrades

СИСТЕМИ НА ЗНАЕЊЕ

Item-Item Collaborative Filtering So far: User-user collaborative filtering Another view: Item-item

For item i, find other similar items Estimate rating for item i based

on ratings for similar items Can use same similarity metrics and

prediction functions as in user-user model

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 27

);(

);(

xiNj ij

xiNj xjijxi s

rsr

sij… similarity of items i and j rxj…rating of user u on item j N(i;x)… set items rated by x similar to i

СИСТЕМИ НА ЗНАЕЊЕ 28

12 11 10 9 8 7 6 5 4 3 2 1

4 5 5 3 1 1

3 1 2 4 4 5 2

5 3 4 3 2 1 4 2 3

2 4 5 4 2 4

5 2 2 4 3 4 5

4 2 3 3 1 6

users m

ovie

s

- unknown rating - rating between 1 to 5

1/29/2013 28 Jure Leskovec, Stanford C246: Mining Massive Datasets

k=2

СИСТЕМИ НА ЗНАЕЊЕ 29

12 11 10 9 8 7 6 5 4 3 2 1

4 5 5  ? 3 1 1

3 1 2 4 4 5 2

5 3 4 3 2 1 4 2 3

2 4 5 4 2 4

5 2 2 4 3 4 5

4 2 3 3 1 6

users

- estimate rating of movie 1 by user 5

1/29/2013 29 Jure Leskovec, Stanford C246: Mining Massive Datasets

mov

ies

СИСТЕМИ НА ЗНАЕЊЕ 30

12 11 10 9 8 7 6 5 4 3 2 1

4 5 5  ? 3 1 1

3 1 2 4 4 5 2

5 3 4 3 2 1 4 2 3

2 4 5 4 2 4

5 2 2 4 3 4 5

4 2 3 3 1 6

users

Neighbor selection: Identify movies similar to movie 1, rated by user 5

1/29/2013 30 Jure Leskovec, Stanford C246: Mining Massive Datasets

mov

ies

1.00

-0.18

0.41

-0.10

-0.31

0.59

sim(1,m)

Here we use Pearson correlation as similarity: 1) Subtract mean rating mi from each movie i m1 = (1+3+5+5+4)/5 = 3.6 row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0] 2) Compute cosine similarities between rows

СИСТЕМИ НА ЗНАЕЊЕ 31

12 11 10 9 8 7 6 5 4 3 2 1

4 5 5  ? 3 1 1

3 1 2 4 4 5 2

5 3 4 3 2 1 4 2 3

2 4 5 4 2 4

5 2 2 4 3 4 5

4 2 3 3 1 6

users

Compute similarity weights: s13=0.41, s16=0.59

1/29/2013 31 Jure Leskovec, Stanford C246: Mining Massive Datasets

mov

ies

1.00

-0.18

0.41

-0.10

-0.31

0.59

sim(1,m)

Се земаат предвид само двата најслични корисници

СИСТЕМИ НА ЗНАЕЊЕ 32

12 11 10 9 8 7 6 5 4 3 2 1

4 5 5 2.6 3 1 1

3 1 2 4 4 5 2

5 3 4 3 2 1 4 2 3

2 4 5 4 2 4

5 2 2 4 3 4 5

4 2 3 3 1 6

users

Predict by taking weighted average:

r15 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6 1/29/2013 32 Jure Leskovec, Stanford C246: Mining Massive Datasets

mov

ies

𝒓𝒊𝒙 =∑ 𝒔𝒊𝒋 ⋅ 𝒓𝒋𝒙𝒋∈𝑵(𝒊;𝒙)

∑𝒔𝒊𝒋

СИСТЕМИ НА ЗНАЕЊЕ

Item-Item vs User-User

33

0.418.010.9

0.30.50.81

Avatar LOTR Matrix Pirates

Alice

Bob

Carol

David

1/29/2013 34 Jure Leskovec, Stanford C246: Mining Massive Datasets

In practice, it has been observed that item-item often works better than user-user

Why? Items are simpler, users have multiple tastes

СИСТЕМИ НА ЗНАЕЊЕ

Pros and cons of collaborative filtering + Works for any kind of item

No feature selection needed - Cold Start:

Need enough users in the system to find a match - Sparsity:

The user/ratings matrix is sparse Hard to find users that have rated the same items

- First rater: Cannot recommend an item that has not been

previously rated New items, Esoteric items

- Popularity bias: Cannot recommend items to someone with

unique taste Tends to recommend popular items

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 35

СИСТЕМИ НА ЗНАЕЊЕ

Hybrid Methods

Implement two or more different recommenders and combine predictions Perhaps using a linear model

Add content-based methods to collaborative filtering Item profiles for new item problem Demographics to deal with new user problem

1/29/2013 36 Jure Leskovec, Stanford C246: Mining Massive Datasets

СИСТЕМИ НА ЗНАЕЊЕ

Evaluation

1 3 4

3 5 5

4 5 5

3

3

2 2 2

5

2 1 1

3 3

1

movies

users

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 38

36

СИСТЕМИ НА ЗНАЕЊЕ

Evaluation

37

1 3 4

3 5 5

4 5 5

3

3

2 ? ?

?

2 1 ?

3 ?

1

Test Data Set

users

movies

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 39

СИСТЕМИ НА ЗНАЕЊЕ

Evaluating Predictions Compare predictions with known ratings

Root-mean-square error (RMSE)

∑ 𝑟 − 𝑟∗ where 𝒓𝒙𝒊 is predicted, 𝒓𝒙𝒊∗ is the true rating of x on i

Precision at top 10: % of those in top 10

Rank Correlation: Spearman’s  correlation between  system’s  and  user’s  complete  rankings

Another approach: 0/1 model

Coverage: Number of items/users for which system can make predictions

Precision: Accuracy of predictions

Receiver operating characteristic (ROC) Tradeoff curve between false positives and false negatives

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 40

СИСТЕМИ НА ЗНАЕЊЕ

Tip: Add data

n Leverage all the data available ¨ Don’t try to reduce data size in an effort

to make fancy algorithms work ¨ Simple methods on large data do best

n Add more data ¨ e.g., add IMDB data on genres

n More Data Beats Better Algorithms http://anand.typepad.com/datawocky/2008/03/more-data-usual.html

СИСТЕМИ НА ЗНАЕЊЕ

Finding similar vectors n Common problem that comes up

in many settings n Given a large number N of

vectors in some high-dimensional space (M dimensions), find pairs of vectors that have high cosine-similarity ¨ e.g., user profiles, item profiles

СИСТЕМИ НА ЗНАЕЊЕ

Прашања?

41