How to build recommender system

ФАКУЛТЕТ ЗА ИНФОРМАТИЧКИ НАУКИ И КОМПЈУТЕРСКО ИНЖЕНЕРСТВО

СИСТЕМИ ЗА ПРЕПОРАКИ.

Системи на знаење

СИСТЕМИ НА ЗНАЕЊЕ

Препораки

База

Пребарување Препораки

Производи, веб страни, блогови, новинарски текстови, …

Search Recommendations

Products, web sites, blogs, news items, …

1/29/2013 4 Jure Leskovec, Stanford C246: Mining Massive Datasets

Examples:

Пример на систем за препораки

Customer X Buys Metallica CD Buys Megadeth CD

Customer Y Does search on Metallica Recommender system

suggests Megadeth from data collected about customer X

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 3 3

Customer X Buys Metallica CD Buys Megadeth CD

Customer Y Does search on Metallica Recommender system

suggests Megadeth from data collected about customer X

1/29/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 3

Од оскудност до изобилство (From scarcity to abundance)

n Shelf space is a scarce commodity for traditional retailers ¨ Also: TV networks, movie theaters,…

n The web enables near-zero-cost dissemination of information about products ¨ From scarcity to abundance

n More choice necessitates better filters ¨ Recommendation engines ¨ How Into Thin Air made Touching the Void a

bestseller: http://www.wired.com/wired/archive/12.10/tail.html

Долгата опашка (The Long Tail)

Source: Chris Anderson (2004) Source: Chris Anderson (2004)

Физички наспроти Интернет продавници

Read http://www.wired.com/wired/archive/12.10/tail.html to learn more! 6

Recommendation Types

n Editorial and hand curated ¨ List of favorites

¨ Lists of “essential” items

n Simple aggregates ¨ Top 10, Most Popular, Recent Uploads

n Tailored to individual users ¨ Amazon, Netflix, …

Formal Model

n X = set of Customers

n S = set of Items n Utility function u: X x S àR

¨ R = set of ratings

¨ R is a totally ordered set

¨ e.g., 0-5 stars, real number in [0,1]

Utility Matrix

0.410.2

0.30.50.21

King Kong LOTR Matrix Nacho Libre

LOTR = Lord of the Rings

Key Problems (1) Gathering “known” ratings for matrix

How to collect the data in the utility matrix

(2) Extrapolate unknown ratings from the known ones Mainly interested in high unknown ratings

We are not interested in knowing what you don’t like but what you like

(3) Evaluating extrapolation methods How to measure success/performance of

recommendation methods

Gathering Ratings

n Explicit ¨ Ask people to rate items ¨ Doesn’t work well in practice – people can’t

be bothered n  Implicit

¨ Learn ratings from user actions ¨ e.g., purchase implies high rating ¨ What about low ratings?

Extrapolating Utilities n Key problem: matrix U is sparse

¨ most people have not rated most items ¨ Cold start:

n  New items have no ratings n  New users have no history

n Three approaches to recommender systems ¨ 1) Content-based ¨ 2) Collaborative ¨ 3) Hybrid (Latent factor based)

Content-based Recommender Systems 13

Content-based recommendations

n Main idea: recommend items to customer x similar to previous items rated highly by x

Examples: n Movie recommendations

¨ recommend movies with same actor(s), director, genre, …

n Websites, blogs, news ¨ recommend other sites with “similar” content

Plan of action

Item profiles

Red Circles

Triangles User profile

recommend build

Item Profiles n For each item, create an item profile

n Profile is a set (vector) of features ¨ movies: author, title, actor, director,…

¨ text: set of “important” words in document

n How to pick important features (words)? ¨ Usual heuristic from text mining is TF-IDF

(Term Frequency * Inverse Doc Frequency) n  Term ... Feature

n  Document ... Item

TF-IDF fij = frequency of term (feature) i in doc (item) j

ni = number of docs that mention term i N = total number of docs TF-IDF score: wij = TFij × IDFi

Doc profile = set of words with highest TF-IDF scores, together with their scores

Note: we normalize TF to discount for “longer” documents

User profiles and prediction

User profile possibilities: Weighted average of rated item profiles Variation: weight by difference from average

rating for item …

Prediction heuristic: Given user profile x and item profile i, estimate 𝑢(𝒙, 𝒊) = cos (𝒙, 𝒊) = 𝒙·𝒊

| 𝒙 |⋅| 𝒊 |

Model-based approaches

n For each user, learn a classifier that classifies items into rating classes ¨ liked by user and not liked by user ¨ e.g., Bayesian, regression, SVM

n Apply classifier to each item to find recommendation candidates

n Problem: scalability ¨ Won’t investigate further in this class

Advantages of Content-based approach (Pros)

+: No need for data on other users No cold-start or sparsity problems

+: Able to recommend to users with unique tastes

+: Able to recommend new & unpopular items No first-rater problem

+: Able to provide explanations Can provide explanations of recommended items by

listing content-features that caused an item to be recommended

Limitations of content-based approach (Cons)

–: Finding the appropriate features is hard E.g., images, movies, music

–: Overspecialization Never recommends items outside user’s

content profile People might have multiple interests Unable to exploit quality judgments of other users

–: Recommendations for new users How to build a user profile?

Collaborative Filtering

Collaborative Filtering Consider user x

Find set N of other users whose ratings are “similar” to x’s ratings

Estimate x’s ratings based on ratings of users in N

Similar users Let rx be the vector of user x’s ratings Jaccard similarity measure

Problem: Ignores the value of the rating Cosine similarity measure

sim(x, y) = cos(rx, ry) = ⋅

|| ||⋅|| ||

Problem: Treats missing ratings as “negative” Pearson correlation coefficient

Sxy = items rated by both users x and y

rx = [*, _, _, *, ***] ry = [*, _, **, **, _]

rx, ry as sets: rx = {1, 4, 5} ry = {1, 3, 4}

rx, ry as points: rx = {1, 0, 0, 1, 3} ry = {1, 0, 2, 2, 0}

rx, ry … avg. rating of x, y

𝒔𝒊𝒎 𝒙, 𝒚 =∑ 𝒓𝒙𝒔 − 𝒓𝒙 𝒓𝒚𝒔 − 𝒓𝒚𝒔∈𝑺𝒙𝒚

∑ 𝒓𝒙𝒔 − 𝒓𝒙 𝟐𝒔∈𝑺𝒙𝒚 ∑ 𝒓𝒚𝒔 − 𝒓𝒚

𝟐𝒔∈𝑺𝒙𝒚

Let rx be the vector of user x’s ratings Jaccard similarity measure

Problem: Ignores the value of the rating Cosine similarity measure

sim(x, y) = cos(rx, ry) = ⋅

|| ||⋅|| ||

Problem: Treats missing ratings as “negative” Pearson correlation coefficient

Sxy = items rated by both users x and y

rx = [*, _, _, *, ***] ry = [*, _, **, **, _]

rx, ry as sets: rx = {1, 4, 5} ry = {1, 3, 4}

rx, ry as points: rx = {1, 0, 0, 1, 3} ry = {1, 0, 2, 2, 0}

rx, ry … avg. rating of x, y

𝒔𝒊𝒎 𝒙, 𝒚 =∑ 𝒓𝒙𝒔 − 𝒓𝒙 𝒓𝒚𝒔 − 𝒓𝒚𝒔∈𝑺𝒙𝒚

∑ 𝒓𝒙𝒔 − 𝒓𝒙 𝟐𝒔∈𝑺𝒙𝒚 ∑ 𝒓𝒚𝒔 − 𝒓𝒚

𝟐𝒔∈𝑺𝒙𝒚

Rating predictions

Let rx be the vector of user x’s ratings Let N be the set of k users most similar to x

who have rated item i Prediction for item s of user x:

𝑟 = ∑ 𝑟∈

𝑟 =∑ ⋅∈∑ ∈

Other options? Many other tricks possible…

Shorthand: 𝒔𝒙𝒚 = 𝒔𝒊𝒎 𝒙, 𝒚

- тежнее да ги усредни рејтинзите

Complexity

n Expensive step is finding k most similar customers ¨ O(|U|)

n Too expensive to do at runtime ¨ Need to pre-compute

n Naïve precomputation takes time O(N|U|) ¨ Simple trick gives some speedup

n Can use clustering, partitioning as alternatives, but quality degrades

Item-Item Collaborative Filtering So far: User-user collaborative filtering Another view: Item-item

For item i, find other similar items Estimate rating for item i based

on ratings for similar items Can use same similarity metrics and

prediction functions as in user-user model

xiNj ij

xiNj xjijxi s

sij… similarity of items i and j rxj…rating of user u on item j N(i;x)… set items rated by x similar to i

СИСТЕМИ НА ЗНАЕЊЕ 28

12 11 10 9 8 7 6 5 4 3 2 1

4 5 5 3 1 1

3 1 2 4 4 5 2

5 3 4 3 2 1 4 2 3

2 4 5 4 2 4

5 2 2 4 3 4 5

4 2 3 3 1 6

users m

- unknown rating - rating between 1 to 5

12 11 10 9 8 7 6 5 4 3 2 1

4 5 5 ? 3 1 1

3 1 2 4 4 5 2

5 3 4 3 2 1 4 2 3

2 4 5 4 2 4

5 2 2 4 3 4 5

4 2 3 3 1 6

- estimate rating of movie 1 by user 5

12 11 10 9 8 7 6 5 4 3 2 1

4 5 5 ? 3 1 1

3 1 2 4 4 5 2

5 3 4 3 2 1 4 2 3

2 4 5 4 2 4

5 2 2 4 3 4 5

4 2 3 3 1 6

Neighbor selection: Identify movies similar to movie 1, rated by user 5

sim(1,m)

Here we use Pearson correlation as similarity: 1) Subtract mean rating mi from each movie i m1 = (1+3+5+5+4)/5 = 3.6 row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0] 2) Compute cosine similarities between rows

12 11 10 9 8 7 6 5 4 3 2 1

4 5 5 ? 3 1 1

3 1 2 4 4 5 2

5 3 4 3 2 1 4 2 3

2 4 5 4 2 4

5 2 2 4 3 4 5

4 2 3 3 1 6

Compute similarity weights: s13=0.41, s16=0.59

sim(1,m)

Се земаат предвид само двата најслични корисници

12 11 10 9 8 7 6 5 4 3 2 1

4 5 5 2.6 3 1 1

3 1 2 4 4 5 2

5 3 4 3 2 1 4 2 3

2 4 5 4 2 4

5 2 2 4 3 4 5

4 2 3 3 1 6

Predict by taking weighted average:

r15 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6 1/29/2013 32 Jure Leskovec, Stanford C246: Mining Massive Datasets

𝒓𝒊𝒙 =∑ 𝒔𝒊𝒋 ⋅ 𝒓𝒋𝒙𝒋∈𝑵(𝒊;𝒙)

∑𝒔𝒊𝒋

Item-Item vs User-User

0.418.010.9

0.30.50.81

Avatar LOTR Matrix Pirates

In practice, it has been observed that item-item often works better than user-user

Why? Items are simpler, users have multiple tastes

Pros and cons of collaborative filtering + Works for any kind of item

No feature selection needed - Cold Start:

Need enough users in the system to find a match - Sparsity:

The user/ratings matrix is sparse Hard to find users that have rated the same items

- First rater: Cannot recommend an item that has not been

previously rated New items, Esoteric items

- Popularity bias: Cannot recommend items to someone with

unique taste Tends to recommend popular items

Hybrid Methods

Implement two or more different recommenders and combine predictions Perhaps using a linear model

Add content-based methods to collaborative filtering Item profiles for new item problem Demographics to deal with new user problem

Evaluation

movies

Evaluation

Test Data Set

movies

Evaluating Predictions Compare predictions with known ratings

Root-mean-square error (RMSE)

∑ 𝑟 − 𝑟∗ where 𝒓𝒙𝒊 is predicted, 𝒓𝒙𝒊∗ is the true rating of x on i

Precision at top 10: % of those in top 10

Rank Correlation: Spearman’s correlation between system’s and user’s complete rankings

Another approach: 0/1 model

Coverage: Number of items/users for which system can make predictions

Precision: Accuracy of predictions

Receiver operating characteristic (ROC) Tradeoff curve between false positives and false negatives

Tip: Add data

n Leverage all the data available ¨ Don’t try to reduce data size in an effort

to make fancy algorithms work ¨ Simple methods on large data do best

n Add more data ¨ e.g., add IMDB data on genres

n More Data Beats Better Algorithms http://anand.typepad.com/datawocky/2008/03/more-data-usual.html

Finding similar vectors n Common problem that comes up

in many settings n Given a large number N of

vectors in some high-dimensional space (M dimensions), find pairs of vectors that have high cosine-similarity ¨ e.g., user profiles, item profiles

Прашања?

How to build recommender system

Science

How to build up a corporate blog

How To Build A Paver Patio Or Walkway - Stone Plusstoneplus.com/DIY/How-To-Build-A-Paver-Patio-Or-Walkway-s1.pdf · How To Build A Paver Patio or Walkway There are different ways

How to Build Your Personal Brand on LinkedIn

How to Build Hardware Lean

How To Build Your Own Website Tip 17

How to build a google site

How to build a sucessful mobile talent strategy

NTXISSACSC4 - How Not to Build a Trojan Horse

Symbian OS: How To Build Your Gadget

How to build/run user applications

How to Build a Team

How to Build a Double Calendar Spread

How to Build 3G Network for Part 3

How CIOs Can Build a Mobility Strategy - Japanese

How to Build X-Wing Fighter 2nd Edition

HOW TO BUILD GEMS #shibuyarb

Richard Stinear - How to Build a Death Star

NutriCare - How to build a brand

How to build a platform

How to build a Neutron Plugin (stadium edition)