76
Recommender System http://net.pku.edu.cn/~wbia 黄黄黄 [email protected] 黄黄黄黄黄黄黄黄黄黄 11/25/2014

Recommender System wbia 黄连恩 [email protected] 北京大学信息工程学院 11/25/2014

Embed Size (px)

Citation preview

Page 1: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Recommender System

http://net.pku.edu.cn/~wbia黄连恩

[email protected]北京大学信息工程学院

11/25/2014

Page 2: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Outline Today

What: Recommender System How:

Collaborative Filtering (CF) Algorithm User-based Item-based Model-based

Evaluation on recommender system

Page 3: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

What is Recommender System?

Page 4: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

The Problem

分类分类

检索检索

还有什么更有效的手段?

还有什么更有效的手段?

Page 5: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Recommendation

Page 6: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014
Page 7: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

This title is a textbook-style exposition on the topic, with its information organized very clearly into topics such as compression, indexing, and so forth. In addition to diagrams and example text transformations, the authors use "pseudo-code" to present algorithms in a language-independent manner wherever possible. They also supplement the reading with mg--their own implementation of the techniques. The mg C language source code is freely available on the Web.

This title is a textbook-style exposition on the topic, with its information organized very clearly into topics such as compression, indexing, and so forth. In addition to diagrams and example text transformations, the authors use "pseudo-code" to present algorithms in a language-independent manner wherever possible. They also supplement the reading with mg--their own implementation of the techniques. The mg C language source code is freely available on the Web.

Page 8: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Personalized Recommendation

Personalized Recommendation

Page 9: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Everyday Examples of Recommender Systems…

Bestseller lists Top 40 music lists The “recent returns” shelf at the library Many weblogs “Read any good books lately?” ....

Common insight: personal tastes are correlated:•If Marry and Bob both like X and Marry likes Y then Bob is more likely to like Y•especially (perhaps) if Bob knows Marry

Common insight: personal tastes are correlated:•If Marry and Bob both like X and Marry likes Y then Bob is more likely to like Y•especially (perhaps) if Bob knows Marry

Page 10: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Rec System: Applications Ecommerce

Product recommendations - amazon Corporate Intranets

Recommendation, finding domain experts, … Digital Libraries

Finding pages/books people will like Medical Applications

Matching patients to doctors, clinical trials, … Customer Relationship Management

Matching customer problems to internal experts

Page 11: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Recommender Systems

给出一个 users 和 items 集合 Items 可以是 documents, products, other users …

向一个 user 推荐 items ,根据 : users 和 items 的属性信息

age, genre, price, … 这个 user 以及其它 user 过去的 behavior

Who has viewed/bought/liked what? 来帮助人们

make decisions maintain awareness

Page 12: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Recommender systems are software applications that aim to support users in their decision-making while interacting with large information spaces.

Recommender systems help overcome the information overload problem by exposing users to the most interesting items, and by offering novelty, surprise, and relevance.

Recommender systems are software applications that aim to support users in their decision-making while interacting with large information spaces.

Recommender systems help overcome the information overload problem by exposing users to the most interesting items, and by offering novelty, surprise, and relevance.

Page 13: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

The Web, they say, is leaving the era of search and entering one of discovery. What's the difference? Search is what you do when you're looking for something. Discovery is when something wonderful that you didn't know existed, or didn't know how to ask for, finds you.

The Web, they say, is leaving the era of search and entering one of discovery. What's the difference? Search is what you do when you're looking for something. Discovery is when something wonderful that you didn't know existed, or didn't know how to ask for, finds you.

Page 14: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Collaborative Filtering Algorithm

Page 15: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Ad Hoc Retrieval and Filtering

Ad hoc retrieval ( 特别检索 : 文档集合保持不变 )

Collection“Fixed Size”

Q2

Q3

Q1

Q4Q5

Page 16: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Ad Hoc Retrieval and Filtering

Filtering( 过滤 : 用户需求不变 )

Documents Stream

User 1Profile

User 2Profile

Docs Filteredfor User 2

Docs forUser 1

Page 17: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Inputs - more detail

Explicit role/domain/content info: content/attributes of documents Document taxonomies Role in an enterprise Interest profiles

Past transactions/behavior info from users: which docs viewed , browsing history search(es) issued which products purchased pages bookmarked explicit ratings (movies, books … )

Large spaceLarge space

Extremely sparseExtremely sparse

Page 18: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Users Items

The Recommendation Space

Item-ItemLinks

User-UserLinks

Links derived from similar attributes,

similar content, explicit cross references

Links derived from similar attributes,

explicit connections

Observed preferences(Ratings, purchases, page views, laundry

lists, play lists)

Page 19: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Definitions

recommender system 为 user 提供对 items 的 recommendation/ prediction

/ opinion 的系统 Rule-based systems use manual rules to do

this An item similarity/clustering system

使用 item links A classic collaborative filtering system

使用 links between users and items Commonly one has hybrid systems

使用前面 all three kinds of links

Page 20: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Link types

User attributes-based Recommendation Male, 18-35: Recommend

The Matrix Item attributes-based

Content Similarity You liked The Matrix:

recommend The Matrix Reloaded

Collaborative Filtering People with interests like

yours also liked Forrest Gump

Page 21: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Example - behavior only

Users Docs viewed

U1

U2

d1

d2

d3

U1 viewed d1, d2, d3.

U2 views d1, d2.

Recommend d3 to U2.

?

Page 22: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Expert finding - simple example

Recommend U1 to U2 as someone to talk to?

U1

U2

d1

d2

d3

Page 23: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Simplest Algorithm:Neighbors Voting

U viewed d1, d2, d5. 看还有谁 viewed d1, d2

or d5. 向 U 推荐:那些 users里面 viewed 最“ popular” 的 doc.

V

W

d1U

d2

d5

Page 24: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Simple algorithm - shortcoming

把所有其它 users 同等对待

实际上,通过过去的历史 behavior 数据可以发现, users 与 U 相像的程度不同。

V

W

d1U

d2

d5

怎样改进?如何区分 user 对于 U 的重要度?User-based Nearest Neighbors

怎样改进?如何区分 user 对于 U 的重要度?User-based Nearest Neighbors

Page 25: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Matrix View

AijAirplane Matrix Room with

a View... Hidalgo

Joe 1 1 1 ... 1Carol 1 0 1 ... 0

... ... ... ... ... ...Kumar

1 1 0 ... 1

Users-Items Matrix Aij = 1 if user i viewed item j, = 0 otherwise. 共同访问过的 items# by pairs of users = ?AAt

user

item

Page 26: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Voting Algorithm

AAt 的行向量 ri

jth entry is the # of items viewed by both user i and user j.

ri A 是一个向量 kth entry gives a weighted vote count to item k

按最高的 vote count 推荐 items.

riuser

user

Auser

item

Page 27: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Add Rating to Algorithm

user i 给出评分 一个实数 rating Vik for item k

每个 user i 都拥有一个 ratings vector vi 稀疏,有大量空值

计算每一对 users i,j 之间的 Similarity measure of how much user pair i,j agrees: wij

Page 28: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Predict user i’s utility for item k

与 voting 算法类似, WiV 是一个向量 Sum (over users i’s nearest neighbors j) ∑wij Vjk

按这个值为 user i 推荐 item k.

VijAirplane Matrix Room with

a View... Hidalgo

Joe 9 7 2 ... 7Carol 8 ? 9 ... ?

... ... ... ... ... ...Kumar 9 3 ? ... 6

Page 29: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Similarity Measure

COS similarity(From IR)

Page 30: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Real data problems

User 有各自的 rating bias

VijAirplane Matrix Room with

a View... Hidalgo

Joe 50 10 40 ... 40Carol 100 ? 80 ... ?

... ... ... ... ... ...Kumar 95 85 ? ... 75

Page 31: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Similarity Measure

Page 32: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Correlation Between two random variables

Mean

Standard variance

Pearson's correlation indicating the degree of

linear dependence between the variables

Page 33: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Correlation Between two random variables

Page 34: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Discussion on Pearson Correlation

Whether two users have co-rated only a few items(on which they may agree by chance)or whether there are many items on which they agree Significance weighting

An agreement by two users on a more controversial item has more “value” than an agreement on a generally liked item. Inverse user frequency Variance weighting factor

Page 35: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Neighborhood Selection

Define a specific minimum threshold of user similarity

Limit the size to a fixed number k 20 to 50 neighbors seems reasonable

Page 36: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Voting Algorithm - implementation issues

计算复杂度? User similarity: w(a,i) Matrix Multiply K nearest neighbors Hold all rating data in memory Memory-based

algorithm Scalability Problem

Does pre-computation on w matrix works?

Page 37: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

vi,j= vote of user i on item j Ii = items for which user i has voted Mean vote for i is

User u,v similarity is

avoids overestimating who happen to have rated a few items identically

User-based Nearest Neighbor Algorithm

Page 38: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

User Nearest Neighbor Algorithm

选取 user u 的 nearest neighbor 集合 V ,计算 u对 item j 的 vote 如下

Page 39: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Item-based .vs. User-based

Amazon Online Shop 2003 29 millions users Millions of catalog items Prediction in real time is infeasible

Item-based .vs. User-based Pre-computation much stable for item similarity

than user similarity.

Page 40: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Item-based Algorithm

U be the set of users that rated both items a and b.

Predict the rating for user u for a product p

Also limited to k nearest neighbors

Page 41: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Model-based Algorithm

Item-based algorithm is still memory-based Original rating database is held in memory and

used directly for generating the recommendations

Model-based Only precomputed or “learned” model is

required to make predictions at runtime E.g. Matrix factorization/latent factor models

Page 42: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Matrix factorization

LSI/SVD

Dimensionality reduction Noise removing

Page 43: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014
Page 44: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Challenges of Nearest-Neighbor CF What is “the most optimal weight calculation” to

use? Requires fine tuning of weighting algorithm for the

particular data set What do we do when the target user has not

voted enough to provide a reliable set of nearest-neighbors? One approach: use default votes (popular items) to

populate matrix on items neither the target user nor the nearest-neighbor have voted on

A different approach: model-based prediction using Dirichlet priors to smooth the votes

Other factors include relative vote counts for all items between users, thresholding, clustering (see Sarwar, 2000)

Page 45: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Summary of Advantages of Pure CF

No expensive and error-prone user attributes or item attributes

Incorporates quality and taste Want not just things that are similar, but things

that are similar and good Works on any rate-able item One model applicable to many content

domains Users understand it

It’s rather like asking your friends’ opinions

Page 46: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Evaluation

Page 47: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Netflix Prize

NetFlix: on-line DVD-rental company a collection of 100,000 titles and over

10 million subscribers. They have over 55 million discs and

ship 1.9 million a day, on average a training data set of over 100 million

ratings that over 480,000 users gave to nearly 18,000 movies

Submitted predictions are scored against the true grades in terms of root mean squared error (RMSE)

Page 48: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Netflix Prize

prize of $1,000,000 A trivial algorithm got RMSE of 1.0540 Netflix, Cinematch, got RMSE of 0.9514 on the quiz

data, a 9.6% improvement To WIN

10% over Cinematch on the test set a progress prize of $50,000 is granted every year for the

best result so far By June, 2007, over 20,000 teams had registered

for the competition from over 150 countries. On June 26, 2009 the team "BellKor's Pragmatic Chaos", a

merger of teams "Bellkor in BigChaos" and "Pragmatic Theory", achieved a 10.05% improvement over Cinematch (an RMSE of 0.8558).

Page 49: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Measuring collaborative filtering

How good are the predictions? How much of previous opinion do we need? How do we motivate people to offer their opini

ons?

Page 50: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Measuring recommendations Typically, machine learning methodology Get a dataset of opinions; mask “half” the

opinions Train system with the other half, then

validate on masked opinions Studies with varying fractions half

Compare various algorithms (correlation metrics)

<User, Item, Grade><User, Item, Grade><User, Item, Grade><User, Item, Grade>。。。 。。。 。。。

<User, Item, Grade><User, Item, Grade><User, Item, Grade><User, Item, Grade>。。。 。。。 。。。

Page 51: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Common Prediction Accuracy Metric

Mean absolute error (MAE)

Root mean square error (RMSE)N

rpE

N

iii

1

N

rpE

N

iii

1

2

Page 52: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

McLaughlin & Herlocker 2004

Argues that current well-known algorithms give poor user experience Nearest neighbor algorithms are the most

frequently cited and the most widely implemented CF algorithms, consistently are rated the top performing algorithms in a variety of publications

But many of their top recommendations are terrible

These algorithms perform poorly where it matters most in user recommendations

Page 53: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Characteristics of MAE

Characteristics of MAE Assumes errors at all levels in the ranking have

equal weight Works well for measuring how accurately the

algorithm predicts the rating of a randomly selected item.

Seems not appropriate for “Find Good Items” task

Limitations of the MAE metric have concealed the flaws of previous algorithms it looks at all predictions not just top predictionsPrecision?Precision?

Page 54: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Precision of top k

Concealed because past evaluation mainly on offline datasets not real users Many un-rated item exist, but not participate

the evaluation

100 ? 80 ... ?

96 97 70 ... 95

test-data

prediction

Appear in recommendation list but not calculated in PrecisionAppear in recommendation list but not calculated in Precision

What’sthis?What’sthis?

Page 55: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Improve the Precision Measure

Precision of top k has wrongly been done on top k rated movies. Instead, treat not-rated as disliked

(underestimate) Captures that people pre-filter movies

Precision with non-rated items should be counted as non-relevant

Page 56: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Novelty versus Trust

There is a trade-off High confidence recommendations

Recommendations are obvious Low utility for user However, they build trust

Users like to see some recommendations that they know are right

Recommendations with high prediction yet lower confidence Higher variability of error Higher novelty → higher utility for user

McLaughlin and Herlocker argue that “very obscure” recommendations are often bad (e.g., hard to obtain)

Page 57: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Rsults from SIGIR 2004 Paper

Much better predicts top movies

Cost is that it tends to often predict blockbuster movies

A serendipity/ trust trade-off

Modified Precision at Top-N

0

0.05

0.1

0.15

0.2

0.25

0.3

Top 1 Top 5 Top 10 Top 15 Top 20

Mo

dif

ied

Pre

cisi

on

User-to-User Item-Item Distribution

Page 58: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Recommender Systems

Page 59: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Early systems

GroupLens (U of Minn) (Resnick, Iacovou, Bergstrom, Riedl) netPerceptions company Based on nearest neighbor recommendation

model Tapestry (Goldberg/Nichols/Oki/Terry) Ringo (MIT Media Lab) (Shardanand/Maes) Experiment with variants of these

algorithms

Page 60: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014
Page 61: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Datasets @ GroupLens

MovieLens Data Sets consists of 100,000 ratings for 1682 movies by 943 users 1 million ratings for 3900 movies by 6040 users

Book-Crossing Data Set 278,858 users (anonymized but with demographic

information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.

J ester J oke Data Set 4.1 million continuous ratings (-10.00 to +10.00) of 100

jokes from 73,496 users.

EachMovie Data Set 2,811,983 ratings entered by 72,916 for 1628 different

movies

Page 62: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014
Page 63: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Strands Recommendation Engine

Page 64: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014
Page 65: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014
Page 66: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014
Page 67: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Resources

GroupLens http://citeseer.nj.nec.com/resnick94grouplens.html http://www.grouplens.org

Has available data sets, including MovieLens Breese et al. UAI 1998

http://research.microsoft.com/users/breese/cfalgs.html McLaughlin and Herlocker, SIGIR 2004

http://portal.acm.org/citation.cfm?doid=1009050 CoFE CoFE “Collaborative Filtering Engine”

Open source Java Reference implementations of many popular CF algorithms http://eecs.oregonstate.edu/iis/CoFE

C/Matlab Toolkit for Collaborative Filtering http://www.cs.cmu.edu/~lebanon/IR-lab.htm

Page 68: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Related Conferences

http://recsys.acm.org/

Page 69: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Books Recommender Systems An Introdu

ction This book offers an overview of approach

es to developing state-of-the-art recommender systems. The authors present current algorithmic approaches for generating personalized buying proposals, such as collaborative and content-based filtering, as well as more interactive and knowledge-based approaches. They also discuss how to measure the effectiveness of recommender systems and illustrate the methods with practical case studies. The final chapters cover emerging topics such as recommender systems in the social web and consumer buying behavior theory.

Page 70: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Readings

[1] MIW Ch8 [2] R. M. Matthew and L. H. Jonathan, "A collab

orative filtering algorithm and evaluation metric that accurately model the user experience," in Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. Sheffield, United Kingdom: ACM, 2004.

Page 71: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Summary

Collaborative Filtering Input data space , especial

ly the User-Item links Nearest Neighbor CF

Weighting scheme Evaluation of CF

MAE failure

Page 72: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Thank You!

Q&A

Page 73: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Challenges of Nearest-Neighbor CF

Structure based recommendations Recommendations based on similarities between

items with positive votes (as opposed to votes of other users)

Structure of item dependencies modeled through dimensionality reduction via singular value decomposition (SVD) aka latent semantic indexing

Approximate the set of row-vector votes as a linear combination of basis column-vectors

i.e. find the set of columns to least-squares minimize the difference between the row estimations and their true values

Perform nearest-neighbor calculations to project predictions for all items

Page 74: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

GroupLens Collaborative Filtering Scheme

aqaaq pvp .Prediction for active user a on

item q

n

iiqaiaq zwp

1

Weighted average of preferences

Similarity weight between active user and user i

k

ikakai zzw .

z-scores for item q

i

iiqiq

vvz

Rating for user i on item q

Mean vote for user i

iIjij

ii v

Iv

||

1

Page 75: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014

Nearest-Neighbor CF

Basic principle: utilize user’s vote history to predict future votes/recommendations based on “nearest-neighbors”

A typical normalized prediction scheme: goal: predict vote for item ‘j’ based on other

users, weighted towards those with similar past votes as target user ‘a’

Page 76: Recommender System wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 11/25/2014