Upload
gina
View
23
Download
2
Embed Size (px)
DESCRIPTION
朝陽科技大學資訊管理系 李 麗 華 教 授 2012/12/18. The Study and The Trend of Recommender Systems (RS). Contents. Preface -- Stay Hunger Stay Foolish. Review of Recommendation Systems. Techniques for Recommendation Systems. Applications of Recommendation Systems. The Trend of Recommendation Systems. - PowerPoint PPT Presentation
Citation preview
The Study and The Trend of Recommender Systems (RS)The Study and The Trend of Recommender Systems (RS)
朝陽科技大學資訊管理系李 麗 華 教 授
2012/12/18
朝陽資管李麗華 2
Preface -- Stay Hunger Stay Foolish
Review of Recommendation Systems
Techniques for Recommendation Systems
Applications of Recommendation Systems
The Trend of Recommendation Systems
ContentsContents
朝陽資管李麗華 3
Q & A Q & A
Q: What is recommendation?
Q: What is recommendation system? ( 以下簡稱 RS)
朝陽資管李麗華
An ExampleAn Example
4
朝陽資管李麗華
An ExampleAn Example
5
推薦區
朝陽資管李麗華 6
Review of Recommender SystemsReview of Recommender Systems
The Recommendation Systems (RS) History:
Information Retrieval ( 資料擷取 ) assumes to have a quite constant underlying database of items and aids the users with changing interests.
Information Filtering ( 資料過濾 ) assumes that to access highly dynamic information sources with rather stable users’ interests.
RS are like the dynamic information filtering systems.
RS try to anticipate the users’ needs, and they can be used as decision tools in case of users absences.
朝陽資管李麗華 7
Review of Recommender SystemsReview of Recommender Systems
IR
IF
RS
取得、過濾、預測出有用且具效益的資訊
朝陽資管李麗華 8
Q & A Q & A
Q: Why do we need the RS?
朝陽資管李麗華 9
RS enhances sales of E-commerce Browsers into buyer( 讓瀏覽者變買者 )
• Recommender systems can help customers find products they wish to purchase.
Cross-sell ( 交义銷售 ) • A site might recommend additional products in
the checkout process.
Loyalty ( 建立顧客忠誠度 )• Recommender systems improve loyalty by
creating a value-added relationship between the site and the customer.
Review of Recommender SystemsReview of Recommender Systems
EX: 微軟賣湯
朝陽資管李麗華 10
Q & A Q & A
Q: How to implement the RS?
Q: What information do we need to implement RS?
朝陽資管李麗華 11
Personalization ( 個人化 ) RS can introduce users to choose the useful
information they interested through personalization.
User Profile ( 個人輪廓 ) User profile through the questionnaire, the
purchasing products, or the web browsing history are usually analyzed in RS to understand the users’ characteristics, habits, and preference.
Filtering & match finding ( 過濾、媒合 ) Filtering method and match finding are used for
deriving the closest information for the user.
Review of Recommender SystemsReview of Recommender Systems
EX: 妙員工
朝陽資管李麗華 12
Characteristics of recommender systems (RS)
1.Be able to access user profiles or user data for analysis.
2.Be able to use the explicit or implicit information.
3.Usually the similarity functions and the distance functions are used for filtering.
4. Be able to adapt the users Interests shifting.
5. Be able to make possible recommendation or proposals.
6.Be able to take the users’ needs into account.
7.Be able to give an explanation or the confidence coefficient.
Review of Recommender SystemsReview of Recommender Systems
朝陽資管李麗華 13
Standard processes of recommendation
Step1: retrieve and filter items ( 擷取和過濾 )Ex: A user is looking for recent fiction books, and the system
should provide him a possible list of books.
Step2: elaborate a prediction for every item for a certain user ( 強化預測 )
Ex: To return a score (or a judgement ) on the fact that the user will like or not like the item.
Step3: generate recommendation to the user ( 推薦 )Ex: The proposal of the recommendation to the user is
strictly related to the interface chosen for the recommender, and to the interaction between users and system.
Review of Recommender Systems
朝陽資管李麗華 14
Collect user data & update customer database
According to user database Retrieve Elaborative a prediction Generate recommendation
Evaluate recommendation results for adjustment
Feedback
according to
user’s
new
information
Recommendation System
Review of Recommender SystemsReview of Recommender Systems
朝陽資管李麗華 15
Retrieve Information ( 資訊擷取的形式 ) Explicit information (Q: give an example) Implicit information (Q: give an example)
User Profiling ( 使用者資料檔 ) The amount of user information required by the
recommendation function as input. Demographic data Explicit keywords and ratings Implicit interest indicators Context
Review of Recommender Systems
朝陽資管李麗華 16
The retrieving of the items ( 資訊擷取的內容 ) The user may look for
• peculiar product• ItemItems (digital information)• suggestionssuggestions
Solution• to select candidate items.• to retrieve candidate items • to give them a proper suggestion
(or prediction or decision)
Review of Recommender Systems
朝陽資管李麗華 17
The elaboration of the prediction
The elaboration of the prediction is done by recommender functions.
Qualitative approach: system provide suggestions or preferences such as “prefer” and “not prefer.”
Quantitative approach: system provide information with score of likeliness to the item.
Review of Recommender Systems
EX: 老師你猜錯了 EX:電腦徵婚
朝陽資管李麗華 18
CBContent Based
CBContent Based
MixedApproach
MixedApproach
CFCollaborative
Filtering
CFCollaborative
Filtering
Techniques for RSTechniques for RS
內容導向式
協同過濾式
The mostly applied RS methods
混合式
朝陽資管李麗華 19
朝陽資管李麗華 20
朝陽資管李麗華 21
Techniques for RS - CBTechniques for RS - CB
Content Based (CB) method: find and match information or content based on the active user’s information.
CB inherits from classic IR and IF.
The advantages of CB approaches CB algorithms are tuned for each user. CB algorithms are able to recommend every item
that comprises new item, strange item, and unpopular item .
CB algorithms are also able to give an explanation of their predictions.
朝陽資管李麗華 22
Content representation of items A set of features
• The type of the feature• The value of the feature• A weight between the features
The items are objects with several attributes that every attribute has completely different meaning.• Firstly, every domain has different attributes.• Secondly, it is necessary to decide which features
are important.• Thirdly, the selection of important features are used
to consider with users.
Techniques for RS - CBTechniques for RS - CB
朝陽資管李麗華 23
Features vs. Terms representation The example can discern between the
properties “Tom Cruise” as director of a movie and as main male actor of a movie.
Techniques for RS - CBTechniques for RS - CB
朝陽資管李麗華 24
Q & A Q & A
Q: How to match the information?
朝陽資管李麗華 25
Vector Space Model In this model, every item is represented by the
vector of its features.
The main advantage of the model• It doesn’t require any training phase.• It is completely available as soon as enough
examples are provided.
Techniques for RS - CBTechniques for RS - CB
朝陽資管李麗華 26
Cosine similarity Example
• user a =
• user b =
naaaa ......,,, 321
nbbbb ......,,, 321
22
22
122
221
332211
1
2
1
2
1
......
.....,
nn
nn
n
ii
n
ii
n
iii
bbbaaa
babababa
wbwa
wbwabasim
aa
b b
(1)
(2)
Techniques for RS- CBTechniques for RS- CB
朝陽資管李麗華 27
Bayesian Classifiers The goal is to derive the probability that how much an
item will belong to a certain category.P(ci):Prior probabilityP(ci|dj):Posteriori probabilitydj: itemci:class
r icPdP
AAAS
ij
r
1,2,...,, 0)( , 0)(
},....,,{ 21
r
iiji
iji
j
ijiji
cdPcP
cdPcP
dP
cdPcPdcP
1
)|()(
)|()(
)(
)|()()|( dj
)|()(...)|()()|()(
)|()(
2211 rjrjj
iji
cdPcPcdPcPcdPcP
cdPcP
Techniques for RS- CBTechniques for RS- CB
朝陽資管李麗華 28
example
A
B
C
3
1
3
1
W
'W
W
W
'W
'W
0
2
1
1
3
1
)|()()|()()|()(
)|()(
)(
)()|(
CWPCPBWPBPAWPAP
BWPBP
WP
WBPWBP
3
1
03
1
2
1
3
11
3
12
1
3
1
Techniques for RS- CBTechniques for RS- CB
朝陽資管李麗華 29
Naïve Bayesian Classifiers Making an assumption that features are
conditionally independent. May have problem on unbalanced classes.
Class A Class BFeatures
Techniques for RS- CBTechniques for RS- CB
朝陽資管李麗華 30
Q & A Q & A
Q: Problems for CB?
朝陽資管李麗華 31
The power of CB is limited by two factors
It can not derive the prediction when the user information (or history records) is not available.
Characteristics like the quality or the readiness of a document are typical attributes that can be recognized only by human but difficult for computer.
Techniques for RS - CBTechniques for RS - CB
朝陽資管李麗華 32
Collaborative Filtering (CF) Method: using information about a group of users, rather than the only active user.
The idea is to find a subset of users that have similar tastes for making prediction.
The basic algorithm for CF consists with the following steps
Step1: calculate the similarity between the user A and all the other users.
Step2: select a set of users that is similar to user A.
Step3: use the set of users for referencing and for making a recommendation.
Techniques for RS- CFTechniques for RS- CF
朝陽資管李麗華 33
Techniques for RS- CFTechniques for RS- CF
朝陽資管李麗華 34
Pearson Correlation Coefficient (PCC) The PCC between a user a and a user u
The covariance function
. itemfor rating s’user :
.by rated items for the vectorsratings the:
. by rated items for the vectorsratings the:
, jir
umr
amr
ji
u
a
Techniques for RS- CFTechniques for RS- CF
朝陽資管李麗華 35
Pearson Correlation Coefficient (PCC) The standard deviation
Significance weight
The similarity function
m : the number of co-rated items
Techniques for RS- CFTechniques for RS- CF
朝陽資管李麗華 36
Pearson Correlation Coefficient (PCC) Example
P1 P2 P3 P4
a 3 2 5 1
u 3 2 5 2
z 1 3 3 1
24
8 , 3
4
12 , 75.2
4
11zu rrra
75.14
7,covar ua rr
5.04
2,covar uz rr
52.14
25.9
ar 2247.1
4
6
ur
14
4
zr
94.0, uaC
408263.0, uzC
75.04
3,covar za rr 49342.0, uaC
Techniques for RS- CFTechniques for RS- CF
朝陽資管李麗華 37
Selection of neighbors Similarity threshold
• It selects all the users that have a similarity coefficient greater than a prefixed threshold.
Best k-neighbors• It simply selects the first k users with the best
similarity coefficient and uses them for the prediction.
Both the approaches presented have the problem that it is necessary to choose a value only on empirical bases.
Techniques for RS- CFTechniques for RS- CF
朝陽資管李麗華 38
The elaboration of the prediction Prediction functions: higher the correlation
between two users, higher the probability score that will generate.
The deviation from mean
uuser theand auser ebetween tht coefficien similarity the:
i item theu touser theof rate the:
uuser theof scores theof averagemean the:
auser for the i item for the prediction the:
oodneighbourh in the selected users ofnumber the:
ua,
iu,
u
ia,
w
r
r
p
n
Techniques for RS- CFTechniques for RS- CF
朝陽資管李麗華 39
Q & A Q & A
Q: Problems for CF?
朝陽資管李麗華 40
Problems of Collaborative Filtering Cold Start
• It is difficult to find a user with a high similarity coefficient.
Sparsity• It could generate only low similarity coefficients, or
none at all. First Rater
• It is difficult to give a rating to new items, since they are not rated by anyone.
Popularity Bias• CF approaches tend to recommend always most
popular items, and give low scores to strange items.
Techniques for RS- CFTechniques for RS- CF
朝陽資管李麗華 41
Several works have tried to take the advantages of both CB and CF methods.
Mixing final results The simplest function is a weighted sum
Techniques for RS - MixedTechniques for RS - Mixed
朝陽資管李麗華 42
Collaboration via Content The idea is to consider the content description
or every metadata available, and to apply the PCC directly on the features, rather than at level of items.
The advantage is that now it is not needed anymore to calculate the similarity coefficient only on items rated by both users.
user a user b
Techniques for RS - MixedTechniques for RS - Mixed
朝陽資管李麗華 43
Content Boosted Collaborative Filtering (CBCF) Where there is no score for a movie, they fill in
the blank with the prediction of the CB algorithm for reducing the sparsity problem.
CBCF is that they apply the PCC for calculating all the similarities and for finding the k-neighborhood of the active user.
Techniques for RS - MixedTechniques for RS - Mixed
朝陽資管李麗華 44
Other techniques of RS Non-Personalized Recommendations
• Each customer gets the same recommendations.
Attribute-Based Recommendations• Content-Based Recommendation
Techniques for RS – other termsTechniques for RS – other terms
朝陽資管李麗華 45
Item-to-Item Correlation• A small set of products that
the customers have expressed
interest in.
People-to-People Correlation
• Collaborative filtering
Techniques for RS – other termsTechniques for RS – other terms
朝陽資管李麗華 46
Year Recommendation system Authors1992 Tapestry Goldberg1994 GroupLens Resnick1995 Pointers Maltz
1996 Letizia & Let’s Browse Lieberman WebWatcher Joachims
1997 Firefly Turnbull PHOAKS Terveen
1998
Fab Turnbull GAB Wittenburg Lotus Notes Turnbull Yahoo! Turnbull
1999 ProfBuilder Wasfi Personal Tango Claypool and Gokhale
2000 MovieLens Sarwar
2001 SmartPad Lawrence PolyLens O’Connor
2003 INTRIGUE Ardissono Amazon.com Linden
2004 Travel Decision Forum Jameson PHOAKS Perugini
Applications of RSApplications of RS
朝陽資管李麗華 47
Company Product Recommended
Launch.com Online Music
Amazon.com Books,CD etc.
Moviefinder.com Movie
MovieLens Movie
Drugstore.com Drugstore
CDNOW music
IMDb movie
Barnes & NOBLE Books, Movies, Music, Toy, Games
Applications of RSApplications of RS
•Well Known business companies who has applied RS in their website.
朝陽資管李麗華 48
We find the research papers from SDOL database 。
There are 486,556 articles appeared from 1834 to Sep. 2010. (See next page figure).
keywords used are:
recommendation, recommender, recommender systems, recommender recommendation, recommender, recommender systems, recommender system, recommendation system, recommending, recommendations, system, recommendation system, recommending, recommendations, Collaborative Filtering, Content-Based, Personalized recommender Collaborative Filtering, Content-Based, Personalized recommender
system", Hybrid recommender systems, collaborative filterssystem", Hybrid recommender systems, collaborative filters
The annual growing rate of the recommender research has surpassed the previous one after 2000.
the majority of the studies focus on the application aspect.
Applications of RSApplications of RS
朝陽資管李麗華 49
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
1985 1989 1993 1997 2001 2005 2009-year
amount
Applications of RSApplications of RS
朝陽資管李麗華 50
Applications of RSApplications of RS
Recommender SystemApplication domain
Commerce
Information
E-learning
Industry
Virtual Community
Teaching
Forum
P2P-web
Cosmetic business
Content
Workflow
Business
Software project planning
Shopping malls
Blog
Multimedia
B2B-commerce
Manufacturing enterprise
Tag
Bookmark
English reading course
News
Book
Movie
Document
Knowledge
Music
Document
TV-program
Web service
Education
E-commerce
M-commerce tourism
朝陽資管李麗華 51
Applications of RSApplications of RS
朝陽資管李麗華 52
Applications of RSApplications of RS
朝陽資管李麗華 53
Clint Web
server
Current access list
Historical data
of trade
Web log
Recommended list
Recommender processing
Non-access list
Data processing
Model
and
threshold
Data warehouse
Product database
Database
Self-adaptation
Applications of RSApplications of RS
Architecture of E-commerce Recommendation System (an example)
朝陽資管李麗華 54
Q & A Q & A
Q: Can you think of the trend of RS?
朝陽資管李麗華 55
The Trend of RSThe Trend of RS
RS for E-CommerceRS for Social NetworkingRS for AdvertisingRS for CRMRS for Mobile ServiceRS for Music FindingRS for Image FindingRS for ….
朝陽資管李麗華 56
Apple 創辦人Steve Jobs
Microsoft 創辦人 Bill Gates
PrefacePreface
朝陽資管李麗華 57
Preface—Steve JobsPreface—Steve Jobs
Steve Jobs 的人生 未婚媽媽的小孩,送給國中學歷父母領養 20 歲創立 Apple Co. 為 Apple 創下 10 年多
的 Macintosh 電腦熱賣風潮 31 歲被 Apple 公司 Fire 31 歲創立 NeXT 公司 ( 研發物件導向系統 ) 32 歲買下一個公司改名為 Pixar(Toy Story) 42 歲蘋果電腦買下 NeXT ,重返 Apple 公
司 42 歲任 Apple CEO 寶座 47 歲推出 iPod 之後又推出 iTune 50 歲得了胰臟癌 53 歲推出 iPhone 56 歲被財務時報封為 2010 年度風雲人物
朝陽資管李麗華 58
Stay Hunger Stay Foolish—Steve JobsStay Hunger Stay Foolish—Steve Jobs
You can't connect the dots looking forward; you can only connect them looking backwards.
I‘m convinced that the only thing that kept me going was that I loved what I did 。
And the only way to do great work is to love what you do 。
Stay Hungry , Stay Foolish ( 求知若渴,虛心若愚 )
朝陽科技大學資管系 李麗華 [email protected]
朝陽資管李麗華 6060
Metrics(1/7)Metrics(1/7)
Three key dimensions needed to be measured for having an idea of the quality of a recommender system: coverage, efficacy, and accuracy. Coverage
• A measure of the percentage of items for which a recommender can provide predictions.
• First of all, not for every item it is possible to have a representation suitable for the algorithm used.
• Second, there could be too less information about the user, and it could not be possible to make prediction for that user.
朝陽資管李麗華 6161
Metrics(2/7)Metrics(2/7)
Efficacy• It is measured through three main parameters:
precision, recall, and fallout.
朝陽資管李麗華 6262
Metrics(3/7)Metrics(3/7)
Accuracy
• It is the most important measure for RS, because users tend to evaluate only few items at the top of an ordered list.
• Accuracy is important to decrease the time required to the user and ultimately satisfied requirement of user.
朝陽資管李麗華 6363
Statistical accuracy metrics Mean Absolute Error (MAE)
• The goal is to minimize this error.
Metrics(4/7)Metrics(4/7)
example
}3,2,5,4,2,1{
}2,4,1,3,1,2{
}1,2,5,4,2,1{
2
1
P
P
r
33.06
2
67.16
10
2
1
E
E
Rri
Rpi
朝陽資管李麗華 64
Metrics(5/7)Metrics(5/7)
Root Mean Squared Error (RMSE)• Very similar to MAE, this metric is biased to
weigh large errors disproportionately more heavily than small errors.
• The goal is to minimize the error.
8165.06
14161111
RMSE
333.06
4000002
RMSE
朝陽資管李麗華 65
Metrics(6/7)Metrics(6/7)
Decision-support accuracy metrics Receiver Operating Characteristic (ROC)
• ROC is a measure of the diagnostic power of a filtering system.
• Sensitivity refers to the probability of a randomly selected good item being recommended by the system.
• Specificity is the probability of a randomly selected bad item refused by the recommender.
朝陽資管李麗華 66
life-half viewingthe:
decliningfor list ordered for theindex the:
tedefault vo the:
iteman for votethe:
,
ja
a,j
vj
d
v
Metrics(7/7)Metrics(7/7)
Expected utility• Their goal is to estimate the expected utility of a
particular ranked list to a user.• The expected utility of a ranked list of items
• The final score for an experiment
utility achievable maximum the:maxaR