Upload
andrei-tabarcea
View
223
Download
0
Embed Size (px)
Citation preview
8/8/2019 Handout Recsys Sac2010
1/139
Introduction to Recommender S stems
TutorialatACMSymposiumonAppliedComputing2010Sierre,Switzerland,22March2010
MarkusZanker
DietmarJannach
TUDortmund
- 1 -
8/8/2019 Handout Recsys Sac2010
2/139
MarkusZanker
AssistantprofessoratUniversityKlagenfurt
CEOofConfi WorksGmbH
DietmarJannach
ProfessoratTUDortmund,Germany
Researchbackground
and
interests
ApplicationofIntelligentSystemstechnologyinbusiness
Recommendersystemsimplementation&evaluation
Productconfigurationsystems
Webmining
Operationsresearch
- 2 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
3/139
Whatarerecommendersystemsfor?
Introduction
Howdo
they
work?
o a ora ve er ng
ContentbasedFiltering
KnowledgeBasedRecommendations
HybridizationStrategies
Howtomeasuretheirsuccess?
Evaluationtechni ues
CasestudyonthemobileInternet
Selected
recent
topics
AttacksonCFRecommenderSystems
RecommenderSystemsintheSocialWeb
What to ex ect?
- 3 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
4/139
Whatarerecommendersystemsfor?
Introduction
Howdo
they
work?
o a ora ve er ng
ContentbasedFiltering
KnowledgeBasedRecommendations
HybridizationStrategies
Howtomeasuretheirsuccess?
Evaluationtechni ues
CasestudyonthemobileInternet
Selected
recent
topics
AttacksonCFRecommenderSystems
RecommenderSystemsintheSocialWeb
What to ex ect?
- 4 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
5/139
- 5 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
6/139
Recommendationsystems RS helptomatchuserswithitems
Easeinformationoverload
Salesassistance uidance,advisor , ersuasion,
Differentsystemdesigns/paradigms
Basedon
availability
of
exploitable
data
Implicitandexplicituserfeedback
Goalto
identify
good
system
implementations
But:multipleoptimalitycriterionsexist
- 6 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
7/139
eren perspec ves aspec s
Depends ondomain andpurpose
No wholistic evaluation scenario exists
Retrieval perspective
Reduce search costs Provide correctproposals
Usersknow inadvance what they want
Recommendation perspective eren p y en y ems rom e ong a
Usersdid notknow about existence
- 7 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
8/139
re c on perspec ve
Predict to what degree users like anitem
Mostpopular evaluation scenario in
research
Interactionperspective
Educate users about the product domain
Convince/persuade users explain
Finally,
conversion perspective ommerc a s ua ons
Increase hit,clickthru,lookers to bookersrates
Optimize sales margins and profit
- 8 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
9/139
RSseen as afunction [AT05]
Given: sermo e e.g.ra ngs,pre erences, emograp cs,s ua ona con ex
Item
Relevance score
Scores responsible for ranking
Inpractical systems usually notallitems willbe scored,buttask is to find
most relevantones (selection task)
- 9 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
10/139
Recommender systems
reduce information overload
- 10 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
11/139
recommendations
- 11 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
12/139
o a ora ve: e me
whats popular among my
- 12 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
13/139
- more of the same what Ive
liked
- 13 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
14/139
Knowledge-based: tell me
what fits based on my
- 14 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
15/139
Hybrid: combinations ofvarious inputs and/or
composition of different
- 15 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
16/139
ros ons
Collaborative Nearly no ramp-upeffort, serendipity of
Requires some form ofrating feedback, cold start
results, learnsmarket segments
for new users and newitems
- -
to acquire, supportscomparisons
necessary, cold start fornew users, no surprises
Knowledge-based Deterministic recs,assured quality, no
cold-start can
Knowledge engineeringeffort to bootstrap,
basicall static does notresemble salesdialogue
react to short-term trends
- 16 -
8/8/2019 Handout Recsys Sac2010
17/139
Goals
,online conversion,
cost of ownership,
Improvement
Evaluation
Explorecombinationsofcollaborativeandknowledgebasedmethods
Hybridizationdesigns
Feedbackloopbyempiricalevaluations
- 17 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
18/139
- 18 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
19/139
Themostprominentapproachtogeneraterecommendations
usedbylarge,commercialecommercesites
wellunderstood,
various
al orithms
and
variations
exist
applicableinmanydomains(book,movies,DVDs,..)
Approach
use
the
"wisdom
of
the
crowd"
to
recommend
items
Basicassumptionandidea
Usersgiveratingstocatalogitems(implicitlyorexplicitly)
Customerswhohadsimilartastesinthepast,willhavesimilartastesinthe
future
- 19 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
20/139
Thebasictechnique:
Givenan"activeuser"(Alice)andanitemInotyetseenbyAlice
findasetofusers eers wholikedthesameitemsasAliceinthe astandwhohaverateditemI
use,e.g.theaverageoftheirratingstopredict,ifAlicewilllikeitemI
Somefirst
questions
Howdowemeasuresimilarity?
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
Howmanyneighborsshouldweconsider?
Howdowegenerateapredictionfromtheneighbors'ratings?
User1 3 1 2 3 3
User2 4 3 4 3 5User3 3 3 1 5 4
User4 1 5 5 2 1
- 20 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
21/139
a,b :users
ra,p :rating
of
user
a
for
item
p
P :setofitems,ratedbothbyaandb
Possiblesimilarityvaluesbetween 1and1
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
s m = ,sim =0,70
sim = 0,79
User3 3 3 1 5 4
User4 1 5 5 2 1
- 21 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
22/139
Takesdifferencesinratingbehaviorintoaccount
4
5
6 Alice
User1
User4
2
3
Ratings
0
Item1 Item2 Item3 Item4
Workswellinusualdomains,comparedwithalternativemeasures
suchascosinesimilarity
- 22 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
23/139
Acommonpredictionfunction:
Calculate,whether
the
neighbors'
ratings
for
the
unseen
item
i are
higher
orlowerthantheiraverage
Combinetheratingdifferences usethesimilaritywithaasaweight
Add/subtractthe neighbors'biasfromtheactiveuser'saverageanduse
- 23 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
24/139
Notallneighborratingsmightbeequally"valuable"
Agreementoncommonlylikeditemsisnotsoinformativeasagreementon
controversialitems
Possiblesolution: Givemoreweighttoitemsthathaveahighervariance
Valueofnumberofcorateditems
Use"significanceweighting",bye.g.,linearlyreducingtheweightwhenthe
numberof
co
rated
items
is
low
Intuition:Givemoreweightto"verysimilar"neighbors,i.e.,wherethe
similarityvalueiscloseto1.
Neighborhoodselection
Usesimilaritythresholdorfixednumberofneighbors
- 24 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
25/139
UserbasedCFissaidtobe"memorybased"
theratingmatrixisdirectlyusedtofindneighbors/makepredictions
doesnot
scale
for
most
real
world
scenarios
largeecommercesiteshavetensofmillionsofcustomersandmillionsof
items
Modelbasedapproaches
basedonanofflinepreprocessingor"modellearning"phase
atruntime,onl thelearnedmodelisusedtomake redictions
modelsareupdated/retrainedperiodically
largevarietyoftechniquesused
mo e u ngan up a ngcan ecompu a ona yexpens ve
- 25 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
26/139
Basicidea:
Usethesimilaritybetweenitems(andnotusers)tomakepredictions
LookforitemsthataresimilartoItem5
TakeAlice'sratingsfortheseitemstopredicttheratingforItem5
Item1 Item2 Item3 Item4 Item5
ce
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
- 26 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
27/139
Producesbetterresultsinitemtoitemfiltering
Ratingsareseenasvectorinndimensionalspace
Similarityiscalculatedbasedontheanglebetweenthevectors
Adjustedcosinesimilarity
takeaverageuserratingsintoaccount,transformtheoriginalratings
U:setofuserswhohaveratedbothitemsaandb
- 27 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
28/139
Itembasedfilteringdoesnotsolvethescalabilityproblemitself
PreprocessingapproachbyAmazon.com(in2003)
a cu a ea pa rw se ems m ar es na vance
Theneighborhoodtobeusedatruntimeistypicallyrathersmall,because
onlyitemsaretakenintoaccountwhichtheuserhasrated
Itemsimilaritiesaresupposedtobemorestablethanusersimilarities
Memoryrequirements
p o pa rw ses m ar es o ememor ze =num ero ems n
theory
Inpractice,thisissignificantlylower(itemswithnocoratings)
Furtherreductionspossible
Minimumthresholdforcoratings
Limittheneighborhoodsize(mightaffectrecommendationaccuracy)
- 28 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
29/139
PureCFbasedsystemsonlyrelyontheratingmatrix
Explicitratings
os common yuse o , o er responsesca es
Researchtopics
"Optimal"granularityofscale;indicationthat10pointscaleisbetteracceptedin
moviedomain
Multidimensionalratings
(multiple
ratings
per
movie)
Challenge
Usersnotalwayswillingtoratemanyitems;sparseratingmatrices
Howtostimulateuserstoratemoreitems?
mp c ra ngs
clicks,pageviews,timespentonsomepage,demodownloads
Canbeusedinadditiontoexplicitones;questionofcorrectnessofinterpretation
- 29 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
30/139
Coldstartproblem
Howtorecommendnewitems?Whatdorecommendtonewusers?
Ask/forceuserstorateasetofitems
Useanothermethod(e.g.,contentbased,demographicorsimplynon
personalized)intheinitialphase
Alternatives
se e era gor ms eyon neares ne g orapproac es
Example:
In
nearest
neighbor
approaches,
the
set
of
sufficiently
similar
neighbors
might
betosmalltomakegoodpredictions
Assume"transitivity"ofneighborhoods
- 30 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
31/139
RecursiveCF
Assumethereisaverycloseneighbornofuwhohoweverhasnotratedthe
targetitem
i yet.
Idea:
ApplyCFmethodrecursivelyandpredictaratingforitemi fortheneighbor
neighbor
Alice 5 3 4 4 ?
User1 3 1 2 3 ?
sim =0,85
User2 4 3 4 3 5
User3 3 3 1 5 4
Predictrating
- 31 - Dietmar Jannach and Markus Zanker
User4 1 5 5 2 1
User1
8/8/2019 Handout Recsys Sac2010
32/139
"Spreadingactivation"
Idea:Usepathsoflengths>3
torecommend
items
Length3:RecommendItem3toUser1
Length5:Item1alsorecommendable
- 32 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
33/139
Plethoraofdifferenttechniquesproposedinthelastyears,e.g.,
Matrixfactorizationtechniques,statistics
singularvalue
decomposition,
principal
component
analysis
Associationrulemining
compare:shoppingbasketanalysis
clusteringmodels,
Bayesian
networks,
probabilistic
Latent
Semantic
Analysis
Variousothermachinelearningapproaches
Costsofpreprocessing
Usuallynotdiscussed
ncremen a
up a esposs e
- 33 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
34/139
LatentSemanticIndexin
developedintheinformationretrievalfield;aimsatdetectionofhidden
"factors"(topics)ofadocumentandthereductionofthedimensionality
basedonSingularValueDecomposition(SVD)
SVDbased recommendation
decomposematrix
/find
factors
factorscanbegenre,actorsbutalsononunderstandableones
on yre a n en= o mos mpor an ac ors
canalsohelptoremovenoiseinthedata
make
recommendation
in
the
lower
dimensional
space e.g.,usenearestneighbors
HeavilyusedinNetflixprizecompetition;specificmethodsproposed
- 34 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
35/139
Commonlyusedforshoppingbehavioranalysis
aimsatdetectionofrulessuchas
"I acustomer urchasesbab oodthenhealsobu sdia ers
in70%ofthecases"
Associationruleminingalgorithms
candetectrulesoftheformX=>Y(e.g.,babyfood=>diapers)fromasetof
salestransactions
measureofquality:support,confidence
usedasathresholdtocutoffunimportantrules
- 35 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
36/139
Item1 Item2 Item3 Item4 Item5
transform5pointratingsintobinary
ratings
(1
=
above
user
average)
User1 1 0 1 0 1
User2 1 0 1 0 1
Minerulessuchas
Item1=>Item5
User3 0 0 0 1 1
User4 0 1 1 0 0
,
Makerecommendations
for
Alice
(basic
method)
Determine"relevant"rulesbasedonAlice'stransactions
(theaboverulewillberelevantasAliceboughtItem1)
ComputeunionofY'snotalreadyboughtbyAlice
'
Differentvariationspossible
dislikestatements userassociations..
- 36 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
37/139
Basicidea simplisticversionforillustration :
giventheuser/itemratingmatrix
determinethe
robabilit
that
user
Alice
will
like
an
item
i
basetherecommendationonsuchtheseprobabilities
CalculationofratingprobabilitiesbasedonBayes Theorem
Howprobableisratingvalue"1"forItem5givenAlice'spreviousratings?
CorrespondstoconditionalprobabilityP(Item1=1|X),where
= ' = = = =
CanbeestimatedbasedonBayes'Theorem
Assumption:Ratingsareindependent(?)
- 37 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
38/139
Item1 Item2 Item3 Item4 Item5
Alice 1 3 3 2 ?
User1 2 4 2 2 4
User2 1 3 3 5 1
User3 4 5 2 3 3
User4 1 1 5 2 1
Zeros(smoothingrequired),computationallyexpensive,
like/dislikesimplificationpossible
- 38 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
39/139
Useaclusterbasedapproach
assumeusersfallinasmallnumberofsubgroups(clusters)
Make
redictionsbased
on
estimates
probabilityofAlicefallingintoclusterc
probabilityofAlicelikingitemi givenacertainclusterandherpreviousratings
Numberof
classes
and
model
parameters
have
to
be
learned
from
data
in
advance(EMalgorithm)
Others:
BayesianNetworks,ProbabilisticLatentSemanticAnalysis,.
Emp r ca ana ys ss ows:
Probabilisticmethodsleadtorelativelygoodresults(moviedomain)
Noconsistentwinner;smallmemor foot rintofnetworkmodel
- 39 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
40/139
Pros:
wellunderstood,workswellinsomedomains,noknowledgeengineeringrequired
requiresusercommunity,sparsityproblems,nointegrationofotherknowledgesources,
noexplanationofresults
WhatisthebestCFmethod?
Inwhich
situation
and
which
domain?
Inconsistent
findings;
always
the
same
domains
anddatasets;Differencesbetweenmethodsareoftenverysmall(1/100)
Howtoevaluatethepredictionquality?
MAE/RMSE:WhatdoesanMAEof0.7actuallymean?
Whataboutmultidimensionalratings?
- 40 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
41/139
- 41 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
42/139
WhileCF methodsdonotrequireanyinformationabouttheitems,
itmightbereasonabletoexploitsuchinformation;and
recommendfantasy
novels
to
people
who
liked
fantasy
novels
in
the
past
Whatdoweneed:
someinformationabouttheavailableitemssuchasthegenre("content")
somesortofuserprofile describingwhattheuserlikes(thepreferences)
Thetask:
locate/recommenditemsthatare"similar"totheuserpreferences
- 42 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
43/139
" "
Thegenreisactuallynotpartofthecontentofabook
MostCBrecommendationmethodsoriginatefromInformationRetrieval
goalistofindandrankinterestingtextdocuments(newsarticles,webpages)
theitemdescriptionsareusuallyautomaticallyextracted(importantwords)
Fuzzy
border
between
content
based
and
"knowledge
based"
RS Here:
classicalIR basedmethodsbasedonkeywords
nomeansendsrecommendationknowledgeinvolved
- 43 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
44/139
Simpleapproach
Computethesimilarityofanunseenitemwiththeuserprofilebasedonthe
. .
Oruseandcombinemultiplemetrics
- 44 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
45/139
Simplekeywordrepresentationhasitsproblems
inparticularwhenautomaticallyextractedas
notevery
word
has
similar
importance
longerdocumentshaveahigherchancetohaveanoverlapwiththeuserprofile
Standardmeasure:TFIDF
EncodestextdocumentsinmultidimensionalEuclidianspace
weightedterm
vector
TF:Measures,howoftenaterma ears densit inadocument
assumingthatimportanttermsappearmoreoften
normalizationhastobedoneinordertotakedocumentlengthintoaccount
Givenakeywordi andadocumentj
TFIDF(i,j)=TF(i,j)*IDF(i)
- 45 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
46/139
- 46 - Dietmar Jannach and Markus Zanker
.
8/8/2019 Handout Recsys Sac2010
47/139
Vectorsareusuallylongandsparse
Improvements
removes opwor s a , e ,..
usestemming
sizecutoffs(onlyusetopnmostrepresentativewords,e.g.around100)
uselexicalknowledge,usemoreelaboratemethodsforfeatureselection
detectionofphrasesasterms(suchasUnitedNations)
m tat ons
semanticmeaningremainsunknown
example:
usage
of
a
word
in
a
negative
context "thereisnothingonthemenuthatavegetarianwouldlike.."
Usualsimilaritymetrictocomparevectors:Cosinesimilarity(angle)
- 47 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
48/139
Simplemethod:nearestneighbors
GivenasetofdocumentsDalreadyratedbytheuser(like/dislike)
Findthe
n
nearest
neighbors
of
an
not
yet
seen
item
i in
D
Taketheseratingstopredictarating/votefori
(Variations:neighborhoodsize,lower/uppersimilaritythresholds..)
Usedin
combination
with
method
to
model
long
term
preferences
Quer basedretrieval:Rocchio's method
TheSMARTSystem:Usersareallowedtorate(relevant/irrelevant)retrieved
documents(feedback)
Queriesarethenautomaticallyextendedwithadditionalterms/weightof
relevantdocuments
- 48 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
49/139
DocumentcollectionsD+ andD
,, usedtofinetune oftenonlypositivefeedback
isused
- 49 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
50/139
8/8/2019 Handout Recsys Sac2010
51/139
Sidenote:Conditionalindependenceofeventsdoesinfactnothold
"NewYork","HongKong"
Still,
oodaccurac
can
be
achieved
Booleanrepresentationsimplistic
positionalindependenceassumed
keywordcountslost
Moreelaborateprobabilisticmethods
e.g.,estimateprobabilityoftermvoccurringinadocumentofclassCby
relativefrequencyofvinalldocumentsoftheclass
SupportVectorMachines,..
Useotherinformationretrievalmethods(usedbysearchengines..)
- 51 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
52/139
Keywordsalonemaynotbesufficienttojudgequality relevanceofa
documentorwebpage
up
to
dateness,
usability,
aesthetics,
writing
style
contentmayalsobelimited/tooshort
contentmaynotbeautomaticallyextractable(multimedia)
ampupp aserequ re
Sometraining
data
is
still
required
Web2.0:Useothersourcestolearntheuserpreferences
Overspecialization
Algorithmstendtopropose"moreofthesame"
Or:too
similar
news
items
- 52 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
53/139
8/8/2019 Handout Recsys Sac2010
54/139
Conversationalinteractionstrategy
Opposedtooneshotinteraction
Elicitationof
user
re uirements
Transferofproductknowledge(educatingusers)
Explicit
domain
knowledge Requirementselicitationfromdomainexperts
Systemmimicsthebehaviourofexperiencedsalesassistant
Bestpracticesalesinteractions
Can
guarantee
correct
recommendations
(determinism)
- 54 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
55/139
8/8/2019 Handout Recsys Sac2010
56/139
8/8/2019 Handout Recsys Sac2010
57/139
now e ge ase
Usuallymediatesbetweenusermodelanditemproperties
Variables Usermodelfeatures(requirements),Itemfeatures(catalogue)
Setofconstraints
B)
Hardand
soft/weighted
constraints
Solution references
Deriveasetofrecommendableitems
Fulfillingsetofapplicableconstraints
Applicabilityof
constraints
depends
on
current
user
model
Explanations transparentlineofreasoning
- 57 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
58/139
Severa erentrecommen at ontas s:
Findasetofuserrequirementssuchthatasubsetofitemsfulfills allconstraints
Askuser
which
re uirements
should
be
relaxed/modified
such
that
some
items
exist
that
do
not
violateanyconstraint(e.g.Trip@dvice [MRB04])
Similar to findamaximally succeeding subquery (XSS)[McSherry05]
Allproposed items have to fulfill the sameset ofconstraints (e.g.[FFJ+06])
Com ute relaxations based on redetermined wei hts
Rankitemsaccordingtoweightsofsatisfiedsoftconstraints
an
ems
ase
on
era o
o
u e
cons ra n s
Doesnotrequireadditionalrankingscheme
- 58 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
59/139
Powershot XYWeight LHS RHS
=
Know ledge Base: Product catalogue:
Lower focal length 35
Upper focal length 140
.
C2: 20 Motives = Landscape Low. foc. Length =28mmandPrice>350EUR
Computationofminimalrevisionsofrequirements
Eventuallyguidedbysomepredefinedweightsorpast communitybehavior
- 60 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
61/139
Twomaximallysucceedingsubqueries
XSS={{C1},{C2,C3}}
Selectioncan
be
based
on
constraints
wei hts
RelaxC1andrecommendLumix
- 61 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
62/139
Applicable: LHS(c) is satisfied by user model, i.e. {C1,C2,C3}
Satisfied: not applicable or RHS(c) is satisfied by catalogue item,i.e. {C1} for Powershot and {C2,C3} for Lumix
Onl items that satisf all hard constraints receive ositive score(fulfilled for both)
Ratio of penalty values of satisfied constraints
Ranked #1: Lumix 35/ 60
Ranked #2: Panasonic: 25/ 60
Cutoff recommendation list after n items
- 62 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
63/139
Morevariants ofconstraint representation:
Interactiveacquisition ofsolution preferences
. .
To explore cheaper variants ofcurrent proposal
Max.cost of350EURfor Canon brand initially specified,higher price
sensitivity for Panasonic brand?
Aging/Outdating ofolder
preferences
Construction ofdecision model/tradeoffanalysis
Disjunctive RHS
IFlocation requ.=nearbyTHENlocation =Ktn ORlocation =Stmk
.g.
e
erma
spa s ou e oca e e er n
ar n a or yr a
- 63 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
64/139
Morevariants ofrecommendation task
Finddiversesets ofitems
Notion
of
similarity/dissimilarity
Idea that users navigate aproduct space
If recommendations are more diversethan users can navigate viacritiques on
recommended entr ointsmore efficientl less ste s ofinteraction
Bundling ofrecommendations
E.g.travel packages,skin care treatments or financial portfolios
RSfor differentitemcategories,CSPrestricts configuring ofbundles
- 64 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
65/139
Findoptimalsequence ofconversational moves
Recommendation is less about optimalalgorithms,butmore about
Asking for requirements,proposing items (that can be critiqued)or
showing explanatory texts are allconversational moves
Interactionprocess towards preference elicitation andmaybe also
user persuasion
- 65 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
66/139
Costofknowledgeacquisition
Fromdomainexperts
Fromusers
Fromwebresources
Accuracyofpreferencemodels
Veryfine
granular
preference
models
require
many
interaction
cycles
Independenceassumptioncanbechallenged
Preferencesarenotalwaysindependentfromeachother
E.g.asymmetricdominanceeffectsandDecoyitems
- 66 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
67/139
- 67 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
68/139
A t ree asetec n quesarenatura y ncorporate yagoo sa esass stance
(atdifferentstagesofthesalesact)buthavetheirshortcomings
Ideaofcrossingtwo(ormore)species/implementations
hybrida [lat.]:denotesanobjectmadebycombiningtwodifferentelements
Avoidsomeoftheshortcomings
Reachdesirable
properties
not
(or
only
inconsistently)
present
in
parent
individuals
Differenthybridizationdesigns
Paralleluseofseveralsystems
Monolithicexploiting
different
features
Pipelinedinvocationofdifferentsystems
- 68 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
69/139
Onlyasinglerecommendationcomponent
Hybridizationisvirtualinthesensethat
Features/knowledgesourcesofdifferentparadigmsarecombined
- 69 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
70/139
Combinationofseveralknowledgesources
E.g.:Ratingsanduserdemographicsorexplicitrequirementsandneedsused
for
similarity
computation
H bridcontentfeatures:
Socialfeatures:Movieslikedbyuser
Contentfeatures:Comedieslikedbyuser,dramaslikedbyuser
y r ea ures:user esmanymov es a arecome es,
thecommonknowledgeengineeringeffortthatinvolvesinventinggood
featurestoenablesuccessfullearning[BHC98]
- 70 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
71/139
Contentboostedcollaborativefiltering MMN02
Basedoncontentfeaturesadditionalratingsarecreated
E. .Alice
likes
Items
1
and
3
unar
ratin s
Item7issimilarto1and3byadegreeof0,75
ThusAlicelikesItem7by0,75
Significance
weighting
and
adjustment
factors Peerswithmorecorateditemsaremoreimportant
Higherconfidenceincontentbasedprediction,ifhighernumberofown
ratings
+
Citationsinterpretedascollaborativerecommendations
- 71 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
72/139
Outputofseveralexistingimplementationscombined
Leastinvasivedesign
Someweightingorvotingscheme
Weightscanbelearneddynamically
- 72 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
73/139
Compute weig te sum:
Recommender 2Recommender 1tem 0.8 2
Item2 0.9 1
Item3 0.4 3
Item4 0
Item1 0.5 1
Item2 0
Item3 0.3 2
Item4 0.1 3tem 0Item5 0
Item1 0,65 1
Recommender weighted (0.5:0.5)
em ,Item3 0,35 3
Item4 0,05 4
Item5 0,00
- 73 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
74/139
BUT,howtoderiveweights?
Estimate,e.g.byempiricalbootstrapping
Historicdataisneeded
Computedifferentweightings
Decidewhichonedoesbest
ynam ca us men
o
we g s
Startwithforinstanceuniformweightdistribution
Foreachuseradaptweightstominimizeerrorofprediction
- 74 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
75/139
LetsassumeAliceactuallybought clickedonitems1and4
IdentifyweightingthatminimizesMeanAbsoluteError(MAE)
-
Beta1 Beta2 rec1 rec2 error MAE
Item1 0,5 0,8 0,23Item4 0,1 0,0 0,99 0,610,1 0,9
em , , ,
Item4 0,1 0,0 0,97
Item1 0,5 0,8 0,35
Item4 0,1 0,0 0,95
0,63
0,650,5
0,3 0,7
0,5
Item1 0,5 0,8 0,41
Item4 0,1 0,0 0,93
Item1 0,5 0,8 0,47
0 1 0 0 0 91
0,670,7 0,3
- 75 -
Dietmar Jannach and Markus Zanker
,, ,
8/8/2019 Handout Recsys Sac2010
76/139
BUT:didntrec1actuallyrankItems1and4higher?
Item1 0.8 2
Recommender 2
Item1 0.5 1
Recommender 1
tem .
Item3 0.4 3
Item4 0
Item5 0
Item2 0
Item3 0.3 2
Item4 0.1 3
Item5 0
Becarefulwhenweighting!
Recommendersneedtoassigncomparablescoresoverallusersanditems
Somescore
trans ormation
cou
e
necessary
Stableweightsrequireseveraluserratings
- 76 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
77/139
8/8/2019 Handout Recsys Sac2010
78/139
8/8/2019 Handout Recsys Sac2010
79/139
Successorsrecommendationsarerestrictedbypredecessor
Whereforall k>1
Subsequentrecommendermaynotintroduceadditionalitems
Thusproducesverypreciseresults
- 79 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
80/139
Recommender 2Recommender 1em .
Item2 0.9 1
Item3 0.4 3
Item4 0
Item1 0.5 1
Item2 0
Item3 0.3 2
Item4 0.1 3
temItem5 0
Recommender cascaded (rec1, rec2)
,
Item2 0,00
Item3 0,40 2
Item4 0,00
Item5 0 00
Recommendationlistiscontinuallyreduced
Firstrecommenderexcludesitems
Removeabsolute
no
go
items
(e.g.
knowledge
based)
Secondrecommenderassignsscore
- 80 -
Dietmar Jannach and Markus Zanker
. .
8/8/2019 Handout Recsys Sac2010
81/139
uccessorexp o samo e e a u ypre ecessor
xamp es:
Fab:
Onlinenewsdomain
CBrecommenderbuildsusermodelsbasedonweightedtermvectors
CFidentifiessimilarpeersbasedontheseusermodelsbutmakesrecommendationsbasedonratings
o a ora vecons ra n
ase
me a
eve
Collaborativefilteringlearnsaconstraintbase
KnowledgebasedRScomputesrecommendations
- 81 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
82/139
Onlyfewworksthatcomparestrategiesfromthemetaperspective
Likeforinstance,[Burke02]
Most
datasets
do
not
allow
to
com are
different
recommendation
aradi ms i.e.ratings,requirements,itemfeatures,domainknowledge,critiquesrarely
availableinasingledataset
Monolithic:somepreprocessingefforttradedinformoreknowledgeincluded
Parallel:requires
careful
matching
of
scores
from
different
predictors
:
Netflixcompetition stackingrecommendersystems
Adaptiveswitchingofweightsbasedonusermodel,contextandmeta
features
- 82 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
83/139
- 83 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
84/139
Amyriadoftechniqueshasbeenproposed,but
Whichoneisbestinagivenapplicationdomain?
What
are
the
success
factors
of
different
techni ues? Comparativeanalysisbasedonanoptimalitycriterion?
Researchquestionsare:
IsaRSefficientwithrespecttoaspecificcriterialikeaccuracy,user
sa s ac on,response me,seren p y,on neconvers on,rampupe or s,
.
Docustomerslike/buyrecommendeditems?
Docustomers
buy
items
they
otherwise
would
have
not?
Aretheysatisfiedwitharecommendationafterpurchase?
- 84 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
85/139
eren perspec ves aspec s
Depends ondomain andpurpose
No wholistic evaluation scenario exists
Retrieval perspective
Reduce search costs
Provide correctproposals
Usersknow inadvance what they want
Recommendation perspective
eren p y Usersdid notknow about existence
- 85 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
86/139
re c on perspec ve
Predict to what degree users like anitem
Mostpopular evaluation scenario inresearch
Interactionperspective
Educate users about the product domain
Persuade users as anintentional
planned effect!?
Finally,conversion perspective
ommerc a
s ua ons Increase hit,clickthru,lookers to bookersrates
Optimize sales margins and profit
- 86 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
87/139
Characterizingdimensions:
Whoisthesubject thatisinthefocusofresearch?
Whatresearchmethodsarea lied?
Inwhichsetting doestheresearchtakeplace?
Subject Online customers, students, historicalonline sessions, computers,
Research method Experiments, quasi-experiments, non-experimental research
Settin Lab real-world scenarios
- 87 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
88/139
Experimentalvs.nonexperimental observational researchmethods
Experiment(test,trial):
Anexperimentisastudyinwhichatleastonevariableismanipulatedand
unitsarerandomlyassignedtodifferentlevelsorcategoriesofmanipulated
variable(s).
Units:users,historicsessions,
Manipulatedvariable:
type
of
RS,
recommended
items,
Cate ories of mani ulated variable s : contentbased RS collaborative RS
- 88 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
89/139
- 89 -
Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
90/139
MeanAbsoluteError MAE computesthedeviationbetweenpredicted
ratingsandactualratings
RootMean
Square
Error
(RMSE)
is
similar
to
MAE,
but
places
more
- 90 -
Dietmar Jannach and Markus Zanker
, ,
8/8/2019 Handout Recsys Sac2010
91/139
PrecisionandRecall,twostandardmeasuresfromInformationRetrieval,
aretwoofthemostbasicmethodsforevaluatingrecommendersystems
E.g.Considerthemoviepredictionsmadebyasimplifiedrecommender
thatclassesmoviesas oodorbad
Theycanbesplitintofourgroups:
Realit
ActuallyGood ActuallyBad
on Rated TruePositive(tp) FalsePositive(fp)
Predict
i
ooRated
Bad
FalseNegative (fn) True Negative(tn)
- 91 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
92/139
Precision:ameasureofexactness,determinesthefractionofrelevant
itemsretrievedoutofallitemsretrieved
E.g.theproportionofrecommendedmoviesthatareactuallygood
Recall:a
measure
of
completeness,
determines
the
fraction
of
relevant
E.g.theproportionofallgoodmoviesrecommended
- 92 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
93/139
ne exper mentat on n ne exper mentat on
Historic session Live interaction
Ratings, transactions Ratings, feedback
,interpreted as dislikes
items unknown
determined
Better for estimating Recall Better for estimating Precision
- 93 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
94/139
RankScoreextendstherecallmetrictotakethepositionsofcorrect
itemsinarankedlistintoaccount
Particularlyimportantinrecommendersystemsaslowerrankeditemsmaybe
overlookedby
users
RankScoreisdefinedastheratiooftheRankScoreofthecorrectitems
o es eore ca an coreac eva e or euser, .e.
h isthesetofcorrectlyrecommendeditems,i.e.hits
rankreturnstheposition(rank)ofanitem
Tisthesetofallitemsofinterest
isthe
rankinghalflife
- 94 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
95/139
Net ix competition Web-based movie rental
Prize of $1,000,000 for accuracy
improvement of 10% compared to ownCinematch system.
s or ca a ase
~480K users rated ~18K movies on ascale of 1 to 5
~100M ratings
Last 9 ratings/user withheld Probe set for teams for evaluation
Quiz set evaluates teams submissions
Test set used by Netflix to determinewinner
- 95 -
8/8/2019 Handout Recsys Sac2010
96/139
Winning team combined more than 100 different predictors
Small group of controversial movies responsible for high share of error rate
E. . sin ular value decom osition, a techni ue for derivin the underl in
factors that cause viewers to like or dislike movies, can be used to find
connections between movies
Interestin l most teams used similar rediction techni ues
Very complex and specialized models
Switching of model parameters based on user/session features
Number of rated items
Content features added noise
- 96 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
97/139
8/8/2019 Handout Recsys Sac2010
98/139
Quasiexperiments
Lackrandomassignmentsofunitstodifferenttreatments
Nonexperimental/observationalresearch
Longitudinalresearch
Observationsover
long
period
of
time
.g. ustomer et meva ue,return ngcustomers
Casestudies
Focusgroup
Interviews
Thinkaloudprotocols
- 98 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
99/139
SkiMatcherResortFinderintroducedbySkiEurope.comtoprovideusers
withrecommendationsbasedontheirpreferences
ConversationalRS
questionandanswerdialog
matchingofuserpreferenceswithknowledgebase
e ga oan av soneva ua e e
effectivenessoftherecommenderovera
4monthperiodin2001
Classifiedas
a
quasi
experiment
asusersdecideforthemselvesifthey
wanttousetherecommenderornot
- 99 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
100/139
u y ugus ep em er c o er
UniqueVisitors 10,714 15,560 18,317 24,416
SkiMatcherUsers 1,027 1,673 1,878 2,558
NonSkiMatcher Users 9,687 13,887 16,439 21,858
RequestsforProposals 272 506 445 641
SkiMatcherUsers 75 143 161 229
NonSkiMatcher Users 197 363 284 412
Conversion 2.54% 3.25% 2.43% 2.63% SkiMatcherUsers 7.30% 8.55% 8.57% 8.95%
NonSkiMatcher Users 2.03% 2.61% 1.73% 1.88%
IncreaseinConversion 359% 327% 496% 475%
[Delgado and Davidson, ENTER 2002]
- 100 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
101/139
Thenatureofthisresearchdesignmeansthatquestionsofcausality
cannotbeanswered,suchas
Areusersoftherecommendersystemsmorelikelyconvert?
Doestherecommendersystemitselfcauseuserstoconvert?
However,significantcorrelationbetweenusingtherecommender
system
and
making
a
request
for
a
proposal
Sizeofeffecthasbeenreplicatedinotherdomains!
Electronicconsumerproducts
- 101 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
102/139
8/8/2019 Handout Recsys Sac2010
103/139
Evaluationdesigns ACMTOIS20042008
Intotal12articles onRS
50%movie domain
75%offlineexperimentation
2user experiments under labconditions
qua a veresearc
va a ty o ata eav y ases w at s one
Many tagrecommendersproposed recently
Tenorat RecSys09to foster liveexperiments
Publicinfrastructures to enable A/B
tests
- 103 - Dietmar Jannach and Markus Zanker
What are recommender systems for?
8/8/2019 Handout Recsys Sac2010
104/139
Whatarerecommendersystemsfor?
Introduction
Howdotheywork?
o a ora ve
er ng ContentbasedFiltering
KnowledgeBasedRecommendations
HybridizationStrategies
Howto
measure
their
success?
Evaluationtechni ues
CasestudyonthemobileInternet
Selectedrecenttopics
Attacks
on
CF
Recommender
Systems RecommenderSystemsintheSocialWeb
What to ex ect?
- 104 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
105/139
- 105 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
106/139
TheMovieLensdataset,others
FocusonimprovingtheMeanAbsoluteError
Nearlynorealworldstudies
Exceptions,e.g.,Diasetal.,2008.
eGrocerapplication
CF
method Shortterm:belowone ercent
Longterm,indirecteffectsimportant
Thisstudy
Measuringimpact
of
different
RS
algorithms
in
Mobile
Internet
scenario
Morethan3%moresalesthroughpersonalizeditemordering
- 106 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
107/139
Gamedownloadplatformoftelco provider
Accessviamobilephone
directdownload,char edtomonthl statement
lowcostitems(0.99centtofewEuro)
Extensiontoexistingplatform
"Myrecommendations"
Incategory
personalization
(where
applicable)
,
Controlgroup
naturaloreditorialitemranking
no"My
Recommendations"
- 107 - Dietmar Jannach and Markus Zanker
.
8/8/2019 Handout Recsys Sac2010
108/139
6recommendationalgorithms,1controlgroup A Btest
CF(itemitem,SlopeOne),Contentbasedfiltering,SwitchingCF/Content
basedhybrid,toprating,topselling
Testperiod:
4weeksevaluationperiod
About150,000usersassignedrandomlytodifferentgroups
Onlyexperiencedusers
H1:Pers.recommendationsstimulate moreuserstoviewitems
H2:Person.recommendationsturn morevisitorsintobuyers
H3:Pers.
recommendations
stimulate
individual users to
view
more
items
H3:Pers.recommendationsstimulateindividual users tobuymoreitems
- 108 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
109/139
Clickandpurchasebehaviorofcustomers
Customersarealwaysloggedin
Allnavi ationactivitiesstoredins stem
Measurementstakenindifferentsituations
MyRecommendations,startpage,postsales,incategories,overalleffects
Metrics:
item
viewers/platform
visitors item urchasers latform visitors
itemviewspervisitor
purchasespervisitor
Implicit
and
explicit
ratings
Itemview,itempurchase,explicitratings
- 109 - Dietmar Jannach and Markus Zanker
" "
8/8/2019 Handout Recsys Sac2010
110/139
Itemviews customer Purchases customer
Itemviews:
ExceptSlopeOne,allpersonalizedRSoutperformnonpersonalizedtechniques
Itempurchases
measura ys mu a eusers o uy own oa more ems
Contentbasedmethoddoesnotworkwellhere
Conversionrates:Nostrongeffects
- 110 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
111/139
per visitor rate
Note: Only 2 demosin top 30 downloads
Demosandnonfreegames:
Previousfigures
counted
all
downloads
Figureshows
Personalizedtechniquescomparabletotopsellerlist
However,canstimulateinterestindemogames
Note:Ratingpossibleonlyafterdownload
- 111 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
112/139
Itemviews/visitor Purchases/visitor
Findings " ",
notworkwell
TopRatingandSlopeOnenearlyexclusivelystimulatedemodownloads(Not
TopSellerundcontrolgroupsellnodemos
- 112 - Dietmar Jannach and Markus Zanker
O ll b f d l d (f f )
8/8/2019 Handout Recsys Sac2010
113/139
Overallnumberofdownloads(free+nonfreegames)
Notes:
Incategory
measurementsnot
Paygames
only
shownhere.
Contentbasedmethod
outperformsothers
in
eren ca egor es
(halfprice,newgames,
eroticgames)
Effect:3.2to3.6%sales
increase!
- 113 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
114/139
Mostprobablycausedbysizeofdisplays
Inaddition:Particularityofplatform;ratingonlyafterdownload
Insufficientcoverage
for
standard
CF
methods
Implicitratings
socount temv ewsan tempurc ases
IncreasethecoverageofCFalgorithms
MAEhowevernotasuitablemeasureanymoreforcomparingalgorithms
Summary
Significantsalesincreasecanbereached!(max.1%inpastwithother
activities
Morestudiesneeded,ValueofMAEmeasure
Recommendationinnavigationalcontext
- 114 - Dietmar Jannach and Markus Zanker
Whatarerecommendersystemsfor?
8/8/2019 Handout Recsys Sac2010
115/139
Introduction
Howdotheywork?
o a ora ve
er ng ContentbasedFiltering
KnowledgeBasedRecommendations
HybridizationStrategies
How
to
measure
their
success? Evaluationtechni ues
CasestudyonthemobileInternet
Selectedrecenttopics
AttacksonCFRecommenderSystems
RecommenderSystemsintheSocialWeb
What to ex ect?
- 115 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
116/139
- 116 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
117/139
Individualsmaybeinterestedtopushsomeitemsbymanipulatingthe
recommendersystem
Individualsmight
be
interested
to
decrease
the
rank
of
other
items
Somesimplymightmaywanttosabotagethesystem..
" "
Notanewissue..
A sim le strate ?
(Automatically)createnumerousfakeaccounts/profiles
Issuehighorlowratingstothe"targetitem"
==> W notwor orne g or ase recommen ers
==> Moreelaborateattackmodelsrequired
==>Goalistoinsertprofilesthatwillappearinneighborhoodofmany
- 117 - Dietmar Jannach and Markus Zanker
Push Nuke
8/8/2019 Handout Recsys Sac2010
118/139
Push Nuke
Notthesameeffectsobserved
Howcostlyisittomakeanattack?
Howmanyprofileshavetobeinserted?
Isknowledgeabouttheratingsmatrixrequired?
usuallyitisnotpublic,butestimatescanbemade
gor m epen a y
Istheattackdesignedforaparticularrecommendationalgorithm?
Howeasyisittodetecttheattack
- 118 - Dietmar Jannach and Markus Zanker
Generalschemeofanattackprofile
8/8/2019 Handout Recsys Sac2010
119/139
p
Attackmodelsmainlydifferinthewaytheprofilesectionsarefilled
Randomattack
model
Takerandomvaluesforfilleritems
Typicaldistributionofratingsisknown,e.g.,forthemoviedomain
(Average
3.6,
standard
deviation
around
1.1) Limitedeffectcomparedwithmoreadvancedmodels
- 119 - Dietmar Jannach and Markus Zanker
TheAverageAttack
8/8/2019 Handout Recsys Sac2010
120/139
g
usetheindividualitem'sratingaverageforthefilleritems
intuitivel ,thereshouldbemorenei hbors
additionalcost
involved:
find
out
the
average
rating
of
an
item
moreeffectivethanRandomAttackinuserbasedCF
Bytheway:whatdoeseffectivemean?
Possible
metrics
to
measure
the
introduced
bias
deviationingeneralaccuracyofalgorithm
Stability
c angeinpre iction oratargetitem e ore a terattac
Inaddition:rankmetrics
HowoftendoesanitemappearinTopNlists(before/after)
- 120 - Dietmar Jannach and Markus Zanker
BandwagonAttack
8/8/2019 Handout Recsys Sac2010
121/139
g
Exploitsadditionalinformationabout thecommunityratings
Sim leidea:
Addprofiles
that
contain
high
ratings
for
"blockbusters"
(in
the
selected
items);userandomvaluesforthefilleritems
Will intuitivel lead to more nei hbors
SegmentAttack
Finditems
that
are
similar
to
the
target
item,
i.e.,
are
probably
liked
by
the
samegroupofpeople(e.g.,otherfantasynovels)
Injectprofilesthathavehighratingsforfantasynovelsandrandomorlow
ratingsforothergenres
Thus,item
will
be
pushed
within
the
relevant
community
- 121 - Dietmar Jannach and Markus Zanker
Ingeneral
8/8/2019 Handout Recsys Sac2010
122/139
Effectdependsmainlyontheattacksize(numberoffakeprofilesinserted)
Bandwagon/AverageAttack: Biasshiftof1.5points ona5pointscaleat3%
attacksize(3%ofprofilesarefakedaftertheattack)
AverageAttackslightlybetterbutrequiresmoreknowledge
1.5pointsshiftissignificant;3%attacksizehowevermeansinsertinge.g.,
30.000profiles
into
one
million
rating
database
Itembasedrecommenders
Farmorestable;only0.15pointspredictionshiftachieved
Exception:
Segment
attack
successful
(was
designed
for
item
based
method) Hybridrecommendersandothermodelbasedalgorithmscannotbeeasily
biased(withthedescribed/knownattackmodels)
- 122 - Dietmar Jannach and Markus Zanker
Usemodelbasedorhybridalgorithms
8/8/2019 Handout Recsys Sac2010
123/139
Increaseprofileinjectioncosts
ap c as Lowcostmanualinsertion
detectgroupsofuserswhocollaboratetopush/nukeitems
monitordevelopment
of
ratings
for
an
item
changesinaveragerating
changesinratingentropy
timedependentmetrics(bulkratings)
usemachine
learning
methods
to
discriminate
real
from
fake
profiles
- 123 - Dietmar Jannach and Markus Zanker
Notdiscussedhere:Privacyensuringmethods
8/8/2019 Handout Recsys Sac2010
124/139
Distributedcollaborativefiltering,dataperturbation
Vulnerabilityofsomeexistingmethodsshown
Speciallydesignedattackmodelsmayalsoexistforuptonowratherstable
methods
Incorporationofmoreknowledgesources/hybridizationmayhelp
Nopublicinformationonlargescalerealworldattackavailable
Attacksizesarestillrelativelyhigh
Moreresearch
and
industry
collaboration
required
- 124 - Dietmar Jannach and Markus Zanker
Whatarerecommendersystemsfor?
Introduction
8/8/2019 Handout Recsys Sac2010
125/139
Howdotheywork?
o a ora ve er ng
ContentbasedFiltering
KnowledgeBasedRecommendations
HybridizationStrategies
Howtomeasuretheirsuccess?
Evaluationtechni ues
CasestudyonthemobileInternet
Selectedrecenttopics
AttacksonCFRecommenderSystems
RecommenderSystemsintheSocialWeb
What to ex ect?
- 125 - Dietmar Jannach and Markus Zanker
8/8/2019 Handout Recsys Sac2010
126/139
- 126 - Dietmar Jannach and Markus Zanker
TheWeb2.0 SocialWeb
8/8/2019 Handout Recsys Sac2010
127/139
Facebook,Twitter,Flickr,
Peo leactivel contributeinformationand artici ateinsocialnetworks
Impactonrecommendersystems
Moreinformationaboutuser'sanditemsavailable
demographicinformationaboutusers
friendshiprelationships
ta sonresources
NewapplicationfieldsforRStechnology
Recommendfriends,resources(pictures,videos),oreventagstousers
==
==>Currently,
lots
of
papers
published
on
the
topic
- 127 - Dietmar Jannach and Markus Zanker
Explicittruststatementsbetweenusers
f ( )
8/8/2019 Handout Recsys Sac2010
128/139
canbeexpressedonsomesocialwebplatforms(epinions.com)
couldbederivedfromrelationshi sonsocial latforms
Trustis
a
multi
faceted,
complex
concept
Goeshoweverbeyondan"implicit"trustnotionbasedonratingsimilarity
Exploitingtrust
information
in
RS
toimproveaccuracy(neighborhoodselection)
toincreasecoverage
couldbeusedtomakeRSrobustagainstattacks
- 128 - Dietmar Jannach and Markus Zanker
Input
i i
8/8/2019 Handout Recsys Sac2010
129/139
ratingmatrix
ex licittrustnetwork ratin sbetween0 notrust,and1 fulltrust
Prediction
basedonusualweightedcombinationofratingsofthenearestneighbors
similarityofneighborsishoweverbasedonthetrustvalue
Note: AssumestandardPearsonCFwithmin. 3
peersandsimilaritythreshold=0.5
NorecommendationforApossible
However:Assumingthattrustistransitive,
alsotheratingofEcouldbeused
Goodforcoldstartsituations
- 129 - Dietmar Jannach and Markus Zanker
Trust ro a ation
Variousalgorithmsandpropagationschemespossible(includingglobal
8/8/2019 Handout Recsys Sac2010
130/139
"reputation"metrics
Recommendation
accuracy
hybridscombiningsimilarityandtrustshowntobemoreaccurateinsome
experiments
SymmetryandDistrust
Trustis
not
symmetric
Howtodealwithexplicitdistruststatements?
IfAdistrustsBandBdistrusts whatdoesthistellusaboutA'srelationtoC?
va ua on
Accuracyimprovementspossible;increaseofcoverage
Notmanypubliclyavailabledatasets
- 130 - Dietmar Jannach and Markus Zanker
CollaborativetaggingintheWeb2.0
Usersadd tags to resources (such as images)
8/8/2019 Handout Recsys Sac2010
131/139
Usersaddtagstoresources(suchasimages)
Folksonomiesarebasedonfreel usedke words e. .,onflickr.com
Note:not
as
formal
as
ontologies,
but
more
easy
to
acquire
FolksonomiesandRecommenderSystems?
Usetagstorecommenditems
UseRStechnologytorecommendtags
- 131 - Dietmar Jannach and Markus Zanker
Ta sascontentannotations
usecontentbasedalgorithmstorecommendinterestingtags
8/8/2019 Handout Recsys Sac2010
132/139
Possibleapproach:
determinekeywords/tags
that
user
usually
uses
for
his
highly
rated
movies
findunratedmovieshavingsimilartags
Metrics:
takekeywordfrequenciesintoaccount
com areta clouds sim leoverla ofmovieta sandusercloud wei htedcomparison)
Possibleimprovements:
tagsofausercanbedifferentfromcommunitytags(plus:synonymproblem)
addsemanticallyrelatedwordstoexistingonesbasedonWordNet
information
- 132 - Dietmar Jannach and Markus Zanker
DifferencetocontentboostedCF
tags/keywordsare not "global" annotations but local for a user
8/8/2019 Handout Recsys Sac2010
133/139
tags/keywordsarenot global annotations,butlocalforauser
,
remember,inuserbasedCF:
similarityofusersisusedtomakerecommendations
here:viewtagsasadditionalitems(0/1rating,ifuserusedatagornot);thus
similarityisalsoinfluencedbytags
likewise:in
item
based
CF,
view
tags
as
additional
users
(1,
if
item
was
labeled
withatag)
Predictions
com neuser ase an tem ase pre ct ons nawe g te approac
experimentsshowthatonlycombinationofbothhelpstoimproveaccuracy
- 133 - Dietmar Jannach and Markus Zanker
ItemretrievalinWeb2.0applications
oftenbasedonoverlapofquerytermsanditemtags
i ffi i tf t i i th "l t il" f it
8/8/2019 Handout Recsys Sac2010
134/139
insufficientforretrievingthe"longtail"ofitems
thinkof
possible
tags
of
a
car:
"Volkswagen",
"beetle",
"red",
"cool"
Oneapproach:SocialRanking
useCFmethodstoretrieverankedlistofitemsforgivenquery
computeuserandtagsimilarities(e.g.,basedoncooccurrence)
extenduserquerywithsimilartags(improvescoverage)
rankitemsbasedon
relevanceoftagstothequery
similarityoftaggerstothecurrentuser
leadstomeasurablybettercoverageandlongtailretrieval
- 134 - Dietmar Jannach and Markus Zanker
Remember:Users
annotate
items
very
differently
RStechnologycanbeusedtohelpusersfindappropriatetags
8/8/2019 Handout Recsys Sac2010
135/139
thus,makingtheannotationsofitemsmoreconsistent
oss eapproac :
DerivetwodimensionalprojectionsofUserXTagXResourcedata
Usenearestneighborapproachtopredictitemrating
useoneoftheprojections
Evaluation
UserTagsimilaritybetterthanUserResource
differencesondifferentdatasets;alwaysbetterthan"mostpopular(by
resource)"strategy
o an :
ViewfolksonomyasgraphandapplyPageRankidea
Methodoutperformsotherapproaches
- 135 - Dietmar Jannach and Markus Zanker
Whatarerecommendersystemsfor?
Introduction
8/8/2019 Handout Recsys Sac2010
136/139
Howdotheywork?
o a ora ve er ng
ContentbasedFiltering
KnowledgeBasedRecommendations
HybridizationStrategies
Howtomeasuretheirsuccess?
Evaluationtechni ues
CasestudyonthemobileInternet
Selectedrecenttopics
AttacksonCFRecommenderSystems
RecommenderSystems
in
the
Social
Web
What to ex ect?
- 136 - Dietmar Jannach and Markus Zanker
RSresearch willbecome much more diverse
Less focus onexplicitratings
B t i f f f db k h i d k l d
8/8/2019 Handout Recsys Sac2010
137/139
Butvarious forms offeedback mechanisms andknowledge
Social and Semantic Web automated knowled e extraction
Contextawareness
(beyond
geographical
positions)
Less focus onalgorithms
Explainingandtrustbuilding
Persuasiveaspects
ess ocus ono neexper men a on
Butliveexperiments,realworld case studies,
Morefocus oncausal relationships
When,where andhow to recommend?
Consumer/Sales
psychology
Consumerdecisionmakingtheories
- 137 - Dietmar Jannach and Markus Zanker
http:/ / recsys.acm.org
Questions?
8/8/2019 Handout Recsys Sac2010
138/139
Questions?
Questions?
htt : www.recommenderbook.netDietmarJannacheServicesResearchGroup
DepartmentofComputerScience
TUDortmund,
Germany
MarkusZanker
. .
P:+492317557272
Recommender Systems An Introduction byn e gen ys emsan us ness n orma cs
Institute
of
Applied
InformaticsUniversityKlagenfurt,Austria
P:+4346327003753
Dietmar Jannach, Markus Zanker, Alexander Felfernig andGerhard FriedrichCambridge University Press, to appear 2010/11
- 138 - Dietmar Jannach and Markus Zanker
[AT05]Adomavicius &Tuzhilin.Toward the next generation ofrecommender systems:asurvey
ofthe stateoftheart andpossible extensions,IEEETKDE,17(6),2005,pp.734749.
[BHC98] Basu Hirsh & Cohen Recommendation as classification using social and content based
8/8/2019 Handout Recsys Sac2010
139/139
[BHC98]Basu,Hirsh &Cohen.Recommendation as classification:using social andcontentbased
, , . .
[Burke02]Burke.
Hybrid
Recommender Systems:
Survey
and
Experiments.
UMUAI
12(4),
2002,
331370.
[FFJ+06]Felfernig,Friedrich,Jannach&Zanker.AnIntegratedEnvironmentfor the Development
ofKnowledgeBased Recommender Applications,IJEC,11(2),2006,pp.1134.
[HJ09]Hegelich &Jannach.Effectiveness ofdifferentrecommender algorithms inthe mobile
internet:
A
case study,
ACM
RecSys,
2009. .
strategies.Hypertext2009,pp.7382.
[McSherry05]McSherry.Retrieval Failure andRecovery inRecommender Systems,AIR24(34),
2005,pp.319338.
[MRB04]Mirzadeh,Ricci&Bansal.SupportingUserQueryRelaxationinaRecommender System.
ECWeb,
2004,
pp.
31
40.
[PF04]Pu &Faltings.Decision Tradeoff using example critiquing,Constraints 9(4),2004,pp.289
- 139 - Dietmar Jannach and Markus Zanker
.