FourSquare New York FourSquare Tokyo Twitter 1 month...

0 1 2 3 4 5

t x 100000

FourSquare New York

0 1 2 3 4 5

t Millions

Twitter 200 tweets

0 2 4 6 8 10

t x 100000

FourSquare Tokyo

0 1 2 3 4 5

t Millions

Twitter 1 month

0 1 2 3

t Millions

MovieLens 1M

00.20.40.60.8

012345Div

k = 1 k = 10 k = 100 Most popular Random

Neighbor bandit

Bandit Recommender Systems

Use of bandit algorithms to generate recommendations

Usually arms = items

Personalized approaches: contextual bandits

o Stochastic versions of Probabilistic Matrix Factorization (PMF)

o Clusters of users/items

Our approach: user-based kNN with stochastic (bandit-based) neighborhood selection

A Simple Multi-Armed Nearest-Neighbor Bandit for Interactive RecommendationJavier Sanz-Cruzado, Pablo Castells and Esther López

Universidad Autónoma de Madrid{javier.sanz-cruzado,pablo.castells}@uam.es

esther.lopezramos@estudiante.uam.es

IRGIR Group @ UAM

Nearest-neighbor bandit

Neighbor bandit Given a user 𝑢, choose the best neighbor 𝑣

Conditional preference: Instead of similarity, 𝑃 𝑢 𝑣 probability that 𝑢 likes an item that 𝑣 likes

Thompson sampling bandit

o Rewards considered as a Bernoulli distribution with mean 𝑃 𝑢 𝑣

o 𝑃 𝑢 𝑣 estimated from data using posterior Beta distributions, 𝑝 𝑢 𝑣 ∼ 𝛣 𝛼 𝑣 𝑢 , 𝛽 𝑣 𝑢

o Hits α 𝒗 𝒖 : Positive common ratings between u and v: 𝛼 𝑣 𝑢 = 𝛼0 + 𝑖∈ℐ 𝑟 𝑢, 𝑖 𝑟 𝑣, 𝑖

o Misses 𝜷 𝒗 𝒖 : 𝑣’s positive ratings not shared by u. 𝛽 𝑣 𝑢 = 𝛽0 + 𝑖∈ℐ 𝑟 𝑣, 𝑖 − 𝛼 𝑣 𝑢

o Select 𝑣 maximizing 𝑝 𝑢 𝑣

Recommend the item 𝑖 that 𝑣 likes most (ties randomly broken)

Update 𝛼 𝑤 𝑢 for all users 𝑤 ∈ 𝒰 who have rated the recommended item 𝑖

Multi-armed bandits

Dataset #Users #Items #Ratings

FourSquare New York 1,083 38,333 227,428

FourSquare Tokyo 2,293 61,858 573,703

Twitter 1 month 9,511 9,511 650,937

Twitter 200 tweets 9,253 9,253 475,608

MovieLens 1M 6,040 3,706 1,000,209

Interacts

User System

Recommends items

𝑟 𝑢, 𝑖2 ≠ ?

3. Neighbor 𝑣 selects item 𝑖to recommend

? 1 ? ? 0

1. Choose targetuser 𝑢

5. Update 𝑢′s conditional preferences with𝑣 ∈ 𝒰𝑖 = 𝑤 ∈ 𝒰 𝑟 𝑤, 𝑖 ≠ ?

Recommendation

4. Obtain 𝑟 𝑢, 𝑖 ∈ 0,1

Update

6. Add 𝑟 𝑢, 𝑖 to rating matrix

Current state(Binary rating matrix)

ℛ:𝒰 × ℐ → 0,1, ? 𝑟 𝑢, 𝑖5 ≠ ?

Evaluation approach Offline evaluation: simulate user feedback using offline data

Extreme cold start: start with no ratings

Random user selection (one at a time)

Metric: cumulative recall – fraction of discovered positive ratings (growth curve)

Algorithms

Bandits

Exploitation-only

Data Implicit feedback datasets

o 2 Twitter follows graphs (1 month, 200 tweets)

o 2 FourSquare venue recommendation datasets (New York, Tokyo)

Explicit ratings dataset: MovieLens 1M

o We take ratings ≥ 4 as positive (𝑟 𝑢, 𝑖 = 1)

Extension: 𝒌 neighbors

Given a target user 𝑢, select the best 𝑘 neighbors

Multiple play Thompson sampling bandit

Pick 𝒩𝑘 𝑢 : the 𝑘 users maximizing 𝑝 𝑢 𝑣

Recommend the item maximizing:

Again, update 𝛼 𝑤 𝑢 for all users 𝑤 ∈ 𝒰 who have rated 𝑖

Penalizes exploration in favor of exploitation

Experiments

Results

Motivation

7 7 7 7 7 7

Reward Reward

Bandit

Which arm maximizes the gain?

Select the best among several actions (arms)

Exploration vs. exploration

o Select arm with highest estimated value (exploit)

o Select arm to gain knowledge (explore)

Algorithms: ε-greedy, UCB1, Thompson Sampling, etc.$

Neighbors 𝑣 ∈ 𝒰

Ratings 𝑟 𝑢, 𝑖

Cond. preference 𝑃(𝑢|𝑣)

Items 𝑖 ∈ ℐ

Target user 𝑢 ∈ 𝒰

Ratings 𝑟 𝑢, 𝑖

Metric (e.g. CTR)

Target user 𝑢 ∈ 𝒰

Rewards

Estimated arm value

Context

Our approachRecommender bandits Multi-armed bandits

13th ACM Conference on Recommender SystemsCode for this paper is available at https://github.com/ir-uam/kNNBandit

Full text Code

Our approach: nearest-neighbors bandit

Item-oriented: ε-greedy, Thompson sampling

Collaborative filtering: matrix factorization, user-based kNN (cosine)

Sanity-check: most-popular, random

Interactive recommendation

More realistic perspective

Recommendation operates in cycles

o Users react to and interact with recommendations

o The system gains further input from this interaction

o The input is used in the following recommendations

argmax𝑖∈ℐ

𝑣∈𝒩𝑘 𝑢

𝑝 𝑢 𝑣 𝑟𝑣 𝑖

0.0000100,000200,000300,000400,000500,000A

Num. Iter. Neighbor bandit Neighbor bandit k = 10 User-based kNN Th. sampling ε-greedy Implicit MF Most popular Random

FourSquare New York FourSquare Tokyo Twitter 1 month...

Documents

Foursquare para empresas

Foursquare Leidseplein

Swarm de Foursquare

Foursquare & Swarm

eTOM Poster.pdf

Foursquare af

Conheça o Foursquare

Foursquare (MMMMXII)

Foursquare 101

Entendiendo Foursquare

Foursquare planet

Foursquare, kokeilin vuoden

Foursquare cfi

Foursquare Brandguide

Foursquare for-business

Foursquare i geolocalització

5. foursquare

Foursquare -CEPSR Presentation

Palestra foursquare

Analizando Foursquare