Meta-‐Prod2Vec: Simple Product Embeddings with Side-‐Informa:on
Flavian Vasile, Elena Smirnova @Criteo Alexis Conneau @FAIR
Contents
• Product Embeddings for Recommenda:on
• Embedding CF signal: Word2Vec and Prod2Vec
• Meta-‐Prod2vec: Embedding with Side-‐Informa:on
• Experimental Results
• Conclusions
Product Embeddings for Recommenda5on
Product Embeddings for Recommenda5on Represent items (and some/mes users) as vectors in the same space and use their distances to compute recommenda/ons.
• At a certain level, nothing new!
• We already had Matrix Factoriza/on
• It is yet another way of crea/ng latent representa/ons for Recommenda/on
Some of the NN methods can be translated back into MF techniques. Differences: • new ways to compute matrix entries • new loss func/ons
Where do we fit? • Hybrid model that uses CF with content
side-‐informa/on • Incursion on the embedding methods
using side info
Embedding CF signal: Word2Vec and Prod2Vec
(Word-‐to-‐hidden matrix) x (Hidden-‐to-‐Word context matrix)
Word2Vec: Skip-‐gram
Word2Vec In this space, words that appear in similar contexts will tend to be close:
The same idea can be applied to other sequen:al data, such as user shopping sessions -‐ Prod2vec.
Words = products Sentences = shopping sessions
Grbovic et al. E-‐commerce in Your Inbox: Product RecommendaBons at Scale, WWW 2013
Prod2Vec
Prod2Vec The resul:ng embedding will co-‐locate products that appear in the vicinity of the same products.
Prod2Vec loss func5on
Meta-‐Prod2vec: Embedding with Side-‐Informa5on
Meta-‐Prod2vec: Embedding with Side-‐Informa5on Idea: Use not only the product sequence informa:on, but also product meta-‐data.
Where is it useful? Product cold-‐start, when sequence informa:on is sparse.
How can it help? We place addi:onal constraints on product co-‐occurrences using external info. We can create more noise-‐robust embeddings for products suffering from cold-‐start.
Type of product side-‐informa5on: • Categories • Brands • Title & Descrip:on • Tags
How does Meta-‐Prod2Vec leverage this informa5on for cold-‐start?
Mo5va5ng example:
Let’s say we are trying to build a recommender system for songs...
We want to build a very simple solu5on that based on the last song the user heard, recommends the next song.
Two different recommenda:on situa:ons: • Simple: the previous song is popular • Hard one: the previous song is
rela:vely unknown (suffers from cold start).
Simple case: Query song: Shake It Off by Taylor SwiL. Best next song: It’s all about the Bass by Meghan Trainor. CF and Prod2Vec both work!
Hard case: Query song: S/ll by Taylor SwiL, but is one of her earlier songs, e.g. You’re Not Sorry. Best next song: ?
?
Hard case + unlucky: • Just one user listened to You’re Not Sorry • He also listened to Rammstein’s Du Hast!
Hard case + unlucky:
Your Recommenda5on Is Not Working!
This is where Meta-‐Prod2Vec comes in handy!
When compu:ng how plausible it is for a user to like a pair of songs, you can place addi5onal constraints by taking into account the song ar5sts.
Prod2Vec constraints
You’re not sorry Du Hast
P(Du Hast|Youʹ′re Not Sorry) -‐> the next song depends on the current song
Prod2Vec constraints
You’re not sorry Du Hast
Youʹ′re Not Sorry is a fringe song -‐> low evidence for the posi/ve and nega/ve pairs
Ar5st metadata constraints
You’re not sorry Du Hast
Taylor SwiU Rammstein However, the associated singer is popular -‐> good evidence that Taylor SwiL and Rammstein do not really co-‐occur (have distant vectors)
Ar5st and Song constraints (1)
You’re not sorry Du Hast
Taylor SwiU Rammstein Furthermore, we can enforce that the songs and their ar5sts should be close...
Ar5st and Song constraints (2)
You’re not sorry Du Hast
Taylor SwiU Rammstein Finally, we add two more constraints between the ar/sts and the previous/next song (they s/ll have more support than the original pairs)
Meta-‐Prod2Vec constraints
You’re not sorry Du Hast
Taylor SwiU Rammstein
#1. P(Rammstein | Youʹ′re Not Sorry) the ar/st of the next song should be plausible given the current song #2. P(Du Hast | Taylor SwiW) the next song should depend on the current ar/st selec/on #3. P(Youʹ′re Not Sorry |Taylor SwiW) and P(Du Hast | Rammstein) the current ar/st selec/on should also influence the current song selec/on #4. P(Rammstein | Taylor SwiW) the probability of the next ar/st should be high given the current ar/st.
PuXng it all together: Meta-‐Prod2Vec loss
Rela5onship with MF with Side-‐Info:
MP2V Implementa5on • No changes in the Word2Vec code!
• Changes just in the input pairs: we generate (propor:onally to the importance hyperparameter) 4 addi:onal types of pairs.
Experimental Results
Task & Metrics Task: Next Event Predic:on Metrics: • Hit ra:o at K (HR@K) • Normalized Discounted Cumula:ve Gain (NDCG@K)
Methods BestOf: (rank by) popularity CoCounts: cosine similarity of candidate item to query item Prod2Vec: cosine similarity of item embedding vectors Meta-‐Prod2Vec: cosine similarity of improved embedding vectors Mix(Prod2Vec, CoCounts): linear combina:on of the two scores Mix(Meta-‐Prod2Vec, CoCounts): same as previous
Dataset: 30Music Dataset • playlists data from Last.fm API • sample of 100k user sessions • resul:ng vocabulary size: 433k songs
and 67k ar:sts.
Global Results
Method Type HR@20 NDCG@20 BestOf Head 0.0003 0.002
CoCounts Head 0.0160 0.141
Prod2Vec Tail 0.0101 0.113
MetaProd2Vec Tail 0.0124 0.125
Mix(Prod2Vec, CoCounts) Global 0.0158 0.152
Mix(MetaProd2Vec, CoCounts) Global 0.0180 0.161
Results on Cold Start (HR@20) Method Type Pair freq = 0 Pair freq < 3 BestOf Head 0.0002 0.0002
CoCounts Head 0.0000 0.0197
Prod2Vec Tail 0.0003 0.0078
MetaProd2Vec Tail 0.0013 0.0198
Mix(Prod2Vec, CoCounts) Global 0.0002 0.0200
Mix(MetaProd2Vec, CoCounts) Global 0.0007 0.0291
Conclusions and Next Steps
Conclusions and Next Steps Using side-‐info for product embeddings helps, especially on cold-‐start.
Conclusions and Next Steps • Beeer ways to mix Head and Tail
recommenda:on methods
• Mix CF and Meta-‐Data at test :me -‐ product embeddings using all available signal (CF, categorical, text and image product informa:on)
Thanks!
Ques5ons?