Download pdf - Meta-Prod2Vec: Simple Product Embeddings with Side-Information

Meta-‐Prod2Vec: Simple Product Embeddings with Side-‐Informa:on

Flavian Vasile, Elena Smirnova @Criteo Alexis Conneau @FAIR

Contents

•  Product Embeddings for Recommenda:on

•  Embedding CF signal: Word2Vec and Prod2Vec

•  Meta-‐Prod2vec: Embedding with Side-‐Informa:on

•  Experimental Results

•  Conclusions

Product Embeddings for Recommenda5on

Product Embeddings for Recommenda5on Represent items (and some/mes users) as vectors in the same space and use their distances to compute recommenda/ons.

•  At a certain level, nothing new!

•  We already had Matrix Factoriza/on

•  It is yet another way of crea/ng latent representa/ons for Recommenda/on

Some of the NN methods can be translated back into MF techniques. Differences: •  new ways to compute matrix entries •  new loss func/ons

Where do we fit? •  Hybrid model that uses CF with content

side-‐informa/on •  Incursion on the embedding methods

using side info

Embedding CF signal: Word2Vec and Prod2Vec

(Word-‐to-‐hidden matrix) x (Hidden-‐to-‐Word context matrix)

Word2Vec: Skip-‐gram

Word2Vec In this space, words that appear in similar contexts will tend to be close:

The same idea can be applied to other sequen:al data, such as user shopping sessions -‐ Prod2vec.

Words = products Sentences = shopping sessions

Grbovic et al. E-‐commerce in Your Inbox: Product RecommendaBons at Scale, WWW 2013

Prod2Vec

Prod2Vec The resul:ng embedding will co-‐locate products that appear in the vicinity of the same products.

Prod2Vec loss func5on

Meta-‐Prod2vec: Embedding with Side-‐Informa5on

Meta-‐Prod2vec: Embedding with Side-‐Informa5on Idea: Use not only the product sequence informa:on, but also product meta-‐data.

Where is it useful? Product cold-‐start, when sequence informa:on is sparse.

How can it help? We place addi:onal constraints on product co-‐occurrences using external info. We can create more noise-‐robust embeddings for products suffering from cold-‐start.

Type of product side-‐informa5on: •  Categories •  Brands •  Title & Descrip:on •  Tags

How does Meta-‐Prod2Vec leverage this informa5on for cold-‐start?

Mo5va5ng example:

Let’s say we are trying to build a recommender system for songs...

We want to build a very simple solu5on that based on the last song the user heard, recommends the next song.

Two different recommenda:on situa:ons: •  Simple: the previous song is popular •  Hard one: the previous song is

rela:vely unknown (suffers from cold start).

Simple case: Query song: Shake It Off by Taylor SwiL. Best next song: It’s all about the Bass by Meghan Trainor. CF and Prod2Vec both work!

Hard case: Query song: S/ll by Taylor SwiL, but is one of her earlier songs, e.g. You’re Not Sorry. Best next song: ?

?

Hard case + unlucky: •  Just one user listened to You’re Not Sorry •  He also listened to Rammstein’s Du Hast!

Hard case + unlucky:

Your Recommenda5on Is Not Working!

This is where Meta-‐Prod2Vec comes in handy!

When compu:ng how plausible it is for a user to like a pair of songs, you can place addi5onal constraints by taking into account the song ar5sts.

Prod2Vec constraints

You’re not sorry Du Hast

P(Du Hast|Youʹ′re Not Sorry) -‐> the next song depends on the current song

Prod2Vec constraints


Youʹ′re Not Sorry is a fringe song -‐> low evidence for the posi/ve and nega/ve pairs

Ar5st metadata constraints


Taylor SwiU Rammstein However, the associated singer is popular -‐> good evidence that Taylor SwiL and Rammstein do not really co-‐occur (have distant vectors)

Ar5st and Song constraints (1)


Taylor SwiU Rammstein Furthermore, we can enforce that the songs and their ar5sts should be close...

Ar5st and Song constraints (2)


Taylor SwiU Rammstein Finally, we add two more constraints between the ar/sts and the previous/next song (they s/ll have more support than the original pairs)

Meta-‐Prod2Vec constraints


Taylor SwiU Rammstein

#1. P(Rammstein | Youʹ′re Not Sorry) the ar/st of the next song should be plausible given the current song #2. P(Du Hast | Taylor SwiW) the next song should depend on the current ar/st selec/on #3. P(Youʹ′re Not Sorry |Taylor SwiW) and P(Du Hast | Rammstein) the current ar/st selec/on should also influence the current song selec/on #4. P(Rammstein | Taylor SwiW) the probability of the next ar/st should be high given the current ar/st.

PuXng it all together: Meta-‐Prod2Vec loss

Rela5onship with MF with Side-‐Info:

MP2V Implementa5on •  No changes in the Word2Vec code!

•  Changes just in the input pairs: we generate (propor:onally to the importance hyperparameter) 4 addi:onal types of pairs.

Experimental Results

Task & Metrics Task: Next Event Predic:on Metrics: •  Hit ra:o at K (HR@K) •  Normalized Discounted Cumula:ve Gain (NDCG@K)

Methods BestOf: (rank by) popularity CoCounts: cosine similarity of candidate item to query item Prod2Vec: cosine similarity of item embedding vectors Meta-‐Prod2Vec: cosine similarity of improved embedding vectors Mix(Prod2Vec, CoCounts): linear combina:on of the two scores Mix(Meta-‐Prod2Vec, CoCounts): same as previous

Dataset: 30Music Dataset •  playlists data from Last.fm API •  sample of 100k user sessions •  resul:ng vocabulary size: 433k songs

and 67k ar:sts.

Global Results

Method Type HR@20 NDCG@20 BestOf Head 0.0003 0.002

CoCounts Head 0.0160 0.141

Prod2Vec Tail 0.0101 0.113

MetaProd2Vec Tail 0.0124 0.125

Mix(Prod2Vec, CoCounts) Global 0.0158 0.152

Mix(MetaProd2Vec, CoCounts) Global 0.0180 0.161

Results on Cold Start (HR@20) Method Type Pair freq = 0 Pair freq < 3 BestOf Head 0.0002 0.0002

CoCounts Head 0.0000 0.0197

Prod2Vec Tail 0.0003 0.0078

MetaProd2Vec Tail 0.0013 0.0198

Mix(Prod2Vec, CoCounts) Global 0.0002 0.0200

Mix(MetaProd2Vec, CoCounts) Global 0.0007 0.0291

Conclusions and Next Steps

Conclusions and Next Steps Using side-‐info for product embeddings helps, especially on cold-‐start.

Conclusions and Next Steps •  Beeer ways to mix Head and Tail

recommenda:on methods

•  Mix CF and Meta-‐Data at test :me -‐ product embeddings using all available signal (CF, categorical, text and image product informa:on)

Thanks!

Ques5ons?