60
1 ©MapR Technologies 2013- Confidential Introduction to Mahout And How To Build a Recommender

Building multi-modal recommendation engines using search engines

Embed Size (px)

DESCRIPTION

This is my strata NY talk about how to build recommendation engines using common items. In particular, I show how multi-modal recommendations can be built using the same framework.

Citation preview

Page 1: Building multi-modal recommendation engines using search engines

1©MapR Technologies 2013- Confidential

Introduction to MahoutAnd How To Build a Recommender

Page 2: Building multi-modal recommendation engines using search engines

2©MapR Technologies 2013- Confidential

Topic For Today

What is recommendation? What makes it different? What is multi-model recommendation? How can I build it using common household items?

Page 3: Building multi-modal recommendation engines using search engines

3©MapR Technologies 2013- Confidential

Oh … Also This

Detailed break-down of a live machine learning system running with Mahout on MapR

With code examples

Page 4: Building multi-modal recommendation engines using search engines

4©MapR Technologies 2013- Confidential

I may have to summarize

Page 5: Building multi-modal recommendation engines using search engines

5©MapR Technologies 2013- Confidential

I may have to summarize

just a bit

Page 6: Building multi-modal recommendation engines using search engines

6©MapR Technologies 2013- Confidential

Part 1:5 minutes of background

Page 7: Building multi-modal recommendation engines using search engines

7©MapR Technologies 2013- Confidential

Part 2:5 minutes: I want a pony

Page 8: Building multi-modal recommendation engines using search engines

8©MapR Technologies 2013- Confidential

Page 9: Building multi-modal recommendation engines using search engines

9©MapR Technologies 2013- Confidential

Part 1:5 minutes of background

Page 10: Building multi-modal recommendation engines using search engines

10©MapR Technologies 2013- Confidential

What Does Machine Learning Look Like?

Page 11: Building multi-modal recommendation engines using search engines

11©MapR Technologies 2013- Confidential

What Does Machine Learning Look Like?

O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high qualityO(κ d log k) or O(d log κ log k) for larger k, looser quality

But tonight we’re going to show you how to keep it simple yet powerful…

Page 12: Building multi-modal recommendation engines using search engines

12©MapR Technologies 2013- Confidential

Recommendations as Machine Learning

Recommendation: – Involves observation of interactions between people taking action (users)

and items for input data to the recommender model– Goal is to suggest additional appropriate or desirable interactions– Applications include: movie, music or map-based restaurant choices;

suggesting sale items for e-stores or via cash-register receipts

Page 13: Building multi-modal recommendation engines using search engines

13©MapR Technologies 2013- Confidential

Page 14: Building multi-modal recommendation engines using search engines

14©MapR Technologies 2013- Confidential

Page 15: Building multi-modal recommendation engines using search engines

15©MapR Technologies 2013- Confidential

Part 2:How recommenders work

(I still want a pony)

Page 16: Building multi-modal recommendation engines using search engines

16©MapR Technologies 2013- Confidential

Recommendations

Recap:Behavior of a crowd helps us understand what individuals will do

Page 17: Building multi-modal recommendation engines using search engines

17©MapR Technologies 2013- Confidential

Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Alice

Charles

Page 18: Building multi-modal recommendation engines using search engines

18©MapR Technologies 2013- Confidential

Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Bob got an apple

Alice

Bob

Charles

Page 19: Building multi-modal recommendation engines using search engines

19©MapR Technologies 2013- Confidential

Recommendations

What else would Bob like??

Alice

Bob

Charles

Page 20: Building multi-modal recommendation engines using search engines

20©MapR Technologies 2013- Confidential

Recommendations

A puppy, of course!

Alice

Bob

Charles

Page 21: Building multi-modal recommendation engines using search engines

21©MapR Technologies 2013- Confidential

You get the idea of how recommenders work… (By the way, like me, Bob also wants a pony)

Page 22: Building multi-modal recommendation engines using search engines

22©MapR Technologies 2013- Confidential

Recommendations

What if everybody gets a pony?

?

Alice

Bob

Charles

Amelia What else would you recommend for Amelia?

Page 23: Building multi-modal recommendation engines using search engines

23©MapR Technologies 2013- Confidential

Recommendations

?

Alice

Bob

Charles

AmeliaIf everybody gets a pony, it’s not a very good indicator of what to else predict...

Page 24: Building multi-modal recommendation engines using search engines

24©MapR Technologies 2013- Confidential

Problems with Raw Co-occurrence

Very popular items co-occur with everything (or why it’s not very helpful to know that everybody wants a pony…)– Examples: Welcome document; Elevator music

Very widespread occurrence is not interesting as a way to generate indicators – Unless you want to offer an item that is constantly desired, such as razor

blades (or ponies)

What we want is anomalous co-occurrence– This is the source of interesting indicators of preference on which to

base recommendation

Page 25: Building multi-modal recommendation engines using search engines

25©MapR Technologies 2013- Confidential

Get Useful Indicators from Behaviors

1. Use log files to build history matrix of users x items– Remember: this history of interactions will be sparse compared to all

potential combinations

2. Transform to a co-occurrence matrix of items x items

3. Look for useful co-occurrence by looking for anomalous co-occurrences to make an indicator matrix

– Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference

– RowSimilarityJob in Apache Mahout uses LLR

Page 26: Building multi-modal recommendation engines using search engines

26©MapR Technologies 2013- Confidential

Log Files

Alice

Bob

Charles

Alice

Bob

Charles

Alice

Page 27: Building multi-modal recommendation engines using search engines

27©MapR Technologies 2013- Confidential

Log Files

u1

u3

u2

u1

u3

u2

u1

t1

t4

t3

t2

t3

t3

t1

Page 28: Building multi-modal recommendation engines using search engines

28©MapR Technologies 2013- Confidential

Log Files and Dimensions

u1

u3

u2

u1

u3

u2

u1

t1

t4

t3

t2

t3

t3

t1

t1

t2

t3

t4

Things

u1 Alice

BobCharles

u3u2

Users

Page 29: Building multi-modal recommendation engines using search engines

29©MapR Technologies 2013- Confidential

History Matrix: Users by Items

Alice

Bob

Charles

✔ ✔ ✔

✔ ✔

✔ ✔

Page 30: Building multi-modal recommendation engines using search engines

30©MapR Technologies 2013- Confidential

Co-occurrence Matrix: Items by Items

-

1 2

1 1

1

1

2 1

How do you tell which co-occurrences are useful?.

00

0 0

Page 31: Building multi-modal recommendation engines using search engines

31©MapR Technologies 2013- Confidential

Co-occurrence Matrix: Items by Items

-

1 2

1 1

1

1

2 1

Use LLR test to turn co-occurrence into indicators…

00

0 0

Page 32: Building multi-modal recommendation engines using search engines

32©MapR Technologies 2013- Confidential

Co-occurrence Binary Matrix

1

1not

not

1

Page 33: Building multi-modal recommendation engines using search engines

33©MapR Technologies 2013- Confidential

Spot the Anomaly

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 2

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

What conclusion do you draw from each situation?

Page 34: Building multi-modal recommendation engines using search engines

34©MapR Technologies 2013- Confidential

Spot the Anomaly

Root LLR is roughly like standard deviations In Apache Mahout, RowSimilarityJob uses LLR

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 2

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

0.90 1.95

4.52 14.3

What conclusion do you draw from each situation?

Page 35: Building multi-modal recommendation engines using search engines

35©MapR Technologies 2013- Confidential

Co-occurrence Matrix

-

1 2

1 1

1

1

2 1

Recap: Use LLR test to turn co-occurrence into indicators

00

0 0

Page 36: Building multi-modal recommendation engines using search engines

36©MapR Technologies 2013- Confidential

Indicator Matrix: Anomalous Co-Occurrence

✔✔

Result: The marked row will be added to the indicator field in the item document…

Page 37: Building multi-modal recommendation engines using search engines

37©MapR Technologies 2013- Confidential

Indicator Matrix

id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet

indicators: (t1)

That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine.

Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators.

Page 38: Building multi-modal recommendation engines using search engines

38©MapR Technologies 2013- Confidential

Internals of the Recommender Engine

38

Page 39: Building multi-modal recommendation engines using search engines

39©MapR Technologies 2013- Confidential

Internals of the Recommender Engine

39

Page 40: Building multi-modal recommendation engines using search engines

40©MapR Technologies 2013- Confidential

Looking Inside LucidWorks

What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?

Recommendation is “1710 : Chuck Berry”

40

Real-time recommendation query and results: Evaluation

Page 41: Building multi-modal recommendation engines using search engines

41©MapR Technologies 2013- Confidential

Search-based Recommendations

Sample document– Merchant Id– Field for text description– Phone– Address– Location

Page 42: Building multi-modal recommendation engines using search engines

42©MapR Technologies 2013- Confidential

Search-based Recommendations

Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

Page 43: Building multi-modal recommendation engines using search engines

43©MapR Technologies 2013- Confidential

Search-based Recommendations

Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

Sample query– Current location– Recent merchant descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted offers– Local top40

Page 44: Building multi-modal recommendation engines using search engines

44©MapR Technologies 2013- Confidential

Search-based Recommendations

Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

Sample query– Current location– Recent merchant descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted offers– Local top40

Original data and meta-data

Derived from cooccurrence and cross-occurrence analysis

Recommendation query

Page 45: Building multi-modal recommendation engines using search engines

45©MapR Technologies 2013- Confidential

For example

Users enter queries (A)– (actor = user, item=query)

Users view videos (B)– (actor = user, item=video)

ATA gives query recommendation– “did you mean to ask for”

BTB gives video recommendation– “you might like these videos”

Page 46: Building multi-modal recommendation engines using search engines

46©MapR Technologies 2013- Confidential

The punch-line

BTA recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)

Page 47: Building multi-modal recommendation engines using search engines

47©MapR Technologies 2013- Confidential

Real-life example

Query: “Paco de Lucia” Conventional meta-data search results:– “hombres del paco” times 400– not much else

Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff

Page 48: Building multi-modal recommendation engines using search engines

48©MapR Technologies 2013- Confidential

Real-life example

Page 49: Building multi-modal recommendation engines using search engines

49©MapR Technologies 2013- Confidential

Hypothetical Example

Want a navigational ontology? Just put labels on a web page with traffic– This gives A = users x label clicks

Remember viewing history– This gives B = users x items

Cross recommend– B’A = label to item mapping

After several users click, results are whatever users think they should be

Page 50: Building multi-modal recommendation engines using search engines

50©MapR Technologies 2013- Confidential

Nice. But we can do better?

Page 51: Building multi-modal recommendation engines using search engines

51©MapR Technologies 2013- Confidential

A Quick Simplification

Users who do h (a vector of things a user has done)

Also do r

User-centric recommendations(transpose translates back to things)

Item-centric recommendations(change the order of operations)

A translates things into users

Page 52: Building multi-modal recommendation engines using search engines

52©MapR Technologies 2013- Confidential

Symmetry Gives Cross Recommentations

Conventional recommendations with off-line learning

Cross recommendations

Page 53: Building multi-modal recommendation engines using search engines

53©MapR Technologies 2013- Confidential

users

things

Page 54: Building multi-modal recommendation engines using search engines

54©MapR Technologies 2013- Confidential

users

thingtype 1

thingtype 2

Page 55: Building multi-modal recommendation engines using search engines

55©MapR Technologies 2013- Confidential

Page 56: Building multi-modal recommendation engines using search engines

56©MapR Technologies 2013- Confidential

Part 3:What about that worked example?

Page 57: Building multi-modal recommendation engines using search engines

57©MapR Technologies 2013- Confidential

http://bit.ly/18vbbaT

Page 58: Building multi-modal recommendation engines using search engines

58©MapR Technologies 2013- Confidential

SolRIndexerSolR

IndexerSolrindexing

Cooccurrence(Mahout)

Item meta-data

Indexshards

Complete history

Analyze with Map-Reduce

Page 59: Building multi-modal recommendation engines using search engines

59©MapR Technologies 2013- Confidential

SolRIndexerSolR

IndexerSolrsearchWeb tier

Item meta-data

Indexshards

User history

Deploy with Conventional Search System

Page 60: Building multi-modal recommendation engines using search engines

60©MapR Technologies 2013- Confidential

Me, Us

Ted Dunning, Chief Application Architect, MapRCommitter PMC member, Mahout, Zookeeper, DrillBought the beer at the first HUG

MapRDistributes more open source components for HadoopAdds major technology for performance, HA, industry standard API’s

InfoHash tag - #maprSee also - @ApacheMahout @ApacheDrill

@ted_dunning and @mapR