Upload
lucidworks
View
514
Download
1
Embed Size (px)
Citation preview
Search and Recommenders
Grant Ingersoll
@gsingers
CTO, Lucidworks
Jake Mannix
@pbrane
Lead Data Engineer, Lucidworks
• Vision, motivations and definitions
• Use cases for ecommerce, compliance, fraud and customer support
• Fusion and the evolution of recommenders
• Demo
• Future Directions
Agenda
Search-Driven Everything
Customer Service
Customer Insights
Fraud Surveillance
Research Portal
Online Retail Digital Content
• Many companies treat search, recommendations/discovery and analytics as different beasts, yet:
• The same inputs that make search better can also drive recommendations and better analytics
• Engagement analytics is the key:
• Your users give you engagement signals regarding the content that is relevant to them
• Over time, patterns emerge in similarities of behavior (simplest possible pattern is just “popularity”)
• These signals are often the biggest factor in both search relevance AND recommendations
• In the enterprise, this is still the case, but the types of signals are often different (email, IM)
Three Sides of the Same Coin
• Content — documents which are textually similar are often good as “similar items” to be recommended
• Collaborative — documents which have been engaged with by the same people (and/or in the same search context) are also similar in a more subtle, but often more powerful way
• Multi-Modal — why choose one? Try a smooth interpolation between using a content-based similarity metric, and an engagement based one!
Defining Moments
Search-Driven Online Retail
Increase conversions with a personalized shopping experience with
best in class reliability and performance.
CATALOG
DYNAMIC NAVIGATION AND LANDING PAGES
INSTANT INSIGHTS AND ANALYTICS
PERSONALIZED SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Data Processing
Smart Access API
Search-Driven Compliance and Surveillance
Detect and investigate activity for regulatory compliance, from one
unified view.
DATABASE
ACCURATE REAL-TIME INFORMATION
CONTEXTUALLY-ENRICHED
INFORMATION
MESSAGESLOGS
DATA EXPLORATION AND VISUALIZATION
Data Acquisition
Indexing & Streaming
Smart Access API
Search-Driven Customer Service
Resolve customer issues quickly with immediate access to relevant answers.
CUSTOMER SELF-SERVICE
KNOWLEDGE BASE
PROACTIVE ALERTS AND RECOMMENDATIONS
EXPERT TUNED RELEVANCY DRIVEN BY
ANALYTICS AND INSIGHTS
CRM SUPPORT TICKETS & ISSUE TRACKING
Data Acquisition
Data Processing
Smart Access API
Fusion and Recommenders
Lucidworks Fusion Is Search-Driven Everything
• Drive next generation relevance via Content, Collaboration and Context
• Harness best in class Open Source: Apache Solr + Spark
• Simplify application development and reduce ongoing maintenance
CATALOG
DYNAMIC NAVIGATION AND LANDING PAGES
INSTANT INSIGHTS AND ANALYTICS
PERSONALIZED SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations & Alerts
Analytics & InsightsExtreme Relevancy
Access data from anywhere to build intelligent, data-driven applications.
Fusion Architecture
REST
API
Worker Worker Cluster Mgr.
Apache Spark
Shards Shards
Apache Solr
HD
FS (O
ptio
nal)
Shared Config Mgmt
Leader Election
Load Balancing
ZK 1
Apache Zookeeper
ZK N
DATABASEWEBFILELOGSHADOOP CLOUD
Connectors
Alerting/Messaging
NLP
Pipelines
Blob Storage
Scheduling
Recommenders/Signals
…
Core Services
Admin UI
SECURITY BUILT-IN
Lucidworks View
• Fusion
• Recommenders API
• Machine Learning pipeline stages
• Scheduling
• Solr:
• More Like This + Signals
• Spark:
• MLlib, Mahout, custom
Key Platform Tech
• Solr comes built-in with a query parser, MoreLikeThis, which takes a given document, and:
• Extracts nontrivial terms from specified fields in it
• Builds an “OR” query to search for closest matches (like a cosine similarity computation)
• Has many knobs to tune regarding “data-cleaning” non-useful terms from the query
• TF-IDF is great, but there are other metrics possible: LSI, LDA, W2V
Content-focused
{!mlt qf=body,suggest,subject,title mintf=2 mindf=5 minwl=3}<DOC_ID>
“People who bought X also bought Y” / “Movies recommended for you”
Collaborative Filtering
Search User/Item Index
Top K users who’ve
interacted with this Item
Search and Rollup on User/
Item Index
Top Y docs
Current DocFilter by context Profit
User/Item Index
Offline Tasks
User/Item Signals
Math!
• Fusion CF-based “documents like this” pipeline stages:
• Sub-query: search aggregated signals index for current doc_id, extracting the top-K pairs of (user_id, weight)
• Sub-query: search that table again with a weighted OR query: (user_id:user_id_1^weight_1 OR user_id:user_id_2^weight_2 OR … )
• Roll-up: topN(sum(score_i * weight_i))
• Sub-query: fetch the documents from primary Solr index of these top N doc_ids
Collaborative Filtering: step by step in Fusion
• Both content-based and CF recommenders use features of the documents to generate a similarity metric
• Content uses the tokens in the document
• CF uses user ids who have engaged with it
• Metrics can be weighted-summed, allowing a “slider” between the two
• Fancy similarity techniques which can be done to a (doc, token) matrix can often be done on a (doc, userId) matrix, or even a joint (doc, (token or userId)) concatenated matrix
• There is a cost to such techniques: harder to maintain, harder to A/B test variations
Multi-modal
• Basics:
• 26 Apache Projects registered so far plus LW web properties
• 93 datasources* including email, Github, JIRA*, Website and Wiki
• Fusion 2.4
• Signals everywhere
• UI based on Lucidworks View
• ASF Mail archives mirrored at: http://asfmail.lucidworks.io
Demo
http://searchhub.lucidworks.com
Implementation Details
http://github.com/lucidworks/searchhub
Branch: GH-28-doc-view
Key Source Code
UI
Angular Directives:
perdocument
recommendations
Offline Tasks
Spark Jobs:
mail_thread_signal_creation_job.json
SimpleTwoHopRecommender.scala
Fusion PipelinesQuery:
lucidfind-recommendations
cf-similar-items-batch-rec
cf-similar-items-rec
• Ensemble and Click-based approaches
• https://github.com/lucidworks/searchhub/issues/40
• https://github.com/lucidworks/searchhub/issues/28
• https://github.com/lucidworks/searchhub/issues/22
• Deploy live
• User registrations
• https://github.com/lucidworks/searchhub/issues/30
Future Work
Resources
Fusion: http://www.lucidworks.com/products/fusion
Search Hub: http://searchhub.lucidworks.com
Company: http://www.lucidworks.com
Our blog: http://www.lucidworks.com/blog
Twitter: @gsingers, @pbrane