Upload
lucenerevolution
View
803
Download
0
Embed Size (px)
DESCRIPTION
Presented by Glenn Engstrand, Zoosk, Inc - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 Although you need Big Data to effectively implement a large scale social media solution, Hadoop is not always the right tool. This implementation description details how Zoosk is using Solr/Lucene as a NoSql solution to meet the near real-time Big Data needs of a social news feed in its evolution into a Romantic Social Network.
Citation preview
Using Lucene/Solr to Surface the Big Data of Social Media
Glenn EngstrandSenior Software Engineer
Platform TeamZoosk, Inc.
Big Data
In Scope
● volume
● rate● shelf life
Out of Scope
● OLAP
● IDF
Social Media
● social graph
● sharing
● affinity
● activity
Zoosk
● romantic
● engagement
● big
Solr
● search
● distributed
● cached
● NoSql
Search
● queries
● ranking
● fuzzy● explicit
● dynamic● location based
● non-deterministic
Distributed
● sharding
● replication
Cached
● document
● filter
● query result● window size● max docs cached
● auto warming
NoSql
● Consistent
● Available
● Partition Tolerant
● ACID vs BASE
Solr @ Zoosk
● user profile search● online● email
● news feed● find your partner
Best Practices
Service Oriented Architecture
Updating the Index
Searching the Index
SOA
● well defined end points
● thin wrappers on the client stack
● well instrumented server stack
● Real-time graphing operational metrics
Writes
● asynchronous
● multi-threaded
● multi-process
● scalability
Reads
● performance
● latency
● throughput
News Feed Structure
● romantic moments● text and photos
● you and your friends
● automatic moments● likes
● most recent
● comments● most recent● totals
News Feed Demo
Architecture
● front end
● middle tier
● data stores
Front End
● Jetty
● Rabbit MQ
Middle Tier
● Solr
● Spring
● Ehcache
Solr
● Function Queries
● Request Handlers
Spring
● Dependency Injection
● Method Interception
Ehcache
● topology
● expiration
● eviction
● overflow
Data Stores
● Lucene
● MySql
● HOWL
Lucene
● Index Reader
● Segments
MySql
● moment cluster
● member cluster
● user cluster
● global
High-speed Object Web Logger
● record size
● block size
● flush time
Deployment
● web tier
● search appliance
● workers
Web Tier
● gate keeper
● web API layer
Search Appliance
● Solr● read slaves● write masters
Workers
● synchronizer service
● re-indexer batch job
● recover from transaction log
Load Test Demo
Thanks!
http://zooskdev.wordpress.com/
https://www.zoosk.com/careers.php
http://www.linkedin.com/in/gengstrand