32
Using Lucene/Solr to Surface the Big Data of Social Media Glenn Engstrand Senior Software Engineer Platform Team Zoosk, Inc.

Using Lucene/Solr to Surface the Big Data of Social Media

Embed Size (px)

DESCRIPTION

Presented by Glenn Engstrand, Zoosk, Inc - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 Although you need Big Data to effectively implement a large scale social media solution, Hadoop is not always the right tool. This implementation description details how Zoosk is using Solr/Lucene as a NoSql solution to meet the near real-time Big Data needs of a social news feed in its evolution into a Romantic Social Network.

Citation preview

Page 1: Using Lucene/Solr to Surface the Big Data of Social Media

Using Lucene/Solr to Surface the Big Data of Social Media

Glenn EngstrandSenior Software Engineer

Platform TeamZoosk, Inc.

Page 2: Using Lucene/Solr to Surface the Big Data of Social Media

Big Data

In Scope

● volume

● rate● shelf life

Out of Scope

● OLAP

● IDF

Page 3: Using Lucene/Solr to Surface the Big Data of Social Media

Social Media

● social graph

● sharing

● affinity

● activity

Page 4: Using Lucene/Solr to Surface the Big Data of Social Media

Zoosk

● romantic

● engagement

● big

Page 5: Using Lucene/Solr to Surface the Big Data of Social Media

Solr

● search

● distributed

● cached

● NoSql

Page 6: Using Lucene/Solr to Surface the Big Data of Social Media

Search

● queries

● ranking

● fuzzy● explicit

● dynamic● location based

● non-deterministic

Page 7: Using Lucene/Solr to Surface the Big Data of Social Media

Distributed

● sharding

● replication

Page 8: Using Lucene/Solr to Surface the Big Data of Social Media

Cached

● document

● filter

● query result● window size● max docs cached

● auto warming

Page 9: Using Lucene/Solr to Surface the Big Data of Social Media

NoSql

● Consistent

● Available

● Partition Tolerant

● ACID vs BASE

Page 10: Using Lucene/Solr to Surface the Big Data of Social Media

Solr @ Zoosk

● user profile search● online● email

● news feed● find your partner

Page 11: Using Lucene/Solr to Surface the Big Data of Social Media

Best Practices

Service Oriented Architecture

Updating the Index

Searching the Index

Page 12: Using Lucene/Solr to Surface the Big Data of Social Media

SOA

● well defined end points

● thin wrappers on the client stack

● well instrumented server stack

● Real-time graphing operational metrics

Page 13: Using Lucene/Solr to Surface the Big Data of Social Media

Writes

● asynchronous

● multi-threaded

● multi-process

● scalability

Page 14: Using Lucene/Solr to Surface the Big Data of Social Media

Reads

● performance

● latency

● throughput

Page 15: Using Lucene/Solr to Surface the Big Data of Social Media

News Feed Structure

● romantic moments● text and photos

● you and your friends

● automatic moments● likes

● most recent

● comments● most recent● totals

Page 16: Using Lucene/Solr to Surface the Big Data of Social Media

News Feed Demo

Page 17: Using Lucene/Solr to Surface the Big Data of Social Media

Architecture

● front end

● middle tier

● data stores

Page 18: Using Lucene/Solr to Surface the Big Data of Social Media

Front End

● Jetty

● Rabbit MQ

Page 19: Using Lucene/Solr to Surface the Big Data of Social Media

Middle Tier

● Solr

● Spring

● Ehcache

Page 20: Using Lucene/Solr to Surface the Big Data of Social Media

Solr

● Function Queries

● Request Handlers

Page 21: Using Lucene/Solr to Surface the Big Data of Social Media

Spring

● Dependency Injection

● Method Interception

Page 22: Using Lucene/Solr to Surface the Big Data of Social Media

Ehcache

● topology

● expiration

● eviction

● overflow

Page 23: Using Lucene/Solr to Surface the Big Data of Social Media

Data Stores

● Lucene

● MySql

● HOWL

Page 24: Using Lucene/Solr to Surface the Big Data of Social Media

Lucene

● Index Reader

● Segments

Page 25: Using Lucene/Solr to Surface the Big Data of Social Media

MySql

● moment cluster

● member cluster

● user cluster

● global

Page 26: Using Lucene/Solr to Surface the Big Data of Social Media

High-speed Object Web Logger

● record size

● block size

● flush time

Page 27: Using Lucene/Solr to Surface the Big Data of Social Media

Deployment

● web tier

● search appliance

● workers

Page 28: Using Lucene/Solr to Surface the Big Data of Social Media

Web Tier

● gate keeper

● web API layer

Page 29: Using Lucene/Solr to Surface the Big Data of Social Media

Search Appliance

● Solr● read slaves● write masters

Page 30: Using Lucene/Solr to Surface the Big Data of Social Media

Workers

● synchronizer service

● re-indexer batch job

● recover from transaction log

Page 31: Using Lucene/Solr to Surface the Big Data of Social Media

Load Test Demo

Page 32: Using Lucene/Solr to Surface the Big Data of Social Media

Thanks!

http://zooskdev.wordpress.com/

https://www.zoosk.com/careers.php

http://www.linkedin.com/in/gengstrand