Using Lucene/Solr to Surface the Big Data of Social Media

Preview:

DESCRIPTION

Presented by Glenn Engstrand, Zoosk, Inc - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 Although you need Big Data to effectively implement a large scale social media solution, Hadoop is not always the right tool. This implementation description details how Zoosk is using Solr/Lucene as a NoSql solution to meet the near real-time Big Data needs of a social news feed in its evolution into a Romantic Social Network.

Citation preview

Using Lucene/Solr to Surface the Big Data of Social Media

Glenn EngstrandSenior Software Engineer

Platform TeamZoosk, Inc.

Big Data

In Scope

● volume

● rate● shelf life

Out of Scope

● OLAP

● IDF

Social Media

● social graph

● sharing

● affinity

● activity

Zoosk

● romantic

● engagement

● big

Solr

● search

● distributed

● cached

● NoSql

Search

● queries

● ranking

● fuzzy● explicit

● dynamic● location based

● non-deterministic

Distributed

● sharding

● replication

Cached

● document

● filter

● query result● window size● max docs cached

● auto warming

NoSql

● Consistent

● Available

● Partition Tolerant

● ACID vs BASE

Solr @ Zoosk

● user profile search● online● email

● news feed● find your partner

Best Practices

Service Oriented Architecture

Updating the Index

Searching the Index

SOA

● well defined end points

● thin wrappers on the client stack

● well instrumented server stack

● Real-time graphing operational metrics

Writes

● asynchronous

● multi-threaded

● multi-process

● scalability

Reads

● performance

● latency

● throughput

News Feed Structure

● romantic moments● text and photos

● you and your friends

● automatic moments● likes

● most recent

● comments● most recent● totals

News Feed Demo

Architecture

● front end

● middle tier

● data stores

Front End

● Jetty

● Rabbit MQ

Middle Tier

● Solr

● Spring

● Ehcache

Solr

● Function Queries

● Request Handlers

Spring

● Dependency Injection

● Method Interception

Ehcache

● topology

● expiration

● eviction

● overflow

Data Stores

● Lucene

● MySql

● HOWL

Lucene

● Index Reader

● Segments

MySql

● moment cluster

● member cluster

● user cluster

● global

High-speed Object Web Logger

● record size

● block size

● flush time

Deployment

● web tier

● search appliance

● workers

Web Tier

● gate keeper

● web API layer

Search Appliance

● Solr● read slaves● write masters

Workers

● synchronizer service

● re-indexer batch job

● recover from transaction log

Load Test Demo

Thanks!

http://zooskdev.wordpress.com/

https://www.zoosk.com/careers.php

http://www.linkedin.com/in/gengstrand

Recommended