Elgg solr presentation

  • Upload
    beck24

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Blue and Red Gradient

Elgg Search Scalability & Solr Integration

Matt Beckett

Community search moved to solr Jun 17, 2014

Matt Beckett

Elgg Core Team Member

Lead Dev Arck Interactive

Scuba Diver

Hello and welcome.

My name is Matt Beckett, you may know me from such places as the internet and underwater.I have been involved with Elgg since April, 2011 and quickly became a very productive plugin writer for various clients, notably Athabasca University.I have been a member of the Elgg core team since October 2013.I'm also the lead developer at Arck Interactive, one of the top Elgg dev outfits.

Sorry for the shameless plug, but every time I say Arck Interactive Paul gives me a raise ;)

Outline

Bundled Elgg Search

Scalability issues

Birth of the Elgg Solr Plugin

What is Solr?

Elgg-Solr integration

Customization

Case Study

Before we dive into the code lets just back up a bit and take a look at the history of search in Elgg.I came to Elgg when at version 1.7.8, and search was a bundled core plugin. It has been since 1.7.0.According to the code attribution it was a collaborative effort between Curverider and The MITRE Corporation (oh, also, whenever I say The MITRE Corporation they send me a contract offer worth more than Paul's last raise so t his should be a profitable trip!)

The core plugin brought some important features to search capability a standardized hook based framework and a nice way to customize results display with simple view overrides.

The plugin is mostly unchanged to from that point to now

Elgg Search

Bundled core plugin

Provides customizable UI

Search logic is hookable

Works out of the box

Bundled with Elgg this is something people expect with a social framework, the ability to search, and there it is supported in core.

Works as advertised you type in a query, and you get results matching that query. No magic involved and not much unexpected.

No setup/config it comes enabled by default and there's nothing else to it. No APIs, external services or technical debt.

Elgg Search Scalability

Large sites run into slow search times

Can affect performance of all areas of site

Combination of MySQL and Elgg data normalization

Elgg Community - 2014

Bundled with Elgg this is something people expect with a social framework, the ability to search, and there it is supported in core.

Works as advertised you type in a query, and you get results matching that query. No magic involved and not much unexpected.

No setup/config it comes enabled by default and there's nothing else to it. No APIs, external services or technical debt.

Billy GunnMyISAM a big part of the standard elgg performance improvements include converting database tables to innodb for row level transactions. Benchmarks have consistently shown this to be a faster overall schema, but until recently innodb did not support full text search

Not scalable with DB size: we saw this on the Elgg community. Back in 2014 we had to switch over to google search because searches were timing out over 30 seconds...

Tag search: we have the ability to register multiple names for tag metadata, each one causes the query to become heavier

SLOW: filtering results by arbitrary metadata

MyISAM a big part of the standard elgg performance improvements include converting database tables to innodb for row level transactions. Benchmarks have consistently shown this to be a faster overall schema, but until recently innodb did not support full text search

Not scalable with DB size: we saw this on the Elgg community. Back in 2014 we had to switch over to google search because searches were timing out over 30 seconds...

Tag search: we have the ability to register multiple names for tag metadata, each one causes the query to become heavier

SLOW: filtering results by arbitrary metadata

So that's core search in Elgg, so what is Solr?

What is Solr?

Java based search engine

Single purpose and built for speed

Flat xml document structure

File content searching

Flexible setup options (same/different server, load balancing)

First and foremost Solr is a java based search engine. It's a single purpose application built for speed of searching xml documents. XML documents have arbitrary fields so they can be fit to model your data.It has a file parser that allows for indexing the content of a wide array of file types.

Being java based it's OS independent and can be deployed on the same webserver as other applications such as Elgg, or load balanced on multiple servers.

Solr Plugin Design

Generic for use in any Elgg project

Utilize existing:- Pagehandlers- Views- Hooks

First and foremost Solr is a java based search engine. It's a single purpose application built for speed of searching xml documents. XML documents have arbitrary fields so they can be fit to model your data.It has a file parser that allows for indexing the content of a wide array of file types.

Being java based it's OS independent and can be deployed on the same webserver as other applications such as Elgg, or load balanced on multiple servers.

So why did I choose Solr? I didn't. The Solr plugin was originally started as a solution by Billy Gunn for one of our clients with a large database that was experiencing some major performance issues with search. It is however an official FOSS project of the Apache Software Foundation, and is used by many big players which means it's well tested, well maintained, and well supported. Those are all good qualities to look for when pulling a new service into a project.Billy did the original implementation for the client, I then took over and made some improvements, eventually rewriting it and making it generic enough for general release as an opensource Elgg plugin.

So why did I choose Solr? I didn't. The Solr plugin was originally started as a solution by Billy Gunn for one of our clients with a large database that was experiencing some major performance issues with search. It is however an official FOSS project of the Apache Software Foundation, and is used by many big players which means it's well tested, well maintained, and well supported. Those are all good qualities to look for when pulling a new service into a project.Billy did the original implementation for the client, I then took over and made some improvements, eventually rewriting it and making it generic enough for general release as an opensource Elgg plugin.

Indexing

Mirroring an ElggEntity in Solr

Hookable custom field management

Flatten data structure

Match Solr entity with ElggEntity by GUID

Event-based synchronization

How it works

ElggEntityAnnotationElgg DBcreate/updateEventEventEvent

Shutdown

CachedGUIDSolr Index

Searching

Pagehandler & hook calling handled by core plugin

Default hooks unregistered

Hook parameters interpreted into Solr Query notation

All default parameters handled automagically

Search Hook Parameters

$params['select'] = ['start' => (int) offset,'rows' => (int) limit,'fields' => (array) field names to match against];$params['sorts'] = ['score' => 'desc','time_created' => 'desc'];

Search Hook Parameters

$params['qf'] = title^1.5 description^1 location^1;

$params['hlfields'] = ['title','description'];

$params['fragsize'] = 200;

Search Hook Parameters

$params['fq'] = ['type' => 'type:object','subtype' => 'subtype:blog'];

eg. $params['fq'] = ['profile_pic' => 'profile_pic:true'];

How it works

Elgg DB

SearchPagehander

Hook

Solr IndexUserQuerySolarium

Solarium

Hook

Results

Code time, finally

$event = new \ElggObject();$event->subtype = 'event';$event->access_id = ACCESS_PUBLIC;$event->title = $title;$event->description = $description;$event->location = $location;$event->start_time = time(); // starting now$event->end_time = strtotime('+3 days'); // ending in 3 days

Helper Plugin

Dynamic fields

_i : integer_is : array of integers_s : string (title)_ss : array of strings (tags, etc)_t : general text (description)_txt : array of texts_b : boolean_bs : array of booleans_f : float_fs : array of floats

Case Study: EN MIRG

Executive Networks

Member Information Report Generator

Staff facilitated communication

Multiple reports with varying conditions

Solr to the rescue!

Conclusions

MySQL: 41.89 secondsSolr: 0.29 secondsSolr === Fast(144x faster in this case)

Todos

Https://github.com/arckinteractive/elgg_solr

Code cleanup

Multi-threaded reindex

Index auto-correction

Other ideas?