41
Just the Job – Employing Apache Solr for Recruitment Search Charlie Hull, Flax [email protected] @FlaxSearch 19 th October 2011

Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Embed Size (px)

DESCRIPTION

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 Using a case study on a major European executive recruitment company, we will show how we used Apache Lucene/Solr to build powerful, flexible, accurate and scalable search services over tens of millions of CVs and candidate records, allowing the company to completely restructure their IT provision for both local and national offices.

Citation preview

Page 1: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Just the Job – Employing Apache Solr for Recruitment Search

Charlie Hull, [email protected] @FlaxSearch 19th October 2011

Page 2: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

What I Will Cover Who are Flax?

2

Page 3: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

What I Will Cover Who are Flax? The Project & The Solution

3

Page 4: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

What I Will Cover Who are Flax? The Project & The Solution How we did it

• A flexible pipeline in two parts• Transforming the UI• Performance• Issues• Results & benefits

4

Page 5: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

What I Will Cover Who are Flax? The Project & The Solution How we did it

• A flexible pipeline in two parts• Transforming the UI• Performance• Issues• Results & benefits

Conclusions & Lessons Learned• Learning to love open source search

5

Page 6: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Who are Flax? Search engine specialists with decades of

experience Based in Cambridge, U.K. Customers include Financial Times, Durrants

Ltd., Accenture, University of Cambridge UK Authorised Partner of Lucid ImaginationWe also run a Search Meetup:

Start your own - add to www.searchmeetups.com !

Page 7: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

The Project The client: Reed Specialist Recruitment

7

Page 8: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

The Project The client: Reed Specialist Recruitment The data

• Hundreds of millions of items to search• Hundreds of fields in the database schema

(which will change in the future)• CVs (resumés) in Word, PDF formats• Multiple languages

8

Page 9: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

The Project The client: Reed Specialist Recruitment The data

• Hundreds of millions of items to search• Hundreds of fields in the database schema

(which will change in the future)• CVs (resumés) in Word, PDF formats• Multiple languages

The problem• Search takes several minutes• 3000+ users familiar with the old system• No foundation for innovation

9

Page 10: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

The Solution – Apache Solr

Flexible and extendable• This is only the first wave of development • A need for complex business rules to drive the

search – Boosts & FunctionQueries

10

Page 11: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

The Solution – Apache Solr

Flexible and extendable• This is only the first wave of development • A need for complex business rules to drive the

search – Boosts & FunctionQueries Economically scalable

• Much more data to come• Too hard to predict future cost of commercial,

closed source alternatives

11

Page 12: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

The Solution – Apache Solr

Flexible and extendable• This is only the first wave of development • A need for complex business rules to drive the

search – Boosts & FunctionQueries Economically scalable

• Much more data to come• Too hard to predict future cost of commercial,

closed source alternatives Great support available - from and

12

Page 13: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

A flexible pipeline - in two parts

Page 14: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

A flexible pipeline - in two parts

1. Indexer • Reads an XML settings file• Extracts data from Oracle• Processes if necessary• Adds to a Solr index

Page 15: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

A flexible pipeline - in two parts

1. Indexer • Reads an XML settings file• Extracts data from Oracle• Processes if necessary• Adds to a Solr index

2. Config tool• Creates a Solr schema from the Indexer settings• Verifies types and checks for conflicts

Page 16: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

CV

Oracle DB

Solr Index

xml

ProcessesActions

The Indexer

Page 17: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

CV

Oracle DB

Solr Index

xml

CopyAction

The Indexer

Page 18: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

CV

Oracle DB

Solr Index

xml

CVActionCVTikaSource

CVSolrSource

The Indexer

Page 19: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

CV

Oracle DB

Solr Index

xml

MostRecentDateProcess

The Indexer

Page 20: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

CV

Oracle DB

Solr Index

xml

ProcessesActions

The Indexer

Page 21: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

CV

Oracle DB

Solr Index

xml

ProcessesActions

Verify & Generate

Solrschema

.xml

The Indexer & The Config Tool

Page 22: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

The pipeline in code...

Actions<action ref="copyAction" column="EMAIL" field="email" />

Processes<process-map> <process field="boost_date"> <beans:bean class="...MostRecentDateProcess"> ... <beans:value>updateddate</beans:value> <beans:value>createddate</beans:value> ... </process> </process-map>

22

Page 23: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

The pipeline in code...

Actions<action ref="copyAction" column="EMAIL" field="email" type="string" indexed="true" stored="true"/>

Processes<process-map> <process field="boost_date" type="tdate" indexed="true" stored="false"> <beans:bean class="...MostRecentDateProcess"> ... <beans:value>updateddate</beans:value> <beans:value>createddate</beans:value> ... </process> </process-map>

23

Page 24: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

...and a Solr schema

<?xml version="1.0" encoding="UTF-8" ?> <schema> <fields> <field name="email" type="string" indexed="true" stored="true" /> <field name="boost_date" type="tdate" indexed="true" stored="false"/> </fields> </schema>

24

Page 25: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Transforming the UI

Page 26: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Transforming the UI

Page 27: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Transforming the UI

Page 28: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Transforming the UI

Page 29: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Transforming the UI

Page 30: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Transforming the UI

Page 31: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Performance

31

Many factors can affect search performance...

Page 32: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Performance

32

Many factors can affect search performance... ...so we built a test framework

• Randomly generated queries based on terms in the index

• Average query times & number of results recorded

• Allows for direct comparison of boost functions, for example

Page 33: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Performance...much improved!

Sub-second searches Only a single server required So fast that the thin client hardware had to

upgraded as it became a bottleneck! Still work to be done on improving indexing

speed

33

Page 34: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Issues

34

Users don't always understand their new freedoms• Training can be required on free text search,

faceting...• Any issues reduce user confidence in new

systems

Page 35: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Issues

35

Users don't always understand their new freedoms• Training can be required on free text search,

faceting...• Any issues reduce user confidence in new

systems Solr features can conflict with each other

• Make sure you understand how features interact – i.e. recency over relevance, synonyms, stopwords

• Get the basics working first

Page 36: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Results & benefits

Project delivered on time and under budget Now live across 350 offices UK & worldwide 24/7/365 support provided by Lucid Imagination

36

Page 37: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Results & benefits

Project delivered on time and under budget Now live across 350 offices UK & worldwide 24/7/365 support provided by Lucid Imagination

A very happy client!

37

Page 38: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Conclusions & Lessons Learned

38

What we learned• A flexible pipeline is essential• Get the basics working first - watch out for

feature conflict

Page 39: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Conclusions & Lessons Learned

39

What we learned• A flexible pipeline is essential• Get the basics working first - watch out for

feature conflict What Reed learned

• User training is important - even if the new system is “simpler”

• To love Open Source Search...

Page 40: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

Conclusions & Lessons Learned

40

"The transition to Solr was the latest step in our strategy to develop a truly worldclass search application. We believe it provides a robust architecture that meets our future aims, it will scale economically and is a welcome addition to our existing suite of Open Source systems."

Page 41: Just the Job: Employing Solr for Recruitment Search -Charlie Hull

The End

Thanks for listening! For more information please contact me:

Charlie Hull, Managing Director, [email protected]://www.flax.co.uk/blog@FlaxSearch

41