View
733
Download
5
Category
Preview:
DESCRIPTION
Motivation for using solr as a NoSQL backend
Citation preview
Tareque Hossain Sr. Software Engineer
The Power
What about it?
• We always associate solr with searching • solr can also serve as your non-‐relational data layer
NoSQL ? solr ?
Why solr?
• Hey solr is already part of my stack • I love solr • It’s fast, scalable and there are some great python interfaces out there
When would you consider it?
• You have a DB that’s frequently read and infrequently written
• You want robust search & filtering on your data
• You want to leverage the faceting feature • You want a decently scalable data layer
What’s not so cool?
• Doesn’t support transactions • Not all SQL queries can be translated into solr queries
• Generating indices can take a long time • Searching and indexing at the same time brings down performance
But..
• You don’t have to give up your relational data layer
• Create a non-‐relational layer on top of your relational data layer
• Get best of the both worlds
So what’s the use case?
• We deal with medical survey data • Say: – About 300 multiple choice questions – Responses can be multi-‐dimensional – 7000+ different answer choices per question – 2000+ respondents per survey – 15+ surveys and growing
Osteoarthritis Rheumatoid Arthritis
Traumatic Arthritis
Psoriatic Arthritis Other
Less than a year ago þ ☐ ☐ ☐ ☐
More than a year ago ☐ ☐ þ ☐ ☐
When were you diagnosed with the following types of Arthri5s?
What a survey question looks like
When were you diagnosed with the following types of Arthri5s?
Osteoarthritis Rheumatoid Arthritis
Traumatic Arthritis
Psoriatic Arthritis Other
Less than a year ago 1 0 0 0 0
More than a year ago 0 0 1 0 0
Storing a single response
When were you diagnosed with the following types of Arthri5s?
Osteoarthritis Rheumatoid Arthritis
Traumatic Arthritis
Psoriatic Arthritis Other
Less than a year ago 63 155 19 27 268
More than a year ago 190 46 8 213 325
Aggregating over 2000 responses
The Document Structure
• Each survey response = solr document • Up to 3000 boolean variables per document indicating chosen answers
• Added meta information: age, profession, interests
Querying
• Filter by age, interest, profession • Facet across boolean field • Result: what group of people chose what group of answers
Why solr is awesome..
• Faceting across boolean field uses very little memory
• Combining 3000 fields for 2000 documents takes 1 ~ 2 ms
• Allowed us to reduce API response time from a variable of 2 ~ 15 seconds (sucked!) to an almost constant ~50 ms
Good to know..
• sunburnt: Awesome python solr interface github.com/tow/sunburnt
• Programmatic querying as well as raw queries
• Supports most advanced solr options • If you only required facets, specify rows=0
Questions?
• wisertogether.com • slideshare.net/tarequeh/the-‐solr-‐power • @tarequeh
Recommended