Download pdf - ElasticSearch with Tire

Transcript
Page 1: ElasticSearch with Tire

ElasticSearch with Tire@AbookYun, Polydice Inc.

1Wednesday, February 6, 13

Page 2: ElasticSearch with Tire

It’s all about Search

• How does search work?

• ElasticSearch

• Tire

2Wednesday, February 6, 13

Page 3: ElasticSearch with Tire

How does search work?

A collection of articles

• Article.find(1).to_json{ title: “One”, content: “The ruby is a pink to blood-red colored gemstone.” }

• Article.find(2).to_json{ title: “Two”, content: “Ruby is a dynamic, reflective, general-purpose object-oriented programming language.” }

• Article.find(3).to_json{ title: “Three”, content: “Ruby is a song by English rock band.” }

3Wednesday, February 6, 13

Page 4: ElasticSearch with Tire

How does search work?

How do you search?

Article.where(“content like ?”, “%ruby%”)

4Wednesday, February 6, 13

Page 5: ElasticSearch with Tire

How does search work?

The inverted indexT0 = “it is what it is”T1 = “what is it”T2 = “it is a banana”

“a”: {2}“banana”: {2}“is”: {0, 1, 2}“it”: {0, 1, 2}“what”: {0, 1}

A term search for the terms “what”, “is” and “it”{0, 1} ∩ {0, 1} ∩ {0, 1, 2} = {0, 1}

5Wednesday, February 6, 13

Page 6: ElasticSearch with Tire

How does search work?

The inverted indexTOKEN ARTICLESARTICLESARTICLES

ruby article_1 article_2 article_3

pink article_1

gemstone article_1

dynamic article_2

reflective article_2

programming article_2

song article_3

english article_3

rock article_3

6Wednesday, February 6, 13

Page 7: ElasticSearch with Tire

How does search work?

The inverted indexArticle.search(“ruby”)Article.search(“ruby”)Article.search(“ruby”)Article.search(“ruby”)

ruby article_1 article_2 article_3

pink article_1

gemstone article_1

dynamic article_2

reflective article_2

programming article_2

song article_3

english article_3

rock article_3

7Wednesday, February 6, 13

Page 8: ElasticSearch with Tire

How does search work?

The inverted indexArticle.search(“song”)Article.search(“song”)Article.search(“song”)Article.search(“song”)

ruby article_1 article_2 article_3

pink article_1

gemstone article_1

dynamic article_2

reflective article_2

programming article_2

song article_3

english article_3

rock article_3

8Wednesday, February 6, 13

Page 9: ElasticSearch with Tire

module SimpleSearch def index document, content tokens = analyze content store document, tokens puts "Indexed document #{document} with tokens:", tokens.inspect, "\n" end

def analyze content # Split content by words into "tokens" content.split(/\W/). # Downcase every word map { |word| word.downcase }. # Reject stop words, digits and whitespace reject { |word| STOPWORDS.include?(word) || word =~ /^\d+/ || word == '' } end

def store document_id, tokens tokens.each do |token| ((INDEX[token] ||= []) << document_id).uniq! end end

def search token puts "Results for token '#{token}':" INDEX[token].each { |document| " * #{document}" } end

INDEX = {} STOPWORDS = %w(a an and are as at but by for if in is it no not of on or that the then there)

extend selfend

9Wednesday, February 6, 13

Page 10: ElasticSearch with Tire

SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.”SimpleSearch.index “article2”, “Ruby is a song.”SimpleSearch.index “article3”, “Ruby is a stone.”SimpleSearch.index “article4”, “Java is a language.”

How does search work?

Indexing documents

10Wednesday, February 6, 13

Page 11: ElasticSearch with Tire

SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.”SimpleSearch.index “article2”, “Ruby is a song.”SimpleSearch.index “article3”, “Ruby is a stone.”SimpleSearch.index “article4”, “Java is a language.”

Indexed document article1 with tokens:[“ruby”, “language”, “java”, “also”, “language”]Indexed document article2 with tokens:[“ruby”, “song”]Indexed document article3 with tokens:[“ruby”, “stone”]Indexed document article4 with tokens:[“java”, “language”]

How does search work?

Indexing documents

11Wednesday, February 6, 13

Page 12: ElasticSearch with Tire

print SimpleSearch::INDEX

{“ruby” => [“article1”, “article2”, “article3”],“language” => [“article1”, “article4”],“java” => [“article1”, “article4”],“also” => [“article1”],“stone” => [“article3”],“song” => [“article2”]

}

How does search work?

Index

12Wednesday, February 6, 13

Page 13: ElasticSearch with Tire

SimpleSearch.search “ruby”

Results for token ‘ruby’:* article1* article2* article3

How does search work?

Search the index

13Wednesday, February 6, 13

Page 14: ElasticSearch with Tire

How does search work?

Search is ...

Inverted Index{ “ruby”: [1,2,3], “language”: [1,4] }

+

Relevance Scoring

• How many matching terms does this document contain?

• How frequently does each term appear in all your documents?

• ... other complicated algorithms.

14Wednesday, February 6, 13

Page 15: ElasticSearch with Tire

ElasticSearch

ElasticSearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene.

http://github.com/elasticsearch/elasticsearch

15Wednesday, February 6, 13

Page 16: ElasticSearch with Tire

ElasticSearch

TerminologyRelational DB ElasticSearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Index *Everything

SQL query DSL

16Wednesday, February 6, 13

Page 17: ElasticSearch with Tire

# Add document

curl -XPUT ‘http://localhost:9200/articles/article/1’ -d ‘{ “title”: “One” }

# Delete document

curl -XDELETE ‘http://localhost:9200/articles/article/1’

# Search

curl -XGET ‘http://localhost:9200/articles/_search?q=One’

ElasticSearch

RESTful

17Wednesday, February 6, 13

Page 18: ElasticSearch with Tire

# Querycurl -XGET ‘http://localhost:9200/articles/article/_search’ -d ‘{ “query”: { “term”: { “title”: “One” } }}’# Results

{ “_shards”: { “total”: 5, “success”: 5, “failed”: 0 }, “hits”: { “total”: 1, “hits”: [{ “_index”: “articles”,

“_type”: “article”, “_id”: “1”, “_source”: { “title”: “One”, “content”: “Ruby is a pink to blood-red colored gemstone.” } }] }

ElasticSearch

JSON in / JSON out

18Wednesday, February 6, 13

Page 19: ElasticSearch with Tire

ElasticSearch

Distributed

Automatic Discovery Protocol

Node 1 Node 2 Node 3 Node 4Master

The discovery module is responsible for discovering nodes within a cluster, as well as electing a master node.

The responsibility of the master node is to maintain the global cluster global cluster state, and act if nodes join or leave the cluster by reassigning shards.

19Wednesday, February 6, 13

Page 20: ElasticSearch with Tire

ElasticSearch

Distributed

Index A

by default, every Index will split into 5 shards and duplicated in 1 replicas.

A3A2A1 A5A4

A3’A2’A1’ A5’A4’

Shards

Replicas

20Wednesday, February 6, 13

Page 21: ElasticSearch with Tire

Queries

- query_string

- term

- wildcard

- boosting

- bool

- filtered

- fuzzy

- range

- geo_shape

- ...

Filters

- term

- query

- range

- bool

- and

- or

- not

- limit

- match_all

- ...

ElasticSearch

Query DSL

21Wednesday, February 6, 13

Page 22: ElasticSearch with Tire

Queries

- query_string

- term

- wildcard

- boosting

- bool

- filtered

- fuzzy

- range

- geo_shape

- ...

Filters

- term

- query

- range

- bool

- and

- or

- not

- limit

- match_all

- ...

ElasticSearch

Query DSL

With RelevanceWithout Cache

With CacheWithout Relevance

22Wednesday, February 6, 13

Page 23: ElasticSearch with Tire

curl -X DELETE "http://localhost:9200/articles"

curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "One", "tags" : ["foo"]}'

curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Two", "tags" : ["foo", "bar"]}'curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}'

curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '

{

"query" : { "query_string" : {"query" : "T*"} }, "facets" : {

"tags" : { "terms" : {"field" : "tags"} } }

}'

ElasticSearch

Facets

23Wednesday, February 6, 13

Page 24: ElasticSearch with Tire

"facets" : {

"tags" : {

"_type" : "terms", "missing" : 0,

"total": 5, "other": 0,

"terms" : [ {

"term" : "foo", "count" : 2

}, { "term" : "bar",

"count" : 2

}, { "term" : "baz",

"count" : 1 } ]

}

ElasticSearch

Facets

24Wednesday, February 6, 13

Page 25: ElasticSearch with Tire

curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d '{ "article": { "properties": { "tags": { "type": "string", "analyzer": "keyword" }, "title": { "type": "string", "analyzer": "snowball", "boost": 10.0 }, "content": { "type": "string", "analyzer": "snowball" } } }}'curl -XGET 'http://localhost:9200/articles/article/_mapping'

ElasticSearch

Mapping

25Wednesday, February 6, 13

Page 26: ElasticSearch with Tire

curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d '{ “article”: { “properties”: { “title”: { “type”: “string”, “analyzer”: “trigrams” } } }}’curl -XPUT ‘localhost:9200/articles/article -d ‘{ “title”: “cupertino” }’

ElasticSearch

Analyzer

C

C n oiu p e r t

u p

u p e

p e r

. . .

26Wednesday, February 6, 13

Page 27: ElasticSearch with Tire

Tire

A rich Ruby API and DSL for the ElasticSearch search engine.

http://github.com/karmi/tire/

27Wednesday, February 6, 13

Page 28: ElasticSearch with Tire

Tire

ActiveRecord Integration# New rails application$ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb

# Callbackclass Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks

end

# Create a articleArticle.create :title => "I Love Elasticsearch", :content => "...", :author => "Captain Nemo", :published_on => Time.now

# SearchArticle.search do

query { string 'love' } facet('timeline') { date :published_on, :interval => 'month' } sort { by :published_on, 'desc' }end

28Wednesday, February 6, 13

Page 29: ElasticSearch with Tire

Tire

ActiveRecord Integrationclass Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks # Setting settings :number_of_shards => 3, :number_of_replicas => 2, :analysis => {

:analyzer => { :url_analyzer => { ‘tokenizer’ => ‘lowercase’, ‘filter’ => [‘stop’, ‘url_ngram’] } } }

# Mapping mapping do

indexes :title, :analyzer => :not_analyzer, :boost => 100 indexes :content, :analyzer => ‘snowball’ endend

29Wednesday, February 6, 13

Page 31: ElasticSearch with Tire

Thanks

31Wednesday, February 6, 13