Transcript
Page 1: Fazendo mágica com ElasticSearch

PEDROFRANCESCHI @pedroh96

[email protected] github.com/pedrofranceschi

Fazendo mágica com ElasticSearch

Page 2: Fazendo mágica com ElasticSearch

Outubro/2010

Page 3: Fazendo mágica com ElasticSearch

Filters

Full text search

Sort

Highlight

Facets

Pagination

Page 4: Fazendo mágica com ElasticSearch

Você vai precisar buscar dados.

Page 5: Fazendo mágica com ElasticSearch

Você vai precisar entender dados.

Page 6: Fazendo mágica com ElasticSearch

(My)SQL não é a solução.

(… nem NoSQL)

Page 7: Fazendo mágica com ElasticSearch

O que é o ElasticSearch?

Page 8: Fazendo mágica com ElasticSearch

ElasticSearch

• “Open Source Distributed Real Time Search & Analytics”

• API RESTful para indexar/buscar JSONs (“NoSQL”)

• NÃO é um banco de dados

• Apache Lucene

• Just works (and scales)

• Full text search, aggregations, scripting, etc, etc, etc.

Page 9: Fazendo mágica com ElasticSearch

MySQL ElasticSearch

Database Index

Table Type

Row Document

Column Field

Schema Mapping

Partition Shard

Nomes?

Page 10: Fazendo mágica com ElasticSearch

Como usar o ElasticSearch?

Page 11: Fazendo mágica com ElasticSearch

$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{! "user" : “pedroh96",! "post_date" : "2009-11-15T14:12:12",! "message" : "trying out Elasticsearch"!}'

Endpoint Index TypeDocument

ID

Document

{! "_index" : "twitter",! "_type" : "tweet",! "_id" : "1",! "_version" : 1,! "created" : true!}

PUT data

Page 12: Fazendo mágica com ElasticSearch

$ curl -XGET 'http://localhost:9200/twitter/tweet/1'

Endpoint Index TypeDocument

ID

{! "_id": "1",! "_index": "twitter",! "_source": {! "message": "trying out Elasticsearch",! "post_date": "2009-11-15T14:12:12",! "user": "pedroh96"! },! "_type": "tweet",! "_version": 1,! "found": true!}

Document

GET data

Page 13: Fazendo mágica com ElasticSearch

$ curl -XGET 'http://localhost:9200/twitter/_search'!-d ‘{ query: . . . }!!!!!!!!!!

Endpoint IndexGET data

Query de busca

Operador de busca

Page 14: Fazendo mágica com ElasticSearch

ActiveRecords

class Tweet < ActiveRecord::Base!end

Page 15: Fazendo mágica com ElasticSearch

ActiveRecords

require 'elasticsearch/model'!!class Tweet < ActiveRecord::Base! include Elasticsearch::Model! include Elasticsearch::Model::Callbacks!end!!

Page 16: Fazendo mágica com ElasticSearch

Tweet.import

Page 17: Fazendo mágica com ElasticSearch

Tweet.search(“pedroh96”)

Page 18: Fazendo mágica com ElasticSearch

Por que usar o ElasticSearch?

Page 19: Fazendo mágica com ElasticSearch

DISCLAIMER

Page 20: Fazendo mágica com ElasticSearch

Post.where(:all, :author => "pedroh96")

vs

Post.search(query: { match: { author: "pedroh96" }})

Just Another Query Language?

Page 21: Fazendo mágica com ElasticSearch

1) Full text search

Page 22: Fazendo mágica com ElasticSearch

ActiveRecords

$ rails g scaffold Post title:string! source:string

Page 23: Fazendo mágica com ElasticSearch

Post.find(5)

GET /posts/5

:-)

ActiveRecords

Page 24: Fazendo mágica com ElasticSearch

Post.where(:all, :title => "Amazon to Buy Video Site Twitch for More Than $1B")

“Amazon to Buy Video Site Twitch for More Than $1B”

ActiveRecords

:-)

Page 25: Fazendo mágica com ElasticSearch

Post.where(["title LIKE ?", "%Amazon%"])

“amazon”

???

ActiveRecords

Page 26: Fazendo mágica com ElasticSearch

Post.where(["title LIKE ? AND source = ?", "%Amazon%", "online.wsj.com"])

“amazon source:online.wsj.com”

??????

ActiveRecords

Page 27: Fazendo mágica com ElasticSearch

Post.search("amazon")

“amazon”

:-)

ElasticSearch

Page 28: Fazendo mágica com ElasticSearch

“amazon source:online.wsj.com”

ElasticSearch

search = Post.search("amazon source:online.wsj.com")

:-)

Page 29: Fazendo mágica com ElasticSearch

“amazon source:online.wsj.com”

ElasticSearch

search = Post.search( query:{ match: { _all: "amazon source:online.wsj.com", } })

Full-text search

Page 30: Fazendo mágica com ElasticSearch

“amazon source:online.wsj.com”

ElasticSearch

search = Post.search( query:{ multi_match: { query: "amazon source:online.wsj.com", fields: ['title^10', 'source'] } })

Full-text search

Title boost

Page 31: Fazendo mágica com ElasticSearch

“amazon source:online.wsj.com”

ElasticSearch

search = Post.search( query:{ multi_match: { query: "amazon source:online.wsj.com", fields: ['title^10', 'source'] } }, highlight: { fields: { title: {} } })

Title highlight

Full-text search

Title boost

Page 32: Fazendo mágica com ElasticSearch

ElasticSearch

> search.results[0].highlight.title => ["Twitch officially acquired by <em>Amazon</em>"]

Title highlight

Page 33: Fazendo mágica com ElasticSearch
Page 34: Fazendo mágica com ElasticSearch

2) Aggregations (faceting)

Page 35: Fazendo mágica com ElasticSearch

Geo distance aggregation

Page 36: Fazendo mágica com ElasticSearch
Page 37: Fazendo mágica com ElasticSearch

ActiveRecords

$ rails g scaffold Coordinate latitude:decimal longitude:decimal

Page 38: Fazendo mágica com ElasticSearch

ActiveRecords

class Coordinate < ActiveRecord::Base!end

Page 39: Fazendo mágica com ElasticSearch

ActiveRecords

class Coordinate < ActiveRecord::Base! def distance_to(coordinate)! # From http://en.wikipedia.org/wiki/Haversine_formula! rad_per_deg = Math::PI/180 # PI / 180! rkm = 6371 # Earth radius in kilometers! rm = rkm * 1000 # Radius in meters!! dlon_rad = (coordinate.longitude.to_f - self.longitude.to_f) * rad_per_deg # Delta, converted to rad! dlat_rad = (coordinate.latitude.to_f - self.latitude.to_f) * rad_per_deg!! lat1_rad = coordinate.latitude.to_f * rad_per_deg! lat2_rad = self.latitude.to_f * rad_per_deg! lon1_rad = coordinate.longitude.to_f * rad_per_deg! lon2_rad = self.longitude.to_f * rad_per_deg!! a = Math.sin(dlat_rad/2)**2 + Math.cos(lat1_rad) * Math.cos(lat2_rad) * Math.sin(dlon_rad/2)**2! c = 2 * Math::atan2(Math::sqrt(a), Math::sqrt(1-a))!! rm * c # Delta in meters! end!end

> c1 = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) > c2 = Coordinate.new(:latitude => -23.5538488, :longitude => -46.6530035) > c1.distance_to(c2) => 66.07749735875552

Page 40: Fazendo mágica com ElasticSearch

ActiveRecordsorigin = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908)

Coordinate.all.each do |coordinate|! distance = origin.distance_to(coordinate)!! buckets.each do |bucket|! if distance < bucket[:to] and distance > (bucket[:from] || 0)! bucket[:coordinates] << coordinate! end! end!end

buckets = [! {! :to => 100,! :coordinates => []! },! {! :from => 100,! :to => 300,! :coordinates => []! },! {! :from => 300,! :coordinates => []! }!]!

??????

Page 41: Fazendo mágica com ElasticSearch

ElasticSearch

query = {! aggregations: {! rings_around_rubyconf: {! geo_distance: {! field: "location",! origin: "-23.5532636, -46.6528908",! ranges: [! { to: 100 },! { from: 100, to: 300 },! { from: 300 }! ]! }! }! }!}

:-)search = Coordinate.search(query)

Nome da aggregation

Field com localização

Buckets para agregar

Coordenadas da origem

Tipo da aggregation

Page 42: Fazendo mágica com ElasticSearch

(Extended) stats aggregation

Page 43: Fazendo mágica com ElasticSearch

ActiveRecords

$ rails g scaffold Grade subject:string grade:decimal

Page 44: Fazendo mágica com ElasticSearch

ElasticSearch

query = {! aggregations: {! grades_stats: {! extended_stats: {! field: "grade",! }! }! }!}!!search = Grade.search(query)

Nome da aggregation

Nome do field

Tipo da aggregation

Page 45: Fazendo mágica com ElasticSearch

ElasticSearch

> search.response.aggregations.grades_stats!!=> #<Hashie::Mash avg=8.03 count=3 max=10.0 min=4.6 std_deviation=2.43 sum=24.1 sum_of_squares=211.41 variance=5.93>>

Page 46: Fazendo mágica com ElasticSearch

(Extended) stats aggregation +

Scripting

Page 47: Fazendo mágica com ElasticSearch

ElasticSearch

query = {! aggregations: {! grades_stats: {! extended_stats: {! field: "grade",! }! }! }!}

Page 48: Fazendo mágica com ElasticSearch

ElasticSearch

query = {! aggregations: {! grades_stats: {! extended_stats: {! field: "grade",! script: "_value < 7.0 ? _value * correction : _value",! params: {! correction: 1.2! }! }! }! }!}!!search = Grade.search(query)

Nome da aggregation

Nome do fieldJavaScript para

calcular novo grade

Tipo da aggregation

Page 49: Fazendo mágica com ElasticSearch

ElasticSearch

> search.response.aggregations.grades_stats!!=> #<Hashie::Mash avg=8.34 count=3 max=10.0 min=5.52 std_deviation=2.00 sum=25.02 sum_of_squares=220.72 variance=4.01>>

Page 50: Fazendo mágica com ElasticSearch

Term aggregation

Page 51: Fazendo mágica com ElasticSearch

ElasticSearch

query = {! aggregations: {! subjects: {! terms: {! field: "subject"! }! }! }!}!!search = Grade.search(query)

Nome da aggregation

Nome do field

Tipo da aggregation

Page 52: Fazendo mágica com ElasticSearch

ElasticSearch

> search.response.aggregations.subjects!!=> #<Hashie::Mash buckets=[!#<Hashie::Mash doc_count=2 key=“math">,!#<Hashie::Mash doc_count=1 key="grammar">, #<Hashie::Mash doc_count=1 key=“physics">!]>

Page 53: Fazendo mágica com ElasticSearch

Combined aggregations (term + stats)

Page 54: Fazendo mágica com ElasticSearch

ElasticSearch

query = {! aggregations: {! subjects: {! terms: {! field: "subject"! }! }! }!}!!search = Grade.search(query)

Page 55: Fazendo mágica com ElasticSearch

ElasticSearch

query = {! aggregations: {! subjects: {! terms: {! field: "subject"! },! aggregations: {! grade_stats: {! stats: {! field: "grade"! }! }! }! }! }!}!!search = Grade.search(query)

Nome da parent aggregation

Field para parent aggregation

Field para child aggregation

Nome da child aggregation

Page 56: Fazendo mágica com ElasticSearch

ElasticSearch

> search.response.aggregations.subjects!!#<Hashie::Mash buckets=[!#<Hashie::Mash doc_count=2 grade_stats=#<Hashie::Mash avg=9.0 count=2 max=10.0 min=8.0 sum=18.0> key="math">, #<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash avg=4.6 count=1 max=4.6 min=4.6 sum=4.6> key="grammar">, #<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash avg=9.5 count=1 max=9.5 min=9.5 sum=9.5> key=“physics">!]>

Page 57: Fazendo mágica com ElasticSearch

Top Hits

More like this

Histogram

Scripted metrics

Geo bounds

Stemmer (sinônimos)

IPv4 ranges

. . .

Page 58: Fazendo mágica com ElasticSearch

3) Scoring

Page 59: Fazendo mágica com ElasticSearch

ActiveRecords

$ rails g scaffold Post title:string! source:string likes:integer

Page 60: Fazendo mágica com ElasticSearch

“amazon”ElasticSearch

search = Post.search( query: { match: { _all: "amazon", } })

Full-text search

search.results.results[0]._score => 0.8174651

Page 61: Fazendo mágica com ElasticSearch

“amazon”ElasticSearch

search = Post.search( query: { custom_score: { query:{ match: { _all: "amazon", } }, script: "_score * doc['likes'].value" } })

Full-text search

Likes influenciam no score

search.results.results[0]._score => 31.8811388

Page 62: Fazendo mágica com ElasticSearch

GET http://localhost:9200/post/_search?explain"_explanation": {! "description": "weight(tweet:honeymoon in 0)! [PerFieldSimilarity], result of:",! "value": 0.076713204,! "details": [! {! "description": "fieldWeight in 0, product of:",! "value": 0.076713204,! "details": [! {! "description": "tf(freq=1.0), with freq of:",! "value": 1,! "details": [! {! "description": "termFreq=1.0",! "value": 1! }! ]! },! {! "description": "idf(docFreq=1, maxDocs=1)",! "value": 0.30685282! },! {! "description": "fieldNorm(doc=0)",! "value": 0.25,! }! ]! }! ]!}

Score explicado

Page 63: Fazendo mágica com ElasticSearch

4) Indexando responses

Page 64: Fazendo mágica com ElasticSearch

$ rails g scaffold Post title:string! source:string likes:integer

Page 65: Fazendo mágica com ElasticSearch

class PostsController < ApplicationController!! # ...!! def show! @post = Post.find(params[:id])!! render json: @post! end!! # ...!!end

SELECT * FROM Posts WHERE id = params[:id]

Page 66: Fazendo mágica com ElasticSearch

class PostsController < ApplicationController!! # ...!! def show! @post = Post.search(query: { match: { id: params[:id] }})!! render json: @post! end!! # ...!!end

GET http://localhost:9200/posts/posts/params[:id]

Page 67: Fazendo mágica com ElasticSearch

ActiveRecords

require 'elasticsearch/model'!!class Post < ActiveRecord::Base! include Elasticsearch::Model! include Elasticsearch::Model::Callbacks!! belongs_to :author!! def as_indexed_json(options={})! self.as_json(! include: { author: { only: [:name, :bio] },! })! end!end Inclui um parent no JSON indexado

Page 68: Fazendo mágica com ElasticSearch

Expondo o ElasticSearch

Page 69: Fazendo mágica com ElasticSearch
Page 70: Fazendo mágica com ElasticSearch

http://localhost:9200/pagarme/_search

https://api.pagar.me/1/search

Page 71: Fazendo mágica com ElasticSearch

Infraestrutura do Pagar.me

Router

api.pagar.me

Servidor da API (Node.js)

ElasticSearchElasticSearch

MySQL (transações e dados relacionais)

MySQL (transações e dados relacionais)

MongoDB (dados de clientes e não relacionais)

Ambiente de testes (sandbox dos clientes)

Ambiente de produção

Servidor da API (Node.js)

Page 72: Fazendo mágica com ElasticSearch

Expondo o ElasticSearch

• Endpoint do ElasticSearch -> Endpoint acessado pelo cliente…

• … mas cuidado: dados precisam ser delimitados a conta do cliente (claro)

• Vantagem: acesso às mesmas features do ElasticSearch (aggregations, statistics, scores, etc)

• Segurança: desabilitar scripts do ElasticSearch

Page 73: Fazendo mágica com ElasticSearch

GET /search

• Um único endpoint para todos os GETs

• Todos os dados indexados e prontos para serem usados (no joins)

• Queries complexas construídas no front-side (Angular.js)

• Desenvolvimento front-end não dependente do back-end

Page 74: Fazendo mágica com ElasticSearch
Page 75: Fazendo mágica com ElasticSearch
Page 76: Fazendo mágica com ElasticSearch

Overall…

Page 77: Fazendo mágica com ElasticSearch
Page 78: Fazendo mágica com ElasticSearch

1)Há uma ferramenta para cada tarefa.

2)Um martelo é sempre a ferramenta certa.

3)Toda ferramenta também é um martelo.

Page 79: Fazendo mágica com ElasticSearch

MySQL

!=

NoSQL

!=

ElasticSearch

Page 80: Fazendo mágica com ElasticSearch

PEDROFRANCESCHI @pedroh96

[email protected] github.com/pedrofranceschi

Obrigado! :)

Page 81: Fazendo mágica com ElasticSearch

PEDROFRANCESCHI @pedroh96

[email protected] github.com/pedrofranceschi

Perguntas?

Page 82: Fazendo mágica com ElasticSearch

PEDROFRANCESCHI @pedroh96

[email protected] github.com/pedrofranceschi

Fazendo mágica com ElasticSearch