Click here to load reader

Fazendo mágica com ElasticSearch

  • View
    1.315

  • Download
    3

Embed Size (px)

DESCRIPTION

Quando uma aplicação começa a ficar grande e complexa, fazer buscas nos seus models torna-se uma tarefa complicada. Efetuar as buscas diretamente no banco de dados é um processo lento, ineficiente e que permite pouca ou nenhuma maleabilidade sobre a forma com que a busca é feita. Surge então o ElasticSearch, uma engine de busca utilizada por empresas como Github, Twitter e 4square para indexar e buscar literalmente milhões de documentos em tempo real. Nessa palestra, explicarei quando, como e porque utilizar o ElasticSearch para facilmente indexar e efetuar buscas complexas nos seus models.

Text of Fazendo mágica com ElasticSearch

2. Outubro/2010 3. FiltersFull text searchSortHighlightFacetsPagination 4. Voc vai precisar buscar dados. 5. Voc vai precisar entender dados. 6. (My)SQL no a soluo.( nem NoSQL) 7. O que o ElasticSearch? 8. ElasticSearch Open Source Distributed Real Time Search & Analytics API RESTful para indexar/buscar JSONs (NoSQL) NO um banco de dados Apache Lucene Just works (and scales) Full text search, aggregations, scripting, etc, etc, etc. 9. Nomes?MySQL ElasticSearchDatabase IndexTable TypeRow DocumentColumn FieldSchema MappingPartition Shard 10. Como usar o ElasticSearch? 11. $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{!"user" : pedroh96",!"post_date" : "2009-11-15T14:12:12",!"message" : "trying out Elasticsearch"!}'Endpoint Index TypeDocumentIDDocument{!"_index" : "twitter",!"_type" : "tweet",!"_id" : "1",!"_version" : 1,!"created" : true!}PUT data 12. Endpoint Index Type$ curl -XGET 'http://localhost:9200/twitter/tweet/1'DocumentID{!"_id": "1",!"_index": "twitter",!"_source": {!"message": "trying out Elasticsearch",!"post_date": "2009-11-15T14:12:12",!"user": "pedroh96"!},!"_type": "tweet",!"_version": 1,!"found": true!}DocumentGET data 13. GET data Endpoint Index$ curl -XGET 'http://localhost:9200/twitter/_search'!-d { query: . . . }!!!Query de busca!!!!!!!Operadorde busca 14. ActiveRecordsclass Tweet < ActiveRecord::Base!end 15. ActiveRecordsrequire 'elasticsearch/model'!!class Tweet < ActiveRecord::Base!include Elasticsearch::Model!include Elasticsearch::Model::Callbacks!end!! 16. Tweet.import 17. Tweet.search(pedroh96) 18. Por que usar o ElasticSearch? 19. DISCLAIMER 20. Post.where(:all, :author => "pedroh96")vsPost.search(query: { match: { author: "pedroh96" }})Just Another Query Language? 21. 1) Full text search 22. ActiveRecords$ rails g scaffold Post title:string!source:string 23. GET /posts/5Post.find(5):-)ActiveRecords 24. ActiveRecordsAmazon to Buy Video Site Twitch for More Than $1BPost.where(:all, :title => "Amazon to BuyVideo Site Twitch for More Than $1B"):-) 25. amazonPost.where(["title LIKE ?", "%Amazon%"])???ActiveRecords 26. amazon source:online.wsj.comPost.where(["title LIKE ? AND source = ?","%Amazon%", "online.wsj.com"])??????ActiveRecords 27. amazonPost.search("amazon"):-)ElasticSearch 28. ElasticSearchamazon source:online.wsj.comsearch = Post.search("amazon source:online.wsj.com"):-) 29. ElasticSearchamazon source:online.wsj.comsearch = Post.search(query:{match: {_all: "amazon source:online.wsj.com",}})Full-text search 30. ElasticSearchamazon source:online.wsj.comsearch = Post.search(query:{multi_match: {query: "amazon source:online.wsj.com",fields: ['title^10', 'source']}})Full-text searchTitle boost 31. ElasticSearchamazon source:online.wsj.comsearch = Post.search(query:{multi_match: {query: "amazon source:online.wsj.com",fields: ['title^10', 'source']}},highlight: {fields: {title: {}}})Title highlightFull-text searchTitle boost 32. ElasticSearchTitle highlight> search.results[0].highlight.title=> ["Twitch officially acquired by Amazon"] 33. 2) Aggregations (faceting) 34. Geo distance aggregation 35. ActiveRecords$ rails g scaffold Coordinatelatitude:decimal longitude:decimal 36. ActiveRecordsclass Coordinate < ActiveRecord::Base!end 37. ActiveRecordsclass Coordinate < ActiveRecord::Base!def distance_to(coordinate)!# From http://en.wikipedia.org/wiki/Haversine_formula!rad_per_deg = Math::PI/180 # PI / 180!rkm = 6371 # Earth radius in kilometers!rm = rkm * 1000 # Radius in meters!!dlon_rad = (coordinate.longitude.to_f - self.longitude.to_f) * rad_per_deg # Delta, converted to rad!dlat_rad = (coordinate.latitude.to_f - self.latitude.to_f) * rad_per_deg!!lat1_rad = coordinate.latitude.to_f * rad_per_deg!lat2_rad = self.latitude.to_f * rad_per_deg!lon1_rad = coordinate.longitude.to_f * rad_per_deg!lon2_rad = self.longitude.to_f * rad_per_deg!!a = Math.sin(dlat_rad/2)**2 + Math.cos(lat1_rad) * Math.cos(lat2_rad) * Math.sin(dlon_rad/2)**2!c = 2 * Math::atan2(Math::sqrt(a), Math::sqrt(1-a))!!rm * c # Delta in meters!end!end> c1 = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908)> c2 = Coordinate.new(:latitude => -23.5538488, :longitude => -46.6530035)> c1.distance_to(c2)=> 66.07749735875552 38. ActiveRecordsorigin = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908)buckets = [!{!:to => 100,!:coordinates => []!},!{!:from => 100,!:to => 300,!:coordinates => []!},!{!:from => 300,!:coordinates => []!}!]!Coordinate.all.each do |coordinate|!distance = origin.distance_to(coordinate)!!buckets.each do |bucket|!if distance < bucket[:to] and distance > (bucket[:from] || 0)!bucket[:coordinates] search.response.aggregations.grades_stats!!=> #> 44. (Extended) stats aggregation+Scripting 45. ElasticSearchquery = {!aggregations: {!grades_stats: {!extended_stats: {!field: "grade",!}!}!}!} 46. ElasticSearchquery = {!aggregations: {!Nome da aggregationgrades_stats: {!extended_stats: {!field: "grade",!script: "_value < 7.0 ? _value * correction : _value",!params: {!correction: 1.2!}!}!}!}!}!!search = Grade.search(query)Nome do fieldJavaScript paracalcular novo gradeTipo da aggregation 47. ElasticSearch> search.response.aggregations.grades_stats!!=> #> 48. Term aggregation 49. ElasticSearchquery = {!aggregations: {!subjects: {!terms: {!Nome da aggregationfield: "subject"!}!}!}!}!!search = Grade.search(query)Nome do fieldTipo da aggregation 50. ElasticSearch> search.response.aggregations.subjects!!=> #,!#,#!]> 51. Combined aggregations(term + stats) 52. ElasticSearchquery = {!aggregations: {!subjects: {!terms: {!field: "subject"!}!}!}!}!!search = Grade.search(query) 53. ElasticSearchquery = {!aggregations: {!subjects: {!terms: {!Nome da parent aggregationfield: "subject"!},!aggregations: {!grade_stats: {!stats: {!Nome da child aggregationfield: "grade"!}!}!}!}!}!}!!search = Grade.search(query)Field para parentaggregationField para childaggregation 54. ElasticSearch> search.response.aggregations.subjects!!# key="math">,# key="grammar">,# key=physics">!]> 55. Top HitsMore like thisHistogramScripted metricsGeo boundsStemmer (sinnimos)IPv4 ranges. . . 56. 3) Scoring 57. ActiveRecords$ rails g scaffold Post title:string!source:string likes:integer 58. amazonElasticSearchsearch = Post.search(query: {match: {_all: "amazon",}})Full-text searchsearch.results.results[0]._score=> 0.8174651 59. amazonElasticSearchsearch = Post.search(query: {custom_score: {query:{match: {_all: "amazon",}},script: "_score * doc['likes'].value"}})Full-text searchLikes influenciam no scoresearch.results.results[0]._score=> 31.8811388 60. GET http://localhost:9200/post/_search?explain"_explanation": {!"description": "weight(tweet:honeymoon in 0)![PerFieldSimilarity], result of:",!"value": 0.076713204,!"details": [!{!"description": "fieldWeight in 0, product of:",!"value": 0.076713204,!"details": [!{!"description": "tf(freq=1.0), with freq of:",!"value": 1,!"details": [!{!"description": "termFreq=1.0",!"value": 1!}!]!},!{!"description": "idf(docFreq=1, maxDocs=1)",!"value": 0.30685282!},!{!"description": "fieldNorm(doc=0)",!"value": 0.25,!}!]!}!]!}Score explicado 61. 4) Indexando responses 62. $ rails g scaffold Post title:string!source:string likes:integer 63. class PostsController < ApplicationController!!# ...!!def [email protected] = Post.find(params[:id])!!render json: @post!end!!# ...!!endSELECT * FROM Posts WHERE id = params[:id] 64. class PostsController < ApplicationController!!# ...!!def [email protected] = Post.search(query: { match: { id: params[:id] }})!!render json: @post!end!!# ...!!endGET http://localhost:9200/posts/posts/params[:id] 65. ActiveRecordsrequire 'elasticsearch/model'!!class Post < ActiveRecord::Base!include Elasticsearch::Model!include Elasticsearch::Model::Callbacks!!belongs_to :author!!def as_indexed_json(options={})!self.as_json(!include: { author: { only: [:name, :bio] },!})!end!end Inclui um parent no JSON indexado 66. Expondo o ElasticSearch 67. http://localhost:9200/pagarme/_searchhttps://api.pagar.me/1/search 68. Infraestrutura do Pagar.meElasticSearch ElasticSearchRouterapi.pagar.meServidor da API(Node.js)MySQL(transaes e dados relacionais)MySQL(transaes e dados relacionais)MongoDB(dados de clientes e no relacionais)Ambiente de testes(sandbox dos clientes)Servidor da API(Node.js)Ambiente de produo 69. Expondo o ElasticSearch Endpoint do ElasticSearch -> Endpoint acessado pelocliente mas cuidado: dados precisam ser delimitados aconta do cliente (claro) Vantagem: acesso s mesmas features doElasticSearch (aggregations, statistics, scores, etc) Segurana: desabilitar scripts do ElasticSearch 70. GET /search Um nico endpoint para todos os GETs Todos os dados indexados e prontos para seremusados (no joins) Queries complexas construdas no front-side(Angular.js) Desenvolvimento front-end no dependente doback-end 71. Overall 72. 1)H uma ferramenta para cada tarefa.2)Um martelo sempre a ferramenta certa.3)Toda ferramenta tambm um martelo. 73. MySQL!=NoSQL!=ElasticSearch 74. Obrigado! :)[email protected]@pagar.megithub.com/pedrofranceschi 75. [email protected]@pagar.megithub.com/pedrofranceschi 76. Fazendo mgica [email protected]@pagar.megithub.com/pedrofranceschi