Click here to load reader

Intro to elasticsearch

  • View

  • Download

Embed Size (px)

Text of Intro to elasticsearch

Your Data, Your Search !2016-06-27

OutlineInformation retrievalIndexing & SearchingElasticsearch

Information retrievalInformation Retrieval(IR) is finding material(usually documents) of an unstructured nature(usually text) that statisfies an information need from within large collections(usually stored on computers).

Search Engineis a software system that is designed to search for information. Its a kind of implementation of IR.

What is search engine?A search engine isAn index engine for documentsA search engine on indexes A search engine is more powerful to do searches:Its designed for it !

Search Engine Architecture

Problems ??How to store the data ?How to index the data ?How to search the data ?

How to store the data ?Inverted List

How to

the data ?INDEX

the follow two filesFile1: Students should be allowed to go out with their friends, but not allowed to drink beer.

File2: My friend Jerry went to school to see his students but found them drunk which is not allowed.

Step 1: TokenzierSplit doc into wordsRemove the punctuationRemove stop word (the, a, this, that etc.)


Step2: Linguistic Processor LowercaseStemming, cars -> car, etc.Lemmatizatio, drove -> drive, etc.



Step3: IndexTermDocument IDstudent 1allow 1go 1their 1friend 1allow 1

DictSortPosting list

How to

the data ?SEARCH

Step1: User search querySuppose you have the follow query

lucene AND learned NOT hadoop

Step2: Lexical & Syntax AnalysisIdentify words and keywordsWords: lucene, learned, hadoopKeywords: AND, NOTBuilding a syntax tree






Step3: SearchSearch in the Inverted ListSort, Conjunction, DisconjunctionScorer

full text searchRESTful APIreal time,Search andanalytics engineopen sourcehigh availabilityschema freeJSON over HTTPLucene baseddistributedRESTful APIElasticSearch

Elastic SearchDistributed and Highly Available Search Engine.Each index is fully sharded with a configurable number of shards.Each shard can have one or more replicas.Read / Search operations performed on either one of the replica shard.Multi Tenant with Multi Types.Support for more than one index.Support for more than one type per index.Index level configuration (number of shards, index storage, ...).Document orientedNo need for upfront schema definition.Schema can be defined per type for customization of the indexing process.Various set of APIsHTTP RESTful APINative Java API.All APIs perform automatic node operation rerouting. (Near) Real Time Search.Reliable, Asynchronous Write Behind for long term persistency.Built on top of LuceneEach shard is a fully functional Lucene indexAll the power of Lucene easily exposed through simple configuration / plugins.Per operation consistencySingle document level operations are atomic, consistent, isolated and durable.Open Source under the Apache License, version 2 ("ALv2")

Terminologies of Elastic SearchClusterNodeIndexShard


A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes

A cluster is identified by a unique name which by default is "elasticsearch"Terminologies of Elastic Search


It is an elasticsearch instance (a java process)

A node is created when a elasticsearch instance is started

A random Marvel Charater name is allocated by defaultTerminologies of Elastic Search


An index is a collection of documents that have somewhat similar characteristics. eg:customer data, product catalog

Very crucial while performing indexing, search, update, and delete operations against the documents in it

One can define as many indexes in one single clusterTerminologies of Elastic Search


It is the most basic unit of information which can be indexed

It is expressed in json (key:value) pair. {user:nullcon}

Every Document gets associated with a type and a unique id.Terminologies of Elastic Search


Every index can be split into multiple shards to be able to distribute data.The shard is the atomic part of an index, which can be distributed over the cluster if you add more nodes.Terminologies of Elastic Search

A terminology comparison

Relational databaseElasticsearchDatabaseIndexTableTypeRowDocumentColumnFieldSchemaMappingIndexEverything is indexedSQLQuery DSLSELECT * FROm tb GET http://UPDATE tb SET PUT http://

Playing with Elasticsearch

REST API: http://host:port/[index]/[type]/[_action/id]HTTP Methods: GET, POST,PUT,DELETE

Playing with ElasticsearchSearchcurl XGET http://localhost:9200/my_index/test/_searchcurl XGET http://localhost:9200/my_index/_searchcurl XPUT http://localhost:9200/_searchMeta Datacurl XPUT http://localhost:9200/my_index/_statusDocuments:curl XPUT http://localhost:9200/my_index/test/1curl XGET http://localhost:9200/my_index/test/1curl XDELETE http://localhost:9200/my_index/test/1

Example: IndexCurl XPUT http://localhost:9200/my_index/test/1 -d { "name": "joeywen", "value": 100}

Example: SearchCurl XGET http://localhost:9200/my_index/_search d { query: { match_all: {} }}

Total number of docsRelevanceSearch timeMax score

Creating, indexing, or deleting a single document