Elasticsearch Overview


“Elasticsearch is open source cross-platform developed completely in Java. It’s built on top of Lucene which provide full text search on high volumes of data quickly and easily do analysis based on indexing. It is schema free and provide NRT(Near real Time) search results.”

Advantage of Elasticsearch

Full Text Search 

Elasticserach built on top of Lucene which provide full-featured  library to search full-text on any open source.

Schema Free 

Elasticsearch stores documents in JSON format and based on it detects words and type to make it searchable.

Restful API 

Elastisearch is easily accessible over browser by using URL and also support for Restful API to perform Operation. For read more on Elasticsearch REST follow link for Elasticsearch REST JAVA API Overview.

Operation Persistence

Elasticsearch cluster keep records of all transaction level changes for schema if anything get change in data for index and track of availability of Nodes in cluster so that make data easily available if any fail-over of any node.

Area of use Elasticsearch?

  • It’s useful in application where need to do analysis, statics and need to find out anomalies on data based on pattern.
  • It’s useful where need to send alerts when particular condition matched like stock market, exception from logs etc.
  • It’s useful with application where log analysis and issue solution provide because of full search in billions of records in milliseconds.
  • It’s compatible with application like Filebeat, Logstash and Kibana for storage of high volume data for analysis and visualize in form of chart and dashboards.

Basic Concepts and Terminology

Cluster

Cluster is a collection of one or more nodes which provide capabilities to search text on scattered data on nodes. It’s identified by unique name with in network so that all associated nodes will join together by cluster name.

For more info on Cluster configuration and query follow link Elasticsearch Cluster.

Elasticsearch Cluster

Elasticsearch Cluster

In above screen elasticsearch cluster “FACING_ISSUE_IN_IT” having three master and four data node.

Node

Node is a Elasticsearch server which associate with a cluster. It’s store data , help cluster for indexing data and search query. It’s identified by unique name in Cluster if name is not provided elasticsearch will generate random Universally Unique Identifier(UUID) on time of server start.

A Cluster can have one or more Nodes .If first node start that will have Cluster with single node and when other node will start will add with that cluster.

For more info on Node Configuration, Master Node, Data Node, Ingest node follow link Elasticsearch Node.

Data Node storage

Data Node Documents Storage

In above screen trying to represent data of two indexes like I1 and I2.Where Index I1 is having two type of documents T1 and T2 while index I2 is having only type T2 and these shards are distributes over all nodes in cluster. This data node is having documents of shard (S1) for  Index I1 and shard (S3) for Index I2. It’s also keeping replica of documents of shards S2 of Index I2 and I1 which are store some other nodes in cluster.

Index

An Index is collection of documents with same characteristics which stores on nodes in distributed fashion and its identify by unique name on which perform different operation like search query, update and delete for documents. A cluster can have as many indexes with unique name.

A document store in Index and assigned a type to it and an Index can have multiple types of documents.

For more info on Index Creation, Mapping Template , CRUD follow link Elasticsearch Index.

Shards

Shards are partitions of indexes scattered on nodes. It provide capability to store large amount (billions) of documents for same index to store in cluster even one disk of node is not capable to store it.

Replica

Replica is copy of shard which store on different node. A shard can have zero or more replica. If shard on one node then replica of shard will store on another node.

Benefits of Shards and Replica

  • Shards splits indexes in horizontal partition for high volumes of data.
  • It perform operations parallel to each shards or replica on multiple node for index so that increase system performance and throughput.
  • Recovered easily in case of fail-over of node because data replica exist on another node because replica always store on different node where shards exist.

Note

When we create index by default elasticseach index configure as 5 shards and 1 replica but we can configure it from config/elasticsearch.yml file or by passing shards and replica values in mapping when index create.

Once index created we can’t change shards configuration but modify in replica. If need to update in shards only option is re-indexing.

Each Shard itself a Lucene index and it can keep max 2,147,483,519 (= Integer.MAX_VALUE – 128) documents. For merging of search results and failover taken care by elasticsearch cluster.

For more info on elasticsearch Shards and Replica follow Elasticsearch Shards and Replica configuration.

Document

Each Record store in index is called a document which store in JSON object. JSON data exchange is fast over internet and easy to handle on browser side display.

Read More

To read more on Elasticsearch Configuration, Sample Elasticsearch REST Clients, Search Queries Types with example follow link Elasticsearch Tutorial and Elasticsearch Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

About Saurabh Gupta

My Name is Saurabh Gupta, I have approx. 10 Year of experience in Information Technology World manly in Java/J2EE. During this time I have worked with multiple organization with different client, so many technology, frameworks etc.
This entry was posted in Elasticsearch, ELK, REST and tagged , , , , , , , , , , , , , , , , . Bookmark the permalink.