How to Configure Filebeat, Kafka, Logstash Input , Elasticsearch Output and Kibana Dashboard


Filebeat, Kafka, Logstash, Elasticsearch and Kibana Integration is used for big organizations where applications deployed in production on hundreds/thousands of servers and scattered around different locations and need to do analysis on data from these servers on real time.

This integration helps mostly for log level analysis , tracking issues, anomalies with data and alerts on events of particular occurrence and where accountability measures.

By using these technology provide scalable architecture to enhance systems and decoupled of each other individually.

Why these Technology?

Filebeat :

  • Lightweight agent for shipping logs.
  • Forward and centralize files and logs.
  • Robust (Not miss a single beat)

Kafka:

  • Open source distributed, Steam Processing, Message Broker platform.
  • process stream data or transaction logs on real time.
  • fault-tolerant, high throughput, low latency platform for dealing real time data feeds.

Logstash:

  •  Open source, server-side data processing pipeline that accept data from a different  sources simultaneously.
  • Parse, Format, Transform data and send to different output sources.

Elasticsearch:

  • Elasticsearch is open source, distributed cross-platform.
  • Built on top of Lucene which provide full text search and provide NRT(Near real Time) search results.
  • Support RESTFUL search  by Elasticsearch REST

Kibana:

  • Open source
  • Provide window to view Elasticsearch data in form different charts and dashboard.
  • Provide way  searches and operation of data easily with respect to time interval.
  • Easily Imported by  any web application by embedded dashboards.

How Data flow works ?

In this integration filebeat will install in all servers where your application is deployed and filebeat will read and ship  latest logs changes from these servers to Kafka topic as configured for this application.

Logstash will subscribe log lines from kafka topic and perform parsing on these lines make relevant changes, formatting, exclude and include fields then send this processed data to Elasticsearch Indexes as centralize location from different servers.

Kibana  is linked with  Elasticsearch indexes which will help to do analysis by search, charts and dashboards .

FKLEK Integration

Design Architecture

In below configured architecture considering my application is deployed on three servers and each server having current log file name as App1.log . Our goal is read real time data from these servers and do analysis on these data.

FKLEK Arch Integration

Steps to Installation, Configuration and Start

Here first we will install Kafka and Elasticsearch run individually rest of tools will install and run sequence to test with data flow.  Initially install all in same machine  and test with sample data with below steps and at end of this post will tell about what changes need to make according to your servers.

  • Kafka Installation, Configuration and Start
  • Elasticsearch Installation,Configuration and Start
  • Filebeat Installation,Configuration and Start
  • Logstash Installation,Configuration and Start
  • Kibana Installation,Start and display.

Pre-Requisite

These Filebeat,Logstash, Elasticsearch and Kibana versions should be compatible better use latest from  https://www.elastic.co/downloads.

  • Java 8+
  • Linux Server
  • Filebeat 5.XX
  • Kafka 2.11.XX
  • Logstash 5.XX
  • Elasticsearch 5.XX
  • Kibana 5.XX

Note  : Make sure JDK 8 should be install  and JAVA_HOME environment variable point to JDK 8 home directory  wherever you want in install Elasticsearch, Logstash,Kibana and Kafka.

Window   : My computer ->right click-> Properties -> Advance System Settings->System Variable

Java_Home

Set JAVA_HOME

Linux : Go to your home directory/ sudo directory and below line as below .

export JAVA_HOME=/opt/app/facingissuesonit/jdk1.8.0_66

Sample Data

For testing we will use these sample log line which is having debug as well as stacktrace of logs and grok parsing of this example is designed according to it. For real time testing and actual data you can point to your server log files but you have to modify grok pattern in Logstash configuration accordingly.

2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}
2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}
2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}

Create  App1.log file  in same machine where filebeat need to install and copy above logs lines in App1.log file.

Kafka Installation , Configuration and Start

Download latest version of Kafka from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://kafka.apache.org/downloads

tar -zxvf kafka_2.11-0.10.0.0

For more configuration and start options follow Setup Kafka Cluster for Single Server/Broker

After download and untar/unzip file it will have below files and directory structure.

ls- l
drwxr-xr-x  3 facingissuesonit Saurabh   4096 Apr  3 05:18 bin
drwxr-xr-x  2 facingissuesonit Saurabh   4096 May  8 11:05 config
drwxr-xr-x 74 facingissuesonit Saurabh   4096 May 27 20:00 kafka-logs
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr  3 05:17 libs
-rw-r--r--  1 facingissuesonit Saurabh  28824 Apr  3 05:17 LICENSE
drwxr-xr-x  2 facingissuesonit Saurabh 487424 May 27 20:00 logs
-rw-r--r--  1 facingissuesonit Saurabh    336 Apr  3 05:18 NOTICE
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr  3 05:17 site-docs

For more details about all these files,configuration option and other integration options follow Kafka Tutorial.

Make below changes in files config/zookeeper.properties and config/server.properties

config/zookeeper.properties

clientPort=2181
config/server.properties:

broker.id=0
listeners=PLAINTEXT://:9092
log.dir=/kafka-logs
zookeeper.connect=localhost:2181

Now Kafka is configured and ready to run. Use below command to start zookeeper and Kafka server as  background process.

screen -d -m bin/zookeeper-server-start.sh config/zookeeper.properties
screen -d -m bin/kafka-server-start.sh config/server.properties

To test  Kafka  install successfully you can check by running Kafka process on Linux “ps -ef|grep kafka” or steps for consumer and producer to/from topic in Setup Kafka Cluster for Single Server/Broker.

Elasticsearch Installation,Configuration and Start

Download latest version of Elasticsearch from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/elasticsearch

tar -zxvf elasticsearch-5.4.0.tar.gz

It will show below files and directory structure for Elasticsearch.

drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 25 19:20 bin
drwxr-xr-x  3 facingissuesonit Saurabh   4096 May 13 17:27 config
drwxr-xr-x  3 facingissuesonit Saurabh   4096 Apr 24 15:56 data
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 17 10:55 lib
-rw-r--r--  1 facingissuesonit Saurabh  11358 Apr 17 10:50 LICENSE.txt
drwxr-xr-x  2 facingissuesonit Saurabh   4096 May 28 05:00 logs
drwxr-xr-x 12 facingissuesonit Saurabh   4096 Apr 17 10:55 modules
-rw-r--r--  1 facingissuesonit Saurabh 194187 Apr 17 10:55 NOTICE.txt
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 17 10:55 plugins
-rw-r--r--  1 facingissuesonit Saurabh   9540 Apr 17 10:50 README.textile

Before going to start Elasticsearch need to make some basic changes in config/elasticsearch.yml file for cluster  and node name. You can configure it based on you application or organization name.

cluster.name: FACING-ISSUE-IN-IT
node.name: TEST-NODE-1
#network.host: 0.0.0.0
http.port: 9200

Now we are ready with elasticsearch configuration and time start elasticsearch. We can use below command to run elasticsearch in background.

screen -d -m  /bin/elasticsearch

For  checking elasticsearch starts successfully you can use below url on browser  to know cluster status . You will get result like below.

http://localhost:9200/_cluster/health?pretty

or as below if network.host configured

http://elasticseverIp:9200/_cluster/health?pretty

Result :

{
  "cluster_name" : "FACING-ISSUE-IN-IT",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Filebeat Installation, Configuration and Start

Download latest version of filebeat from  below link and use  command to untar  and installation in Linux server. or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/beats/filebeat

tar -zxvf filebeat-<version>.tar.gz

For more configuration and start options follow Filebeat Download,Installation and Start/Run

After download and untar/unzip file it will have below files and directory structure.

ls- l
-rwxr-xr-x 1 facingissuesonit Saurabh 14908742 Jan 11 14:11 filebeat
-rw-r--r-- 1 facingissuesonit Saurabh    31964 Jan 11 14:11 filebeat.full.yml
-rw-r--r-- 1 facingissuesonit Saurabh     3040 Jan 11 14:11 filebeat.template-es2x.json
-rw-r--r-- 1 facingissuesonit Saurabh     2397 Jan 11 14:11 filebeat.template.json
-rw-r--r-- 1 facingissuesonit Saurabh     4196 Jan 11 14:11 filebeat.yml
-rw-r--r-- 1 facingissuesonit Saurabh      811 Jan 11 14:10 README.md
drwxr-xr-x 2 facingissuesonit Saurabh     4096 Jan 11 14:11 scripts

For more details about all these files,configuration option and other integration options follow Filebeat Tutorial.

Now filebeat is installaed and need to make below changes in filebeat.full.yml file

  • Inside prospectors section change paths to your log file location as
paths:
-/opt/app/facingissuesonit/App1.log
  • Comment out Elasticsearch Output default properties as below
#output.elasticsearch:
#hosts: ["localhost:9200"]
  • Configure multiline option as below so that all stacktrace line which are not starting with date  can we consider as single line.
multiline.pattern: ^\d
multiline.negate: true
multiline.match: after

For learn more on filebeat multiline configuration follow Filebeat Multiline Configuration Changes for Object, StackTrace and XML

  • Inside Kafka Output section update these properties hosts and topic. if Kafka on same machine then use localhost else update with IP of kafka machine.
output.kafka:
 hosts: ["localhost:9092"]
 topic: APP-1-TOPIC

For more on Logging configuration follow link Filebeat, Logging Configuration.

Now filebeat is configured and ready to start with  below command, it will read from configured prospector for file App1.log continiously and publish log line events to Kafka . It will also create topic as APP-1-TOPIC in Kafka if not exist.

./filebeat -e -c filebeat.full.yml -d "publish"

On console it will display output as below for sample lines.

2017/05/28 00:24:27.991828 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
  "offset": 194,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991907 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
  "offset": 375,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991984 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
  "offset": 718,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991984 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.992Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}",
  "offset": 902,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}

Now you can see from above filebeat debug statements publish event 3 is having multiline statements with stacktrace exception and each debug will have these fields like.

@timestamp:  Timestamp of data shipped.

beat.hostname : filebeat machine name from where data is shipping.

beat.version: which version of filebeat installed on server that help for compatibility check on target end.

message : Log line from logs file or multline log lines

offset: it’s represent inode value in source file

source :  it’s file name from where logs were read

Now time to check data is publish to Kafka topic or not. For this go to below directory  and you will see two files as xyz.index and xyz.log for maintaining data offset and messages.

{Kafka_home}/kafk_logs/APP-1-TOPIC
          00000000000000000000.log
          00000000000000000000.index

Now your server log lines are in Kafka topic for reading and parsing  by Logstash and send it to elasticsearch for doing analysis/search on this data.

Logstash Installation, Configuration and Start

Download latest version of Logstash from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/logstash

tar -zxvf logstash-5.4.0.tar.gz

It will show below file and directory structure.

drwxr-xr-x 2 facingissuesonit Saurabh   4096 Apr 20 11:27 bin
-rw-r--r-- 1 facingissuesonit Saurabh 111569 Mar 22 23:49 CHANGELOG.md
drwxr-xr-x 2 facingissuesonit Saurabh   4096 Apr 20 11:27 config
-rw-r--r-- 1 facingissuesonit Saurabh   2249 Mar 22 23:49 CONTRIBUTORS
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 12:07 data
-rw-r--r-- 1 facingissuesonit Saurabh   3945 Mar 22 23:55 Gemfile
-rw-r--r-- 1 facingissuesonit Saurabh  21544 Mar 22 23:49 Gemfile.jruby-1.9.lock
drwxr-xr-x 5 facingissuesonit Saurabh   4096 Apr 20 11:27 lib
-rw-r--r-- 1 facingissuesonit Saurabh    589 Mar 22 23:49 LICENSE
drwxr-xr-x 2 facingissuesonit Saurabh   4096 May 21 00:00 logs
drwxr-xr-x 4 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-event-java
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-plugin-api
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-queue-jruby
-rw-r--r-- 1 facingissuesonit Saurabh  28114 Mar 22 23:56 NOTICE.TXT
drwxr-xr-x 4 facingissuesonit Saurabh   4096 Apr 20 11:27 vendor

Before going to start Logstash need to create configuration file for taking input data from Kafka and parse these data in respected fields and send it elasticsearch. Create file logstash-app1.conf in logstash bin directory with below content.

/bin/logstash-app1.conf

input {
     kafka {
            bootstrap_servers => 'localhost:9092'
            topics => ["APP-1-TOPIC"]
            codec => json {}
          }
}
filter
{
//parse log line
      grok
	{
	match => {"message" => "\A%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:loglevel}\s+(?<logger>(?:[a-zA-Z0-9-]+\.)*[A-Za-z0-9$]+)\s+(-\s+)?(?=(?<msgnr>[A-Z]+[0-9]{4,5}))*%{DATA:message}({({[^}]+},?\s*)*})?\s*$(?<stacktrace>(?m:.*))?" }
	}  

    #Remove unused fields
    #mutate { remove_field =>["beat","@version" ]}
}
output {
    #Output result sent to elasticsearch and dynamically create array
    elasticsearch {
        index  => "app1-logs-%{+YYYY.MM.dd}"
        hosts => ["localhost:9200"]
        sniffing => false
  	}

     #Sysout logs
     stdout
       {
         codec => rubydebug
       }
}

To test your configuration file you can use below command.


./logstash -t -f logstash-app1.conf

If  we get result OK from above command run below to start reading and parsing data from Kafka topic.


./logstash -f logstash-app1.conf

For design your own grok pattern for you logs line formatting you can follow below link that will help to generate incrementally and also provide some sample logs grok.

http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/

Logstash console will show parse data as below  and you can remove unsed fields for storing in elasticsearch by uncomment mutate section from configuration file.

{
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 194,
      "loglevel" => "WARN",
        "logger" => "CreateSomethingActivationKey",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",
        "source" => "/opt/app/facingissuesonit/App1.log",
       "message" => [
        [0] "2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
        [1] "WhateverException for User 49-123-345678 "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 09:57:56,662"
}
{
         "msgnr" => "ERR1700",
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 375,
      "loglevel" => "INFO",
        "logger" => "LMLogger",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",
        "source" => "/opt/app/facingissuesonit/App1.log",
       "message" => [
        [0] "2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
        [1] "ERR1700 - u:null failures: 0  - Technical error "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 09:57:56,663"
}
{
        "offset" => 718,
        "logger" => "SomeCallLogger",
    "input_type" => "log",

       "message" => [
        [0] "2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
        [1] "ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  "
    ],
          "type" => "log",
         "msgnr" => "ESS10005",
    "@timestamp" => 2017-05-28T23:47:42.160Z,
    "stacktrace" => "\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
      "loglevel" => "ERROR",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
      "@version" => "1",
     "timestamp" => "2013-02-28 09:57:56,668"
}
{
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 903,
      "loglevel" => "INFO",
        "logger" => "EntryFilter",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",

       "message" => [
        [0] "2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}\n",
        [1] "Fresh on request /portalservices/foobarwhatever "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 10:04:35,723"
}

To test on elasticsearch end your data sent  successfully  you can use this url
http://localhost:9200/_cat/indices  on your browser and will display created index with current date.

yellow open app1-logs-2017.05.28                             Qjs6XWiFQw2zsiVs9Ks6sw 5 1         4     0  47.3kb  47.3kb

Kibana Installation, Configuration and Start

Download latest version of Kibana from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/kibana

tar -zxvf kibana-5.4.0.tar.gz

It will show below files and directory structure for kibana.

ls -l
drwxr-xr-x   2 facingissuesonit Saurabh   4096 May 22 14:23 bin
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 25 18:58 config
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 25 11:54 data
-rw-r--r--   1 facingissuesonit Saurabh    562 Apr 17 12:04 LICENSE.txt
drwxr-xr-x   6 facingissuesonit Saurabh   4096 Apr 17 12:04 node
drwxr-xr-x 485 facingissuesonit Saurabh  20480 Apr 17 12:04 node_modules
-rw-r--r--   1 facingissuesonit Saurabh 660429 Apr 17 12:04 NOTICE.txt
drwxr-xr-x   3 facingissuesonit Saurabh   4096 Apr 17 12:04 optimize
-rw-r--r--   1 facingissuesonit Saurabh    702 Apr 17 12:04 package.json
drwxr-xr-x   2 facingissuesonit Saurabh   4096 May 22 12:29 plugins
-rw-r--r--   1 facingissuesonit Saurabh   4909 Apr 17 12:04 README.txt
drwxr-xr-x  10 facingissuesonit Saurabh   4096 Apr 17 12:04 src
drwxr-xr-x   3 facingissuesonit Saurabh   4096 Apr 17 12:04 ui_framework
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 17 12:04 webpackShims

Before going to start Kibana need to make some basic changes in config/kibana.yml file make below changes after uncomment these properties file.

server.port: 5601
server.host: localhost
elasticsearch.url: "http://localhost:9200"

Now we are ready with Kibana configuration and time start Kibana. We can use below command to run Kibana in background.

screen -d -m  /bin/kibana

Kibana take time to start and we can test it by using below url in browser

http://localhost:5601/

For checking this data  in Kibana open above url in browser go to management tab on left side menu -> Index Pattern -> Click on Add New

Enter Index name or pattern and time field name as in below screen  and click on create button.

Kibana index setting

Index Pattern Settings

Now go to Discover Tab and select index as app1-log* will display data as below.

kibana discover data

Now make below changes according to  your application specification .

Filebeat :

  • update prospector path to your log directory current file
  •  Move Kafka on different machine because Kafka will single location where receive shipped data from different servers. Update localhost with same IP of kafka server in Kafka output section of filebeat.full.yml file  for hosts properties.
  • Copy same filebeat setup on all servers from where you application deployed and need to read logs.
  • Start all filebeat instances on each Server.

Elasticsearch :

  • Uncomment network.host properties from elasticsearch.yml file for accessing by  IP address.

Logstash:

  • Update localhost in logstash-app1.conf file input section with Kafka machine IP.
  • change grok pattern in filter section according to your logs format. You can take help from below url for incrementally design. http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/
  • Update localhost output section for elasticsearch with IP if moving on different machine.

Kibana:

  • update localhost in kibana.yml file for elasticsearch.url properties with IP if kibana on different machine.

Conclusion :

In this tutorial considers below points :

  • Installation of Filebeat, Kafka, Logstash, Elasticsearch and Kibana.
  • Filebeat is configured to shipped logs to Kafka Message Broker.
  • Logstash configured to read logs line from Kafka topic , Parse and shipped to Elasticsearch.
  • Kibana show these Elasticsearch information in form of chart and dashboard to users for doing analysis.

Read More

To read more on Filebeat, Kafka, Elasticsearch  configurations follow the links and Logstash Configuration,Input Plugins, Filter Plugins, Output Plugins, Logstash Customization and related issues follow Logstash Tutorial and Logstash Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Reference  :

 https://www.elastic.co/products

 

Posted in Elasticsearch, ELK, Example, Filebeat, JSON, Kafka, Kibana, Logstash, Zookeeper | Tagged , , , , , , , , | Leave a comment

Logstash , JDBC Input configuration tutorial with sql_last_value and tracking_column as numeric or timestamp


Logstash , JDBC Input Plug-in work like a adapter to send your database detail to Elasticsearch so that utilize for full text search, query, analysis and show in form of Charts and Dashboard to Kibana.

In below example I will explain about how to create Logstash configuration file by  using JDBC Input Plug-in for Oracle Database and output to Elasticsearch .

Logstash JDBC Input configuration for Elasticsearch Output

Pre-requisite:

Sample Data:

Below sample data is from defect_detail table where defect id as numeric value and increment continuously in ascending order.

defect_id	owned_by	severity	status	          summary	                  application	created_by	creation_date	modified_by	modified_date	assigned_to
530812		Ramesh      Severity 3	Cacelled	      Customer call 5 time 	      TEST-APP	    Saurabh	    7/3/2017 15:44	Gaurav	    8/19/2017 6:22	Development
530828	    Neha	    Severity 1	Cancelled	      Dealer Code Buyer on behalf TEST-APP-5	Rajan	    7/3/2017 16:20	Nilam	    8/17/2017 9:29	Development
540829	    Ramesh	    Severity 1	Retest Completed  Client Not want  Bulk call  TEST-APP-4	Rajiv	    7/24/2017 11:29	Raghav	    8/5/2017 20:00	IST

Configuration File :

Below configuration file is setup to read data from Oracle database , will execute query in every 15 minute and read records after last run value of defect id . We should always use order by for column for which need to use last run value as configured for defect_id having numeric column.

If you are using any other database like MYSQL, SQLServer, DB2 etc. change jdbc_driver_library and jdbc_connection_string according to database. Because every database have there own query format so update query accordinly.

Copy below content and create file in bin directory as /bin/logstash-jdbc-defect.conf

input
{
jdbc {
#Path to download jdbc deriver and add in class path
jdbc_driver_library ="../jar/ojdbc6.jar";
# ORACLE Driver Class
jdbc_driver_class ="Java::oracle.jdbc.driver.OracleDriver";
# ORACLE database jdbc connection string ,  jdbc:oracle:thin:@hostname:PORT/SERVICE
jdbc_connection_string ="jdbc:oracle:thin:@hostname:1521/service";
#The user and password to connect to database
jdbc_user ="username";
jdbc_password ="password";
#Use when need to read password from file
#jdbc_password_filepath ="/opt/app/password-path-location";
jdbc_paging_enabled ="true";
jdbc_page_size ="50000";
#Configure Cron to How frequent want execute query in database
schedule ="*/15 * * * *";
#Use below if query is big and want to store in separate file
#statement_filepath ="../query/remedy-tickets-details.sql"
#Use for Inline query and if want to execute record after last run compare with value sql_last_value that can be numeric or timestamp
statement ="select defect_id,owned_by,severity,status,summary,application,created_by,creation_date,modified_by,modified_date,assigned_to from defect_detail where defect_id>:sql_last_value order by defect_id"
#Below is configuration when want to use last run value
clean_run=true
use_column_value =true
tracking_column =defect_id
#Logstash by default consider last_sql_value as numeric if it's timestamp configure specifically as timestamp
#tracking_column_type ="timestamp"
record_last_run =true
#This file keep record of sql_last_value so that when next time query run can utilize last run values
last_run_metadata_path ="logstash_jdbc_last_run_t_data.txt"
#Define type of data from database
type ="t-data"
#Configure Timestamp according to database location
#jdbc_default_timezone ="UTC";

}
}
filter
{
#To map your creation_date column with elasticsearch @timestamp use below Date filter
mutate
{
convert =[ "creation_date", "string" ]
}
#Date pattern represent to date filter this creation_date is on format "MM/dd/yyyy HH:mm"
#and from timezone America/New_York so that when store in elasticsearch in UTC will adjust accordingly

date {
match =["creation_date","MM/dd/yyyy HH:mm"]
timezone ="America/New_York"
}
}
output
{
#output to elasticsearch
elasticsearch {
index = "defect-data-%{+YYYY.MM}"
hosts = ["elasticsearch-server:9200"]
document_type = "t-type"
#Use document_id in elasticsearch id you want to stop duplicate record in elasticsearch
document_id = "%{defect_id}"
}
#Output to console
stdout { codec = rubydebug}
}

I try to give  descriptive information in comment corresponding to each properties in configuration file. if need to go in depth and  more information just drop comments and send email will discuss in detail.

Date Filter : This filter will map CREATION_DATE  to @timestamp value for Index for each document and it says to CREATION_DATE is having pattern as “MM/dd/yyyy HH:mm” so that while converting to timestamp will follow same.

Execution :

 [logstash-installation-dir]/bin/logstash -f transaction-jdbc-defect.conf

For learning validation and start Logstash with other option follow link Logstash Installation, Configuration and Start

Logstash Console Output

If you noticed by using Date filter index @timestamp value is generating based on value of CREATION_DATE and for elasticsearch output configuration for index name defect-data-%{+YYYY.MM} will create  indexes for every month based on @timestamp value as   defect-data-2017.07 for sample data and if data changing in your database and defect id increase you will see changes on your console for new defects in every 15 minute as setup in configuration file.

Result :

select defect_id,owned_by,severity,status,summary,application,created_by,creation_date,modified_by,modified_date,assigned_to from defect_detail where defect_id:sql_last_value order by defect_id"
#
{
               "severity" = "Severity 3",
                "summary" = "Customer call 5 time but no response",
               "owned_by" = "Ramesh",
          "creation_date" = "7/3/2017 15:44",
          "modified_date" = "8/19/2017 6:22",
                   "type" = "t-data",
             "created_by" = "Saurabh",
             "@timestamp" = 2017-07-03T19:44:00.000Z,
            "modified_by" = "Gaurav",
               "@version" = "1",
              "defect_id" = 530812,
            "application" = "TEST-APP",
                 "status" = "Cancelled",
            "assigned_to" = "Development"
}
{
               "severity" = "Severity 1",
                "summary" = "Dealer Code Buyer on behalf",
               "owned_by" = "Neha",
          "creation_date" = "7/3/2017 16:20",
          "modified_date" = "8/17/2017 9:29",
                   "type" = "t-data",
             "created_by" = "Rajan",
             "@timestamp" = 2017-07-03T20:20:00.000Z,
            "modified_by" = "Nilam",
               "@version" = "1",
              "defect_id" = 530828,
            "application" = "TEST-APP5",
                 "status" = "Cancelled",
            "assigned_to" = "Development"
}
{
               "severity" = "Severity 1",
                "summary" = "Client Not want  Bulk call",
               "owned_by" = "Ramesh",
          "creation_date" = "7/24/2017 11:29",
          "modified_date" = "8/5/2017 20:00",
                   "type" = "t-data",
             "created_by" = "Rajiv",
             "@timestamp" = 2017-07-24T15:29:00.000Z,
            "modified_by" = "Raghav",
               "@version" = "1",
              "defect_id" = 540829,
            "application" = "TEST-APP4",
                 "status" = "Retest Complete",
            "assigned_to" = "IST - Integrated System Test"
}

Summary

In above detail cover about below points:

  • Logstash JDBC Input from Oracle Database.
  • JDBC Input changes for sql_last_value for numeric and timestamp
  • Read password and multi-line query from separate file.
  • Date Filter to get Index Timestamp value based on fields and pattern.
  • Dynamic Index Name for each day by appending date format.
  • Duplicate insert record prevention on Elasticsearch.
  • Start Logstash on background for configuration file.
  • Send Logstash output to Elasticsearch and Console.

Read More

To read more on Logstash Configuration,Input Plugins, Filter Plugins, Output Plugins, Logstash Customization and related issues follow Logstash Tutorial and Logstash Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Reference  :

 https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html

 

Posted in Centralize logging, Date, Elasticsearch, ELK, Logstash | Tagged , , , , , , , , , ,

Logstash , JDBC Input with sql_last_value as numeric or timestamp Example


Logstash , JDBC Input Plug-in work like a adapter to send your database detail to Elasticsearch so that utilize for full text search, query, analysis and show in form of Charts and Dashboard to Kibana.

In below example I will explain about how to create Logstash configuration file by  using JDBC Input Plug-in for Oracle Database and output to Elasticsearch .

Pre-requisite:

  • Logstash 5.xx installed
  • Elasticsearch 5.xx installed
  • Java 7/8 Installed

Sample Data:

Below sample data is from defect_detail table where defect id as numeric value and increment continuously in ascending order.

defect_id	owned_by	severity	status	          summary	                  application	created_by	creation_date	modified_by	modified_date	assigned_to
530812		Ramesh      Severity 3	Cacelled	      Customer call 5 time 	      TEST-APP	    Saurabh	    7/3/2017 15:44	Gaurav	    8/19/2017 6:22	Development
530828	    Neha	    Severity 1	Cancelled	      Dealer Code Buyer on behalf TEST-APP-5	Rajan	    7/3/2017 16:20	Nilam	    8/17/2017 9:29	Development
540829	    Ramesh	    Severity 1	Retest Completed  Client Not want  Bulk call  TEST-APP-4	Rajiv	    7/24/2017 11:29	Raghav	    8/5/2017 20:00	IST

Configuration File :

Below configuration file is setup to read data from Oracle database , will execute query in every 15 minute and read records after last run value of defect id . We should always use order by for column for which need to use last run value as configured for defect_id having numeric column.

If you are using any other database like MYSQL, SQLServer, DB2 etc. change jdbc_driver_library and jdbc_connection_string according to database. Because every database have there own query format so update query accordinly.

Copy below content and create file in bin directory as /bin/logstash-jdbc-defect.conf

input
{
jdbc {
#Path to download jdbc deriver and add in class path
jdbc_driver_library ="../jar/ojdbc6.jar";
# ORACLE Driver Class
jdbc_driver_class ="Java::oracle.jdbc.driver.OracleDriver";
# ORACLE database jdbc connection string ,  jdbc:oracle:thin:@hostname:PORT/SERVICE
jdbc_connection_string ="jdbc:oracle:thin:@hostname:1521/service";
#The user and password to connect to database
jdbc_user ="username";
jdbc_password ="password";
#Use when need to read password from file
#jdbc_password_filepath ="/opt/app/password-path-location";
jdbc_paging_enabled ="true";
jdbc_page_size ="50000";
#Configure Cron to How frequent want execute query in database
schedule ="*/15 * * * *";
#Use below if query is big and want to store in separate file
#statement_filepath ="../query/remedy-tickets-details.sql"
#Use for Inline query and if want to execute record after last run compare with value sql_last_value that can be numeric or timestamp
statement ="select defect_id,owned_by,severity,status,summary,application,created_by,creation_date,modified_by,modified_date,assigned_to from defect_detail where defect_id>:sql_last_value order by defect_id"
#Below is configuration when want to use last run value
clean_run=true
use_column_value =true
tracking_column =defect_id
#Logstash by default consider last_sql_value as numeric if it's timestamp configure specifically as timestamp
#tracking_column_type ="timestamp"
record_last_run =true
#This file keep record of sql_last_value so that when next time query run can utilize last run values
last_run_metadata_path ="logstash_jdbc_last_run_t_data.txt"
#Define type of data from database
type ="t-data"
#Configure Timestamp according to database location
#jdbc_default_timezone ="UTC";

}
}
filter
{
#To map your creation_date column with elasticsearch @timestamp use below Date filter
mutate
{
convert =[ "creation_date", "string" ]
}
#Date pattern represent to date filter this creation_date is on format "MM/dd/yyyy HH:mm"
#and from timezone America/New_York so that when store in elasticsearch in UTC will adjust accordingly

date {
match =["creation_date","MM/dd/yyyy HH:mm"]
timezone ="America/New_York"
}
}
output
{
#output to elasticsearch
elasticsearch {
index = "defect-data-%{+YYYY.MM}"
hosts = ["elasticsearch-server:9200"]
document_type = "t-type"
#Use document_id in elasticsearch id you want to stop duplicate record in elasticsearch
document_id = "%{defect_id}"
}
#Output to console
stdout { codec = rubydebug}
}

I try to give  descriptive information in comment corresponding to each properties in configuration file. if need to go in depth and  more information just drop comments and send email will discuss in detail.

Date Filter : This filter will map CREATION_DATE  to @timestamp value for Index for each document and it says to CREATION_DATE is having pattern as “MM/dd/yyyy HH:mm” so that while converting to timestamp will follow same.

Execution :

 [logstash-installation-dir]/bin/logstash -f transaction-jdbc-defect.conf

For learning validation and start Logstash with other option follow link Logstash Installation, Configuration and Start

Logstash Console Output

If you noticed by using Date filter index @timestamp value is generating based on value of CREATION_DATE and for elasticsearch output configuration for index name defect-data-%{+YYYY.MM} will create  indexes for every month based on @timestamp value as   defect-data-2017.07 for sample data and if data changing in your database and defect id increase you will see changes on your console for new defects in every 15 minute as setup in configuration file.

Result :

select defect_id,owned_by,severity,status,summary,application,created_by,creation_date,modified_by,modified_date,assigned_to from defect_detail where defect_id:sql_last_value order by defect_id"
#
{
               "severity" = "Severity 3",
                "summary" = "Customer call 5 time but no response",
               "owned_by" = "Ramesh",
          "creation_date" = "7/3/2017 15:44",
          "modified_date" = "8/19/2017 6:22",
                   "type" = "t-data",
             "created_by" = "Saurabh",
             "@timestamp" = 2017-07-03T19:44:00.000Z,
            "modified_by" = "Gaurav",
               "@version" = "1",
              "defect_id" = 530812,
            "application" = "TEST-APP",
                 "status" = "Cancelled",
            "assigned_to" = "Development"
}
{
               "severity" = "Severity 1",
                "summary" = "Dealer Code Buyer on behalf",
               "owned_by" = "Neha",
          "creation_date" = "7/3/2017 16:20",
          "modified_date" = "8/17/2017 9:29",
                   "type" = "t-data",
             "created_by" = "Rajan",
             "@timestamp" = 2017-07-03T20:20:00.000Z,
            "modified_by" = "Nilam",
               "@version" = "1",
              "defect_id" = 530828,
            "application" = "TEST-APP5",
                 "status" = "Cancelled",
            "assigned_to" = "Development"
}
{
               "severity" = "Severity 1",
                "summary" = "Client Not want  Bulk call",
               "owned_by" = "Ramesh",
          "creation_date" = "7/24/2017 11:29",
          "modified_date" = "8/5/2017 20:00",
                   "type" = "t-data",
             "created_by" = "Rajiv",
             "@timestamp" = 2017-07-24T15:29:00.000Z,
            "modified_by" = "Raghav",
               "@version" = "1",
              "defect_id" = 540829,
            "application" = "TEST-APP4",
                 "status" = "Retest Complete",
            "assigned_to" = "IST - Integrated System Test"
}

Summary

In above detail cover about below points:

  • Logstash JDBC Input from Oracle Database.
  • JDBC Input changes for sql_last_value for numeric and timestamp
  • Read password and multi-line query from separate file.
  • Date Filter to get Index Timestamp value based on fields and pattern.
  • Dynamic Index Name for each day by appending date format.
  • Duplicate insert record prevention on Elasticsearch.
  • Start Logstash on background for configuration file.
  • Send Logstash output to Elasticsearch and Console.

Read More

To read more on Logstash Configuration,Input Plugins, Filter Plugins, Output Plugins, Logstash Customization and related issues follow Logstash Tutorial and Logstash Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Reference  :

 https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html

Posted in Centralize logging, Date, Elasticsearch, ELK, Logstash | Tagged , , , , , , , , , ,

Logstash , JDBC Input Plug-in Configuration Example with Oracle Database and Output to Elasticsearch


Logstash , JDBC Input Plug-in work like a adapter to send your database detail to Elasticsearch so that utilize for full text search, query, analysis and show in form of Charts and Dashboard to Kibana.

In below example I will explain about how to create Logstash configuration file by  using JDBC Input Plug-in for Oracle Database and output to Elasticsearch .

Logstash JDBC Input configuration for Elasticsearch Output

Pre-requisite:

Sample Data:

Below sample data is from defect_detail table where defect id as numeric value and increment continuously in ascending order.

defect_id	owned_by	severity	status	          summary	                  application	created_by	creation_date	modified_by	modified_date	assigned_to
530812		Ramesh      Severity 3	Cacelled	      Customer call 5 time 	      TEST-APP	    Saurabh	    7/3/2017 15:44	Gaurav	    8/19/2017 6:22	Development
530828	    Neha	    Severity 1	Cancelled	      Dealer Code Buyer on behalf TEST-APP-5	Rajan	    7/3/2017 16:20	Nilam	    8/17/2017 9:29	Development
540829	    Ramesh	    Severity 1	Retest Completed  Client Not want  Bulk call  TEST-APP-4	Rajiv	    7/24/2017 11:29	Raghav	    8/5/2017 20:00	IST

Configuration File :

Below configuration file is setup to read data from Oracle database , will execute query in every 15 minute and read records after last run value of defect id . We should always use order by for column for which need to use last run value as configured for defect_id having numeric column.

If you are using any other database like MYSQL, SQLServer, DB2 etc. change jdbc_driver_library and jdbc_connection_string according to database. Because every database have there own query format so update query accordinly.

Copy below content and create file in bin directory as /bin/logstash-jdbc-defect.conf

input
{
jdbc {
#Path to download jdbc deriver and add in class path
jdbc_driver_library => "../jar/ojdbc6.jar"
# ORACLE Driver Class
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
# ORACLE database jdbc connection string ,  jdbc:oracle:thin:@hostname:PORT/SERVICE
jdbc_connection_string => "jdbc:oracle:thin:@hostname:1521/service"
#The user and password to connect to database
jdbc_user => "username"
jdbc_password => "password"
#Use when need to read password from file
#jdbc_password_filepath => "/opt/app/password-path-location"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
#Configure Cron to How frequent want execute query in database
schedule => "*/15 * * * *"
#Use below if query is big and want to store in separate file
#statement_filepath =>"../query/remedy-tickets-details.sql"
#Use for Inline query and if want to execute record after last run compare with value sql_last_value that can be numeric or timestamp
statement => "select defect_id,owned_by,severity,status,summary,application,created_by,creation_date,modified_by,modified_date,assigned_to from defect_detail where defect_id>:sql_last_value order by defect_id"
#Below is configuration when want to use last run value
clean_run=>true
use_column_value => true
tracking_column => defect_id
#Logstash by default consider last_sql_value as numeric if it's timestamp configure specifically as timestamp
#tracking_column_type => "timestamp"
record_last_run => true
#This file keep record of sql_last_value so that when next time query run can utilize last run values
last_run_metadata_path =>"logstash_jdbc_last_run_t_data.txt"
#Define type of data from database
type => "t-data"
#Configure Timestamp according to database location
#jdbc_default_timezone => "UTC"</code>

}
}
filter
{
#To map your creation_date column with elasticsearch @timestamp use below Date filter
mutate
{
convert => [ "creation_date", "string" ]
}
#Date pattern represent to date filter this creation_date is on format "MM/dd/yyyy HH:mm"
#and from timezone America/New_York so that when store in elasticsearch in UTC will adjust accordingly

date {
match => ["creation_date","MM/dd/yyyy HH:mm"]
timezone => "America/New_York"
}
}
output
{
#output to elasticsearch
elasticsearch {
index => "defect-data-%{+YYYY.MM}"
hosts => ["elasticsearch-server:9200"]
document_type => "t-type"
#Use document_id in elasticsearch id you want to stop duplicate record in elasticsearch
document_id => "%{defect_id}"
}
#Output to console
stdout { codec => rubydebug}
}

I try to give  descriptive information in comment corresponding to each properties in configuration file. if need to go in depth and  more information just drop comments and send email will discuss in detail.

Date Filter : This filter will map CREATION_DATE  to @timestamp value for Index for each document and it says to CREATION_DATE is having pattern as “MM/dd/yyyy HH:mm” so that while converting to timestamp will follow same.

Execution :

 [logstash-installation-dir]/bin/logstash -f <strong>transaction-jdbc-defect</strong>.conf

For learning validation and start Logstash with other option follow link Logstash Installation, Configuration and Start

Logstash Console Output

If you noticed by using Date filter index @timestamp value is generating based on value of CREATION_DATE and for elasticsearch output configuration for index name defect-data-%{+YYYY.MM} will create  indexes for every month based on @timestamp value as   defect-data-2017.07 for sample data and if data changing in your database and defect id increase you will see changes on your console for new defects in every 15 minute as setup in configuration file.

Result :

select defect_id,owned_by,severity,status,summary,application,created_by,creation_date,modified_by,modified_date,assigned_to from defect_detail where defect_id>:sql_last_value order by defect_id"
#
{
               "severity" => "Severity 3",
                "summary" => "Customer call 5 time but no response",
               "owned_by" => "Ramesh",
          "creation_date" => "7/3/2017 15:44",
          "modified_date" => "8/19/2017 6:22",
                   "type" => "t-data",
             "created_by" => "Saurabh",
             "@timestamp" => 2017-07-03T19:44:00.000Z,
            "modified_by" => "Gaurav",
               "@version" => "1",
              "defect_id" => 530812,
            "application" => "TEST-APP",
                 "status" => "Cancelled",
            "assigned_to" => "Development"
}
{
               "severity" => "Severity 1",
                "summary" => "Dealer Code Buyer on behalf",
               "owned_by" => "Neha",
          "creation_date" => "7/3/2017 16:20",
          "modified_date" => "8/17/2017 9:29",
                   "type" => "t-data",
             "created_by" => "Rajan",
             "@timestamp" => 2017-07-03T20:20:00.000Z,
            "modified_by" => "Nilam",
               "@version" => "1",
              "defect_id" => 530828,
            "application" => "TEST-APP5",
                 "status" => "Cancelled",
            "assigned_to" => "Development"
}
{
               "severity" => "Severity 1",
                "summary" => "Client Not want  Bulk call",
               "owned_by" => "Ramesh",
          "creation_date" => "7/24/2017 11:29",
          "modified_date" => "8/5/2017 20:00",
                   "type" => "t-data",
             "created_by" => "Rajiv",
             "@timestamp" => 2017-07-24T15:29:00.000Z,
            "modified_by" => "Raghav",
               "@version" => "1",
              "defect_id" => 540829,
            "application" => "TEST-APP4",
                 "status" => "Retest Complete",
            "assigned_to" => "IST - Integrated System Test"
}

Summary

In above detail cover about below points:

  • Logstash JDBC Input from Oracle Database.
  • JDBC Input changes for sql_last_value for numeric and timestamp
  • Read password and multi-line query from separate file.
  • Date Filter to get Index Timestamp value based on fields and pattern.
  • Dynamic Index Name for each day by appending date format.
  • Duplicate insert record prevention on Elasticsearch.
  • Start Logstash on background for configuration file.
  • Send Logstash output to Elasticsearch and Console.

Read More

To read more on Logstash Configuration,Input Plugins, Filter Plugins, Output Plugins, Logstash Customization and related issues follow Logstash Tutorial and Logstash Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Reference  :

 https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html
Posted in Centralize logging, Date, Elasticsearch, ELK, Logstash | Tagged , , , , , , , , ,

Logstash, File Input, CSV Filter and Elasticsearch Output


Logstash, File Input Plugin, CSV Filter and Elasticsearch Output Plugin Example will read data from CSV file, Logstash will parse this data and store in Elasticsearch.

Pre-Requisite

  • Logstash 5.X
  • Elasticsearch 5.X

Below Logstash configuration file is considered based data in CSV file.You can modify  this configuration file as per you data in your CSV file.

Logstash File Input CSV Filter Elasticsearch Output

Sample Data 

transactions-sample-data.txt

TRANSACTION_COUNT|TRANSACTION_DATE|TRANSACTION_TYPE|SERVER
18|07/24/2017|New Customer|SVR-1
9|07/25/2017|Online Customer|SVR-2
9|07/26/2017|Agents|SVR-3
12|07/24/2017|In Store|SVR-1
13|07/25/2017|New Customer|SVR-2
18|07/26/2017|Online Customer|SVR-3
21|07/24/2017|Agents|SVR-2
13|07/25/2017|In Store|SVR-3
15|07/26/2017|New Customer|SVR-4

Logstash Configuration File

Create Logstastash configuration file logstash- installation-dir/bin/transaction-test.conf and paste below content.

input {
    file {
        path => "/opt/app/facinissuesonit/transactions-sample-data.txt"
        start_position => beginning
    }
}
filter {
    csv {
        #add mapping columns name correspondily values assigned
        columns => ["TRANSACTION_COUNT","TRANSACTION_DATE","TRANSACTION_TYPE","SERVER"]
        separator => "|"
        remove_field => ["message"]
        }
#Date filter is used to convert date to @Timestamp sho that chart in Kibana will show as per date
    date {
        match => ["TRANSACTION_DATE", "MM/dd/yyyy"]
    }
#Remove first header line to insert in elasticsearch
    if  [TRANSACTION_TYPE] =~ "TRANSACTION_TYPE"
{
drop {}
}
}
output {

	elasticsearch {
   # Create Index based on date
		index => "app-transactions-%{+YYYY.MM.dd}"
    		hosts => ["elasticsearver:9200"]
  		}
#Console Out put
stdout
         {
         codec => rubydebug
        # debug => true
         }
}

Information about configuration file :

File Input Plugin :  will read data from file and because we set as start-position as “Beginning” will always read file form start.

CSV Filter : This filter will read each line message , split based on “|” and map with corresponding column mentioned position and finally will remove this message field because data is parsed now.

Date Filter : This filter will map TRANSACTION_DATE  to @timestamp value for Index for each document and it says to TRANSACTION_DATE  is having pattern as “MM/dd/YYYY” so that when converting to timestamp will follow same.

drop: Drop is for removing header line if field name match with content.

Run Logstash Configuration with below command

 [logstash-installation-dir]/bin/logstash -f transaction-test.conf

For learning validation and start Logstash with other option follow link Logstash Installation, Configuration and Start

Logstash Console Output

If you noticed by using Date filter index @timestamp value is generating based on value of TRANSACTION_DATE and for elasticsearch output configuration for index name app-transactions-%{+YYYY.MM.dd} will create 3 indexes based on @timestamp value as   app-transactions-2017.07.24 , app-transactions-2017.07.25, app-transactions-2017.07.26 for sample data.

{
"path" => "/opt/app/facinissuesonit/transactions-sample-data.txt",
"TRANSACTION_DATE" => "07/24/2017",
"@timestamp" => 2017-07-24T04:00:00.000Z,
"SERVER" => "SVR-1",
"@version" => "1",
"host" => "facingissuesonit.saurabh.com",
"TRANSACTION_TYPE" => "New Customer",
"TRANSACTION_COUNT" => "18"
}
{
"path" => "/opt/app/facinissuesonit/transactions-sample-data.txt",
"TRANSACTION_DATE" => "07/25/2017",
"@timestamp" => 2017-07-25T04:00:00.000Z,
"SERVER" => "SVR-2",
"@version" => "1",
"host" => "facingissuesonit.saurabh.com",
"TRANSACTION_TYPE" => "Online Customer",
"TRANSACTION_COUNT" => "9"
}
{
"path" => "/opt/app/facinissuesonit/transactions-sample-data.txt",
"TRANSACTION_DATE" => "07/26/2017",
"@timestamp" => 2017-07-26T04:00:00.000Z,
"SERVER" => "SVR-3",
"@version" => "1",
"host" => "facingissuesonit.saurabh.com",
"TRANSACTION_TYPE" => "Agents",
"TRANSACTION_COUNT" => "9"
}
{
"path" => "/opt/app/facinissuesonit/transactions-sample-data.txt",
"TRANSACTION_DATE" => "07/24/2017",
"@timestamp" => 2017-07-24T04:00:00.000Z,
"SERVER" => "SVR-1",
"@version" => "1",
"host" => "facingissuesonit.saurabh.com",
"TRANSACTION_TYPE" => "In Store",
"TRANSACTION_COUNT" => "12"
}
{
"path" => "/opt/app/facinissuesonit/transactions-sample-data.txt",
"TRANSACTION_DATE" => "07/25/2017",
"@timestamp" => 2017-07-25T04:00:00.000Z,
"SERVER" => "SVR-2",
"@version" => "1",
"host" => "facingissuesonit.saurabh.com",
"TRANSACTION_TYPE" => "New Customer",
"TRANSACTION_COUNT" => "13"
}
{
"path" => "/opt/app/facinissuesonit/transactions-sample-data.txt",
"TRANSACTION_DATE" => "07/26/2017",
"@timestamp" => 2017-07-26T04:00:00.000Z,
"SERVER" => "SVR-3",
"@version" => "1",
"host" => "facingissuesonit.saurabh.com",
"TRANSACTION_TYPE" => "Online Customer",
"TRANSACTION_COUNT" => "18"
}
{
"path" => "/opt/app/facinissuesonit/transactions-sample-data.txt",
"TRANSACTION_DATE" => "07/24/2017",
"@timestamp" => 2017-07-24T04:00:00.000Z,
"SERVER" => "SVR-2",
"@version" => "1",
"host" => "facingissuesonit.saurabh.com",
"TRANSACTION_TYPE" => "Agents",
"TRANSACTION_COUNT" => "21"
}
{
"path" => "/opt/app/facinissuesonit/transactions-sample-data.txt",
"TRANSACTION_DATE" => "07/25/2017",
"@timestamp" => 2017-07-25T04:00:00.000Z,
"SERVER" => "SVR-3",
"@version" => "1",
"host" => "facingissuesonit.saurabh.com",
"TRANSACTION_TYPE" => "In Store",
"TRANSACTION_COUNT" => "13"
}

Summary

In above detail cover about below points:

  • Logstash File Input reading.
  • How to apply CSV filter for “|” and map with fields.
  • How to drop header line if exist in CSV file
  • Date Filter to get Index Timestamp value based on fields and pattern
  • Dynamic Index Name for each day by appending date format
  • Start Logstash on background for configuration file.

Read More

To read more on Logstash Configuration,Input Plugins, Filter Plugins, Output Plugins, Logstash Customization and related issues follow Logstash Tutorial and Logstash Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Reference:

https://www.elastic.co/guide/en/logstash/current/plugins-filters-csv.html

Posted in Centralize logging, Elasticsearch, ELK, Logstash, Tutorials | Tagged , , , , , , , , , ,

Logstash Installation, Configuration and Start


Download latest version of Logstash from below link and use command to unTar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/logstash

tar -zxvf logstash-5.4.0.tar.gz

It will show below file and directory structure.

drwxr-xr-x 2 facingissuesonit Saurabh   4096 Apr 20 11:27 bin
-rw-r--r-- 1 facingissuesonit Saurabh 111569 Mar 22 23:49 CHANGELOG.md
drwxr-xr-x 2 facingissuesonit Saurabh   4096 Apr 20 11:27 config
-rw-r--r-- 1 facingissuesonit Saurabh   2249 Mar 22 23:49 CONTRIBUTORS
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 12:07 data
-rw-r--r-- 1 facingissuesonit Saurabh   3945 Mar 22 23:55 Gemfile
-rw-r--r-- 1 facingissuesonit Saurabh  21544 Mar 22 23:49 Gemfile.jruby-1.9.lock
drwxr-xr-x 5 facingissuesonit Saurabh   4096 Apr 20 11:27 lib
-rw-r--r-- 1 facingissuesonit Saurabh    589 Mar 22 23:49 LICENSE
drwxr-xr-x 2 facingissuesonit Saurabh   4096 May 21 00:00 logs
drwxr-xr-x 4 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-event-java
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-plugin-api
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-queue-jruby
-rw-r--r-- 1 facingissuesonit Saurabh  28114 Mar 22 23:56 NOTICE.TXT
drwxr-xr-x 4 facingissuesonit Saurabh   4096 Apr 20 11:27 vendor

Before going to start Logstash need to create configuration file for taking input from different input sources like file, csv, jdbc, json, kafka, filebeat etc.  and parse these data in respected fields and send it to output like elasticsearch, file, kafka etc.

Logstash Configuration file will follow below syntax  as i have created file logstash-app1.conf in logstash bin directory . Please follow Logstash Tutorial for more Input, Filter and Output plugin Examples.

/bin/logstash-app1.conf

input {
     kafka {
            ....
          }
    jdbc {
            ....
          }
}
filter
{
//parse log line or data...
      grok
	{
	....
	}
}
output {
    #Output result sent to elasticsearch
    elasticsearch {
       ....
  	}
     #Sysout to console
     stdout
       {
         codec => rubydebug
       }
}

To test your configuration file you can use below command.


./logstash -t -f logstash-app1.conf

If  we get result OK means no any syntax, compile time issue with configuation file from above command.Now run below to start reading and parsing data from different sources.


./logstash -f logstash-app1.conf

To run logstash in background follow command as so that when close your console your Logstash process will keep running.


screen -d -m ./logstash -f logstash-app1.conf

Summary

In above detail cover about below points:

  • How to Install Logstash on Linux Environment.
  • Configuration file Syntax and validation.
  • Start Logstash for configuration file.
  • Start Logstash on background for configuration file.

Read More

To read more on Logstash Configuration,Input Plugins, Filter Plugins, Output Plugins, Logstash Customization and related issues follow Logstash Tutorial and Logstash Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Posted in ELK, Logstash | Tagged , , | 3 Comments

Elasticsearch Installation in Linux


For  Elasticsearch installation in Linux download .tar file from below download link.

Latest Version : elasticsearch-5.4.0

Download Link : https://www.elastic.co/downloads/elasticsearch

Pre-Requisite :

  • Java 8 +
  • Set JAVA_HOME environment variable to your JDK directory like below by run manually or set in .bash.profile in home directory.
export JAVA_HOME=/opt/app/FACING_ISSUE_IN_IT/JAVA/jdk1.8.0_66

How to Install Elasticsearch?

Untar downloaded .tar file in directory where you want to install elasticsearch by using below command.

tar -zxvf elasticsearch-5.4.0.tar.gz

Your untar directory structure would look as below.

Elasticsearch installation directory window

Below is description about all these directories.

Directory Description
 bin  Binary Script directory which keeps files to start elasticseach, install plugin
 config  Keeps elasticsearch.yml file elasticseach configuration and jvm.options to make jvm related setting like heap etc.
 data This is default directory set in elasticsearch for keeping node data that can be configure by changing path.data in elasticsearch.yml file.
 lib  Keeps all jar files for elasticsearch
 logs  This is default directory set in elasticsearch for keeping logs  that can be configure by changing path.logs in elasticsearch.yml file
 modules  It keeps all functionality and processors required for data.
 plugins  All installed plugin will store in plugins directory

Elasticsearch Configuration

Before going to start Elasticsearch need to make some basic changes in config/elasticsearch.yml file.

cluster.name: FACING-ISSUE-IN-IT
node.name: TEST-NODE-1
#network.host: 0.0.0.0
http.port: 9200

cluster.name : Elasticsearch Cluster name is unique name with in network which links all nodes.

node.name :  Each node in cluster have unique name by which Cluster identify nodes.

http.port: Default http port for Elasticsearch is 9200 . You can update it .

network.host: This property will  set when elasticsearch need to  access other machine  or by IP. For more on network.host follow link Why network.host?

Now Elasticsearch is ready for start.

How to start Elasticsearch?

There are multiple way to start elasticsearch as below.

  • Run in foreground
  • Run in background
  • Run in background with commandline arguments

Run in Foreground

To run elastic search in Linux  server as  foreground process to see sysout in console use below command in elasticsearch home directory. When you will see started as in below screen means elasticsearch is started  successfully.

/bin/elasticsearch

elasticsearch start

In above screen elasticsearch is started  on port 9200.

Note: To stop elasticsearch use CNTR+C.

Run in Background

use option “screen -d -m” to run elasticsearch in background

screen -d -m  /bin/elasticsearch
or
/bin/elasticsearch -d

Elasticsearch in Background by Command-line configuration

Instead of passing hard code value in elasticsearch.yml file .We can also pass all these above configurable fields from command-line as given below. For more detail follow link Elasticsearch Cluster with multi node on same machine.

./bin/elasticsearch -d -Ecluster.name=FACING-ISSUE-IN-IT
 -Enode.name=TEST-NODE-1 -Ehttp.port=9200

Where -E represents for arguments name where value need to set.

-d is for running elasticsearch as daemon in background .

Note :To stop Elasticsearch running in background findout the process as below and kill processId.

Now Elasticsearch is running  and time to Test.

For testing elasticsearch cluster status copy below link try on your browser address bar. You will get result like below.

http://localhost:9200/_cluster/health?pretty

or as below if network.host configured

http://elasticseverIp:9200/_cluster/health?pretty

Result :

{
  "cluster_name" : "FACING-ISSUE-IN-IT",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

For learn more about Cluster follow link Cluster Configuration and Health

Same way you can get detail about your configured Node from below URLs.

http://localhost:9200/_nodes?pretty

or as below if network.host configured

http://elasticsearverip:9200/_nodes?pretty

For learn more about Node follow link Node Type,Configuration and status etc. Follow Link for Elasticsearch Tutorials.

Read More

To read more on Elasticsearch Configuration, sample Elasticsearch REST clients, Queries Type configurations with example follow link Elasticsearch Tutorial and Elasticsearch Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Posted in Elasticsearch, ELK | Tagged , , , , , , , , ,

Elasticsearch Installation in Window


For  Elasticsearch installation in window download .zip file from below  link.

Latest Version : elasticsearch-5.4.0

Download Link : https://www.elastic.co/downloads/elasticsearch

Pre-Requisite : Java 8 +

How to Install Elasticsearch?

Unzip downloaded .zip file in directory where you want to install elasticsearch and your unzip directory structure world look as below.

Elasticsearch installation directory window

Below is description about all these directories.

Directory Description
 bin  Binary Script directory which keeps files to start elasticseach, install plugin
 config  Keeps elasticsearch.yml file elasticseach configuration and jvm.options to make jvm related setting like heap etc.
 data This is default directory set in elasticsearch for keeping node data that can be configure by changing path.data in elasticsearch.yml file.
 lib  Keeps all jar files for elasticsearch
 logs  This is default directory set in elasticsearch for keeping logs  that can be configure by changing path.logs in elasticsearch.yml file
 modules  It keeps all functionality and processors required for data.
 plugins  All installed plugin will store in plugins directory

Elasticsearch Configuration

Before going to start Elasticsearch need to make some basic changes in config/elasticsearch.yml file.

cluster.name: FACING-ISSUE-IN-IT
node.name: TEST-NODE-1
#network.host: 0.0.0.0
http.port: 9200

cluster.name : Elasticsearch Cluster name is unique name with in network which links all nodes.

node.name :  Each node in cluster have unique name by which Cluster identify nodes.

http.port: Default http port for Elasticsearch is 9200 . You can update it .

network.host: This property will  set when elasticsearch need to  access other machine  or by IP. For more on network.host follow link Why network.host?

Now Elasticsearch is ready for start.

How to start Elasticsearch?

Click on elasticsearch.bat file inside /bin directory and you will get below console sysout when successfully started.

elasticsearch start

Now you can see from above screen your Elasticsearch is started successfully on port 9200.

For testing elasticsearch cluster status copy below link try on your browser address bar. You will get result like below.

http://localhost:9200/_cluster/health?pretty

or as below if network.host configured

http://elasticseverIp:9200/_cluster/health?pretty

Result :

{
  "cluster_name" : "FACING-ISSUE-IN-IT",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

For learn more about Cluster follow link Cluster Configuration and Health

Same way you can get detail about your configured Node from below URLs.

http://localhost:9200/_nodes?pretty

or as below if network.host configured

http://elasticsearverip:9200/_nodes?pretty

For learn more about Node follow link Node Type,Configuration and status etc. Go to below link for Elasticsearch Tutorials.

Read More

To read more on Elasticsearch Configuration, sample Elasticsearch REST clients, Queries Type configurations with example follow link Elasticsearch Tutorial and Elasticsearch Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Posted in Elasticsearch, ELK | Tagged , , , , , , , , ,

Elasticsearch Overview


“Elasticsearch is open source cross-platform developed completely in Java. It’s built on top of Lucene which provide full text search on high volumes of data quickly and easily do analysis based on indexing. It is schema free and provide NRT(Near real Time) search results.”

Advantage of Elasticsearch

Full Text Search 

Elasticserach built on top of Lucene which provide full-featured  library to search full-text on any open source.

Schema Free 

Elasticsearch stores documents in JSON format and based on it detects words and type to make it searchable.

Restful API 

Elastisearch is easily accessible over browser by using URL and also support for Restful API to perform Operation. For read more on Elasticsearch REST follow link for Elasticsearch REST JAVA API Overview.

Operation Persistence

Elasticsearch cluster keep records of all transaction level changes for schema if anything get change in data for index and track of availability of Nodes in cluster so that make data easily available if any fail-over of any node.

Area of use Elasticsearch?

  • It’s useful in application where need to do analysis, statics and need to find out anomalies on data based on pattern.
  • It’s useful where need to send alerts when particular condition matched like stock market, exception from logs etc.
  • It’s useful with application where log analysis and issue solution provide because of full search in billions of records in milliseconds.
  • It’s compatible with application like Filebeat, Logstash and Kibana for storage of high volume data for analysis and visualize in form of chart and dashboards.

Basic Concepts and Terminology

Cluster

Cluster is a collection of one or more nodes which provide capabilities to search text on scattered data on nodes. It’s identified by unique name with in network so that all associated nodes will join together by cluster name.

For more info on Cluster configuration and query follow link Elasticsearch Cluster.

Elasticsearch Cluster

Elasticsearch Cluster

In above screen elasticsearch cluster “FACING_ISSUE_IN_IT” having three master and four data node.

Node

Node is a Elasticsearch server which associate with a cluster. It’s store data , help cluster for indexing data and search query. It’s identified by unique name in Cluster if name is not provided elasticsearch will generate random Universally Unique Identifier(UUID) on time of server start.

A Cluster can have one or more Nodes .If first node start that will have Cluster with single node and when other node will start will add with that cluster.

For more info on Node Configuration, Master Node, Data Node, Ingest node follow link Elasticsearch Node.

Data Node storage

Data Node Documents Storage

In above screen trying to represent data of two indexes like I1 and I2.Where Index I1 is having two type of documents T1 and T2 while index I2 is having only type T2 and these shards are distributes over all nodes in cluster. This data node is having documents of shard (S1) for  Index I1 and shard (S3) for Index I2. It’s also keeping replica of documents of shards S2 of Index I2 and I1 which are store some other nodes in cluster.

Index

An Index is collection of documents with same characteristics which stores on nodes in distributed fashion and its identify by unique name on which perform different operation like search query, update and delete for documents. A cluster can have as many indexes with unique name.

A document store in Index and assigned a type to it and an Index can have multiple types of documents.

For more info on Index Creation, Mapping Template , CRUD follow link Elasticsearch Index.

Shards

Shards are partitions of indexes scattered on nodes. It provide capability to store large amount (billions) of documents for same index to store in cluster even one disk of node is not capable to store it.

Replica

Replica is copy of shard which store on different node. A shard can have zero or more replica. If shard on one node then replica of shard will store on another node.

Benefits of Shards and Replica

  • Shards splits indexes in horizontal partition for high volumes of data.
  • It perform operations parallel to each shards or replica on multiple node for index so that increase system performance and throughput.
  • Recovered easily in case of fail-over of node because data replica exist on another node because replica always store on different node where shards exist.

Note

When we create index by default elasticseach index configure as 5 shards and 1 replica but we can configure it from config/elasticsearch.yml file or by passing shards and replica values in mapping when index create.

Once index created we can’t change shards configuration but modify in replica. If need to update in shards only option is re-indexing.

Each Shard itself a Lucene index and it can keep max 2,147,483,519 (= Integer.MAX_VALUE – 128) documents. For merging of search results and failover taken care by elasticsearch cluster.

For more info on elasticsearch Shards and Replica follow Elasticsearch Shards and Replica configuration.

Document

Each Record store in index is called a document which store in JSON object. JSON data exchange is fast over internet and easy to handle on browser side display.

Read More

To read more on Elasticsearch Configuration, Sample Elasticsearch REST Clients, Search Queries Types with example follow link Elasticsearch Tutorial and Elasticsearch Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Posted in Elasticsearch, ELK, REST | Tagged , , , , , , , , , , , , , , , ,

Elasticsearch REST Index Manager Auto Client for CRUD


Elasticsearch 5 REST Java Index Manager Auto Client can  help to manage index life from client end by setting configuration for keeping  index   open, close, delete indexes  for this no any third party tool required.

Below steps for auto  index management will save your time of index management manually and will take care of index life based on configure time.

Pre-requisite

  • Minimum requirement for Java 8 version required.
  • Add dependency for Elasticsearch REST and JSON Mapping in your pom.xml or add in your class path.
  • Index name format should be like IndexName-2017.06.10 for Ex. app1-logs-2017.06.08 if you have different date format change accordingly in below code.

We will follow below steps to create this client and auto run:

  • Create Java Maven Project ElasticsearchAutoIndexManager.
  • Add ElasticSearchIndexManagerClient in Project.
  • Test
  • Create auto run jar for project
  • Create script file for run auto jar
  • Create Cron Tab configuration for schedule and receive email alert.

Create Java Maven Project ElasticsearchAutoIndexManager

Create console based JAVA maven project as in below screen shot with name as ElasticsearchAutoIndexManager . To know more about Console based Java maven project follow link How to create Maven Java Console Project?

Elasticsearch REST Auto Client

Add below dependency in pom.xml file

<!--Elasticsearch REST jar-->
<dependency>
			<groupId>org.elasticsearch.client</groupId>
			<artifactId>rest</artifactId>
			<version>5.1.2</version>
</dependency>
<!--Jackson jar for mapping json to Java -->
<dependency>
			<groupId>com.fasterxml.jackson.core</groupId>
			<artifactId>jackson-databind</artifactId>
			<version>2.8.5</version>
</dependency>

Add below ElasticSearchIndexManagerClient class in com.facingissuesonit.es package and make below constant fields changes as per your server info and requirement.

Set INDEX_NO_ACTION_TIME so that till these days difference no action will take. For Example as set 2 means till  two days index will searchable in system.

private static final int INDEX_NO_ACTION_TIME = 2; 

Set INDEX_CLOSE_TIME so that all indexes will in close status means exist in elasticsearch server but not searchable.For Example as set 5 means if index life is more than five days  will close these indexes and keep it as long as Index delete time not reach.

private static final int INDEX_CLOSE_TIME = 5; 

Set INDEX_DELETE_TIME decide when to delete these indexes which match this criteria. For example as set 15 means will delete all indexes which are having index life more than 15 days.

private static final int INDEX_DELETE_TIME = 15;

private static final String ELASTICSEARCH_SERVER = “ServerHost”;

private static final int ELASTICSEARCH_SERVER_PORT = 9200;

Note : Set proxy server and login credential information if required else comment out.

package com.facingissuesonit.es;

import java.io.IOException;
import java.io.InputStream;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.time.temporal.ChronoUnit;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

import org.apache.http.HttpEntity;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
import org.elasticsearch.client.Response;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;

import com.fasterxml.jackson.databind.ObjectMapper;

public class ElasticSearchIndexManagerClient {
	private static final int INDEX_NO_ACTION_TIME = 2;
	private static final int INDEX_CLOSE_TIME = 5;
	private static final int INDEX_DELETE_TIME = 15;
	private static final String ELASTICSEARCH_SERVER = "ServerHost";
	private static final int ELASTICSEARCH_SERVER_PORT = 9200;
	public static void main(String[] args) {
		RestClient client;
		String indexName = "", indexDateStr = "";
		LocalDate indexDate = null;
		long days = 0;
		final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
		final LocalDate todayLocalDate = LocalDate.now();

		try {
			ElasticSearchIndexManagerClient esManager=new ElasticSearchIndexManagerClient();
			//Get Connection from Elasticsearch
			client=esManager.getElasticsearchConnectionClient();
			if(client!=null)
			{
				IndexDetail[] indexList = esManager.getIndexDetailList(client);

				if (indexList != null && indexList.length > 0) {
					for (IndexDetail indexDetail : indexList) {
						indexName = indexDetail.getIndexName();
						System.out.println(indexName);
						indexDateStr = indexName.substring(indexName.lastIndexOf("-") + 1);
						//Below code is for getting number of days difference from index creation ad current date
						try {
							indexDate = LocalDate.parse(indexDateStr.replace('.', '-'), formatter);
							days = ChronoUnit.DAYS.between(indexDate, todayLocalDate);
							esManager.performAction(indexDetail, days,client);
						} catch (Exception ex) {
							System.out.println("Index is not having formatted date as required : yyyy.MM.dd :"+indexName);
						}
					}
				}
			}
		} catch (Exception ex) {
			System.out.println("Exception found while index management");
			ex.printStackTrace();
			System.exit(1);
		} finally {
			System.out.println("Index Management successfully completed");
			System.exit(0);
		}
	}
	//Get Elasticsearch Connection
	private RestClient getElasticsearchConnectionClient() {
		final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
		credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials("userid", "password"));

		RestClient client = RestClient
				.builder(new HttpHost(ELASTICSEARCH_SERVER,ELASTICSEARCH_SERVER_PORT))
				.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {

					public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
						return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
								.setProxy(new HttpHost("ProxyHost", "ProxyPort"));

					}
				}).setMaxRetryTimeoutMillis(60000).build();
		return client;
	}
	//Get List of Indexes in Elaticsearxh Server
	public IndexDetail[] getIndexDetailList(RestClient client)
	{
		IndexDetail[] indexDetails=null;
		HttpEntity in=null;
		try
		{
		ObjectMapper jacksonObjectMapper = new ObjectMapper();
		Response response = client.performRequest("GET", "/_cat/indices?format=json&pretty", Collections.singletonMap("pretty", "true"));
		in =response.getEntity();
		indexDetails=jacksonObjectMapper.readValue(in.getContent(), IndexDetail[].class);
		System.out.println("Index found :"+indexDetails.length);
		}
		catch(IOException ex)
		{
			ex.printStackTrace();
		}

		return indexDetails;
	}
	//This Method Decide what action need to take based based Index creation date and configured date for No Action, close and Delete indexes
	private  void performAction(IndexDetail indexDetail, long days,RestClient client) {
		String indexName = indexDetail.getIndexName();
		if (days >= INDEX_NO_ACTION_TIME) {
			if (!(indexDetail.getStatus() != null && indexDetail.getStatus().equalsIgnoreCase("close"))) {
				// Close index condition
				if (days >= INDEX_CLOSE_TIME) {
					System.out.println("Close Index :" + indexName);
					closeIndex(indexName,client);
				}
			}
			// Delete index condition
			if (days >= INDEX_DELETE_TIME) {
				if (!(indexDetail.getStatus() != null && indexDetail.getStatus().equalsIgnoreCase("close"))) {
					System.out.println("Delete Index :" + indexName);
					deleteIndex(indexName,client);
				} else {
					System.out.println("Delete Close Index :" + indexName);
					deleteCloseIndex(indexName,client);
				}
			}
		}
	}

	// Operation on Indexes
		private  void closeIndex(String indexName,RestClient client) {

			flushIndex(indexName,client);
			postDocuments(indexName + "/_close", client);
			System.out.println("Close Index :" + indexName);
		}

		private  void deleteIndex(String indexName,RestClient client) {
			flushIndex(indexName,client);
			deleteDocument(indexName,client);
			System.out.println("Delete Index :" + indexName);
		}

		private  void deleteCloseIndex(String indexName,RestClient client) {
			openIndex(indexName,client);
			flushIndex(indexName,client);
			deleteDocument(indexName, client);
			System.out.println("Delete Close Index :" + indexName);
		}

		private  void openIndex(String indexName,RestClient client) {
			postDocuments(indexName + "/_open", client);
			System.out.println("Open Index :" + indexName);
		}

		private  void flushIndex(String indexName,RestClient client) {
			postDocuments(indexName + "/_flush", client);
			System.out.println("Flush Index :" + indexName);
			try {
				Thread.sleep(3000);
			} catch (InterruptedException ex) {
				ex.printStackTrace();
			}
		}
		//POST perform action used for creation and updation indexes
		public InputStream postDocuments(String endPoint,RestClient client)
		{
			InputStream in=null;
			Response response=null;
			try
			{
				response = client.performRequest("POST", endPoint, Collections.<String, String>emptyMap());
				in =response.getEntity().getContent();
			}
			catch(IOException ex)
			{
				System.out.println("Exception in post Documents :");
				ex.printStackTrace();
			}
			return in;
		}
		//DELETE perform action use for Deletion of Index
		public InputStream deleteDocument(String endPoint,RestClient client)
		{
			InputStream in=null;
			try
			{

		    Response response = client.performRequest("DELETE", endPoint, Collections.singletonMap("pretty", "true"));
			in =response.getEntity().getContent();
			}
			catch(IOException ex)
			{
				System.out.println("Exception in delete Documents :");
				ex.printStackTrace();
			}
			return in;
		}

}

In above code pretty state forward and step by step. Let’s me explain about below operation.

Open :  Index status open keep index available for searching and we can perform below operation like close and delete on indexes when it open status. For making Index open we can use below command in curl .

curl POST /indexName-2017.06.10/_open

Flush: This operation  is required before executing close and delete operation on indexes so that all running transactions on indexes complete.

curl POST /indexName-2017.06.10/_flush

Close : Close indexes persist in elasticsearch server but not available for searching. For making Index open we can use below command in curl .

curl POST /indexName-2017.06.10/_close

Delete : Delete operation on index will delete completely from server.

curl POST /indexName-2017.06.10/_delete

Now our code is ready to take care of indexes based on configured time and test . we test it after running above code.

Below steps are for making your index manager code auto runnable in Linux environment.

Create Auto Runnable Jar

Export ElasticsearchAutoIndexManager project as auto runnable jar by selecting as Launch class ElascticsearchIndexManagerClient. To learn about Auto runnable jar creation steps following link How to make and auto run /executable jar with dependencies?

Create Script File to Execute  Autorun Jar

Create script file as below with name as IndexManger.sh and save it.

#!/bin/bash
~/JAVA/jdk1.8.0_66/bin/java  -jar /opt/app/facingissuesonit/automate/IndexManagerClient.jar

Create Cron Tab configuration for schedule and receive email alert

Linux provide cron tab for executing schedule job/scripts. by using cron tab will execute  runnable jar by using above script file

  • Use command crontab -e to make and edit existing entries in cron tab.
  • Make below cron entry in this editor  for executing IndexManager.sh script on every night 1AM.
  • If you want to get execution alert to you and your team  with console logs also add your email id as below.
  • Save cron tab as ESC then (:wq)

Below are some more example for cron tab expression.

0 * * * *           : Run Every hour of day
* * * * *           : Every minute of day
30 4 * * *         : Run on 4:30 AM everyday
5 10,22 * * *   : Run twice on 10:05 and 22:05
5 0 * * *          : Run after Midnight

Read More

To read more on Elasticsearch REST , sample clients, configurations with example follow link Elasticsearch REST Tutorial and Elasticsearch Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Posted in Elasticsearch, ELK, Example, REST, Tutorials | Tagged , , , , , , , , , , ,