Category Archives: Filebeat

Elasticsearch Ingest Node Vs Logstash Vs Filebeat

Elasticsearch Ingest Node

Ingest node use to pre-process documents before the actual document indexing
happens. The ingest node intercepts bulk and index requests, it applies transformations, and it then passes the documents back to the index or bulk APIs.

Logstash

Logstash is a server-side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to different output sources like Elasticsearch, Kafka Queues, Databases etc.

Filebeat

Filebeat is lightweight log shipper which reads logs from thousands of logs files and forward those log lines to centralize system like Kafka topics to further processing on Logstash, directly to Logstash or Elasticsearch search.

There is overlap in functionality between Elasticsearch Ingest Node , Logstash and Filebeat.All have there weakness and strength based on architectures and area of uses. You cam also integrate all of these Filebeat, Logstash and Elasticsearch Ingest node by minor configuration to optimize performance and analyzing of data.

Below are some key points to compare Elasticsearch Ingest Node , Logstash and Filebeat.

Elasticsearch Ingest Node Vs Logstash Vs Filebeat

PointsElasticsearch Ingest NodeLogstashFilebeat
Data In and OutAs ingest node runs as pipeline within the indexing flow in Elasticsearch, data has to be pushed to it
through bulk or indexing requests and configure pipeline processors process documents before indexing of actively writing data
to Elasticsearch.
Logstash supports wide variety of input and output plugins. It can act as middle server to accept pushed data from clients over TCP, UDP and HTTP and filebeat, message queues and databases.
It parse and process data for variety of output sources e.g elasticseach, message queues like Kafka and RabbitMQ or long term data analysis on S3 or HDFS.
Filebeat specifically to shipped logs files data to Kafka, Logstash or Elasticsearch.
QueuingElasticsearch Ingest Node is not having any built in queuing mechanism in to pipeline processing.
If the data nodes are not bale to accept data, the ingest node will stop accepting data as well.
Logstash provide persistent queuing feature mechanism features by storing on disk.Filebeat provide queuing mechanism with out data loss.
Back-pressureClients pushing data to ingest node need to be able to handle back-pressure by queuing data In case elasticsearch is not reachable or able to accept data for extended period otherwise there would be data loss.Logstash provide at least once delivery guarantees and buffer data locally through ingestion spikes.Filebeat designed architecture like that with out losing single bit of log line if out put systems like kafka, Logstash or Elasticsearch not available
Data ProcessingIngest node comes around 20 different processors, covering the functionality of
the most commonly used Logstash plugins.

Ingest node have some limitation like pipeline can only work in the context of a single event. Processors are
also generally not able to call out to other systems or read data from disk. It's also not having filters as in beats and logstash.
Logstash has a larger selection of plugins to choose from. This includes
plugins to add or transform content based on lookups in configuration files,
Elasticsearch, Beats or relational databases.
Logstash support filtering out and dropping events based on
configurable criteria.
Beats support filtering out and dropping events based on
configurable criteria.
ConfigurationEach document can only be processed by a single pipeline when passing through the ingest node.

Logstash supports to define multiple logically separate pipelines by conditional control flow s to handle complex and multiple data formats.

Logstash is easier to measuring and optimizing performance of the pipeline to supports monitoring and resolve potential issues quickly by excellent pipeline viewer UI.
Minor configuration to read , shipping and filtering of data. But limitation with parsing.
SpecializationIngest Node pipeline processed data before doing indexing on elasticsearch.Its middle server to parse process and filter data from multiple input plugins and send processes data to output plugins.Specific to read and shipped logs from different servers to centralize location on Elasticsearch, Kafka and if require parsing processed through Logstash.
IntegrationLogstash supports sending data to an Ingest Pipeline.Ingest node can accept data from Filebeat and Logstash etc,Filebeat can send data to Logstash , Elasticsearch Ingest Node or Kafka.
PerformancePlease follow below link to check performance of each on different cases:Elasticsearch Ingest Node , Logstash and Filebeat Performance comparison.

Know More

To know more about Elasticsearch Ingest Node, Logstash or Filebeat follow below links:

Advertisements

How to Configure Filebeat, Kafka, Logstash Input , Elasticsearch Output and Kibana Dashboard

Filebeat, Kafka, Logstash, Elasticsearch and Kibana Integration is used for big organizations where applications deployed in production on hundreds/thousands of servers and scattered around different locations and need to do analysis on data from these servers on real time.

This integration helps mostly for log level analysis , tracking issues, anomalies with data and alerts on events of particular occurrence and where accountability measures.

By using these technology provide scalable architecture to enhance systems and decoupled of each other individually.

Why these Technology?

Filebeat :

  • Lightweight agent for shipping logs.
  • Forward and centralize files and logs.
  • Robust (Not miss a single beat)

Kafka:

  • Open source distributed, Steam Processing, Message Broker platform.
  • process stream data or transaction logs on real time.
  • fault-tolerant, high throughput, low latency platform for dealing real time data feeds.

Logstash:

  •  Open source, server-side data processing pipeline that accept data from a different  sources simultaneously.
  • Parse, Format, Transform data and send to different output sources.

Elasticsearch:

  • Elasticsearch is open source, distributed cross-platform.
  • Built on top of Lucene which provide full text search and provide NRT(Near real Time) search results.
  • Support RESTFUL search  by Elasticsearch REST

Kibana:

  • Open source
  • Provide window to view Elasticsearch data in form different charts and dashboard.
  • Provide way  searches and operation of data easily with respect to time interval.
  • Easily Imported by  any web application by embedded dashboards.

How Data flow works ?

In this integration filebeat will install in all servers where your application is deployed and filebeat will read and ship  latest logs changes from these servers to Kafka topic as configured for this application.

Logstash will subscribe log lines from kafka topic and perform parsing on these lines make relevant changes, formatting, exclude and include fields then send this processed data to Elasticsearch Indexes as centralize location from different servers.

Kibana  is linked with  Elasticsearch indexes which will help to do analysis by search, charts and dashboards .

FKLEK Integration

Design Architecture

In below configured architecture considering my application is deployed on three servers and each server having current log file name as App1.log . Our goal is read real time data from these servers and do analysis on these data.

FKLEK Arch Integration

Steps to Installation, Configuration and Start

Here first we will install Kafka and Elasticsearch run individually rest of tools will install and run sequence to test with data flow.  Initially install all in same machine  and test with sample data with below steps and at end of this post will tell about what changes need to make according to your servers.

  • Kafka Installation, Configuration and Start
  • Elasticsearch Installation,Configuration and Start
  • Filebeat Installation,Configuration and Start
  • Logstash Installation,Configuration and Start
  • Kibana Installation,Start and display.

Pre-Requisite

These Filebeat,Logstash, Elasticsearch and Kibana versions should be compatible better use latest from  https://www.elastic.co/downloads.

  • Java 8+
  • Linux Server
  • Filebeat 5.XX
  • Kafka 2.11.XX
  • Logstash 5.XX
  • Elasticsearch 5.XX
  • Kibana 5.XX

Note  : Make sure JDK 8 should be install  and JAVA_HOME environment variable point to JDK 8 home directory  wherever you want in install Elasticsearch, Logstash,Kibana and Kafka.

Window   : My computer ->right click-> Properties -> Advance System Settings->System Variable

Java_Home
Set JAVA_HOME

Linux : Go to your home directory/ sudo directory and below line as below .

export JAVA_HOME=/opt/app/facingissuesonit/jdk1.8.0_66

Sample Data

For testing we will use these sample log line which is having debug as well as stacktrace of logs and grok parsing of this example is designed according to it. For real time testing and actual data you can point to your server log files but you have to modify grok pattern in Logstash configuration accordingly.

2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}
2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}
2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}

Create  App1.log file  in same machine where filebeat need to install and copy above logs lines in App1.log file.

Kafka Installation , Configuration and Start

Download latest version of Kafka from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://kafka.apache.org/downloads

tar -zxvf kafka_2.11-0.10.0.0

For more configuration and start options follow Setup Kafka Cluster for Single Server/Broker

After download and untar/unzip file it will have below files and directory structure.

ls- l
drwxr-xr-x  3 facingissuesonit Saurabh   4096 Apr  3 05:18 bin
drwxr-xr-x  2 facingissuesonit Saurabh   4096 May  8 11:05 config
drwxr-xr-x 74 facingissuesonit Saurabh   4096 May 27 20:00 kafka-logs
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr  3 05:17 libs
-rw-r--r--  1 facingissuesonit Saurabh  28824 Apr  3 05:17 LICENSE
drwxr-xr-x  2 facingissuesonit Saurabh 487424 May 27 20:00 logs
-rw-r--r--  1 facingissuesonit Saurabh    336 Apr  3 05:18 NOTICE
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr  3 05:17 site-docs

For more details about all these files,configuration option and other integration options follow Kafka Tutorial.

Make below changes in files config/zookeeper.properties and config/server.properties

config/zookeeper.properties

clientPort=2181
config/server.properties:

broker.id=0
listeners=PLAINTEXT://:9092
log.dir=/kafka-logs
zookeeper.connect=localhost:2181

Now Kafka is configured and ready to run. Use below command to start zookeeper and Kafka server as  background process.

screen -d -m bin/zookeeper-server-start.sh config/zookeeper.properties
screen -d -m bin/kafka-server-start.sh config/server.properties

To test  Kafka  install successfully you can check by running Kafka process on Linux “ps -ef|grep kafka” or steps for consumer and producer to/from topic in Setup Kafka Cluster for Single Server/Broker.

Elasticsearch Installation,Configuration and Start

Download latest version of Elasticsearch from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/elasticsearch

tar -zxvf elasticsearch-5.4.0.tar.gz

It will show below files and directory structure for Elasticsearch.

drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 25 19:20 bin
drwxr-xr-x  3 facingissuesonit Saurabh   4096 May 13 17:27 config
drwxr-xr-x  3 facingissuesonit Saurabh   4096 Apr 24 15:56 data
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 17 10:55 lib
-rw-r--r--  1 facingissuesonit Saurabh  11358 Apr 17 10:50 LICENSE.txt
drwxr-xr-x  2 facingissuesonit Saurabh   4096 May 28 05:00 logs
drwxr-xr-x 12 facingissuesonit Saurabh   4096 Apr 17 10:55 modules
-rw-r--r--  1 facingissuesonit Saurabh 194187 Apr 17 10:55 NOTICE.txt
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 17 10:55 plugins
-rw-r--r--  1 facingissuesonit Saurabh   9540 Apr 17 10:50 README.textile

Before going to start Elasticsearch need to make some basic changes in config/elasticsearch.yml file for cluster  and node name. You can configure it based on you application or organization name.

cluster.name: FACING-ISSUE-IN-IT
node.name: TEST-NODE-1
#network.host: 0.0.0.0
http.port: 9200

Now we are ready with elasticsearch configuration and time start elasticsearch. We can use below command to run elasticsearch in background.

screen -d -m  /bin/elasticsearch

For  checking elasticsearch starts successfully you can use below url on browser  to know cluster status . You will get result like below.

http://localhost:9200/_cluster/health?pretty

or as below if network.host configured

http://elasticseverIp:9200/_cluster/health?pretty

Result :

{
  "cluster_name" : "FACING-ISSUE-IN-IT",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Filebeat Installation, Configuration and Start

Download latest version of filebeat from  below link and use  command to untar  and installation in Linux server. or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/beats/filebeat

tar -zxvf filebeat-<version>.tar.gz

For more configuration and start options follow Filebeat Download,Installation and Start/Run

After download and untar/unzip file it will have below files and directory structure.

ls- l
-rwxr-xr-x 1 facingissuesonit Saurabh 14908742 Jan 11 14:11 filebeat
-rw-r--r-- 1 facingissuesonit Saurabh    31964 Jan 11 14:11 filebeat.full.yml
-rw-r--r-- 1 facingissuesonit Saurabh     3040 Jan 11 14:11 filebeat.template-es2x.json
-rw-r--r-- 1 facingissuesonit Saurabh     2397 Jan 11 14:11 filebeat.template.json
-rw-r--r-- 1 facingissuesonit Saurabh     4196 Jan 11 14:11 filebeat.yml
-rw-r--r-- 1 facingissuesonit Saurabh      811 Jan 11 14:10 README.md
drwxr-xr-x 2 facingissuesonit Saurabh     4096 Jan 11 14:11 scripts

For more details about all these files,configuration option and other integration options follow Filebeat Tutorial.

Now filebeat is installaed and need to make below changes in filebeat.full.yml file

  • Inside prospectors section change paths to your log file location as
paths:
-/opt/app/facingissuesonit/App1.log
  • Comment out Elasticsearch Output default properties as below
#output.elasticsearch:
#hosts: ["localhost:9200"]
  • Configure multiline option as below so that all stacktrace line which are not starting with date  can we consider as single line.
multiline.pattern: ^\d
multiline.negate: true
multiline.match: after

For learn more on filebeat multiline configuration follow Filebeat Multiline Configuration Changes for Object, StackTrace and XML

  • Inside Kafka Output section update these properties hosts and topic. if Kafka on same machine then use localhost else update with IP of kafka machine.
output.kafka:
 hosts: ["localhost:9092"]
 topic: APP-1-TOPIC

For more on Logging configuration follow link Filebeat, Logging Configuration.

Now filebeat is configured and ready to start with  below command, it will read from configured prospector for file App1.log continiously and publish log line events to Kafka . It will also create topic as APP-1-TOPIC in Kafka if not exist.

./filebeat -e -c filebeat.full.yml -d "publish"

On console it will display output as below for sample lines.

2017/05/28 00:24:27.991828 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
  "offset": 194,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991907 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
  "offset": 375,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991984 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
  "offset": 718,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991984 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.992Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}",
  "offset": 902,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}

Now you can see from above filebeat debug statements publish event 3 is having multiline statements with stacktrace exception and each debug will have these fields like.

@timestamp:  Timestamp of data shipped.

beat.hostname : filebeat machine name from where data is shipping.

beat.version: which version of filebeat installed on server that help for compatibility check on target end.

message : Log line from logs file or multline log lines

offset: it’s represent inode value in source file

source :  it’s file name from where logs were read

Now time to check data is publish to Kafka topic or not. For this go to below directory  and you will see two files as xyz.index and xyz.log for maintaining data offset and messages.

{Kafka_home}/kafk_logs/APP-1-TOPIC
          00000000000000000000.log
          00000000000000000000.index

Now your server log lines are in Kafka topic for reading and parsing  by Logstash and send it to elasticsearch for doing analysis/search on this data.

Logstash Installation, Configuration and Start

Download latest version of Logstash from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/logstash

tar -zxvf logstash-5.4.0.tar.gz

It will show below file and directory structure.

drwxr-xr-x 2 facingissuesonit Saurabh   4096 Apr 20 11:27 bin
-rw-r--r-- 1 facingissuesonit Saurabh 111569 Mar 22 23:49 CHANGELOG.md
drwxr-xr-x 2 facingissuesonit Saurabh   4096 Apr 20 11:27 config
-rw-r--r-- 1 facingissuesonit Saurabh   2249 Mar 22 23:49 CONTRIBUTORS
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 12:07 data
-rw-r--r-- 1 facingissuesonit Saurabh   3945 Mar 22 23:55 Gemfile
-rw-r--r-- 1 facingissuesonit Saurabh  21544 Mar 22 23:49 Gemfile.jruby-1.9.lock
drwxr-xr-x 5 facingissuesonit Saurabh   4096 Apr 20 11:27 lib
-rw-r--r-- 1 facingissuesonit Saurabh    589 Mar 22 23:49 LICENSE
drwxr-xr-x 2 facingissuesonit Saurabh   4096 May 21 00:00 logs
drwxr-xr-x 4 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-event-java
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-plugin-api
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-queue-jruby
-rw-r--r-- 1 facingissuesonit Saurabh  28114 Mar 22 23:56 NOTICE.TXT
drwxr-xr-x 4 facingissuesonit Saurabh   4096 Apr 20 11:27 vendor

Before going to start Logstash need to create configuration file for taking input data from Kafka and parse these data in respected fields and send it elasticsearch. Create file logstash-app1.conf in logstash bin directory with below content.

/bin/logstash-app1.conf

input {
     kafka {
            bootstrap_servers => 'localhost:9092'
            topics => ["APP-1-TOPIC"]
            codec => json {}
          }
}
filter
{
//parse log line
      grok
	{
	match => {"message" => "\A%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:loglevel}\s+(?<logger>(?:[a-zA-Z0-9-]+\.)*[A-Za-z0-9$]+)\s+(-\s+)?(?=(?<msgnr>[A-Z]+[0-9]{4,5}))*%{DATA:message}({({[^}]+},?\s*)*})?\s*$(?<stacktrace>(?m:.*))?" }
	}  

    #Remove unused fields
    #mutate { remove_field =>["beat","@version" ]}
}
output {
    #Output result sent to elasticsearch and dynamically create array
    elasticsearch {
        index  => "app1-logs-%{+YYYY.MM.dd}"
        hosts => ["localhost:9200"]
        sniffing => false
  	}

     #Sysout logs
     stdout
       {
         codec => rubydebug
       }
}

To test your configuration file you can use below command.


./logstash -t -f logstash-app1.conf

If  we get result OK from above command run below to start reading and parsing data from Kafka topic.


./logstash -f logstash-app1.conf

For design your own grok pattern for you logs line formatting you can follow below link that will help to generate incrementally and also provide some sample logs grok.

http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/

Logstash console will show parse data as below  and you can remove unsed fields for storing in elasticsearch by uncomment mutate section from configuration file.

{
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 194,
      "loglevel" => "WARN",
        "logger" => "CreateSomethingActivationKey",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",
        "source" => "/opt/app/facingissuesonit/App1.log",
       "message" => [
        [0] "2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
        [1] "WhateverException for User 49-123-345678 "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 09:57:56,662"
}
{
         "msgnr" => "ERR1700",
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 375,
      "loglevel" => "INFO",
        "logger" => "LMLogger",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",
        "source" => "/opt/app/facingissuesonit/App1.log",
       "message" => [
        [0] "2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
        [1] "ERR1700 - u:null failures: 0  - Technical error "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 09:57:56,663"
}
{
        "offset" => 718,
        "logger" => "SomeCallLogger",
    "input_type" => "log",

       "message" => [
        [0] "2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
        [1] "ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  "
    ],
          "type" => "log",
         "msgnr" => "ESS10005",
    "@timestamp" => 2017-05-28T23:47:42.160Z,
    "stacktrace" => "\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
      "loglevel" => "ERROR",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
      "@version" => "1",
     "timestamp" => "2013-02-28 09:57:56,668"
}
{
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 903,
      "loglevel" => "INFO",
        "logger" => "EntryFilter",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",

       "message" => [
        [0] "2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}\n",
        [1] "Fresh on request /portalservices/foobarwhatever "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 10:04:35,723"
}

To test on elasticsearch end your data sent  successfully  you can use this url
http://localhost:9200/_cat/indices  on your browser and will display created index with current date.

yellow open app1-logs-2017.05.28                             Qjs6XWiFQw2zsiVs9Ks6sw 5 1         4     0  47.3kb  47.3kb

Kibana Installation, Configuration and Start

Download latest version of Kibana from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/kibana

tar -zxvf kibana-5.4.0.tar.gz

It will show below files and directory structure for kibana.

ls -l
drwxr-xr-x   2 facingissuesonit Saurabh   4096 May 22 14:23 bin
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 25 18:58 config
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 25 11:54 data
-rw-r--r--   1 facingissuesonit Saurabh    562 Apr 17 12:04 LICENSE.txt
drwxr-xr-x   6 facingissuesonit Saurabh   4096 Apr 17 12:04 node
drwxr-xr-x 485 facingissuesonit Saurabh  20480 Apr 17 12:04 node_modules
-rw-r--r--   1 facingissuesonit Saurabh 660429 Apr 17 12:04 NOTICE.txt
drwxr-xr-x   3 facingissuesonit Saurabh   4096 Apr 17 12:04 optimize
-rw-r--r--   1 facingissuesonit Saurabh    702 Apr 17 12:04 package.json
drwxr-xr-x   2 facingissuesonit Saurabh   4096 May 22 12:29 plugins
-rw-r--r--   1 facingissuesonit Saurabh   4909 Apr 17 12:04 README.txt
drwxr-xr-x  10 facingissuesonit Saurabh   4096 Apr 17 12:04 src
drwxr-xr-x   3 facingissuesonit Saurabh   4096 Apr 17 12:04 ui_framework
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 17 12:04 webpackShims

Before going to start Kibana need to make some basic changes in config/kibana.yml file make below changes after uncomment these properties file.

server.port: 5601
server.host: localhost
elasticsearch.url: "http://localhost:9200"

Now we are ready with Kibana configuration and time start Kibana. We can use below command to run Kibana in background.

screen -d -m  /bin/kibana

Kibana take time to start and we can test it by using below url in browser

http://localhost:5601/

For checking this data  in Kibana open above url in browser go to management tab on left side menu -> Index Pattern -> Click on Add New

Enter Index name or pattern and time field name as in below screen  and click on create button.

Kibana index setting
Index Pattern Settings

Now go to Discover Tab and select index as app1-log* will display data as below.

kibana discover data

Now make below changes according to  your application specification .

Filebeat :

  • update prospector path to your log directory current file
  •  Move Kafka on different machine because Kafka will single location where receive shipped data from different servers. Update localhost with same IP of kafka server in Kafka output section of filebeat.full.yml file  for hosts properties.
  • Copy same filebeat setup on all servers from where you application deployed and need to read logs.
  • Start all filebeat instances on each Server.

Elasticsearch :

  • Uncomment network.host properties from elasticsearch.yml file for accessing by  IP address.

Logstash:

  • Update localhost in logstash-app1.conf file input section with Kafka machine IP.
  • change grok pattern in filter section according to your logs format. You can take help from below url for incrementally design. http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/
  • Update localhost output section for elasticsearch with IP if moving on different machine.

Kibana:

  • update localhost in kibana.yml file for elasticsearch.url properties with IP if kibana on different machine.

Conclusion :

In this tutorial considers below points :

  • Installation of Filebeat, Kafka, Logstash, Elasticsearch and Kibana.
  • Filebeat is configured to shipped logs to Kafka Message Broker.
  • Logstash configured to read logs line from Kafka topic , Parse and shipped to Elasticsearch.
  • Kibana show these Elasticsearch information in form of chart and dashboard to users for doing analysis.

Read More

To read more on Filebeat, Kafka, Elasticsearch  configurations follow the links and Logstash Configuration,Input Plugins, Filter Plugins, Output Plugins, Logstash Customization and related issues follow Logstash Tutorial and Logstash Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Reference  :

 https://www.elastic.co/products

 

Integrate Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Filebeat, Kafka, Logstash, Elasticsearch and Kibana Integration is used for big organizations where applications deployed in production on hundreds/thousands of servers and scattered around different locations and need to do analysis on data from these servers on real time.

This integration helps mostly for log level analysis , tracking issues, anomalies with data and alerts on events of particular occurrence and where accountability measures.

By using these technology provide scalable architecture to enhance systems and decoupled of each other individually.

Why these Technology?

Filebeat :

  • Lightweight agent for shipping logs.
  • Forward and centralize files and logs.
  • Robust (Not miss a single beat)

Kafka:

  • Open source distributed, Steam Processing, Message Broker platform.
  • process stream data or transaction logs on real time.
  • fault-tolerant, high throughput, low latency platform for dealing real time data feeds.

Logstash:

  •   Open source, server-side data processing pipeline that accept data from a different  sources simultaneously.
  • Parse, Format, Transform data and send to different output sources.

Elasticsearch:

  • Elasticsearch is open source, distributed cross-platform.
  • Built on top of Lucene which provide full text search and provide NRT(Near real Time) search results.
  • Support RESTFUL search  by Elasticsearch REST

Kibana:

  • Open source
  • Provide window to view Elasticsearch data in form different charts and dashboard.
  • Provide way  searches and operation of data easily with respect to time interval.
  • Easily Imported by  any web application by embedded dashboards.

How Data flow works ?

In this integration filebeat will install in all servers where your application is deployed and filebeat will read and ship  latest logs changes from these servers to Kafka topic as configured for this application.

Logstash will subscribe log lines from kafka topic and perform parsing on these lines make relevant changes, formatting, exclude and include fields then send this processed data to Elasticsearch Indexes as centralize location from different servers.

Kibana  is linked with  Elasticsearch indexes which will help to do analysis by search, charts and dashboards .

FKLEK Integration

Design Architecture

In below configured architecture considering my application is deployed on three servers and each server having current log file name as App1.log . Our goal is read real time data from these servers and do analysis on these data.

FKLEK Arch Integration

Steps to Installation, Configuration and Start

Here first we will install Kafka and Elasticsearch run individually rest of tools will install and run sequence to test with data flow.  Initially install all in same machine  and test with sample data with below steps and at end of this post will tell about what changes need to make according to your servers.

  • Kafka Installation, Configuration and Start
  • Elasticsearch Installation,Configuration and Start
  • Filebeat Installation,Configuration and Start
  • Logstash Installation,Configuration and Start
  • Kibana Installation,Start and display.

Pre-Requisite

These Filebeat,Logstash, Elasticsearch and Kibana versions should be compatible better use latest from  https://www.elastic.co/downloads.

  • Java 8+
  • Linux Server
  • Filebeat 6.XX
  • Kafka 2.11.XX
  • Logstash 6.XX
  • Elasticsearch 6.XX
  • Kibana 6.XX

Note  : Make sure JDK 8 should be install  and JAVA_HOME environment variable point to JDK 8 home directory  wherever you want in install Elasticsearch, Logstash,Kibana and Kafka.

Window   : My computer ->right click-> Properties -> Advance System Settings->System Variable

Java_Home
Set JAVA_HOME

Linux : Go to your home directory/ sudo directory and below line as below .

export JAVA_HOME=/opt/app/facingissuesonit/jdk1.8.0_66

Sample Data

For testing we will use these sample log line which is having debug as well as stacktrace of logs and grok parsing of this example is designed according to it. For real time testing and actual data you can point to your server log files but you have to modify grok pattern in Logstash configuration accordingly.

2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}
2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}
2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}

Create  App1.log file  in same machine where filebeat need to install and copy above logs lines in App1.log file.

Kafka Installation , Configuration and Start

Download latest version of Kafka from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://kafka.apache.org/downloads

tar -zxvf kafka_2.11-0.10.0.0

For more configuration and start options follow Setup Kafka Cluster for Single Server/Broker

After download and untar/unzip file it will have below files and directory structure.

ls- l
drwxr-xr-x  3 facingissuesonit Saurabh   4096 Apr  3 05:18 bin
drwxr-xr-x  2 facingissuesonit Saurabh   4096 May  8 11:05 config
drwxr-xr-x 74 facingissuesonit Saurabh   4096 May 27 20:00 kafka-logs
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr  3 05:17 libs
-rw-r--r--  1 facingissuesonit Saurabh  28824 Apr  3 05:17 LICENSE
drwxr-xr-x  2 facingissuesonit Saurabh 487424 May 27 20:00 logs
-rw-r--r--  1 facingissuesonit Saurabh    336 Apr  3 05:18 NOTICE
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr  3 05:17 site-docs

For more details about all these files,configuration option and other integration options follow Kafka Tutorial.

Make below changes in files config/zookeeper.properties and config/server.properties

config/zookeeper.properties

clientPort=2181
config/server.properties:

broker.id=0
listeners=PLAINTEXT://:9092
log.dir=/kafka-logs
zookeeper.connect=localhost:2181

Now Kafka is configured and ready to run. Use below command to start zookeeper and Kafka server as  background process.

screen -d -m bin/zookeeper-server-start.sh config/zookeeper.properties
screen -d -m bin/kafka-server-start.sh config/server.properties

To test  Kafka  install successfully you can check by running Kafka process on Linux “ps -ef|grep kafka” or steps for consumer and producer to/from topic in Setup Kafka Cluster for Single Server/Broker.

Elasticsearch Installation,Configuration and Start

Download latest version of Elasticsearch from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/elasticsearch

tar -zxvf elasticsearch-5.4.0.tar.gz

It will show below files and directory structure for Elasticsearch.

drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 25 19:20 bin
drwxr-xr-x  3 facingissuesonit Saurabh   4096 May 13 17:27 config
drwxr-xr-x  3 facingissuesonit Saurabh   4096 Apr 24 15:56 data
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 17 10:55 lib
-rw-r--r--  1 facingissuesonit Saurabh  11358 Apr 17 10:50 LICENSE.txt
drwxr-xr-x  2 facingissuesonit Saurabh   4096 May 28 05:00 logs
drwxr-xr-x 12 facingissuesonit Saurabh   4096 Apr 17 10:55 modules
-rw-r--r--  1 facingissuesonit Saurabh 194187 Apr 17 10:55 NOTICE.txt
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 17 10:55 plugins
-rw-r--r--  1 facingissuesonit Saurabh   9540 Apr 17 10:50 README.textile

Before going to start Elasticsearch need to make some basic changes in config/elasticsearch.yml file for cluster  and node name. You can configure it based on you application or organization name.

cluster.name: FACING-ISSUE-IN-IT
node.name: TEST-NODE-1
#network.host: 0.0.0.0
http.port: 9200

Now we are ready with elasticsearch configuration and time start elasticsearch. We can use below command to run elasticsearch in background.

screen -d -m  /bin/elasticsearch

For  checking elasticsearch starts successfully you can use below url on browser  to know cluster status . You will get result like below.

http://localhost:9200/_cluster/health?pretty

or as below if network.host configured

http://elasticseverIp:9200/_cluster/health?pretty

Result :

{
  "cluster_name" : "FACING-ISSUE-IN-IT",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Filebeat Installation, Configuration and Start

Download latest version of filebeat from  below link and use  command to untar  and installation in Linux server. or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/beats/filebeat

tar -zxvf filebeat-<version>.tar.gz

For more configuration and start options follow Filebeat Download,Installation and Start/Run

After download and untar/unzip file it will have below files and directory structure.

ls- l
-rwxr-xr-x 1 facingissuesonit Saurabh 14908742 Jan 11 14:11 filebeat
-rw-r--r-- 1 facingissuesonit Saurabh    31964 Jan 11 14:11 filebeat.full.yml
-rw-r--r-- 1 facingissuesonit Saurabh     3040 Jan 11 14:11 filebeat.template-es2x.json
-rw-r--r-- 1 facingissuesonit Saurabh     2397 Jan 11 14:11 filebeat.template.json
-rw-r--r-- 1 facingissuesonit Saurabh     4196 Jan 11 14:11 filebeat.yml
-rw-r--r-- 1 facingissuesonit Saurabh      811 Jan 11 14:10 README.md
drwxr-xr-x 2 facingissuesonit Saurabh     4096 Jan 11 14:11 scripts

For more details about all these files,configuration option and other integration options follow Filebeat Tutorial.

Now filebeat is installaed and need to make below changes in filebeat.full.yml file

  • Inside prospectors section change paths to your log file location as
paths:
-/opt/app/facingissuesonit/App1.log
  • Comment out Elasticsearch Output default properties as below
#output.elasticsearch:
#hosts: ["localhost:9200"]
  • Configure multiline option as below so that all stacktrace line which are not starting with date  can we consider as single line.
multiline.pattern: ^\d
multiline.negate: true
multiline.match: after

For learn more on filebeat multiline configuration follow Filebeat Multiline Configuration Changes for Object, StackTrace and XML

  • Inside Kafka Output section update these properties hosts and topic. if Kafka on same machine then use localhost else update with IP of kafka machine.
output.kafka:
 hosts: ["localhost:9092"]
 topic: APP-1-TOPIC

For more on Logging configuration follow link Filebeat, Logging Configuration.

Now filebeat is configured and ready to start with  below command, it will read from configured prospector for file App1.log continiously and publish log line events to Kafka . It will also create topic as APP-1-TOPIC in Kafka if not exist.

./filebeat -e -c filebeat.full.yml -d "publish"

On console it will display output as below for sample lines.


2017/05/28 00:24:27.991828 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
  "offset": 194,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991907 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
  "offset": 375,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991984 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
  "offset": 718,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991984 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.992Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}",
  "offset": 902,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}

Now you can see from above filebeat debug statements publish event 3 is having multiline statements with stacktrace exception and each debug will have these fields like.

@timestamp:  Timestamp of data shipped.

beat.hostname : filebeat machine name from where data is shipping.

beat.version: which version of filebeat installed on server that help for compatibility check on target end.

message : Log line from logs file or multline log lines

offset: it’s represent inode value in source file

source :  it’s file name from where logs were read

Now time to check data is publish to Kafka topic or not. For this go to below directory  and you will see two files as xyz.index and xyz.log for maintaining data offset and messages.

{Kafka_home}/kafk_logs/APP-1-TOPIC
          00000000000000000000.log
          00000000000000000000.index

Now your server log lines are in Kafka topic for reading and parsing  by Logstash and send it to elasticsearch for doing analysis/search on this data.

Logstash Installation, Configuration and Start

Download latest version of Logstash from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/logstash

tar -zxvf logstash-5.4.0.tar.gz

It will show below file and directory structure.

drwxr-xr-x 2 facingissuesonit Saurabh   4096 Apr 20 11:27 bin
-rw-r--r-- 1 facingissuesonit Saurabh 111569 Mar 22 23:49 CHANGELOG.md
drwxr-xr-x 2 facingissuesonit Saurabh   4096 Apr 20 11:27 config
-rw-r--r-- 1 facingissuesonit Saurabh   2249 Mar 22 23:49 CONTRIBUTORS
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 12:07 data
-rw-r--r-- 1 facingissuesonit Saurabh   3945 Mar 22 23:55 Gemfile
-rw-r--r-- 1 facingissuesonit Saurabh  21544 Mar 22 23:49 Gemfile.jruby-1.9.lock
drwxr-xr-x 5 facingissuesonit Saurabh   4096 Apr 20 11:27 lib
-rw-r--r-- 1 facingissuesonit Saurabh    589 Mar 22 23:49 LICENSE
drwxr-xr-x 2 facingissuesonit Saurabh   4096 May 21 00:00 logs
drwxr-xr-x 4 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-event-java
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-plugin-api
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-queue-jruby
-rw-r--r-- 1 facingissuesonit Saurabh  28114 Mar 22 23:56 NOTICE.TXT
drwxr-xr-x 4 facingissuesonit Saurabh   4096 Apr 20 11:27 vendor

Before going to start Logstash need to create configuration file for taking input data from Kafka and parse these data in respected fields and send it elasticsearch. Create file logstash-app1.conf in logstash bin directory with below content.

/bin/logstash-app1.conf


input {
     kafka {
            bootstrap_servers => 'localhost:9092'
            topics => ["APP-1-TOPIC"]
            codec => json {}
          }
}
filter
{
//parse log line
      grok
    {
    match => {"message" => "\A%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:loglevel}\s+(?(?:[a-zA-Z0-9-]+\.)*[A-Za-z0-9$]+)\s+(-\s+)?(?=(?[A-Z]+[0-9]{4,5}))*%{DATA:message}({({[^}]+},?\s*)*})?\s*$(?(?m:.*))?" }
    }  

    #Remove unused fields
    #mutate { remove_field =>["beat","@version" ]}
}
output {
    #Output result sent to elasticsearch and dynamically create array
    elasticsearch {
        index  => "app1-logs-%{+YYYY.MM.dd}"
        hosts => ["localhost:9200"]
        sniffing => false
    }

     #Sysout logs
     stdout
       {
         codec => rubydebug
       }
}

To test your configuration file you can use below command.


./logstash -t -f logstash-app1.conf

If  we get result OK from above command run below to start reading and parsing data from Kafka topic.


./logstash -f logstash-app1.conf

For design your own grok pattern for you logs line formatting you can follow below link that will help to generate incrementally and also provide some sample logs grok.

http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/

Logstash console will show parse data as below  and you can remove unsed fields for storing in elasticsearch by uncomment mutate section from configuration file.


{
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 194,
      "loglevel" => "WARN",
        "logger" => "CreateSomethingActivationKey",
          "beat" => {
        "hostname" => "zlp02870",
            "name" => "zlp02870",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",
        "source" => "/opt/app/facingissuesonit/App1.log",
       "message" => [
        [0] "2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
        [1] "WhateverException for User 49-123-345678 "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 09:57:56,662"
}
{
         "msgnr" => "ERR1700",
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 375,
      "loglevel" => "INFO",
        "logger" => "LMLogger",
          "beat" => {
        "hostname" => "zlp02870",
            "name" => "zlp02870",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",
        "source" => "/opt/app/facingissuesonit/App1.log",
       "message" => [
        [0] "2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
        [1] "ERR1700 - u:null failures: 0  - Technical error "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 09:57:56,663"
}
{
        "offset" => 718,
        "logger" => "SomeCallLogger",
    "input_type" => "log",

       "message" => [
        [0] "2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
        [1] "ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  "
    ],
          "type" => "log",
         "msgnr" => "ESS10005",
    "@timestamp" => 2017-05-28T23:47:42.160Z,
    "stacktrace" => "\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
      "loglevel" => "ERROR",
          "beat" => {
        "hostname" => "zlp02870",
            "name" => "zlp02870",
         "version" => "5.1.2"
    },
      "@version" => "1",
     "timestamp" => "2013-02-28 09:57:56,668"
}
{
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 903,
      "loglevel" => "INFO",
        "logger" => "EntryFilter",
          "beat" => {
        "hostname" => "zlp02870",
            "name" => "zlp02870",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",

       "message" => [
        [0] "2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}\n",
        [1] "Fresh on request /portalservices/foobarwhatever "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 10:04:35,723"
}

To test on elasticsearch end your data sent  successfully  you can use this url
http://localhost:9200/_cat/indices  on your browser and will display created index with current date.

yellow open app1-logs-2017.05.28                             Qjs6XWiFQw2zsiVs9Ks6sw 5 1         4     0  47.3kb  47.3kb

Kibana Installation, Configuration and Start

Download latest version of Kibana from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/kibana

tar -zxvf kibana-5.4.0.tar.gz

It will show below files and directory structure for kibana.

ls -l
drwxr-xr-x   2 facingissuesonit Saurabh   4096 May 22 14:23 bin
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 25 18:58 config
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 25 11:54 data
-rw-r--r--   1 facingissuesonit Saurabh    562 Apr 17 12:04 LICENSE.txt
drwxr-xr-x   6 facingissuesonit Saurabh   4096 Apr 17 12:04 node
drwxr-xr-x 485 facingissuesonit Saurabh  20480 Apr 17 12:04 node_modules
-rw-r--r--   1 facingissuesonit Saurabh 660429 Apr 17 12:04 NOTICE.txt
drwxr-xr-x   3 facingissuesonit Saurabh   4096 Apr 17 12:04 optimize
-rw-r--r--   1 facingissuesonit Saurabh    702 Apr 17 12:04 package.json
drwxr-xr-x   2 facingissuesonit Saurabh   4096 May 22 12:29 plugins
-rw-r--r--   1 facingissuesonit Saurabh   4909 Apr 17 12:04 README.txt
drwxr-xr-x  10 facingissuesonit Saurabh   4096 Apr 17 12:04 src
drwxr-xr-x   3 facingissuesonit Saurabh   4096 Apr 17 12:04 ui_framework
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 17 12:04 webpackShims

Before going to start Kibana need to make some basic changes in config/kibana.yml file make below changes after uncomment these properties file.

server.port: 5601
server.host: localhost
elasticsearch.url: "http://localhost:9200"

Now we are ready with Kibana configuration and time start Kibana. We can use below command to run Kibana in background.

screen -d -m  /bin/kibana

Kibana take time to start and we can test it by using below url in browser

http://localhost:5601/

For checking this data  in Kibana open above url in browser go to management tab on left side menu -> Index Pattern -> Click on Add New

Enter Index name or pattern and time field name as in below screen  and click on create button.

Kibana index setting
Index Pattern Settings

Now go to Discover Tab and select index as app1-log* will display data as below.

kibana discover data

Now make below changes according to  your application specification .

Filebeat :

  • update prospector path to your log directory current file
  •  Move Kafka on different machine because Kafka will single location where receive shipped data from different servers. Update localhost with same IP of kafka server in Kafka output section of filebeat.full.yml file  for hosts properties.
  • Copy same filebeat setup on all servers from where you application deployed and need to read logs.
  • Start all filebeat instances on each Server.

Elasticsearch :

  • Uncomment network.host properties from elasticsearch.yml file for accessing by  IP address.

Logstash:

  • Update localhost in logstash-app1.conf file input section with Kafka machine IP.
  • change grok pattern in filter section according to your logs format. You can take help from below url for incrementally design. http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/
  • Update localhost output section for elasticsearch with IP if moving on different machine.

Kibana:

  • update localhost in kibana.yml file for elasticsearch.url properties with IP if kibana on different machine.

Know More

To know more about Elasticsearch, Logstash, Kibana , Kafka configuration and related issues follow below links:

Elasticsearch Interview Questions and Answers

 

 

 

Integrate Filebeat with Kafka

Kafka can consume messages published by Filebeat based on configuration  filebeat.yml file for Kafka Output.

Filebeat Kafka Output Configuration

Filebeat.yml  required below fields to connect and publish message to Kafka for configured topic. Kafka will create Topics dynamically based on filebeat requirement.

output.kafka:
#The list of Kafka broker addresses from where to fetch the cluster metadata.
#The cluster metadata contain the actual Kafka brokers events are published to.
hosts: &lt;strong&gt;[&quot;localhost:9092&quot;]&lt;/strong&gt;

# The Kafka topic used for produced events. The setting can be a format string
topic: &lt;strong&gt;Topic-Name&lt;/strong&gt;

# Authentication details. Password is required if username is set.
#username: ''
#password: ''

For more information about filebeat Kafka Output  configuration option refers below Links.

Read More on Kafka

Integration

Integrate Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Sample filebeat.yml file for Prospectors ,Kafka Output and Logging Configuration

You can copy same file in filebeat.yml  and run  after making below change as per your environment directory structure and follow steps mentioned for Filebeat Download,Installation and Start/Run

  • Change on Prospectors section for your logs file directory and file name
  • Configure Multiline pattern as per your logs format as of now set as generic hopefully will work with all pattern
  • Change on Logstash Output section for Host ,Port, Topic and other settings if required
  • Change on logging directory as per you machine directory.

Sample filebeat.yml file

#=============Filebeat prospectors ===============

filebeat.prospectors:

# Here we can define multiple prospectors and shipping method and rules  as per #requirement and if need to read logs from multiple file from same patter directory #location can use regular pattern also.

#Filebeat support only two types of input_type log and stdin

##############input type logs configuration#####################

- input_type: log

# Paths of the files from where logs will read and use regular expression if need to read #from multiple files
paths:
- /opt/app/app1/logs/app1-debug*.log*
# make this fields_under_root as true if you want filebeat json out for read files in root.
fields_under_root: true

### Multiline configuration for handeling stacktrace, Object, XML etc if that is the case #and multiline is enabled with below configuration will shipped output for these case in #multiline

# The regexp Pattern that has to be matched. The example pattern matches all lines #starting with [DEBUG,ALERT,TRACE,WARNING log level that can be customize #according to your logs line format
#multiline.pattern: '^\[([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)'

# Default is false.Defines if the pattern match  should be negated or not.
#multiline.negate: true

# multiline.match define if pattern not match with above pattern where these line need #to append.Possible values  are "after" or "before".

#multiline.match: after

# if you will set this max line after these number of multiline all will ignore
#multiline.max_lines: 50

#==========Kafka output Configuration ============================
output.kafka:
# Below enable flag is for enable or disable output module will discuss more on filebeat #moodule section
#enabled: true

# Here mentioned all your Kafka broker host and port to fetch cluster metadata which #contains published events for kafka brokers.

hosts: ["kafkahost:port"]

# We can define topic for Kafka broker where events will published.
topic: QC-LOGS

# Default no key setting. But we can use formatted key settings.
#key: ''

#Default partition strategy is 'hash' using key values set. If not set key value will #randomly distribute publish events.

#partition.hash:

# Default value  is false. If reach_only enabled event will publish only reachable kafka #brokers.
#reachable_only: false

# Configure alternative event field names used to compute the hash value.
# If empty `output.kafka.key` setting will be used.
# Default value is empty list.
#hash: []

# If authentication set on Kafka broker end below fileds are required.
#username: ''
#password: ''

#Kafka Broker version to configure so that filebeat can check compatibility with that.
#version: 0.8.2

#Meta data information is required for broker event publishing so that filbeat can take  #decision based on status of brokers.

#metadata:

#Defaults value for max 3 retries selection of available brokers.
#retry.max: 3

# Default value is 250ms. Will wait for specified time before make next retries.
#retry.backoff: 250ms

# Will update meta data information  in every 10 minutes.
#refresh_frequency: 10m

# It shows no of worker will run for each configure kafka broker.
#worker: 1

#Default value is 3. If set less than 0 filebeat will retry continuously as logs as events not #publish.
#max_retries: 3

# The Default value is 2048.It shows max number of batch events will publish to Kafka in #one request.
#bulk_max_size: 2048

#The default value is 30 second. It will timeout if not hear any response from Kafka #broker with in specified time.
#timeout: 30s
# Default is value is 10 seconds. During this max duration broker will wait for #number #of required acknowledgement.
#broker_timeout: 10s

# Default value is 256 for buffered message for Kafka broker.
#channel_buffer_size: 256

# Default value is 0 seconds  as keep alive is disabled and if this value set will keep alive #active network connection for that time.
#keep_alive: 0

# Default value for compression is gzip. We can also set other compression codec like #snappy, gzip or none.
compression: gzip

#Default value is 1000000 bytes . If Json value is more than configured max message #bytes event will dropped.
max_message_bytes: 1000000

#Default Value is 1 for ACK for reliability. Possible values can be :

#0=no response , Message can be lost on some error happens

#1=wait for local commit

#-1=wait for all replicas to commit.
#required_acks: 1

# Waiting Interval between new events and previous events for read logs.
#flush_interval: 1s

# The configurable ClientID used for logging, debugging, and auditing
# purposes. The default is "beats".

#Default value is beat. We can set values for this field that will help for analysis and #auditing purpose.
#client_id: beats

# Configure SSL setting id required for Kafk broker
#ssl.enabled: true

# Optional SSL configuration options. SSL is off by default.
# List of root certificates for HTTPS server verifications

#SSL configuration is Optional and OFF by default . It required for server verification if #HTTPS root certificate .
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

#Default value is full. SSL configuration verfication mode is required if SSL is configured .#We can use value as 'none' for testing purpose but in this mode it can accept any #certificate.
#ssl.verification_mode: full

# List of supported/valid TLS versions. By default all TLS versions 1.0 up to
# 1.2 are enabled.

#By Default  it support all TLS versions after 1.0 to 1.2. We can also mentioned version in #below array
#ssl.supported_protocols: [TLSv1.0, TLSv1.1, TLSv1.2]

# Define path for certificate for SSL
#ssl.certificate: "/etc/pki/client/cert.pem"

# Define path for Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"

# If data is configured and shipped encrypted form. Need to add passphrase for decrypting the Certificate Key otherwise optional
#ssl.key_passphrase: ''

# Configure encryption cipher suites to be used for SSL connections
#ssl.cipher_suites: []

# Configure encryption curve types for ECDHE based cipher suites
#ssl.curve_types: []
#====================Logging ==============================

# Default log level is info if set above or below will record top this hierarchy #automatically. Available log levels are: critical, error, warning, info, debug

logging.level: debug
# Possible values for selectors are "beat", "publish" and  "service" if you want to enable #for all select value as "*". This selector decide on command line when  start filebeat.
logging.selectors: ["*"]

# The default value is false.If make it true will send out put to syslog.
logging.to_syslog: false
# The default is true. all non-zero metrics  reading are output on shutdown.
logging.metrics.enabled: true

# Period of matrics for log reading counts from log files and it will send complete report #when shutdown filebeat
logging.metrics.period: 30s
# Set this flag as true to enable logging in files if not set that will disable.
logging.to_files: true
logging.files:
# Path of directory where logs file will write if not set default directory will home #directory.
path: /tmp

# Name of files where logs will write
name: filebeat-app.log
# Log File will rotate if reach max size and will create new file. Default value is 10MB
rotateeverybytes: 10485760 # = 10MB

# This will keep recent maximum log files in directory for rotation and remove oldest #files.
keepfiles: 7
# Will enable logging for that level only. Available log levels are: critical, error, warning, #info, debug
level: debug

Sample filebeat.yml File

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues.To Know more about YAML follow link YAML Tutorials.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Sample filebeat.yml file for Prospectors ,Logstash Output and Logging Configuration

You can copy same file in filebeat.yml  and run  after making below change as per your environment directory structure and follow steps mentioned for Filebeat Download,Installation and Start/Run

  • Change on Prospectors section for your logs file directory and file name
  • Configure Multiline pattern as per your logs format as of now set as generic hopefully will work with all pattern
  • Change on Logstash Output section for Host ,Port and other settings if required
  • Change on logging directory as per you machine directory.

Sample filebeat.yml file

#=============Filebeat prospectors ===============

filebeat.prospectors:

# Here we can define multiple prospectors and shipping method and rules  as per #requirement and if need to read logs from multiple file from same patter directory #location can use regular pattern also.

#Filebeat support only two types of input_type log and stdin

##############input type logs configuration#####################

- input_type: log

# Paths of the files from where logs will read and use regular expression if need to read #from multiple files
paths:
- /opt/app/app1/logs/app1-debug*.log*
# make this fields_under_root as true if you want filebeat json out for read files in root.
fields_under_root: true

### Multiline configuration for handeling stacktrace, Object, XML etc if that is the case #and multiline is enabled with below configuration will shipped output for these case in #multiline

# The regexp Pattern that has to be matched. The example pattern matches all lines #starting with [DEBUG,ALERT,TRACE,WARNING log level that can be customize #according to your logs line format
#multiline.pattern: '^\[([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)'

# Default is false.Defines if the pattern match  should be negated or not.
#multiline.negate: true

# multiline.match define if pattern not match with above pattern where these line need #to append.Possible values  are "after" or "before".

#multiline.match: after

# if you will set this max line after these number of multiline all will ignore
#multiline.max_lines: 50
#=========Logstash Output Configuration=======================
output.logstash:
# Below enable flag is for enable or disable output module will discuss more on filebeat #module section.
#enabled: true

#  Here mentioned all your logstash server host and port to publish events. Default port #for logstash is 5044 if Logstash listener start with different port then use same here.
#hosts: ["logstashserver:5044"]

# It shows no of worker will run for each configure Logstash host.
#worker: 1

#Filebeat provide gzip compression level which varies from 1 to 9. As compression level #increase processing speed will reduce but network speed increase.By default #compression level disable and value is 0.
#compression_level: 3

# Default value is false.  If set to true will check status of hosts if unresponsive will send #to another available host. if false filebeat will select random host and send events to it.
#loadbalance: true

# Default value is 0 means pipeline disabled. Configure value decide of pipeline  batches #to send to logstash asynchronously and wait for response. If pipeline value is written #means output will blocking.
#pipelining: 0

#Filebeat use SOCKS5 protocol to communicate with Logstash servers. If any proxy #configure for this protocol on server end then we can overcome by setting below #details.

# SOCKS5 proxy URL
#proxy_url: socks5://userid:pwd@socks5-server:2233

# Default value is false means resolve host name resolution on  proxy server. If value is #set as true Logstash host name resolution locally for proxy.
#proxy_use_local_resolver: false

# Configure SSL setting id required for Logstash broker if SSL is configured
#ssl.enabled: true

# Optional SSL configuration options. SSL is off by default.
# List of root certificates for HTTPS server verifications

#SSK configuration is Optional and OFF by default . It required for server verification if #HTTPS root certificate .
#ssl.certificate_authorities: ["/app/pki/root/ca.pem"]

#Default value is full. SSL configuration verfication mode is required if SSL is configured #We can use value as 'none' for testing purpose but in this mode it can accept any #certificate.
#ssl.verification_mode: full

# List of supported/valid TLS versions. By default all TLS versions 1.0 up to
# 1.2 are enabled.

#By Default  it support all TLS versions after 1.0 to 1.2. We can also mentioned version in #below array
#ssl.supported_protocols: [TLSv1.0, TLSv1.1, TLSv1.2]

# Define path for certificate for SSL
#ssl.certificate: "/etc/pki/client/cert.pem"

# Define path for Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"

# If data is configured and shipped encrypted form. Need to add passphrase for #decrypting the Certificate Key otherwise optional
#ssl.key_passphrase: ''

# Configure encryption cipher suites to be used for SSL connections
#ssl.cipher_suites: []

# Configure encryption curve types for ECDHE based cipher suites
#ssl.curve_types: []
#====================Logging ==============================

# Default log level is info if set above or below will record top this hierarchy #automatically. Available log levels are: critical, error, warning, info, debug

logging.level: debug
# Possible values for selectors are "beat", "publish" and  "service" if you want to enable #for all select value as "*". This selector decide on command line when  start filebeat.
logging.selectors: ["*"]

# The default value is false.If make it true will send out put to syslog.
logging.to_syslog: false
# The default is true. all non-zero metrics  reading are output on shutdown.
logging.metrics.enabled: true

# Period of matrics for log reading counts from log files and it will send complete report #when shutdown filebeat
logging.metrics.period: 30s
# Set this flag as true to enable logging in files if not set that will disable.
logging.to_files: true
logging.files:
# Path of directory where logs file will write if not set default directory will home #directory.
path: /tmp

# Name of files where logs will write
name: filebeat-app.log
# Log File will rotate if reach max size and will create new file. Default value is 10MB
rotateeverybytes: 10485760 # = 10MB

# This will keep recent maximum log files in directory for rotation and remove oldest #files.
keepfiles: 7
# Will enable logging for that level only. Available log levels are: critical, error, warning, #info, debug
level: debug

Sample filebeat.yml File

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues.To Know more about YAML follow link YAML Tutorials.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Sample filebeat.yml file for Prospectors, Elasticsearch Output and Logging Configuration

Filebeat.yml file with Prospectors, Multiline,Elasticsearch Output and Logging Configuration

You can copy same file in filebeat.yml and run after making below change as per your environment directory structure and follow steps mentioned for Filebeat Download,Installation and Start/Run

  • Change on Prospectors section for your logs file directory and file name
  • Configure Multiline pattern as per your logs format as of now set as generic hopefully will work with all pattern
  • Change on Elasticsearch output section for Host ,Port and other setting if required
  • Change on logging directory as per you machine directory.

Sample filebeat.yml file

#=============Filebeat prospectors ===============

filebeat.prospectors:

# Here we can define multiple prospectors and shipping method and rules  as per #requirement and if need to read logs from multiple file from same patter directory #location can use regular pattern also.

#Filebeat support only two types of input_type log and stdin

##############input type logs configuration#####################

- input_type: log

# Paths of the files from where logs will read and use regular expression if need to read #from multiple files
paths:
- /opt/app/app1/logs/app1-debug*.log*
# make this fields_under_root as true if you want filebeat json out for read files in root.
fields_under_root: true

### Multiline configuration for handeling stacktrace, Object, XML etc if that is the case #and multiline is enabled with below configuration will shipped output for these case in #multiline

# The regexp Pattern that has to be matched. The example pattern matches all lines #starting with [DEBUG,ALERT,TRACE,WARNING log level that can be customize #according to your logs line format
#multiline.pattern: '^\[([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)'

# Default is false.Defines if the pattern match  should be negated or not.
#multiline.negate: true

# multiline.match define if pattern not match with above pattern where these line need #to append.Possible values  are "after" or "before".

#multiline.match: after

# if you will set this max line after these number of multiline all will ignore
#multiline.max_lines: 50
<h4>#==========Elasticsearch Output Configuration=======================</h4>
output.elasticsearch:
# We can configure this flag the output as module.
#enabled: true

#Define elasticsearch elasticsearch HTTP client server host and port. default port for #elasticsearch is 9200
hosts: ["elasticsearver:9200"]

# Filebeat provide gzip compression level which varies from 1 to 9. As compression level #increase processing speed will reduce but network speed increase.By default #compression level disable and value is 0.
compression_level: 0

# Optional protocol by default HTTP. If requires set https and basic auth credentials for #credentials if any.
#protocol: "https"
#username: "userid"
#password: "pwd"

# we can configure number of worker for each host publishing events to elasticseach #which will do load balancing.
#worker: 1

# Optional index name. The default is "filebeat" plus date and generates filebeat-{YYYY.MM.DD} keys.
index: "app1-%{+yyyy.MM.dd}"

# Optional ingest node pipeline. By default no pipeline will be used.
#pipeline: ""

# Optional HTTP Path
#path: "/elasticsearch"

# Proxy server url
#proxy_url: http://proxy:3128

# Default value is 3. When max retry reach specified limit and evens not published all #events will drop. Filebeat also provide option to retry until all events are published by #setting value as less than 0.
#max_retries: 3

#Default values is 50. If filebeat is generating events more than configure batch max size it will split events in configure size batches and send to elasticsearch. As much as batch size will increase performance will improve but require more buffring. It can cause other issue like connection, errors, timeout for requests.
#bulk_max_size: 50

#Default value is 90 seconds. If no response http request will timeout.
#timeout: 90

# waiting time for new events for bulk requests. If bulk request max size sent before this #specified time, new bulk index request created.
#flush_interval: 1s

# We can update elasticsearch index template from filebeat which will define settings #and mappings to determine field analysis.

# Set to false to disable template loading.
#template.enabled: true

# Template name. By default the template name is filebeat.
#template.name: "app1"

# Path to template file
#template.path: "${path.config}/app1.template.json"

#Set template.overwrite as true and if need to update template file version as 2.x then set #path of Latest template file with below configuration.
#template.overwrite: false
#template.versions.2x.enabled: true
#template.versions.2x.path: "${path.config}/filebeat.template-es2x.json"

# Configure SSL setting id required for Kafk broker
#ssl.enabled: true

# Optional SSL configuration options. SSL is off by default.
# List of root certificates for HTTPS server verifications

#SSL configuration is Optional and OFF by default . It required for server verification if #HTTPS root certificate .
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

#Default value is full. SSL configuration verfication mode is required if SSL is configured .#We can use value as 'none' for testing purpose but in this mode it can accept any #certificate.
#ssl.verification_mode: full

# List of supported/valid TLS versions. By default all TLS versions 1.0 up to
# 1.2 are enabled.

#By Default  it support all TLS versions after 1.0 to 1.2. We can also mentioned version in #below array
#ssl.supported_protocols: [TLSv1.0, TLSv1.1, TLSv1.2]

# Define path for certificate for SSL
#ssl.certificate: "/etc/pki/client/cert.pem"

# Define path for Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"

# If data is configured and shipped encrypted form. Need to add passphrase for decrypting the Certificate Key otherwise optional
#ssl.key_passphrase: ''

# Configure encryption cipher suites to be used for SSL connections
#ssl.cipher_suites: []

# Configure encryption curve types for ECDHE based cipher suites
#ssl.curve_types: []
#====================Logging ==============================

# Default log level is info if set above or below will record top this hierarchy #automatically. Available log levels are: critical, error, warning, info, debug

logging.level: debug
# Possible values for selectors are "beat", "publish" and  "service" if you want to enable #for all select value as "*". This selector decide on command line when  start filebeat.
logging.selectors: ["*"]

# The default value is false.If make it true will send out put to syslog.
logging.to_syslog: false
# The default is true. all non-zero metrics  reading are output on shutdown.
logging.metrics.enabled: true

# Period of matrics for log reading counts from log files and it will send complete report #when shutdown filebeat
logging.metrics.period: 30s
# Set this flag as true to enable logging in files if not set that will disable.
logging.to_files: true
logging.files:
# Path of directory where logs file will write if not set default directory will home #directory.
path: /tmp

# Name of files where logs will write
name: filebeat-app.log
# Log File will rotate if reach max size and will create new file. Default value is 10MB
rotateeverybytes: 10485760 # = 10MB

# This will keep recent maximum log files in directory for rotation and remove oldest #files.
keepfiles: 7
# Will enable logging for that level only. Available log levels are: critical, error, warning, #info, debug
level: debug

Read More on Filebeat

To Know more about YAML follow link YAML Tutorials.

Sample filebeat.yml File

Integration

Integrate Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Sample filebeat.yml file for Prospectors,Multiline and Logging Configuration

You can copy same file in filebeat.yml  and run  after making below change as per your environment directory structure and follow steps mentioned for Filebeat Download,Installation and Start/Run

  • Change on Prospectors section for your logs file directory and file name
  • Configure Multiline pattern as per your logs format as of now set as generic hopefully will work with all pattern
  • Change on Kafka output section for Host ,Port and topic name as required
  • Change on logging directory as per you machine directory.

Sample filebeat.yml file

#=============Filebeat prospectors ===============

filebeat.prospectors:

# Here we can define multiple prospectors and shipping method and rules  as per
#requirement and if need to read logs from multiple file from same patter directory #location can use regular pattern also.

#Filebeat support only two types of input_type log and stdin

# #############input type logs configuration#####################

- input_type: log

# Paths of the files from where logs will read and use regular expression if need to read #from multiple files
paths:
- /opt/app/app1/logs/app1-debug*.log*
# make this fields_under_root as true if you want filebeat json out for read files in root.
fields_under_root: true

### Multiline configuration for handeling stacktrace, Object, XML etc if that is the case #and multiline is enabled with below configuration will shipped output for these case in #multiline

# The regexp Pattern that has to be matched. The example pattern matches all lines #starting with [DEBUG,ALERT,TRACE,WARNING log level that can be customize #according to your logs line format
multiline.pattern: '^\[([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)'

# Default is false.Defines if the pattern match  should be negated or not.
multiline.negate: true

# multiline.match define if pattern not match with above pattern where these line need #to append.Possible values  are "after" or "before".

multiline.match: after

# if you will set this max line after these number of multiline all will ignore
#multiline.max_lines: 50

#==========Kafka output Configuration ============================
output.kafka:
# Below enable flag is for enable or disable output module will discuss more on filebeat #module section
#enabled: true

# Here mentioned all your Kafka broker host and port to fetch cluster metadata which #contains published events for kafka brokers.

hosts: ["kafkahost:port"]

# We can define topic for Kafka broker where events will published.
topic: QC-LOGS

# Default no key setting. But we can use formatted key settings.
#key: ''

#Default partition strategy is 'hash' using key values set. If not set key value will #randomly distribute publish events.

#partition.hash:

# Default value  is false. If reach_only enabled event will publish only reachable kafka #brokers.
#reachable_only: false

# Configure alternative event field names used to compute the hash value.
# If empty `output.kafka.key` setting will be used.
# Default value is empty list.
#hash: []

# If authentication set on Kafka broker end below fileds are required.
#username: ''
#password: ''

#Kafka Broker version to configure so that filebeat can check compatibility with that.
#version: 0.8.2

#Meta data information is required for broker event publishing so that filbeat can take  #decision based on status of brokers.

#metadata:

#Defaults value for max 3 retries selection of available brokers.
#retry.max: 3

# Default value is 250ms. Will wait for specified time before make next retries.
#retry.backoff: 250ms

# Will update meta data information  in every 10 minutes.
#refresh_frequency: 10m

# It shows no of worker will run for each configure kafka broker.
#worker: 1

#Default value is 3. If set less than 0 filebeat will retry continuously as logs as events not #publish.
#max_retries: 3

# The Default value is 2048.It shows max number of batch events will publish to Kafka in #one request.
#bulk_max_size: 2048

#The default value is 30 second. It will timeout if not hear any response from Kafka #broker with in specified time.
#timeout: 30s
# Default is value is 10 seconds. During this max duration broker will wait for #number #of required acknowledgement.
#broker_timeout: 10s

# Default value is 256 for buffered message for Kafka broker.
#channel_buffer_size: 256

# Default value is 0 seconds  as keep alive is disabled and if this value set will keep alive #active network connection for that time.
#keep_alive: 0

# Default value for compression is gzip. We can also set other compression codec like #snappy, gzip or none.
compression: gzip

#Default value is 1000000 bytes . If Json value is more than configured max message #bytes event will dropped.
max_message_bytes: 1000000

#Default Value is 1 for ACK for reliability. Possible values can be :

#0=no response , Message can be lost on some error happens

#1=wait for local commit

#-1=wait for all replicas to commit.
#required_acks: 1

# Waiting Interval between new events and previous events for read logs.
#flush_interval: 1s

# The configurable ClientID used for logging, debugging, and auditing
# purposes. The default is "beats".

#Default value is beat. We can set values for this field that will help for analysis and #auditing purpose.
#client_id: beats

# Configure SSL setting id required for Kafk broker
#ssl.enabled: true

# Optional SSL configuration options. SSL is off by default.
# List of root certificates for HTTPS server verifications

#SSK configuration is Optional and OFF by default . It required for server verification if #HTTPS root certificate .
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

#Default value is full. SSL configuration verfication mode is required if SSL is configured #We can use value as 'none' for testing purpose but in this mode it can accept any #certificate.
#ssl.verification_mode: full

# List of supported/valid TLS versions. By default all TLS versions 1.0 up to
# 1.2 are enabled.

#By Default  it support all TLS versions after 1.0 to 1.2. We can also mentioned version in #below array
#ssl.supported_protocols: [TLSv1.0, TLSv1.1, TLSv1.2]

# Define path for certificate for SSL
#ssl.certificate: "/etc/pki/client/cert.pem"

# Define path for Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"

# If data is configured and shipped encrypted form. Need to add passphrase for #decrypting the Certificate Key otherwise optional
#ssl.key_passphrase: ''

# Configure encryption cipher suites to be used for SSL connections
#ssl.cipher_suites: []

# Configure encryption curve types for ECDHE based cipher suites
#ssl.curve_types: []
#====================Logging ==============================

# Default log level is info if set above or below will record top this hierarchy #automatically. Available log levels are: critical, error, warning, info, debug

logging.level: debug
# Possible values for selectors are "beat", "publish" and  "service" if you want to enable #for all select value as "*". This selector decide on command line when  start filebeat.
logging.selectors: ["*"]

# The default value is false.If make it true will send out put to syslog.
logging.to_syslog: false
# The default is true. all non-zero metrics  reading are output on shutdown.
logging.metrics.enabled: true

# Period of matrics for log reading counts from log files and it will send complete report #when shutdown filebeat
logging.metrics.period: 30s
# Set this flag as true to enable logging in files if not set that will disable.
logging.to_files: true
logging.files:
# Path of directory where logs file will write if not set default directory will home #directory.
path: /tmp

# Name of files where logs will write
name: filebeat-app.log
# Log File will rotate if reach max size and will create new file. Default value is 10MB
rotateeverybytes: 10485760 # = 10MB

# This will keep recent maximum log files in directory for rotation and remove oldest #files.
keepfiles: 7
# Will enable logging for that level only. Available log levels are: critical, error, warning, #info, debug
level: debug

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues.To Know more about YAML follow link YAML Tutorials.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Sample filebeat.yml file for Prospectors and Logging Configuration

Filebeat.yml file  with Prospectors, Kafka Output and Logging Configuration

You can  copy same file in filebeat.yml  and run after making below change as per your environment directory structure and follow steps mentioned for  Filebeat Download,Installation and Start/Run

  • Change on Prospectors section for your logs file directory and file name
  • Configure Multiline pattern as per your logs format as of now set as generic hopefully will work with all pattern
  • Change on Kafka output section for Host ,Port and topic name as required
  • Change on logging directory as per you machine directory.

Below is Sample file:

#=============Filebeat prospectors ===============

filebeat.prospectors:

#Here we can define multiple prospectors and shipping method and rules  as per
#requirement and if need to read logs from multiple file from same patter directory #location can use regular pattern also.

#Filebeat support only two types of input_type log and stdin

- input_type: log

# Paths of the files from where logs will read and use regular expression if need to read #from multiple files
paths:
- /opt/app/app1/logs/app1-debug*.log*
# make this fields_under_root as true if you want filebeat json out for read files in root.
fields_under_root: true

### Multiline configuration for handeling stacktrace, Object, XML etc if that is the case #and multiline is enabled with below configuration will shipped output for these case in #multiline

# The regexp Pattern that has to be matched. The example pattern matches all lines #starting with [DEBUG,ALERT,TRACE,WARNING log level that can be customize #according to your logs line format
#multiline.pattern: '^\[([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)'

# Default is false.Defines if the pattern match  should be negated or not.
#multiline.negate: true

# multiline.match define if pattern not match with above pattern where these line need #to append.Possible values  are "after" or "before".

#multiline.match: after

# if you will set this max line after these number of multiline all will ignore
#multiline.max_lines: 50

#==========Kafka output Configuration ============================
output.kafka:
# Below enable flag is for enable or disable output module will discuss more on filebeat #module section
#enabled: true

# Here mentioned all your Kafka broker host and port to fetch cluster metadata which #contains published events for kafka brokers.

hosts: ["kafkahost:port"]

# We can define topic for Kafka broker where events will published.
topic: QC-LOGS

# Default no key setting. But we can use formatted key settings.
#key: ''

#Default partition strategy is 'hash' using key values set. If not set key value will #randomly distribute publish events.

#partition.hash:

# Default value  is false. If reach_only enabled event will publish only reachable kafka #brokers.
#reachable_only: false

# Configure alternative event field names used to compute the hash value.
# If empty `output.kafka.key` setting will be used.
# Default value is empty list.
#hash: []

# If authentication set on Kafka broker end below fileds are required.
#username: ''
#password: ''

#Kafka Broker version to configure so that filebeat can check compatibility with that.
#version: 0.8.2

#Meta data information is required for broker event publishing so that filbeat can take  #decision based on status of brokers.

#metadata:

#Defaults value for max 3 retries selection of available brokers.
#retry.max: 3

# Default value is 250ms. Will wait for specified time before make next retries.
#retry.backoff: 250ms

# Will update meta data information  in every 10 minutes.
#refresh_frequency: 10m

# It shows no of worker will run for each configure kafka broker.
#worker: 1

#Default value is 3. If set less than 0 filebeat will retry continuously as logs as events not #publish.
#max_retries: 3

# The Default value is 2048.It shows max number of batch events will publish to Kafka in #one request.
#bulk_max_size: 2048

#The default value is 30 second. It will timeout if not hear any response from Kafka #broker with in specified time.
#timeout: 30s
# Default is value is 10 seconds. During this max duration broker will wait for #number #of required acknowledgement.
#broker_timeout: 10s

# Default value is 256 for buffered message for Kafka broker.
#channel_buffer_size: 256

# Default value is 0 seconds  as keep alive is disabled and if this value set will keep alive #active network connection for that time.
#keep_alive: 0

# Default value for compression is gzip. We can also set other compression codec like #snappy, gzip or none.
compression: gzip

#Default value is 1000000 bytes . If Json value is more than configured max message #bytes event will dropped.
max_message_bytes: 1000000

#Default Value is 1 for ACK for reliability. Possible values can be :

#0=no response , Message can be lost on some error happens

#1=wait for local commit

#-1=wait for all replicas to commit.
#required_acks: 1

# Waiting Interval between new events and previous events for read logs.
#flush_interval: 1s

# The configurable ClientID used for logging, debugging, and auditing
# purposes. The default is "beats".

#Default value is beat. We can set values for this field that will help for analysis and #auditing purpose.
#client_id: beats

# Configure SSL setting id required for Kafk broker
#ssl.enabled: true

# Optional SSL configuration options. SSL is off by default.
# List of root certificates for HTTPS server verifications

#SSK configuration is Optional and OFF by default . It required for server verification if #HTTPS root certificate .
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

#Default value is full. SSL configuration verfication mode is required if SSL is configured #.We can use value as 'none' for testing purpose but in this mode it can accept any #certificate.
#ssl.verification_mode: full

# List of supported/valid TLS versions. By default all TLS versions 1.0 up to
# 1.2 are enabled.

#By Default  it support all TLS versions after 1.0 to 1.2. We can also mentioned version in #below array
#ssl.supported_protocols: [TLSv1.0, TLSv1.1, TLSv1.2]

# Define path for certificate for SSL
#ssl.certificate: "/etc/pki/client/cert.pem"

# Define path for Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"

# If data is configured and shipped encrypted form. Need to add passphrase for #decrypting the Certificate Key otherwise optional
#ssl.key_passphrase: ''

# Configure encryption cipher suites to be used for SSL connections
#ssl.cipher_suites: []

# Configure encryption curve types for ECDHE based cipher suites
#ssl.curve_types: []
#====================Logging ==============================

# Default log level is info if set above or below will record top this hierarchy #automatically. Available log levels are: critical, error, warning, info, debug

logging.level: debug
# Possible values for selectors are "beat", "publish" and  "service" if you want to enable #for all select value as "*". This selector decide on command line when  start filebeat.
logging.selectors: ["*"]

# The default value is false.If make it true will send out put to syslog.
logging.to_syslog: false
# The default is true. all non-zero metrics  reading are output on shutdown.
logging.metrics.enabled: true

# Period of matrics for log reading counts from log files and it will send complete report #when shutdown filebeat
logging.metrics.period: 30s
# Set this flag as true to enable logging in files if not set that will disable.
logging.to_files: true
logging.files:
# Path of directory where logs file will write if not set default directory will home #directory.
path: /tmp

# Name of files where logs will write
name: filebeat-app.log
# Log File will rotate if reach max size and will create new file. Default value is 10MB
rotateeverybytes: 10485760 # = 10MB

# This will keep recent maximum log files in directory for rotation and remove oldest #files.
keepfiles: 7
# Will enable logging for that level only. Available log levels are: critical, error, warning, #info, debug
level: debug

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues.To Know more about YAML follow link YAML Tutorials.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Filebeat, Commandline Arguments Configuration

Environment and Commandline arguments setting to filebeat.yml

We can use environment variables and arguments from command line  references in the filebeat.yml file to set values or fields that need to be configurable during deployment. To configure fields and values use like this:

${VAR1}

Where VAR1 is the name of the environment variable or passing values from command line.

These configured values will replaced on start up by the value of environment variable or command line arguments. These configured fields are case sensitive if value not found will fail on parsing so better configure values with default value as given below.

${VAR1:default_value}

We pass command line arguments to set configured values with option -E with argument name.

We have configured in below file variable fields for server, kafkaHost and topic in below sample configuration file that way we can set other fields also as per requirement.

Use below command to run this sample configuration file:

./filebeat -c filebeat.yml -d publish -E server=server1 -E kafkaHost=IP:PORT -E topicName=QC-TEST

For example above commandline arguments will pass to filebeat.yml file for below settings.

Prospectors Settings :

filebeat.prospectors:
- input_type: log
paths:
- /opt/app/${server}/logs/app1-debug*.log*
fields_under_root: true

Kafka Output Settings:

output.kafka:
#enabled: true
hosts: ["${kafkaHost}"]
topic: ${topicName}

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues.To Know more about YAML follow link YAML Tutorials.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Filebeat, Logging Configuration

Logging is important with any application/tool/software same way filebeat provide option for logging and it’s configuration.

Filebeat provide three ways of configuration for log output: syslog, file and stderr

Default Configuration :

Windows :  file output

Linux or others: syslog

Below are example of configuration for logging in file and syslog and how to run. You can also get Sample file for Logging Configuration at end of this blog.

Logging Configuration for output to file:


logging.level: debug
logging.selectors: ["*"]
logging.metrics.enabled: true
logging.to_files: true
logging.files:
path: /tmp
name: mybeat-app1.log
rotateeverybytes: 10485760
keepfiles: 8
level: debug

To start/run filebeat from command line use below command which will send output to logging files . For other option to run filebeat follow link Ways to run filebeat

./filebeat  -c filebeat.yml -d “publish”

Logging Configuration for output to syslog:

logging.level: debug
logging.selectors: ["*"]
logging.metrics.enabled: true
logging.to_syslog: true

To start/run filebeat from command line use below command which will send output to syslog .For other option to run filebeat follow link Ways to run filebeat

./filebeat -e -c filebeat.yml -d “publish”

Filebeat Logging Configuration in Detail :

  • level :  Default log level is “info” . It can accept different log level like debug,info,warning,error or critical. If debug level is selected the no selectors configured by default consider as *.
  • selectors[]: Filebeat provide different selectors “beat”,”publish”,”service” which will tell filebeat where these debugging need to apply. If need to configure all sectors can use [“*”].  We can also override selectors by using -d  command line option for setting debug level.
  • to_syslog : By default to_syslog  is false which will send all output to sys_log.We can remove this flag from configuration file if need to configure for logging as file and set to_files as true and set other options for file.

Filebeat Performance analysis: We can enable metrics options if need analysis filebeat and what was change with previous interval.  It will record internally and will logged when filebeat will shotdown.

  • matrics.enabled : default value is true.
  • matrics.period: default value is 30s (Seconds) . Period of internal metrics log.

Filebeat Logging to`file: Will log all output to file. If not configure other fields for to files will take default values for below options.

  • to_files : default value is true.
  • path : set path of your directory where want to log files
  • Name:  Default name generate as mybeat.log . if you want according to your application change name accordingly.
  • Rotateeverybytes : Default Maximum size of log file is 10MB and if it will reach to max limit will generate new log file with rotation.
  • keepfiles: Default value is 7 and accept value on range 2 to 1024 only.It will always keep  latest files in directory and delete older one while rotation.

Sample File for Logging Configuration

Sample filebeat.yml file for Logging Configuration

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues.To Know more about YAML follow link YAML Tutorials.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Filebeat, Kafka Output Configuration

If need  to shipped server logs lines  directly to Kafka. Follow below steps:

Pre-Requisite :

  • Start Kafka  before start filebeat to listen publish events and configure filebeat with same kafka server port

Kafka Output  Required Configuration :

  • Comment out output.elasticsearch output section and uncomment output.kafka section
  • Set enabled value is true to make kafka output as enabled
  • Set host  of server where Kafka is running for listening  by default port for Kafka is 9092 if any change use same port value.
output.kafka:
 enabled:true
 #configure topic as per your application need
 hosts:["kafkaserver:9092"]
 topic:QC-TEST

Kafka Credentials Settings: Set below credentials if any for Kafka broker.

 username:"userid"
 password:"password"

Other Optional Configurations:

Kafka Output Compression Configuration:

Default value for compression is gzip. We can also set other compression codec like snappy, gzip or none.

compression:gzip

Logstash Output Performance Configuration:

worker:  we can configure number of worker for each host publishing events to elasticsearch which will do load balancing.

Kafka Broker Topic Partition Configuration:

key: Default no key setting. But we can use formatted key settings.

partition.hash: Default partition strategy is ‘hash’ using key values set. If not set key value will randomly distribute publish events.

reachable_only: Default value  is false. If reach_only enabled event will publish only reachable kafka brokers.

hash: [] Default value is empty list. Configure alternative event field names used to compute the hash value. If empty output.kafka.key setting will be used.

version: Kafka Broker version to configure so that filebeat can check compatibility with that.

Meta Data Configuration: Meta data information is required for broker event publishing so that filebeat can take  #decision based on status of brokers.

metadata:

retry.max: Defaults value for max 3 retries selection of available brokers.

retry.backoff: Default value is 250ms. Will wait for specified time before make next retries.
refresh_frequency: Will update meta data information  in every 10 minutes.

max_retries: Default value is 3. If set less than 0 filebeat will retry continuously as logs as events not publish.

bulk_max_size: The Default value is 2048.It shows max number of batch events will publish to Kafka in one request.

Kafka Reliability Setting:

#Default Value is 1 for ACK for reliability. Possible values can be :

#0=no response , Message can be lost on some error happens

#1=wait for local commit

#-1=wait for all replicas to commit.
required_acks: 1
timeout: The default value is 30 second. It will timeout if not hear any response from Kafka broker with in specified time.
broker_timeout: Default is value is 10 seconds. During this max duration broker will wait for number #of required acknowledgement.
channel_buffer_size: Default value is 256 for buffered message for Kafka broker.
keep_alive: Default value is 0 seconds  as keep alive is disabled and if this value set will keep alive active network connection for that time.
max_message_bytes: Default value is 1000000 bytes . If Json value is more than configured max message bytes event will dropped.

flush_interval: Waiting Interval between new events and previous events for read logs.

client_id: Default value is beat. We can set values for this field that will help for analysis and auditing purpose.

Sample configuration file

Sample filebeat.yml file for Kafka Output Configuration

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues. To Know more about YAML follow link YAML Tutorials.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Filebeat, Logstash Output Configuration

If need  to shipped server logs lines  directly to Logstash. Follow below steps:

Pre-Requisite :

  • Create Logstash Configuration file  with input section mentioned same port as configured in filebeat for logstash listener. Default port for logstash is 5044.
  • Start Logstash with same configuration file.

Logstash Output  Required Configuration :

  • Comment out output.elasticsearch output section and uncomment output.logstash section
  • Set enabled value is true to make logstash output as enabled
  • Set host  of server where Logstash is running for listening  by default port for Logstash is 5044 if any change use same port value.
output.logstash:
 enabled:true
#use localhost if on same machine and same port                                                                    useby  logstash listener
 hosts:["logstashserver:5044"]

Other Optional Configurations:

Logstash Output Compression Configuration:

Filebeat provide gzip compression level which varies from 1 to 9. As compression level increase processing speed will reduce but network speed increase.By default compression level disable and value is 0.

compress_level:0

Logstash Output Performance Configuration:

worker:  we can configure number of worker for each host publishing events to elasticseach which will do load balancing.

loadbalance: Default value is false.  If set to true will check status of hosts if unresponsive will send to another available host. if false filebeat will select random host and send events to it.

pipelining: Default value is 0 means pipeline disabled. Configure value decide of pipeline  batches to send to logstash asynchronously and wait for response. If pipeline value is written means output will blocking.

Logstash Output Proxy Configuration: Filebeat use SOCKS5 protocol to communicate with logstash servers. If any proxy configure for this protocol on server end then we can overcome by setting below details.

proxy_url:socks5://userid:pwd@socks5-server:2233

proxy_use_local_resolver: Default value is false means resolve host name resolution on  proxy server. If value is set as true Logstash host name resolution locally for proxy.

Sample configuration file

Sample filebeat.yml file for Logstash Output

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues. To know more about YML follow link YAML Tutorials.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Filebeat,Elasticsearch Output Configuration

If we need  to shipped server logs lines  directly to elasticseach  over HTTP by filebeat . We have set below fields for elasticsearch output according to your elasticsearch server configuration and follow below steps.

  1.  Uncomment output.elasticsearch in filebeat.yml file Elasticsearch
  2.  Set host and port in hosts line
  3.  Set index name as you want. If it’s not set filebeat will create default index as “filebeat-%{+yyyy.MM.dd}” .
output.elasticsearch :

   enabled:true
   hosts:["localhost:9200"]
   index:app1-logs-%{+yyyy.MM.dd}"

Elasticsearch server credentials configuration if any 

  1.  Set user name and password
  2.  Set protocol if https because default protocol is http
    username:userid
    password:pwd

Elasticsearch Index Template Configuration: We can update elasticsearch index template from filebeat which will define settings and mappings to determine field analysis.

Auto Index Template Loading: Filebeat package will load default template filebeat.template.json to elasticsearch if no any template configuration for template and will not overwrite template.

Customize Index Template Loading: We can upload our user define template and update version also by using below configuration.

#(if set as false template need to upload manually)
template.enabled:true
#default value is filebeat
template.name:"app1"
#default value is filebeat.template.json.
template.path:"app1.template.json"
#default value is false
template.overwrite:false 

By default, template.overwrite value is false and will not overwrite index template if already exist on elasticsearch.  For overwriting index template make this flag as true in filebeat.yml configuraton file.

Latest Template Version Loading from Filebeat: Set template.overwrite as true and if need to update template file version as 2.x then set path of Latest template file with below configuration.


template.overwrite:true
template.versions.2x.enabled: true
template.versions.2x.path:"${path.config}/app1.template-es2x.json"

Manually Index Template Loading : for manually index loading please refer Elasticsearch Index Template Management.

Compress Elasticsearch Output :  Filebeat provide gzip compression level which varies from 1 to 9. As compression level increase processing speed will reduce but network speed increase.By default compression level disable and value is 0.


compress_level: 0

Other configuration Options:

bulk_max_size : Default values is 50. If filebeat is generating events more than configure batch max size it will split events in configure size batches and send to elasticsearch. As much as batch size will increase performance will improve but require more buffring. It can cause other issue like connection, errors, timeout for requests.

Never set value of bulk size as 0 because there would not be any buffering for events and filebeat will send each event directly to elasticsearch.

timeout: Default value is 90 seconds. If no response http request will timeout.

flush_interval: waiting time for new events for bulk requests. If bulk request max size sent before this specified time, new bulk index request created.

max_retries: Default value is 3. When max retry reach specified limit and evens not published all events will drop. Filebeat also provide option to retry until all events are published by setting value as less than 0.

worker:  we can configure number of worker for each host publishing events to elasticseach which will do load balancing.

 Sample Filebeat Configuration file:

Sample filebeat.yml file to Integrate Filebeat with Elasticsearch

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues. To know more about YAML follow link YAML tutorials.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Filebeat Multiline Configuration Changes for Object, StackTrace and XML

Multiline configuration is required if need to handle multilines on filebeat server end. That will help for logs type like stackTrace for exception, print objects, XML, JSON etc. where standard log4j format does’t work  so this type of lines can be combined with previous line where log4j format was applied.

Below are filebeat configuration for multiline.

multiline.pattern: The regexp Pattern that has to be matched. The example pattern matches all lines starting with [DEBUG,ALERT,TRACE,WARNING log level that can be customize according to your logs line format. But that is generic one that will help most of cases.
multiline.pattern: ‘^[([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)’

Default is false for negate. Defines if the pattern match should be negated or not.
multiline.negate: true

multiline.match define if pattern not match with above pattern where these line need to append. Possible values are “after” or “before”.

multiline.match: after

If you will set this max line after these number of multiline all will ignore
multiline.max_lines: 50

For Example :

multiline.pattern: ‘^\[([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)’

multiline.negate: true
multiline.match: after
#multiline.max_lines: 50

Sample Configuration file for multiline configuration.

Sample filebeat.yml file for Prospectors,Multiline and Logging Configuration

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues. To know more about YAML follow link as YAML Tutorial.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Filebeat Prospectors Configuration Changes for Read Log files

Filebeat Prospectors Configuration

Filebeat can read logs from multiple files parallel and apply different condition, pass additional fields for different files, multiline and include_line, exclude_lines etc. based on different log files.

Filebeat allows multiline prospectors on same filebeat.yml file.

How to decide number of prospectors in configuration file?

We can decide number of prospectors after categorizing same type logs file based on their format format and operation need to perform based on business need. I have find out some steps to divide in prospectors.

  • Read only: First decide what are files/files from need to read by filebeat. If that’s need to read and shipped output to some other system then only one prospectors is enough. If require any below case then require more prospectors for each category.
  • Multiline : If require multiline handling on filebeat end then divide selected files from above step to different category based on same file log format and where same multiline pattern  can  apply. Go to link for more information about Filebeat Configuration Changes for Multiline Logs Handling
  • Fields Handling: If need to pass some additional fields over shipping data from filebeat to Output System. If field detail are same for prospectors then no more prospectors required if different then again sub categories according to required field detail and define more prospectors.

How to define Prospectors?

Filebeat allow two type of prospector’s input_type log and stdin. Prospector setting start from filebeat.prospectors and each prospector implement with input_type. Here in below example will consider as input type of log.

Multiline Prospectors Example:

filebeat.prospectors:


#Prospectors 1 : Only reading logs line
input_type: log
paths:
- /var/app1/backend/debug-log.log
- /var/app1/frontend/debug-log.log
- /var/app1/backend/server.log
- /var/app1/frontend/server.log

#Prospector 2 : reading and sending some additional field
input_type: log
paths:
-/var/app2/log/*-debug.log
fields:
apache: true

#Prospectors 3 : reading, multiline and sending some additional fields
input_type: log
paths:
-/var/app2/log/*-debug.log

multiline.pattern: '^\['
multiline.negate: true
multiline.match: after
#multiline.max_lines: 50

fields:
tz: EST

fields_under_root: true

Above example having three prospectors as given below

Prospector 1: reading logs files and shipped to output system.
Prospector 2: reading logs files and also sending additional fields like apache.
Prospectors 3: reading logs, multiline and also sending additional field for timezone.

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues. To know more about YAML follow link as YAML Tutorial

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Filebeat Download,Installation and Start/Run

Filebeat Latest Version : 6.2.4

Filebeat Download Link :  https://www.elastic.co/downloads/beats/filebeat

 Download filebeat from above link according to your Operating System and copy to directory where you want to install.

 Installation on Linux :  Go to directory where tar file was copied and use below    command to install it.


tar –zxvf filebeat--linux-x86.tar.gz

Installation on Windows: Go to directory where zip file was copied and unzip  file.


unzip file filebeat--window-xxx.zip file

Before Test and Run filebeat installation need to make below configuration changes in filbeat.yml file for prospectors,Output ,logging etc. Prospectors changes are required rest of changes optional and decide based on application requirements.

Required Change:

 Optional Change : Based on your Application Requirement

 Run/Start Filebeat On Linux:


./filebeat -e -c filebeat.yml -d "publish"

For running filebeat in background add “screen –d –m” as given below:


screen -d -m ./filebeat -e -c filebeat.yml -d "publish"

For Logging filebeat output to log file remove –e option from command as given below and follow link Filebeat Configuration Changes for Logging  for more info.


./filebeat  -c filebeat.yml -d "publish"

screen -d -m ./filebeat  -c filebeat.yml -d "publish"

Filebeat 5 added new features of passing command line arguments while start filebeat. This is really helpful because no change required in filebeat.yml configuration file specifics to servers and and pass server specific information over command line. If in future your servers scaling and changes in output port and machine IP for elasticsearch or kafka or logstash. Then configuration team need to update only command line arguments for specific information and no change in configuration file.

Run/Start Filebeat with command line Arguments in Forground:


./filebeat -c filebeat.yml -d publish -E server= -E file=app1.log  -E tz=CDT -E                     kafkaHost=IP:PORT

Run In Background :


screen –d –m ./filebeat -c filebeat.yml -d publish -E server= -E  file=app1.log  -E tz=CDT -E kafkaHost=IP:PORT

Here -E option represents argument values are passing from command line will set in respective position in filebeat.yml configuration file. Follow link Filebeat Commandline Arguments setting in configuration file .

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues. To know more about YAML/YML follow YAML Tutorials

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Filebeat Overview

Filebeat is a light weight agent on server for log data shipping, which can monitors log files, log directories changes and forward log lines to different target Systems like Logstash, Kafka ,elasticsearch or files etc.

Filebeat work like tail command in Unix/Linux.

Latest Filebeat Version :   6.2.4

Why Filebeat ?

  • Lightweight agent for shipping logs.
  • Forward and centralize files and logs.
  • Robust (Not miss a single beat)

How Filebeat Work?

Filebeat starts prospectors to locate corresponding to each log file path mentioned in filebeat configuration file. Filebeat start a periodic harvester, which identify changes on file based on inode value, do tail to read change logs and send it to spooler to aggregate it. Processors (If configure) will perform different operation based on condition in spooler. Spooler send this aggregated data to target Systems like Logstash, Kafka, Elasticsearch or files etc.

filebeatflow.png
Filebeat Architeture

Filebeat Download Link: Filebeat Download

Integration

Complete Integration Example Filebeat, Kafka, Logstash, Elasticsearch and Kibana

Read More

To read more on Filebeat topics, sample configuration files and integration with other systems with example follow link Filebeat Tutorial  and  Filebeat Issues.

Leave you feedback to enhance more on this topic so that make it more helpful for others.