Factorial of a Number by Java Program


The Factorial of number is denoted by n! , is the product/multiplication of all positive integers less than or equal to n.

Where n is always a non-negative number and The value of 0! is 1, according to the convention for an empty product.

Example :

0!=1=1
1!=1=1
5!=5 * 4 * 3 * 2 * 1=120
12!=12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1=479001600

Below are examples to calculate factorial of  a Number. I have consider all the cases like calculation of big factorial number by loop and also through recursion.

Factorial of a Number by using Java code loop.

class Factorial {
	public static void main(String args[]) {
		int n, c, fact = 1;

		System.out.println("Enter an integer to calculate it's factorial");
		Scanner in = new Scanner(System.in);

		n = in.nextInt();

		if (n < 0)
			System.out.println("Number should be non-negative.");
		else {
			for (c = 1; c <= n; c++)
				fact = fact * c;

			System.out.println("Factorial of " + n + " is = " + fact);
		}
	}
}

Factorial of a Big Integer Number by using loop

If calculation of number is too big which cross the Integer limit then use BigInteger instead if Integer.

class Factorial {
	public static void main(String args[]) {
		int n, c;
		BigInteger inc = new BigInteger("1");
		BigInteger fact = new BigInteger("1");

		Scanner input = new Scanner(System.in);

		System.out.println("Input an integer");
		n = input.nextInt();

		for (c = 1; c <= n; c++) {
			fact = fact.multiply(inc);
			inc = inc.add(BigInteger.ONE);
		}

		System.out.println(n + "! = " + fact);
	}
}

Factorial of Integer Number by using Java Recursion.

 

import java.util.Scanner;

public class FactorialByRecursion {

	public static void main(String[] args) {
		int n;
		System.out.println("Enter an integer to calculate it's factorial");
		Scanner in = new Scanner(System.in);

		n = in.nextInt();

		if (n < 0)
			System.out.println("Number should be non-negative.");
		else {
			System.out.println("Factorial of " + n + " is = " + factorial(n));
		}

	}

	private static int factorial(int num)
	{
		//Recursion Terminating condition if 0 terminate with value 1
		if(num==0)
			return 1;
		else
		return num*factorial(num-1);
	}

}
Posted in Java, Algorithm | Tagged , , , , , , | Leave a comment

Armstrong Number JAVA Program


Armstrong number is a number which is equal to sum of digits raise to the power total number of digits in the number.

Example :

33+ 73 + 13 = 371

14 + 64 + 34 + 44=1634

Below is JAVA program to check Armstrong Number.

import java.util.Scanner;

class ArmstrongNumber {
	public static void main(String args[]) {
	int n, sum = 0, temp, remainder, digits = 0;

	Scanner in = new Scanner(System.in);
	System.out.println("Input a number to check if it is an Armstrong number");
	n = in.nextInt();

	temp = n;

	// Count number of digits

	while (temp != 0) {
		digits++;
		temp = temp / 10;
	}

	temp = n;

	while (temp != 0) {
		remainder = temp % 10;
		sum = sum + power(remainder, digits);
		temp = temp / 10;
	}

	if (n == sum)
		System.out.println(n + " is an Armstrong number.");
	else
		System.out.println(n + " is not an Armstrong number.");
	}

	static int power(int n, int r) {
		int c, p = 1;

		for (c = 1; c <= r; c++)
			p = p * n;

		return p;
	}
}
Posted in Java, Example, Algorithm, Complexity | Tagged , | Leave a comment

Bubble Sort/ Sinking Sort Java Program


Bubble Sort also called Sinking Sort is a comparison sort algorithm where smaller or larger “bubble” elements move to top.

Complexity

Complexity of bubble sort is O(n2) in both average and worst case while O(n) in best case when elements are already sorted. Where n is number of elements.

Drawback

It will not be efficient in the case of a reverse-ordered collection or number of elements are high.

import java.util.Scanner;

public class BubbleSort {
	public static void main(String []args) {
	    int n, c, d, swap;
	    //For User Input
	    Scanner in = new Scanner(System.in);

	    System.out.println("Input number of integers to sort");
	    n = in.nextInt();

	    int array[] = new int[n];

	    System.out.println("Enter " + n + " integers");

	    for (c = 0; c < n; c++)
	      array[c] = in.nextInt();

	    for (c = 0; c < ( n - 1 ); c++) {
	      for (d = 0; d < n - c - 1; d++) {
/* In case of descending order use < */
if (array[d] > array[d+1]) 
	        {
	          swap       = array[d];
	          array[d]   = array[d+1];
	          array[d+1] = swap;
	        }
	      }
	    }
	    System.out.println("Sorted list of numbers");

	    for (c = 0; c < n; c++)
	      System.out.println(array[c]);
	  }
	}

Posted in Algorithm, Complexity, Java, Sort | Tagged , , , , | Leave a comment

Exception javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher


Generally this exception happen while having some encrypted character which where used for URL parameter encryption.

Exception in thread "main" javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher
	at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:936)
	at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:847)
	at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446)
	at javax.crypto.Cipher.doFinal(Cipher.java:2165)
	at security.EncryptionDecryptionURLParam.main(EncryptionDecryptionURLParam.java:51)

Solution :

Use below line of statements. Follow example below for more detail.

Not Use :

 byte[] decryptedPassword = cipher.doFinal(decodeStr.getBytes());

Use:

byte[] base64decodedTokenArr = Base64.decodeBase64(decodeStr.getBytes());
byte[] decryptedPassword = cipher.doFinal(base64decodedTokenArr);

Example :

Issues Solution

For more other JAVA/JDBC issues solution follow link JAVA/JDBC Issues.

 

Posted in Java, Security | Tagged , , , ,

Exception java.security.NoSuchAlgorithmException: Cannot find any provider supporting AES/ECB/PKCS7Padding


Java 8 doesn’t support provider “AES/ECB/PKCS7Padding” .

Exception :

Exception in thread "main" java.security.NoSuchAlgorithmException: Cannot find any provider supporting AES/ECB/PKCS7Padding
	at javax.crypto.Cipher.getInstance(Cipher.java:540)
	at security.EncryptionDecryptionURLParam.main(EncryptionDecryptionURLParam.java:31)

Solution :

Java 8 doesn’t support provider “AES/ECB/PKCS7Padding” use provider as “AES/ECB/PKCS5Padding” as used in given example for Encryption and Decryption.

Examples:

 

Issues Solution

For more other JAVA/JDBC issues solution follow link JAVA/JDBC Issues.

Posted in Java, Security | Tagged , ,

How to do Encryption and Decryption for plain text/password in JAVA


Java code for Encryption and Decryption of plain text. In below code encrypting plain text encrypted by using Key by algorithm “AES/ECB/PKCS5Padding“ and Decryption again in plain text.

Pre-Requisite :

  • Java 7 or 8
import javax.crypto.Cipher;
import javax.crypto.spec.SecretKeySpec;
public class EncryptionDecryption {

	public static void main(String[] args) throws Exception {
	    byte[] input = "facingissuesonit.com".getBytes();
	    byte[] keyBytes = new byte[] { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09,
	        0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f };

	    SecretKeySpec key = new SecretKeySpec(keyBytes, "AES");

	    Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");

	    System.out.println(new String(input));

	    // Encryption pass
	    cipher.init(Cipher.ENCRYPT_MODE, key);

	    byte[] cipherText = new byte[cipher.getOutputSize(input.length)];
	    int ctLength = cipher.update(input, 0, input.length, cipherText, 0);
	    ctLength += cipher.doFinal(cipherText, ctLength);
	    System.out.println(new String(cipherText));
	    System.out.println(ctLength);

	    // Decryption pass
	    cipher.init(Cipher.DECRYPT_MODE, key);
	    byte[] plainText = new byte[cipher.getOutputSize(ctLength)];
	    int ptLength = cipher.update(cipherText, 0, ctLength, plainText, 0);
	    ptLength += cipher.doFinal(plainText, ptLength);
	    System.out.println(new String(plainText));
	    System.out.println(ptLength);
	  }

}

More Sample Code

For more JAVA and JDBC codes follow below links

 

Posted in Java, Security | Tagged , , , | 1 Comment

JAVA Encryption and Decryption of URL Parameter


Java code for Encryption and Decryption of URL parameters.  In below code encrypting parameter as passed as token which is having (Fixed Text + Time stamp and Session ID) and encrypted by using Key by algorithm “AES/ECB/PKCS5Padding“.

Pre-requisite :

  • JAVA 8
  • Common-Codec-1.8.jar
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
import java.text.SimpleDateFormat;
import java.util.Date;

import javax.crypto.Cipher;
import javax.crypto.spec.SecretKeySpec;

import org.apache.commons.codec.binary.Base64;
public class EncryptionDecryptionURLParam {
	public static final String FORMAT = "yyyy-MM-dd'T'HH:mm:ssZ";
	public static void main(String[] args) throws Exception {
		 SimpleDateFormat sdf = new SimpleDateFormat(FORMAT);
	     String timestamp = sdf.format(new Date());

	     String constantValue="FacingIssuesOnIT";
	     String sessionId="ABCDEFGHIJKLMNOPQRSTUVWXYZ";

		 String tokenStr = constantValue+"$"+timestamp+"/06$"+sessionId;

		 System.out.println(tokenStr);

	    byte[] keyBytes = new byte[] { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09,
	        0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f };

	    Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");

	    SecretKeySpec key = new SecretKeySpec(keyBytes, "AES");

	    // encryption url
	    cipher.init(Cipher.ENCRYPT_MODE, key);

	    byte[] cipherText = cipher.doFinal(tokenStr.getBytes());
        System.out.println("encrypted token size:" + cipherText.length);
        //Encode Character which are not allowed on URL
        String encodedTxt = Base64.encodeBase64URLSafeString(cipherText);

        System.out.println("EncodedEncryptedToken : " + encodedTxt);

	    //decryption url
        cipher.init(Cipher.DECRYPT_MODE, key);
        String decodeStr = URLDecoder.decode(
        		encodedTxt,
        StandardCharsets.UTF_8.toString());
        System.out.println("URL Decoder String :"+decodeStr);
        //Decode URl safe to base 64
        byte[] base64decodedTokenArr = Base64.decodeBase64(decodeStr.getBytes());

       byte[] decryptedPassword = cipher.doFinal(base64decodedTokenArr);
        //byte[] decryptedPassword = cipher.doFinal(decodeStr.getBytes());
       String  decodeTxt=new String(decryptedPassword);
       System.out.println("Token after decryption: " + decodeTxt);

	  }

}
 

More Sample Code

For more JAVA and JDBC codes follow below links

 

Posted in Java, Security | Tagged , , , , , , | 2 Comments

Elasticsearch Interview Questions and Answers


Top 50 Elasticsearch Frequently Asked Interview Questions are collected based on my Interview Experience on ELK (Elasticsearch, Logstash and Kibana) with different Organization. I have divided these question in three categories  as below .

  • Elasticsearch Overview Questions and Answers.
  • Basic Concepts and Terminology Questions and Answers.
  • Advance and Practical Questions and Answers.

Elasticsearch Overview Questions and Answers

1.What is Elasticsearch?

“Elasticsearch is Open source, cross-paltform, scalable, full-text search and analytical engine based on Apache Lucene technology. It help in NRT (Near Real Time) analysis and full text search on big volume of data on distributed clustered environment.”

  • Elasticsearch is developed by Apache in Java Language.
  • Elasticsearch store records in form of JSON documents as key and value.
  • By Default Schema free if required schema can added by mapping from client app.
  • Access by HTTP over the browser, by application through Elasticsearch REST Client API or Elasticsearch Transport Client.
  • Elasticsearch Organization provide some application and plug-in for making Elasticsearch more useful like Kibana for doing search and Analysis by different charts and Dashboard.

2. What are the advantages of Elasticsearch?

  • Elasticsearch is implemented on Java, which makes it compatible on almost every platform.
  • Elasticsearch is Near Real Time (NRT), in other words after one second the added document is searchable in this engine.
  • Elasticsearch cluster is distributed, which makes it easy to scale and integrate in any big organizations.
  • Creating full backups of data are easy by using the concept of gateway, which is present in Elasticsearch.
  • Elasticsearch REST uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.
  • Elasticsearch supports almost every document type except those that do not support text rendering.
  • Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.

3. What are the Disadvantages of Elasticsearch?

  • Elasticsearch does not have multi-language support in terms of handling request and response data in JSON while in Apache Solr, where it is possible in CSV, XML and JSON formats.
  • Elasticsearch have a problem of Split Brain situations, but in rare cases.

4. What is difference and Similarities between NoSQL MongoDB and Elasticsearch?

Elasticsearch is Apache Lucene based RESTful NRT(Near Real Time) search and analytics engine while MongoDB is an open source document-oriented Database Management System.

Similarities

Certain features are common between both products like Document-oriented Store, Schema free, Distributed Data Storage, High-Availability, Sharding, Replication etc.

Difference

 There are many differences between both products as below

Type Elasticsearch MongoDB
Indexing
  • Uses Apache Lucene for indexing.
  • Real-time indexing and searching power from Lucene, which allows creation of index on every field of a document by default.
  • Based on traditional B+ Tree.
  • Define the index, which improves query performance, but affects write operations.

 

Language Implemented in Java Implemented in C++
Documents Stores JSON documents Stores them in BSON (Binary JSON) format. (though, it looks same like a JSON document to the end user)
REST Interface RESTful Not RESTful
Map Reduce Not Support MapReduce Allow Map Reduce Operation
Huge Data Store and Retrieve Huge Data Store and Search Huge Data

5. What are common area of use Elasticsearch?

  • It’s useful in application where need to do analysis, statics and need to find out anomalies on data based on pattern.
  • It’s useful where need to send alerts when particular condition matched like stock market, exception from logs etc.
  • It’s useful with application where log analysis and issue solution provide because of full search in billions of records in milliseconds.
  • It’s compatible with application like Filebeat, Logstash and Kibana for storage of high volume data for analysis and visualize in form of chart and dashboards.

6. What are operations can be performed on Elasticsearch Documents?

Elasticsearch perform some basic operations like:

  • Indexing
  • Searching
  • Fetching
  • Updating
  • Delete Documents.

Basic Concepts and Terminology Questions and Answers

7. What is Elasticsearch Cluster ?

Cluster is a collection of one or more nodes which provide capabilities to search text on scattered data on nodes. It’s identified by unique name with in network so that all associated nodes will join together by cluster name.

Operation Persistent : Cluster also maintain keep records of all transaction level changes for schema if anything get change in data for index and track of availability of Nodes in cluster so that make data easily available if any fail-over of any node.

Elasticsearch Cluster

Elasticsearch Cluster

In above screen Elasticsearch cluster “FACING_ISSUE_IN_IT” having three master and four data node.

8.What is Elasticsearch Node?

Node is a Elasticsearch server which associate with in a cluster. It’s store data , help cluster for indexing data and search query. It’s identified by unique name in Cluster if name is not provided in elasticsearch will generate random Universally Unique Identifier(UUID) on time of server start.

A Cluster can have one or more Nodes .If first node start that will have Cluster with single node and when other node will start will add with that cluster.

Data Node storage

Data Node Documents Storage

In above screen trying to represent data of two indexes like I1 and I2. Where Index I1 is having two type of documents T1 and T2 while index I2 is having only type T2 and these shards are distributes over all nodes in cluster. This data node is having documents of shard (S1) for  Index I1 and shard (S3) for Index I2. It’s also keeping replica of documents of shards S2 of Index I2 and I1 which are store some other nodes in cluster.

9. What are types of Node in Elasticsearch?

With in Elasticsearch Cluster each Node know others Node based on configuration decide role/responsibility of each individual Node.  Below are Elasticsearch Node Types.

  • Master-Eligible Node.
  • Data Node.
  • Ingest Node.
  • Tribe Node/Coordinating Node.

10. What is Master Node and Master Eligible Node in Elasticsearch?

Master Node control cluster wide operations like creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes. It is important for cluster health to have a stable master node. Master Node elected based on configuration properties node.master=true (Default).

Master Eligible Node decide based on below  configuration

discovery.zen.minimum_master_node : number (default 1)

and above number decide based (master_eligible_nodes / 2) + 1

11. What is Data Node in Elasticsearch?

Data nodes hold the shards/replica that contain the documents that was indexed. Data Nodes perform data related operation such as CRUD, search aggregation etc. Set node.data=true (Default) to make node as Data Node.

Data Node operations are I/O-, memory-, and CPU-intensive. It is important to monitor these resources and to add more data nodes if they are overloaded.The main benefit of having dedicated data nodes is the separation of the master and data roles.

12. What is Ingest Node in Elasticsearch?

Ingest nodes can execute pre-processing an ingest pipeline to a document in order to transform and enrich the document before indexing. With a heavy ingest load, it makes sense to use dedicated ingest nodes and to mark the master and data nodes  as false and node.ingest=true.

13. What is Tribe Node and Coordinating Node in Elasticsearch?

Tribe node, is special type of node that coordinate to connect to multiple clusters and perform search and  others operation across all connected clusters. Tribe Node configured by settings tribe.*.

Coordinating Node behave like Smart Load balancer which able to handle master duties, to hold data, and pre-process documents, then you are left with a coordinating node that can only route requests, handle the search reduce phase, and distribute bulk indexing.

Every node is implicitly a coordinating node. This means that a node that has all three node.masternode.data and node.ingest set to false will only act as a coordinating node, which cannot be disabled. As a result, such a node needs to have enough memory and CPU in order to deal with the gather phase.

14. What is Index in Elasticsearch?

An Index is collection of documents with same characteristics which stores on nodes in distributed fashion and its identify by unique name on which perform different operations like insert , search query, update and delete for documents. A cluster can have as many indexes with unique name.

A document store in Index and assigned a type to it and an Index can have multiple types of documents.

15. What is Shards in Elasticsearch?

Shards are partitions of indexes scattered on nodes in order to make  scal. It provide capability to store large amount (billions) of documents for same index to store in cluster even one disk of node is not capable to store it. Shards also maintain Inverted Index of documents token to make full-text search fast.

16. What is Replica in Elasticsearch?

Replica is copy of shard which store on different node or same node. A shard can have zero or more replica. If shard on one node then replica of shard will store on another node.

17. What are Benefits of Shards and Replica in Elasticsearch?

  • Shards splits indexes in horizontal partition for high volumes of data.
  • It perform operations parallel to each shards or replica on multiple node for index so that increase system performance and throughput.
  • Recovered easily in case of fail-over of node because data replica exist on another node because replica always store on different node where shards exist.

Some Important Points:

When we create index by default elasticseach index configure as 5 shards and 1 replica but we can configure it from config/elasticsearch.yml file or by passing shards and replica values in mapping when index create.

Once index created we can’t change shards configuration but modify in replica. If need to update in shards only option is re-indexing.

Each Shard itself a Lucene index and it can keep max 2,147,483,519 (= Integer.MAX_VALUE – 128) documents. For merging of search results and failover taken care by elasticsearch cluster.

18. What is Document in Elasticsearch?

Each Record store in index is called a document which store in JSON object. Document is Similar to row in term of RDBMS only difference is that each document will have different number of fields and structure but common fields should have same data type.

19. What is a Type in Elasticsearch ?

Type is logical category/grouping/partition of index whose semantics is completely up to user and type will always have same number of columns for each documents.

ElasticSearch => Indices => Types => Documents with Fields/Properties

20. What is a Document Type in Elaticsearch?

A document type can be seen as the document schema / mapping definition, which has the mapping of all the fields in the document along with its data types.

21. What is indexing in ElasticSearch ?

The process of storing data in an index is called indexing in ElasticSearch. Data in ElasticSearch can be dividend into write-once and read-many segments. Whenever an update/modification is attempted, a new version of the document is written to the index.

22. What is inverted index in Elasticsearch ?

Inverted Index is backbone of Elasticsearch which make full-text search   fast.  Inverted index consists of a list of all unique words that occurs in  documents and for each word, maintain a list of documents number and positions in which it appears.

For Example  : There are two documents and having content as :

1: FacingIssuesOnIT is for ELK.

2: If ELK check FacingIssuesOnIT.

To make inverted index each document will split in words (also called as terms or token) and create below sorted index .

Term                   Doc_1  Doc_2
-------------------------
FacingIssuesOnIT    |   X   |  X
is                  |   X   |
for                 |   X   |  
ELK                 |   X   |  X
If                  |       |  X
check               |       |  X

Now when we do some full-text search for String will sort documents based on existence and occurrence of matching counts .

Usually in Books we have inverted indexes on last pages. Based on the word we can thus find the page on which the word exists.

23. What is an Analyzer in ElasticSearch ?

While indexing data in Elastic Search, data is transformed internally by the Analyzer defined for the index, and then indexed. An analyzer is building block of  character filters, tokenizers and token filters. Following types of Built-in Analyzers are available in Elasticsearch 5.6.

Analyzer

Description
Standard Analyzer

Divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. It removes most punctuation, lower cases terms, and supports removing stop words.

Simple Analyzer

Divides text into terms whenever it encounters a character which is not a letter. It lower cases all terms.

White space Analyzer

Divides text into terms whenever it encounters any white space character. It does not lowercase terms.

Stop Analyzer

It is like the simple analyzer, but also supports removal of stop words.

Keyword Analyzer

A “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.

Pattern Analyzer

Uses a regular expression to split the text into terms. It supports lower-casing and stop words.

Language Analyzer

Elasticsearch provides many language-specific analyzers like English or French.

Finger Print Analyzer

A specialist analyzer which creates a fingerprint which can be used for duplicate detection.

24. What is a Tokenizer in ElasticSearch ?

tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. Inverted indexes are created and updates using these token values by recording the order or position of  each term and the start and end character offsets of the original word which the term represents.

An analyzer must have exactly one Tokenizer.

25. What is Character Filter in Elasticsearch Analyzer?

character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For instance, a character filter could be used to convert Hindu-Arabic numerals (٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin equivalents (0123456789), or to strip HTML elements like  from the stream.

An analyzer may have zero or more character filters, which are applied in order.

26.What is Token filters in Elasticsearch Analyzer?

token filter receives the token stream and may add, remove, or change tokens. For example, a lowercase token filter converts all tokens to lowercase, a stop token filter removes common words (stop words) like the from the token stream, and a synonym token filter introduces synonyms into the token stream.

Token filters are not allowed to change the position or character offsets of each token.

An analyzer may have zero or more token filters, which are applied in order.

27. What are Type of Token Filters in Elasticsearch Analyzer?

Elasticsearch have number of built in Token filters which can use in custom filters.

28.  What is the is use of attributes- enabled, index and store ?

The enabled attribute applies to various ElasticSearch specific/created fields such as _index and _size. User-supplied fields do not have an “enabled” attribute.

Store means the data is stored by Lucene will return this data if asked. Stored fields are not necessarily searchable. By default, fields are not stored, but full source is. Since you want the defaults (which makes sense), simply do not set the store attribute.

The index attribute is used for searching. Only indexed fields can be searched. The reason for the differentiation is that indexed fields are transformed during analysis, so you cannot retrieve the original data if it is required.

29.What is the query language of ElasticSearch ?

Elasticsearch uses the Apache Lucene query language, which is called as Query DSL.

30. Does Elasticsearch have a schema ?

Yes, Elasticseach can have mappings which can be used to enforce schema on documents. We define Elasticsearch Index Schema by defining Mappings.

Advance and Practical Interview Questions and Answers 

31.What are Scripting Languages Support by Elasticsearch?

Elasticsearch supports custom scripting available in Lucene Expression, Groovy, Python,Java Script and Painless.

32. What is Painless and their benefits in Elasticsearch?

Painless is a simple, secure scripting language designed specifically for use with Elasticsearch 5.XX . It is the default scripting language for Elasticsearch  and can safely be used for inline and stored scripts. Painless use anywhere scripts can be used in Elasticsearch.

Benefits of Painless :

  • Fast performance: Painless scripts run several times faster than the alternatives.
  • Safety: Fine-grained whitelist with method call/field granularity.
  • Optional typing: Variables and parameters can use explicit types or the dynamic def type.
  • Syntax: Extends Java’s syntax to provide Groovy-style scripting language features that make scripts easier to write.
  • Optimizations: Designed specifically for Elasticsearch scripting.

33. How to store Elasticsearch Node Data to external Directory?

By default in Elasticsearch  data path location is $ES_HOME/data.  Keeping data in external path from Elasticsearch directory is beneficial while doing upgrade or any modification of Elasticsearch so that no any data loss.

For pointing to external path there are two ways to do :

First :   Set static path on elasticsearch.yml file as below .

path.data: /opt/app/FacingIssuesOnIT/data

Second : By Passing argument from command line while starting Elasticsearch.

./bin/elasticsearch Epath.data=/opt/app/FacingIssuesOnIT/data

33. What is Restore and Snapshot in Elasticsearch?

Snapshot : Snapshot is copy or backup of individual indices or an entire cluster into a  remote repository like shared file system, S3, or HDFS. Snapshots are not archival because they can only be restored to versions of Elasticsearch that can read the index.

Steps to create Snapshot:

  • Setup Backup directory
PUT /_snapshot/facingIssueOnIT_bkp
{
"type": "fs",  
"settings": { 
"compress": true, 
"location": "/mount/backups/facingIssueOnIT_bkp" 
         }
}
  • Check status
GET /_snapshot/facingIssueOnIT_bkp
or 
GET /_snapshot/_all
{ 
"facingIssueOnIT_bkp": { 
"type": "fs", 
"settings": { 
 "compress": true, 
 "location": "/mount/backups/facingIssueOnIT_bkp" 
 }
 } 
}
  • After registering repository create Snapshot of Cluster or Index as Below
For Cluster
PUT /_snapshot/facingIssueOnIT_bkp/snapshot_1?wait_for_completion=true

For indexes
PUT /_snapshot/facingIssueOnIT_bkp/snapshot_1
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false
}

wait_for_completion=true is use to complete prompt then only you can execute any other action for doing it in background use as false.

Restored : Restored is used to retrieve backup/snapshot indexes again in cluster. Restore can we done on cluster level and index level.

Cluster Level
POST /_snapshot/facingIssueOnIT_bkp/snapshot_1/_restore
Index Level
POST /_snapshot/facingIssueOnIT_bkp/snapshot_1/_restore
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": true,
  "rename_pattern": "index_(.+)",
  "rename_replacement": "restored_index_$1"
}

34. What is Elasticsearch REST API and use of it?

Elasticsearch provides a very comprehensive and powerful REST API that you can use to interact with your cluster. Among the few things that can be done with the API are as follows:

  • Check your cluster, node, and index health, status, and statistics
  • Administer your cluster, node, and index data and metadata
  • Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
  • Execute advanced search operations such as paging, sorting, filtering, scripting, aggregations, and many others

To learn more on Elasticsearch REST API follow link Elasticsearch Tutorial

35. How to check Elasticsearch Cluster Health?

To know about cluster health follow below URL over curl or on your browser.

GET /_cat/health?v

36. What are type of Cluster Health Status?

  • Green means everything is good (cluster is fully functional).
  •  Yellow means all data is available but some replicas are not yet allocated (cluster is fully functional)
  • Red means some data is not available for whatever reason.
  • Note: that even if a cluster is red, it still is partially functional (i.e. it will continue to serve search requests from the available shards) but you will likely need to fix it ASAP since you have missing data

37.  How to know Number of Nodes?

GET /_cat/nodes?v

Response:

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1           10           5   5    4.46                        mdi      *      PB2SGZY

Here, we can see our one node named “PB2SGZY”, which is the single node that is currently in our cluster.

38. How to get list of available Indices in Elasticsearch Cluster?

GET /_cat/indices?v

39. How to create Indexes?

PUT /customer?pretty
GET /_cat/indices?v

39. How to delete Index and records?

DELETE /customer?pretty
GET /_cat/indices?v

and 

PUT /customer
PUT /customer/external/1
{
  "name": "John Doe"
}
GET /customer/external/1
DELETE /customer

If we study the above commands carefully, we can actually see a pattern of how we access data in Elasticsearch. That pattern can be summarized as follows:

<REST Verb> //<Type>/<ID>

This REST access pattern is so pervasive throughout all the API commands that if you can simply remember it, you will have a good head start at mastering Elasticsearch.

40. How to update record and document fields value in Index?

We’ve previously seen how we can index a single document. Let’s recall that command again:

PUT /customer/external/1?pretty
{
  "name": "John Doe"
}

Again, the above will index the specified document into the customer index, external type, with the ID of 1. If we then executed the above command again with a different (or same) document, Elasticsearch will replace (i.e. reindex) a new document on top of the existing one with the ID of 1:

PUT /customer/external/1?pretty
{
  "name": "Jane Doe"
}

The above changes the name of the document with the ID of 1 from “John Doe” to “Jane Doe”. If, on the other hand, we use a different ID, a new document will be indexed and the existing document(s) already in the index remains untouched.

PUT /customer/external/2?pretty
{
  "name": "Jane Doe"
}

The above indexes a new document with an ID of 2.

When indexing, the ID part is optional. If not specified, Elasticsearch will generate a random ID and then use it to index the document. The actual ID Elasticsearch generates (or whatever we specified explicitly in the previous examples) is returned as part of the index API call.

This example shows how to index a document without an explicit ID:

POST /customer/external?pretty
{
  "name": "Jane Doe"
}

Note that in the above case, we are using the POST verb instead of PUT since we didn’t specify an ID.

 

Read More

To read more on Elasticsearch Configuration, Sample Elasticsearch REST Clients, Search Queries Types with example follow link Elasticsearch Tutorial and Elasticsearch Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Posted in Elasticsearch, ELK | Tagged , , , , , , , , ,

How to Configure Filebeat, Kafka, Logstash Input , Elasticsearch Output and Kibana Dashboard


Filebeat, Kafka, Logstash, Elasticsearch and Kibana Integration is used for big organizations where applications deployed in production on hundreds/thousands of servers and scattered around different locations and need to do analysis on data from these servers on real time.

This integration helps mostly for log level analysis , tracking issues, anomalies with data and alerts on events of particular occurrence and where accountability measures.

By using these technology provide scalable architecture to enhance systems and decoupled of each other individually.

Why these Technology?

Filebeat :

  • Lightweight agent for shipping logs.
  • Forward and centralize files and logs.
  • Robust (Not miss a single beat)

Kafka:

  • Open source distributed, Steam Processing, Message Broker platform.
  • process stream data or transaction logs on real time.
  • fault-tolerant, high throughput, low latency platform for dealing real time data feeds.

Logstash:

  •  Open source, server-side data processing pipeline that accept data from a different  sources simultaneously.
  • Parse, Format, Transform data and send to different output sources.

Elasticsearch:

  • Elasticsearch is open source, distributed cross-platform.
  • Built on top of Lucene which provide full text search and provide NRT(Near real Time) search results.
  • Support RESTFUL search  by Elasticsearch REST

Kibana:

  • Open source
  • Provide window to view Elasticsearch data in form different charts and dashboard.
  • Provide way  searches and operation of data easily with respect to time interval.
  • Easily Imported by  any web application by embedded dashboards.

How Data flow works ?

In this integration filebeat will install in all servers where your application is deployed and filebeat will read and ship  latest logs changes from these servers to Kafka topic as configured for this application.

Logstash will subscribe log lines from kafka topic and perform parsing on these lines make relevant changes, formatting, exclude and include fields then send this processed data to Elasticsearch Indexes as centralize location from different servers.

Kibana  is linked with  Elasticsearch indexes which will help to do analysis by search, charts and dashboards .

FKLEK Integration

Design Architecture

In below configured architecture considering my application is deployed on three servers and each server having current log file name as App1.log . Our goal is read real time data from these servers and do analysis on these data.

FKLEK Arch Integration

Steps to Installation, Configuration and Start

Here first we will install Kafka and Elasticsearch run individually rest of tools will install and run sequence to test with data flow.  Initially install all in same machine  and test with sample data with below steps and at end of this post will tell about what changes need to make according to your servers.

  • Kafka Installation, Configuration and Start
  • Elasticsearch Installation,Configuration and Start
  • Filebeat Installation,Configuration and Start
  • Logstash Installation,Configuration and Start
  • Kibana Installation,Start and display.

Pre-Requisite

These Filebeat,Logstash, Elasticsearch and Kibana versions should be compatible better use latest from  https://www.elastic.co/downloads.

  • Java 8+
  • Linux Server
  • Filebeat 5.XX
  • Kafka 2.11.XX
  • Logstash 5.XX
  • Elasticsearch 5.XX
  • Kibana 5.XX

Note  : Make sure JDK 8 should be install  and JAVA_HOME environment variable point to JDK 8 home directory  wherever you want in install Elasticsearch, Logstash,Kibana and Kafka.

Window   : My computer ->right click-> Properties -> Advance System Settings->System Variable

Java_Home

Set JAVA_HOME

Linux : Go to your home directory/ sudo directory and below line as below .

export JAVA_HOME=/opt/app/facingissuesonit/jdk1.8.0_66

Sample Data

For testing we will use these sample log line which is having debug as well as stacktrace of logs and grok parsing of this example is designed according to it. For real time testing and actual data you can point to your server log files but you have to modify grok pattern in Logstash configuration accordingly.

2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}
2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}
2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}

Create  App1.log file  in same machine where filebeat need to install and copy above logs lines in App1.log file.

Kafka Installation , Configuration and Start

Download latest version of Kafka from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://kafka.apache.org/downloads

tar -zxvf kafka_2.11-0.10.0.0

For more configuration and start options follow Setup Kafka Cluster for Single Server/Broker

After download and untar/unzip file it will have below files and directory structure.

ls- l
drwxr-xr-x  3 facingissuesonit Saurabh   4096 Apr  3 05:18 bin
drwxr-xr-x  2 facingissuesonit Saurabh   4096 May  8 11:05 config
drwxr-xr-x 74 facingissuesonit Saurabh   4096 May 27 20:00 kafka-logs
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr  3 05:17 libs
-rw-r--r--  1 facingissuesonit Saurabh  28824 Apr  3 05:17 LICENSE
drwxr-xr-x  2 facingissuesonit Saurabh 487424 May 27 20:00 logs
-rw-r--r--  1 facingissuesonit Saurabh    336 Apr  3 05:18 NOTICE
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr  3 05:17 site-docs

For more details about all these files,configuration option and other integration options follow Kafka Tutorial.

Make below changes in files config/zookeeper.properties and config/server.properties

config/zookeeper.properties

clientPort=2181
config/server.properties:

broker.id=0
listeners=PLAINTEXT://:9092
log.dir=/kafka-logs
zookeeper.connect=localhost:2181

Now Kafka is configured and ready to run. Use below command to start zookeeper and Kafka server as  background process.

screen -d -m bin/zookeeper-server-start.sh config/zookeeper.properties
screen -d -m bin/kafka-server-start.sh config/server.properties

To test  Kafka  install successfully you can check by running Kafka process on Linux “ps -ef|grep kafka” or steps for consumer and producer to/from topic in Setup Kafka Cluster for Single Server/Broker.

Elasticsearch Installation,Configuration and Start

Download latest version of Elasticsearch from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/elasticsearch

tar -zxvf elasticsearch-5.4.0.tar.gz

It will show below files and directory structure for Elasticsearch.

drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 25 19:20 bin
drwxr-xr-x  3 facingissuesonit Saurabh   4096 May 13 17:27 config
drwxr-xr-x  3 facingissuesonit Saurabh   4096 Apr 24 15:56 data
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 17 10:55 lib
-rw-r--r--  1 facingissuesonit Saurabh  11358 Apr 17 10:50 LICENSE.txt
drwxr-xr-x  2 facingissuesonit Saurabh   4096 May 28 05:00 logs
drwxr-xr-x 12 facingissuesonit Saurabh   4096 Apr 17 10:55 modules
-rw-r--r--  1 facingissuesonit Saurabh 194187 Apr 17 10:55 NOTICE.txt
drwxr-xr-x  2 facingissuesonit Saurabh   4096 Apr 17 10:55 plugins
-rw-r--r--  1 facingissuesonit Saurabh   9540 Apr 17 10:50 README.textile

Before going to start Elasticsearch need to make some basic changes in config/elasticsearch.yml file for cluster  and node name. You can configure it based on you application or organization name.

cluster.name: FACING-ISSUE-IN-IT
node.name: TEST-NODE-1
#network.host: 0.0.0.0
http.port: 9200

Now we are ready with elasticsearch configuration and time start elasticsearch. We can use below command to run elasticsearch in background.

screen -d -m  /bin/elasticsearch

For  checking elasticsearch starts successfully you can use below url on browser  to know cluster status . You will get result like below.

http://localhost:9200/_cluster/health?pretty

or as below if network.host configured

http://elasticseverIp:9200/_cluster/health?pretty

Result :

{
  "cluster_name" : "FACING-ISSUE-IN-IT",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Filebeat Installation, Configuration and Start

Download latest version of filebeat from  below link and use  command to untar  and installation in Linux server. or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/beats/filebeat

tar -zxvf filebeat-<version>.tar.gz

For more configuration and start options follow Filebeat Download,Installation and Start/Run

After download and untar/unzip file it will have below files and directory structure.

ls- l
-rwxr-xr-x 1 facingissuesonit Saurabh 14908742 Jan 11 14:11 filebeat
-rw-r--r-- 1 facingissuesonit Saurabh    31964 Jan 11 14:11 filebeat.full.yml
-rw-r--r-- 1 facingissuesonit Saurabh     3040 Jan 11 14:11 filebeat.template-es2x.json
-rw-r--r-- 1 facingissuesonit Saurabh     2397 Jan 11 14:11 filebeat.template.json
-rw-r--r-- 1 facingissuesonit Saurabh     4196 Jan 11 14:11 filebeat.yml
-rw-r--r-- 1 facingissuesonit Saurabh      811 Jan 11 14:10 README.md
drwxr-xr-x 2 facingissuesonit Saurabh     4096 Jan 11 14:11 scripts

For more details about all these files,configuration option and other integration options follow Filebeat Tutorial.

Now filebeat is installaed and need to make below changes in filebeat.full.yml file

  • Inside prospectors section change paths to your log file location as
paths:
-/opt/app/facingissuesonit/App1.log
  • Comment out Elasticsearch Output default properties as below
#output.elasticsearch:
#hosts: ["localhost:9200"]
  • Configure multiline option as below so that all stacktrace line which are not starting with date  can we consider as single line.
multiline.pattern: ^\d
multiline.negate: true
multiline.match: after

For learn more on filebeat multiline configuration follow Filebeat Multiline Configuration Changes for Object, StackTrace and XML

  • Inside Kafka Output section update these properties hosts and topic. if Kafka on same machine then use localhost else update with IP of kafka machine.
output.kafka:
 hosts: ["localhost:9092"]
 topic: APP-1-TOPIC

For more on Logging configuration follow link Filebeat, Logging Configuration.

Now filebeat is configured and ready to start with  below command, it will read from configured prospector for file App1.log continiously and publish log line events to Kafka . It will also create topic as APP-1-TOPIC in Kafka if not exist.

./filebeat -e -c filebeat.full.yml -d "publish"

On console it will display output as below for sample lines.

2017/05/28 00:24:27.991828 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
  "offset": 194,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991907 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
  "offset": 375,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991984 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.991Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
  "offset": 718,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}
2017/05/28 00:24:27.991984 client.go:184: DBG  Publish: {
  "@timestamp": "2017-05-28T00:24:22.992Z",
  "beat": {
    "hostname": "sg02870",
    "name": "sg02870",
    "version": "5.1.2"
  },
  "input_type": "log",
  "message": "2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}",
  "offset": 902,
  "source": "/opt/app/facingissuesonit/App1.log",
  "type": "log"
}

Now you can see from above filebeat debug statements publish event 3 is having multiline statements with stacktrace exception and each debug will have these fields like.

@timestamp:  Timestamp of data shipped.

beat.hostname : filebeat machine name from where data is shipping.

beat.version: which version of filebeat installed on server that help for compatibility check on target end.

message : Log line from logs file or multline log lines

offset: it’s represent inode value in source file

source :  it’s file name from where logs were read

Now time to check data is publish to Kafka topic or not. For this go to below directory  and you will see two files as xyz.index and xyz.log for maintaining data offset and messages.

{Kafka_home}/kafk_logs/APP-1-TOPIC
          00000000000000000000.log
          00000000000000000000.index

Now your server log lines are in Kafka topic for reading and parsing  by Logstash and send it to elasticsearch for doing analysis/search on this data.

Logstash Installation, Configuration and Start

Download latest version of Logstash from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/logstash

tar -zxvf logstash-5.4.0.tar.gz

It will show below file and directory structure.

drwxr-xr-x 2 facingissuesonit Saurabh   4096 Apr 20 11:27 bin
-rw-r--r-- 1 facingissuesonit Saurabh 111569 Mar 22 23:49 CHANGELOG.md
drwxr-xr-x 2 facingissuesonit Saurabh   4096 Apr 20 11:27 config
-rw-r--r-- 1 facingissuesonit Saurabh   2249 Mar 22 23:49 CONTRIBUTORS
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 12:07 data
-rw-r--r-- 1 facingissuesonit Saurabh   3945 Mar 22 23:55 Gemfile
-rw-r--r-- 1 facingissuesonit Saurabh  21544 Mar 22 23:49 Gemfile.jruby-1.9.lock
drwxr-xr-x 5 facingissuesonit Saurabh   4096 Apr 20 11:27 lib
-rw-r--r-- 1 facingissuesonit Saurabh    589 Mar 22 23:49 LICENSE
drwxr-xr-x 2 facingissuesonit Saurabh   4096 May 21 00:00 logs
drwxr-xr-x 4 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-event-java
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-plugin-api
drwxr-xr-x 3 facingissuesonit Saurabh   4096 Apr 20 11:27 logstash-core-queue-jruby
-rw-r--r-- 1 facingissuesonit Saurabh  28114 Mar 22 23:56 NOTICE.TXT
drwxr-xr-x 4 facingissuesonit Saurabh   4096 Apr 20 11:27 vendor

Before going to start Logstash need to create configuration file for taking input data from Kafka and parse these data in respected fields and send it elasticsearch. Create file logstash-app1.conf in logstash bin directory with below content.

/bin/logstash-app1.conf

input {
     kafka {
            bootstrap_servers => 'localhost:9092'
            topics => ["APP-1-TOPIC"]
            codec => json {}
          }
}
filter
{
//parse log line
      grok
	{
	match => {"message" => "\A%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:loglevel}\s+(?<logger>(?:[a-zA-Z0-9-]+\.)*[A-Za-z0-9$]+)\s+(-\s+)?(?=(?<msgnr>[A-Z]+[0-9]{4,5}))*%{DATA:message}({({[^}]+},?\s*)*})?\s*$(?<stacktrace>(?m:.*))?" }
	}  

    #Remove unused fields
    #mutate { remove_field =>["beat","@version" ]}
}
output {
    #Output result sent to elasticsearch and dynamically create array
    elasticsearch {
        index  => "app1-logs-%{+YYYY.MM.dd}"
        hosts => ["localhost:9200"]
        sniffing => false
  	}

     #Sysout logs
     stdout
       {
         codec => rubydebug
       }
}

To test your configuration file you can use below command.


./logstash -t -f logstash-app1.conf

If  we get result OK from above command run below to start reading and parsing data from Kafka topic.


./logstash -f logstash-app1.conf

For design your own grok pattern for you logs line formatting you can follow below link that will help to generate incrementally and also provide some sample logs grok.

http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/

Logstash console will show parse data as below  and you can remove unsed fields for storing in elasticsearch by uncomment mutate section from configuration file.

{
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 194,
      "loglevel" => "WARN",
        "logger" => "CreateSomethingActivationKey",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",
        "source" => "/opt/app/facingissuesonit/App1.log",
       "message" => [
        [0] "2013-02-28 09:57:56,662 WARN  CreateSomethingActivationKey - WhateverException for User 49-123-345678 {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
        [1] "WhateverException for User 49-123-345678 "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 09:57:56,662"
}
{
         "msgnr" => "ERR1700",
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 375,
      "loglevel" => "INFO",
        "logger" => "LMLogger",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",
        "source" => "/opt/app/facingissuesonit/App1.log",
       "message" => [
        [0] "2013-02-28 09:57:56,663 INFO  LMLogger - ERR1700 - u:null failures: 0  - Technical error {{rid,US8cFAp5eZgAABwUItEAAAAI_dev01_443}{realsid,60A9772A136B9912B6FF0C3627A47090.dev1-a}}",
        [1] "ERR1700 - u:null failures: 0  - Technical error "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 09:57:56,663"
}
{
        "offset" => 718,
        "logger" => "SomeCallLogger",
    "input_type" => "log",

       "message" => [
        [0] "2013-02-28 09:57:56,668 ERROR SomeCallLogger - ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  {}\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
        [1] "ESS10005 Cpc portalservices: Exception caught while writing log messege to MEA Call:  "
    ],
          "type" => "log",
         "msgnr" => "ESS10005",
    "@timestamp" => 2017-05-28T23:47:42.160Z,
    "stacktrace" => "\njava.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist\n\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445)\n\tat oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)",
      "loglevel" => "ERROR",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
      "@version" => "1",
     "timestamp" => "2013-02-28 09:57:56,668"
}
{
    "@timestamp" => 2017-05-28T23:47:42.160Z,
        "offset" => 903,
      "loglevel" => "INFO",
        "logger" => "EntryFilter",
          "beat" => {
        "hostname" => "zlp0287k",
            "name" => "zlp0287k",
         "version" => "5.1.2"
    },
    "input_type" => "log",
      "@version" => "1",

       "message" => [
        [0] "2013-02-28 10:04:35,723 INFO  EntryFilter - Fresh on request /portalservices/foobarwhatever {{rid,US8dogp5eZgAABwXPGEAAAAL_dev01_443}{realsid,56BA2AD41D9BB28AFCEEEFF927EE61C2.dev1-a}}\n",
        [1] "Fresh on request /portalservices/foobarwhatever "
    ],
          "type" => "log",
     "timestamp" => "2013-02-28 10:04:35,723"
}

To test on elasticsearch end your data sent  successfully  you can use this url
http://localhost:9200/_cat/indices  on your browser and will display created index with current date.

yellow open app1-logs-2017.05.28                             Qjs6XWiFQw2zsiVs9Ks6sw 5 1         4     0  47.3kb  47.3kb

Kibana Installation, Configuration and Start

Download latest version of Kibana from below link and use command to untar and installation in Linux server or if window just unzip downloaded file.

Download Link : https://www.elastic.co/downloads/kibana

tar -zxvf kibana-5.4.0.tar.gz

It will show below files and directory structure for kibana.

ls -l
drwxr-xr-x   2 facingissuesonit Saurabh   4096 May 22 14:23 bin
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 25 18:58 config
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 25 11:54 data
-rw-r--r--   1 facingissuesonit Saurabh    562 Apr 17 12:04 LICENSE.txt
drwxr-xr-x   6 facingissuesonit Saurabh   4096 Apr 17 12:04 node
drwxr-xr-x 485 facingissuesonit Saurabh  20480 Apr 17 12:04 node_modules
-rw-r--r--   1 facingissuesonit Saurabh 660429 Apr 17 12:04 NOTICE.txt
drwxr-xr-x   3 facingissuesonit Saurabh   4096 Apr 17 12:04 optimize
-rw-r--r--   1 facingissuesonit Saurabh    702 Apr 17 12:04 package.json
drwxr-xr-x   2 facingissuesonit Saurabh   4096 May 22 12:29 plugins
-rw-r--r--   1 facingissuesonit Saurabh   4909 Apr 17 12:04 README.txt
drwxr-xr-x  10 facingissuesonit Saurabh   4096 Apr 17 12:04 src
drwxr-xr-x   3 facingissuesonit Saurabh   4096 Apr 17 12:04 ui_framework
drwxr-xr-x   2 facingissuesonit Saurabh   4096 Apr 17 12:04 webpackShims

Before going to start Kibana need to make some basic changes in config/kibana.yml file make below changes after uncomment these properties file.

server.port: 5601
server.host: localhost
elasticsearch.url: "http://localhost:9200"

Now we are ready with Kibana configuration and time start Kibana. We can use below command to run Kibana in background.

screen -d -m  /bin/kibana

Kibana take time to start and we can test it by using below url in browser

http://localhost:5601/

For checking this data  in Kibana open above url in browser go to management tab on left side menu -> Index Pattern -> Click on Add New

Enter Index name or pattern and time field name as in below screen  and click on create button.

Kibana index setting

Index Pattern Settings

Now go to Discover Tab and select index as app1-log* will display data as below.

kibana discover data

Now make below changes according to  your application specification .

Filebeat :

  • update prospector path to your log directory current file
  •  Move Kafka on different machine because Kafka will single location where receive shipped data from different servers. Update localhost with same IP of kafka server in Kafka output section of filebeat.full.yml file  for hosts properties.
  • Copy same filebeat setup on all servers from where you application deployed and need to read logs.
  • Start all filebeat instances on each Server.

Elasticsearch :

  • Uncomment network.host properties from elasticsearch.yml file for accessing by  IP address.

Logstash:

  • Update localhost in logstash-app1.conf file input section with Kafka machine IP.
  • change grok pattern in filter section according to your logs format. You can take help from below url for incrementally design. http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/
  • Update localhost output section for elasticsearch with IP if moving on different machine.

Kibana:

  • update localhost in kibana.yml file for elasticsearch.url properties with IP if kibana on different machine.

Conclusion :

In this tutorial considers below points :

  • Installation of Filebeat, Kafka, Logstash, Elasticsearch and Kibana.
  • Filebeat is configured to shipped logs to Kafka Message Broker.
  • Logstash configured to read logs line from Kafka topic , Parse and shipped to Elasticsearch.
  • Kibana show these Elasticsearch information in form of chart and dashboard to users for doing analysis.

Read More

To read more on Filebeat, Kafka, Elasticsearch  configurations follow the links and Logstash Configuration,Input Plugins, Filter Plugins, Output Plugins, Logstash Customization and related issues follow Logstash Tutorial and Logstash Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Reference  :

 https://www.elastic.co/products

 

Posted in Elasticsearch, ELK, Example, Filebeat, JSON, Kafka, Kibana, Logstash, Zookeeper | Tagged , , , , , , , ,

Logstash , JDBC Input configuration tutorial with sql_last_value and tracking_column as numeric or timestamp


Logstash , JDBC Input Plug-in work like a adapter to send your database detail to Elasticsearch so that utilize for full text search, query, analysis and show in form of Charts and Dashboard to Kibana.

In below example I will explain about how to create Logstash configuration file by  using JDBC Input Plug-in for Oracle Database and output to Elasticsearch .

Logstash JDBC Input configuration for Elasticsearch Output

Pre-requisite:

Sample Data:

Below sample data is from defect_detail table where defect id as numeric value and increment continuously in ascending order.

defect_id	owned_by	severity	status	          summary	                  application	created_by	creation_date	modified_by	modified_date	assigned_to
530812		Ramesh      Severity 3	Cacelled	      Customer call 5 time 	      TEST-APP	    Saurabh	    7/3/2017 15:44	Gaurav	    8/19/2017 6:22	Development
530828	    Neha	    Severity 1	Cancelled	      Dealer Code Buyer on behalf TEST-APP-5	Rajan	    7/3/2017 16:20	Nilam	    8/17/2017 9:29	Development
540829	    Ramesh	    Severity 1	Retest Completed  Client Not want  Bulk call  TEST-APP-4	Rajiv	    7/24/2017 11:29	Raghav	    8/5/2017 20:00	IST

Configuration File :

Below configuration file is setup to read data from Oracle database , will execute query in every 15 minute and read records after last run value of defect id . We should always use order by for column for which need to use last run value as configured for defect_id having numeric column.

If you are using any other database like MYSQL, SQLServer, DB2 etc. change jdbc_driver_library and jdbc_connection_string according to database. Because every database have there own query format so update query accordinly.

Copy below content and create file in bin directory as /bin/logstash-jdbc-defect.conf

input
{
jdbc {
#Path to download jdbc deriver and add in class path
jdbc_driver_library ="../jar/ojdbc6.jar";
# ORACLE Driver Class
jdbc_driver_class ="Java::oracle.jdbc.driver.OracleDriver";
# ORACLE database jdbc connection string ,  jdbc:oracle:thin:@hostname:PORT/SERVICE
jdbc_connection_string ="jdbc:oracle:thin:@hostname:1521/service";
#The user and password to connect to database
jdbc_user ="username";
jdbc_password ="password";
#Use when need to read password from file
#jdbc_password_filepath ="/opt/app/password-path-location";
jdbc_paging_enabled ="true";
jdbc_page_size ="50000";
#Configure Cron to How frequent want execute query in database
schedule ="*/15 * * * *";
#Use below if query is big and want to store in separate file
#statement_filepath ="../query/remedy-tickets-details.sql"
#Use for Inline query and if want to execute record after last run compare with value sql_last_value that can be numeric or timestamp
statement ="select defect_id,owned_by,severity,status,summary,application,created_by,creation_date,modified_by,modified_date,assigned_to from defect_detail where defect_id>:sql_last_value order by defect_id"
#Below is configuration when want to use last run value
clean_run=true
use_column_value =true
tracking_column =defect_id
#Logstash by default consider last_sql_value as numeric if it's timestamp configure specifically as timestamp
#tracking_column_type ="timestamp"
record_last_run =true
#This file keep record of sql_last_value so that when next time query run can utilize last run values
last_run_metadata_path ="logstash_jdbc_last_run_t_data.txt"
#Define type of data from database
type ="t-data"
#Configure Timestamp according to database location
#jdbc_default_timezone ="UTC";

}
}
filter
{
#To map your creation_date column with elasticsearch @timestamp use below Date filter
mutate
{
convert =[ "creation_date", "string" ]
}
#Date pattern represent to date filter this creation_date is on format "MM/dd/yyyy HH:mm"
#and from timezone America/New_York so that when store in elasticsearch in UTC will adjust accordingly

date {
match =["creation_date","MM/dd/yyyy HH:mm"]
timezone ="America/New_York"
}
}
output
{
#output to elasticsearch
elasticsearch {
index = "defect-data-%{+YYYY.MM}"
hosts = ["elasticsearch-server:9200"]
document_type = "t-type"
#Use document_id in elasticsearch id you want to stop duplicate record in elasticsearch
document_id = "%{defect_id}"
}
#Output to console
stdout { codec = rubydebug}
}

I try to give  descriptive information in comment corresponding to each properties in configuration file. if need to go in depth and  more information just drop comments and send email will discuss in detail.

Date Filter : This filter will map CREATION_DATE  to @timestamp value for Index for each document and it says to CREATION_DATE is having pattern as “MM/dd/yyyy HH:mm” so that while converting to timestamp will follow same.

Execution :

 [logstash-installation-dir]/bin/logstash -f transaction-jdbc-defect.conf

For learning validation and start Logstash with other option follow link Logstash Installation, Configuration and Start

Logstash Console Output

If you noticed by using Date filter index @timestamp value is generating based on value of CREATION_DATE and for elasticsearch output configuration for index name defect-data-%{+YYYY.MM} will create  indexes for every month based on @timestamp value as   defect-data-2017.07 for sample data and if data changing in your database and defect id increase you will see changes on your console for new defects in every 15 minute as setup in configuration file.

Result :

select defect_id,owned_by,severity,status,summary,application,created_by,creation_date,modified_by,modified_date,assigned_to from defect_detail where defect_id:sql_last_value order by defect_id"
#
{
               "severity" = "Severity 3",
                "summary" = "Customer call 5 time but no response",
               "owned_by" = "Ramesh",
          "creation_date" = "7/3/2017 15:44",
          "modified_date" = "8/19/2017 6:22",
                   "type" = "t-data",
             "created_by" = "Saurabh",
             "@timestamp" = 2017-07-03T19:44:00.000Z,
            "modified_by" = "Gaurav",
               "@version" = "1",
              "defect_id" = 530812,
            "application" = "TEST-APP",
                 "status" = "Cancelled",
            "assigned_to" = "Development"
}
{
               "severity" = "Severity 1",
                "summary" = "Dealer Code Buyer on behalf",
               "owned_by" = "Neha",
          "creation_date" = "7/3/2017 16:20",
          "modified_date" = "8/17/2017 9:29",
                   "type" = "t-data",
             "created_by" = "Rajan",
             "@timestamp" = 2017-07-03T20:20:00.000Z,
            "modified_by" = "Nilam",
               "@version" = "1",
              "defect_id" = 530828,
            "application" = "TEST-APP5",
                 "status" = "Cancelled",
            "assigned_to" = "Development"
}
{
               "severity" = "Severity 1",
                "summary" = "Client Not want  Bulk call",
               "owned_by" = "Ramesh",
          "creation_date" = "7/24/2017 11:29",
          "modified_date" = "8/5/2017 20:00",
                   "type" = "t-data",
             "created_by" = "Rajiv",
             "@timestamp" = 2017-07-24T15:29:00.000Z,
            "modified_by" = "Raghav",
               "@version" = "1",
              "defect_id" = 540829,
            "application" = "TEST-APP4",
                 "status" = "Retest Complete",
            "assigned_to" = "IST - Integrated System Test"
}

Summary

In above detail cover about below points:

  • Logstash JDBC Input from Oracle Database.
  • JDBC Input changes for sql_last_value for numeric and timestamp
  • Read password and multi-line query from separate file.
  • Date Filter to get Index Timestamp value based on fields and pattern.
  • Dynamic Index Name for each day by appending date format.
  • Duplicate insert record prevention on Elasticsearch.
  • Start Logstash on background for configuration file.
  • Send Logstash output to Elasticsearch and Console.

Read More

To read more on Logstash Configuration,Input Plugins, Filter Plugins, Output Plugins, Logstash Customization and related issues follow Logstash Tutorial and Logstash Issues.

Hope this blog was helpful for you.

Leave you feedback to enhance more on this topic so that make it more helpful for others.

Reference  :

 https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html

 

Posted in Centralize logging, Date, Elasticsearch, ELK, Logstash | Tagged , , , , , , , , , ,