TIKA Reference API


Java Programmers can integrate the Tika library in their applications by using the Tika facade class and other below classes.

Tika Class

Tika facade class abstracts the complexity and provides simple methods to explore the functionalities of TIKA.

package:org.apache.tika

Constructors

Followings are constructors of Tika class:

Constructor Description
Tika () Tika default constructor uses the default configuration and constructs the Tika class.
Tika (Detector detector) Creates the Tika facade class by accepting the detector instance as a parameter.
Tika (Detector detector, Parser parser) Creates a Tika facade class by accepting the detector and parser instances as parameters.
Tika (Detector detector, Parser parser, Translator translator) Creates the Tika facade class by accepting the detector, the parser, and the translator instance as parameters.
Tika (TikaConfig config) Creates a Tika facade class by accepting the object of the TikaConfig class as a parameter.

Methods and Description

The following are the important methods of the Tika facade class:

Method Description
parseToString (File file) This method parses and extract extracted text content in the String format. By default, string parameter length is limited.
int getMaxStringLength () This method returns the maximum length of strings returned by the method.
void setMaxStringLength (int maxStringLength) Set the maximum length of strings returned while extracting data from the file.
Reader parse (File file) This method parses and extract extracted text content in the form of java.io.reader object.
String detect (InputStream stream, Metadata metadata) This method accepts an InputStrea and Metadata of an object as parameters and returns the document type name.
String translate (InputStream text, String targetLanguage) This method accepts the InputStream and a String representing the language that we want our text to be translated. It returns, given text to the desired language, attempting to auto-detect the source language.

Parser Interface

This interface implemented by all the parser classes of the Tika package.

package: org.apache.tika.parser

Methods

This is the important method of Tika Parser interface −

Methods Description
parse (InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) This parse method use is given document input stream into a sequence of XHTML and SAX events. After parsing, it places the metadata in the object of MetaData class and extracted document content in the object of the ContentHandler class.

Metadata Class

This MetaData class implements various interfaces such as CreativeCommons, Geographic, HttpHeaders, Message, MSOffice, ClimateForcast, TIFF, TikaMetadataKeys, TikaMimeKeys, Serializable to support various data models.

package: org.apache.tika.metadata

Constructors

Constructor Description
Metadata() Constructs new, empty metadata.

Methods

Methods Description
add (Property property, String value) Adds a new metadata property in the form of key/value pair.
add (String name, String value) Adds a new metadata property in the form of key/value pair.
String get (Property property) Returns the property’s value (if any).
String get (String name) Returns the key’s value (if any).
Date getDate (Property property) Returns the value of Date of metadata property.
String[] getValues (Property property) Returns all the values of metadata associated with property.
String[] getValues (String name) Returns all the values of a given metadata key.
String[] names() Returns all the key names of metadata elements in a metadata object.
set (Property property, Date date) Sets the date of the given metadata property
set(Property property, String[] values) Sets multiple values for a metadata property.

LanguageIdentifier Class

This class used to identify the language of the given content.

package : org.apache.tika.language

Constructors

Constructor Description
LanguageIdentifier (LanguageProfile profile) Instantiates the language identifier for parameter LanguageProfile.
LanguageIdentifier (String content) Instantiates the language identifier for text content.

Methods

Methods Description
String getLanguage () Returns the language of the content of current LanguageIdentifier object.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s