Java Programmers can integrate the Tika library in their applications by using the Tika facade class and other below classes.
Tika Class
Tika facade class abstracts the complexity and provides simple methods to explore the functionalities of TIKA.
package:org.apache.tika
Constructors
Followings are constructors of Tika class:
Constructor | Description |
Tika () | Tika default constructor uses the default configuration and constructs the Tika class. |
Tika (Detector detector) | Creates the Tika facade class by accepting the detector instance as a parameter. |
Tika (Detector detector, Parser parser) | Creates a Tika facade class by accepting the detector and parser instances as parameters. |
Tika (Detector detector, Parser parser, Translator translator) | Creates the Tika facade class by accepting the detector, the parser, and the translator instance as parameters. |
Tika (TikaConfig config) | Creates a Tika facade class by accepting the object of the TikaConfig class as a parameter. |
Methods and Description
The following are the important methods of the Tika facade class:
Method | Description |
parseToString (File file) | This method parses and extract extracted text content in the String format. By default, string parameter length is limited. |
int getMaxStringLength () | This method returns the maximum length of strings returned by the method. |
void setMaxStringLength (int maxStringLength) | Set the maximum length of strings returned while extracting data from the file. |
Reader parse (File file) | This method parses and extract extracted text content in the form of java.io.reader object. |
String detect (InputStream stream, Metadata metadata) | This method accepts an InputStrea and Metadata of an object as parameters and returns the document type name. |
String translate (InputStream text, String targetLanguage) | This method accepts the InputStream and a String representing the language that we want our text to be translated. It returns, given text to the desired language, attempting to auto-detect the source language. |
Parser Interface
This interface implemented by all the parser classes of the Tika package.
package: org.apache.tika.parser
Methods
This is the important method of Tika Parser interface −
Methods | Description |
parse (InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) | This parse method use is given document input stream into a sequence of XHTML and SAX events. After parsing, it places the metadata in the object of MetaData class and extracted document content in the object of the ContentHandler class. |
Metadata Class
This MetaData class implements various interfaces such as CreativeCommons, Geographic, HttpHeaders, Message, MSOffice, ClimateForcast, TIFF, TikaMetadataKeys, TikaMimeKeys, Serializable to support various data models.
package: org.apache.tika.metadata
Constructors
Constructor | Description |
Metadata() | Constructs new, empty metadata. |
Methods
Methods | Description |
add (Property property, String value) | Adds a new metadata property in the form of key/value pair. |
add (String name, String value) | Adds a new metadata property in the form of key/value pair. |
String get (Property property) | Returns the property’s value (if any). |
String get (String name) | Returns the key’s value (if any). |
Date getDate (Property property) | Returns the value of Date of metadata property. |
String[] getValues (Property property) | Returns all the values of metadata associated with property. |
String[] getValues (String name) | Returns all the values of a given metadata key. |
String[] names() | Returns all the key names of metadata elements in a metadata object. |
set (Property property, Date date) | Sets the date of the given metadata property |
set(Property property, String[] values) | Sets multiple values for a metadata property. |
LanguageIdentifier Class
This class used to identify the language of the given content.
package : org.apache.tika.language
Constructors
Constructor | Description |
LanguageIdentifier (LanguageProfile profile) | Instantiates the language identifier for parameter LanguageProfile. |
LanguageIdentifier (String content) | Instantiates the language identifier for text content. |
Methods
Methods | Description |
String getLanguage () | Returns the language of the content of current LanguageIdentifier object. |
You must log in to post a comment.