Tag Archives: Tika MIME Type

[Solved] org.apache.tika.mime.MimeTypeException


MimeTypeException is a subclass of TikaException. This exception occurred when there is a mismatch with selected parser and document mime type or Mime Type not supported by TIKA.

public class MimeTypeException extends TikaException

Constructors

  • MimeTypeException(String message) :Constructs a MimeTypeException with the specified detail message.
  • MimeTypeException(String message, Throwable cause)
    Constructs a MimeTypeException with the specified detail message and root cause.

References

https://tika.apache.org/1.22/api/org/apache/tika/mime/MimeTypeException.html

TIKA Supported Document Formats


TIKA supports these documents formats. Here you will also get list of parser with respect to format and MIME Type.

Format Parser MIME Type
HyperText Markup Language HtmlParser text/html
application/vnd.wap.xhtml+xml
application/x-asp
application/xhtml+xml
XML and derived formats DcXMLParser
Microsoft Office document formats OfficeParser
OOXMLParser application/vnd.ms-powerpoint.template.macroenabled.12
application/vnd.ms-excel.addin.macroenabled.12
application/vnd.openxmlformats-officedocument.wordprocessingml.template
application/vnd.ms-excel.sheet.binary.macroenabled.12
application/vnd.openxmlformats-officedocument.wordprocessingml.document
application/vnd.ms-powerpoint.slide.macroenabled.12
application/vnd.ms-visio.drawing
application/vnd.ms-powerpoint.slideshow.macroenabled.12
application/vnd.ms-powerpoint.presentation.macroenabled.12
application/vnd.openxmlformats-officedocument.presentationml.slide
application/vnd.ms-excel.sheet.macroenabled.12
application/vnd.ms-word.template.macroenabled.12
application/vnd.ms-word.document.macroenabled.12
application/vnd.ms-powerpoint.addin.macroenabled.12
application/vnd.openxmlformats-officedocument.spreadsheetml.template
application/vnd.ms-xpsdocument
application/vnd.ms-visio.drawing.macroenabled.12
application/vnd.ms-visio.template.macroenabled.12
model/vnd.dwfx+xps
application/vnd.openxmlformats-officedocument.presentationml.template
application/vnd.openxmlformats-officedocument.presentationml.presentation
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.ms-visio.stencil
application/vnd.ms-visio.template
application/vnd.openxmlformats-officedocument.presentationml.slideshow
application/vnd.ms-visio.stencil.macroenabled.12
application/vnd.ms-excel.template.macroenabled.12
OldExcelParser application/vnd.ms-excel.workspace.3
application/vnd.ms-excel.workspace.4
application/vnd.ms-excel.sheet.2
application/vnd.ms-excel.sheet.3
application/vnd.ms-excel.sheet.4
SpreedsheetMLParser
WordMLParser application/vnd.ms-wordml
Word2006MlParser application/vnd.ms-word2006ml
MSOwnerFileParser application/x-ms-owner
OpenDocument Format OpenDocumentParser application/x-vnd.oasis.opendocument.presentation
application/vnd.oasis.opendocument.chart
application/x-vnd.oasis.opendocument.text-web
application/x-vnd.oasis.opendocument.image
application/vnd.oasis.opendocument.graphics-template
application/vnd.oasis.opendocument.text-web
application/x-vnd.oasis.opendocument.spreadsheet-template
application/vnd.oasis.opendocument.spreadsheet-template
application/vnd.sun.xml.writer
application/x-vnd.oasis.opendocument.graphics-template
application/vnd.oasis.opendocument.graphics
application/vnd.oasis.opendocument.spreadsheet
application/x-vnd.oasis.opendocument.chart
application/x-vnd.oasis.opendocument.spreadsheet
application/vnd.oasis.opendocument.image
application/x-vnd.oasis.opendocument.text
application/x-vnd.oasis.opendocument.text-template
application/vnd.oasis.opendocument.formula-template
application/x-vnd.oasis.opendocument.formula
application/vnd.oasis.opendocument.image-template
application/x-vnd.oasis.opendocument.image-template
application/x-vnd.oasis.opendocument.presentation-template
application/vnd.oasis.opendocument.presentation-template
application/vnd.oasis.opendocument.text
application/vnd.oasis.opendocument.text-template
application/vnd.oasis.opendocument.chart-template
application/x-vnd.oasis.opendocument.chart-template
application/x-vnd.oasis.opendocument.formula-template
application/x-vnd.oasis.opendocument.text-master
application/vnd.oasis.opendocument.presentation
application/x-vnd.oasis.opendocument.graphics
application/vnd.oasis.opendocument.formula
application/vnd.oasis.opendocument.text-master
iWorks document formats IWorkPackageParser application/vnd.apple.keynote
application/vnd.apple.iwork
application/vnd.apple.numbers
application/vnd.apple.pages
WordPerfect document formats WordPerfectParser application/vnd.wordperfect; version=5.1
application/vnd.wordperfect; version=5.0
application/vnd.wordperfect; version=6.x
org.apache.tika.parser.xml.DcXMLParser
application/xml
image/svg+xml
QuattroProParser application/x-quattro-pro; version=9
Portable Document Format PDFParser application/pdf
Electronic Publication Format EpubParser application/x-ibooks+zip
application/epub+zip
FictionBookParser application/x-fictionbook+xml
org.gagravarr.tika.FlacParser
audio/x-oggflac
audio/x-flac
Rich Text Format RTFParser application/rtf
Compression and packaging formats CompressorParser application/zlib
application/x-gzip
application/x-bzip2
application/x-compress
application/x-java-pack200
application/x-lzma
application/deflate64
application/x-lz4
application/x-snappy
application/x-brotli
application/gzip
application/x-bzip
application/x-xz
PackageParser application/x-tar
application/java-archive
application/x-arj
application/x-archive
application/zip
application/x-cpio
application/x-tika-unix-dump
application/x-7z-compressed
RarParser application/x-rar-compressed
AppleSingleFileParser application/applefile
Text formats TXTParser
Feed and Syndication formats FeedParser application/atom+xml
application/rss+xml
IptcAnpaParser text/vnd.iptc.anpa
Help formats ChmParser application/vnd.ms-htmlhelp
application/x-chm
application/chm
Audio formats AudioParser audio/vnd.wave
audio/x-wav
audio/basic
audio/x-aiff
MidiParser application/x-midi
audio/midi
Mp3Parser audio/mpeg
Mp4Parser video/x-m4v
application/mp4
video/3gpp
video/3gpp2
video/quicktime
audio/mp4
video/mp4
VorbisParser audio/vorbis
OpusParser audio/opus
audio/ogg; codecs=opus
SpeexParser audio/ogg; codecs=speex
audio/speex
FlacParser
Image formats ImageParser image/png
image/vnd.wap.wbmp
image/x-jbig2
image/bmp
image/x-xcf
image/gif
image/x-icon
image/x-ms-bmp
JpegParser image/jpeg
TiffParser image/tiff
PSDParser image/vnd.adobe.photoshop
BPGParser image/bpg
image/x-bpg
WebPParser image/webp
ICNSParser image/icns
TesseractOCRParser
WMFParser image/wmf
EMFParser image/emf
Video formats FLVParser video/x-flv
Mp4Parser video/x-m4v
application/mp4
video/3gpp
video/3gpp2
video/quicktime
audio/mp4
video/mp4
OggParser audio/ogg
application/kate
application/ogg
video/daala
video/x-ogguvs
video/x-ogm
audio/x-oggpcm
video/ogg
video/x-dirac
video/x-oggrgb
video/x-oggyuv
TheoraParser video/theora
PooledTimeSeriesParser
Java class files and archives ClassParser application/java-vm
Source code SourceCodeParser text/x-c++src
text/x-groovy
text/x-java-source
Mail formats MboxParser application/mbox
RFC822Parser message/rfc822
OutlookPSTParser application/vnd.ms-outlook-pst
OfficeParser application/x-tika-msoffice-embedded; format=ole10_native
application/msword
application/vnd.visio
application/vnd.ms-project
application/x-tika-msworks-spreadsheet
application/x-mspublisher
application/vnd.ms-powerpoint
application/x-tika-msoffice
application/sldworks
application/x-tika-ooxml-protected
application/vnd.ms-excel
application/vnd.ms-outlook
TNEFParser application/vnd.ms-tnef
application/x-tnef
application/ms-tnef
CAD formats DWGParser image/vnd.dwg
Font formats TrueTypeParser application/x-font-ttf
AdobeFontMetricParser application/x-font-adobe-metric
Scientific formats DIFParser application/dif+xml
GDALParser application/x-gsc
image/x-ozi
application/x-pds
image/eir
application/x-usgs-dem
application/aaigrid
application/x-bag
application/elas
application/x-rs2
application/x-tsx
application/x-lcp
image/geotiff
application/x-mbtiles
application/x-cappi
application/x-netcdf
application/x-gsag
application/x-epsilon
application/x-ace2
application/jaxa-pal-sar
image/x-pcraster
application/x-msgn
image/arg
application/x-hdf
image/x-mff
application/x-kro
image/x-hdf5-image
image/x-dimap
image/x-srp
image/big-gif
application/x-envi
application/x-cosar
application/x-ntv2
image/bmp
application/x-doq2
application/x-bt
application/x-kml
application/x-gmt
application/x-rst
application/vrt
application/pcisdk
application/x-ctg
application/x-e00-grid
application/x-rik
image/ida
image/x-mff2
application/sdts-raster
application/x-snodas
image/jp2
image/sar-ceos
application/terragen
application/x-wcs
application/leveller
application/x-ingr
application/x-gtx
image/sgi
application/x-pnm
image/raster
application/fits
application/x-r
image/gif
application/x-envi-hdr
application/x-http
application/x-rmf
application/x-ecrg-toc
application/aig
application/x-rpf-toc
image/adrg
application/x-srtmhgt
application/x-generic-bin
application/jdem
image/x-airsar
application/x-webp
application/x-ngs-geoid
application/x-pcidsk
image/x-fujibas
application/x-wms
application/x-map
image/ceos
application/xpm
application/x-zmap
image/envisat
application/x-ers
application/x-doq1
application/x-isis2
application/x-nwt-grd
application/x-ppi
image/ilwis
application/x-isis3
application/x-nwt-grc
application/x-blx
application/gff
application/x-ndf
image/jpeg
application/x-geo-pdf
application/x-l1b
image/fit
application/x-gsbg
application/x-sdat
application/x-ctable2
application/x-grib
application/x-coasp
application/x-dipex
application/grass-ascii-grid
image/fits
application/x-til
application/x-dods
image/png
application/x-gxf
application/x-gs7bg
application/x-cpg
application/x-lan
application/x-xyz
image/bsb
application/x-p-aux
application/dted
application/x-rasterlite
image/nitf
image/hfa
application/x-fast
application/x-los-las
GeographicInformationParser text/iso19139+xml
GeoParser application/geotopic
GribParser application/x-grib2
HDFParser application/x-hdf
ISArchiveParser application/x-isatab
NetCDFParser application/x-netcdf
MatParser application/x-matlab-data
Executable programs and libraries ExecutableParser application/x-msdownload
application/x-sharedlib
application/x-elf
application/x-object
application/x-executable
application/x-coredump
Crypto formats Pkcs7Parser application/pkcs7-signature
application/pkcs7-mime
TSDParser
Database formats SQLite3Parser
JackcessParser application/x-msaccess
DBFParser application/x-dbf
Natural Language Processing SentimentParser
JournalParser
Image and Video object recognition Tika recognization package

References

https://tika.apache.org/1.22/formats.html

TIKA Reference API


Java Programmers can integrate the Tika library in their applications by using the Tika facade class and other below classes.

Tika Class

Tika facade class abstracts the complexity and provides simple methods to explore the functionalities of TIKA.

package:org.apache.tika

Constructors

Followings are constructors of Tika class:

Constructor Description
Tika () Tika default constructor uses the default configuration and constructs the Tika class.
Tika (Detector detector) Creates the Tika facade class by accepting the detector instance as a parameter.
Tika (Detector detector, Parser parser) Creates a Tika facade class by accepting the detector and parser instances as parameters.
Tika (Detector detector, Parser parser, Translator translator) Creates the Tika facade class by accepting the detector, the parser, and the translator instance as parameters.
Tika (TikaConfig config) Creates a Tika facade class by accepting the object of the TikaConfig class as a parameter.

Methods and Description

The following are the important methods of the Tika facade class:

Method Description
parseToString (File file) This method parses and extract extracted text content in the String format. By default, string parameter length is limited.
int getMaxStringLength () This method returns the maximum length of strings returned by the method.
void setMaxStringLength (int maxStringLength) Set the maximum length of strings returned while extracting data from the file.
Reader parse (File file) This method parses and extract extracted text content in the form of java.io.reader object.
String detect (InputStream stream, Metadata metadata) This method accepts an InputStrea and Metadata of an object as parameters and returns the document type name.
String translate (InputStream text, String targetLanguage) This method accepts the InputStream and a String representing the language that we want our text to be translated. It returns, given text to the desired language, attempting to auto-detect the source language.

Parser Interface

This interface implemented by all the parser classes of the Tika package.

package: org.apache.tika.parser

Methods

This is the important method of Tika Parser interface −

Methods Description
parse (InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) This parse method use is given document input stream into a sequence of XHTML and SAX events. After parsing, it places the metadata in the object of MetaData class and extracted document content in the object of the ContentHandler class.

Metadata Class

This MetaData class implements various interfaces such as CreativeCommons, Geographic, HttpHeaders, Message, MSOffice, ClimateForcast, TIFF, TikaMetadataKeys, TikaMimeKeys, Serializable to support various data models.

package: org.apache.tika.metadata

Constructors

Constructor Description
Metadata() Constructs new, empty metadata.

Methods

Methods Description
add (Property property, String value) Adds a new metadata property in the form of key/value pair.
add (String name, String value) Adds a new metadata property in the form of key/value pair.
String get (Property property) Returns the property’s value (if any).
String get (String name) Returns the key’s value (if any).
Date getDate (Property property) Returns the value of Date of metadata property.
String[] getValues (Property property) Returns all the values of metadata associated with property.
String[] getValues (String name) Returns all the values of a given metadata key.
String[] names() Returns all the key names of metadata elements in a metadata object.
set (Property property, Date date) Sets the date of the given metadata property
set(Property property, String[] values) Sets multiple values for a metadata property.

LanguageIdentifier Class

This class used to identify the language of the given content.

package : org.apache.tika.language

Constructors

Constructor Description
LanguageIdentifier (LanguageProfile profile) Instantiates the language identifier for parameter LanguageProfile.
LanguageIdentifier (String content) Instantiates the language identifier for text content.

Methods

Methods Description
String getLanguage () Returns the language of the content of current LanguageIdentifier object.