TIKA supports these documents formats. Here you will also get list of parser with respect to format and MIME Type.
Format | Parser | MIME Type |
HyperText Markup Language | HtmlParser | text/html application/vnd.wap.xhtml+xml application/x-asp application/xhtml+xml |
XML and derived formats | DcXMLParser | |
Microsoft Office document formats | OfficeParser | |
OOXMLParser | application/vnd.ms-powerpoint.template.macroenabled.12 application/vnd.ms-excel.addin.macroenabled.12 application/vnd.openxmlformats-officedocument.wordprocessingml.template application/vnd.ms-excel.sheet.binary.macroenabled.12 application/vnd.openxmlformats-officedocument.wordprocessingml.document application/vnd.ms-powerpoint.slide.macroenabled.12 application/vnd.ms-visio.drawing application/vnd.ms-powerpoint.slideshow.macroenabled.12 application/vnd.ms-powerpoint.presentation.macroenabled.12 application/vnd.openxmlformats-officedocument.presentationml.slide application/vnd.ms-excel.sheet.macroenabled.12 application/vnd.ms-word.template.macroenabled.12 application/vnd.ms-word.document.macroenabled.12 application/vnd.ms-powerpoint.addin.macroenabled.12 application/vnd.openxmlformats-officedocument.spreadsheetml.template application/vnd.ms-xpsdocument application/vnd.ms-visio.drawing.macroenabled.12 application/vnd.ms-visio.template.macroenabled.12 model/vnd.dwfx+xps application/vnd.openxmlformats-officedocument.presentationml.template application/vnd.openxmlformats-officedocument.presentationml.presentation application/vnd.openxmlformats-officedocument.spreadsheetml.sheet application/vnd.ms-visio.stencil application/vnd.ms-visio.template application/vnd.openxmlformats-officedocument.presentationml.slideshow application/vnd.ms-visio.stencil.macroenabled.12 application/vnd.ms-excel.template.macroenabled.12 |
|
OldExcelParser | application/vnd.ms-excel.workspace.3 application/vnd.ms-excel.workspace.4 application/vnd.ms-excel.sheet.2 application/vnd.ms-excel.sheet.3 application/vnd.ms-excel.sheet.4 |
|
SpreedsheetMLParser | ||
WordMLParser | application/vnd.ms-wordml | |
Word2006MlParser | application/vnd.ms-word2006ml | |
MSOwnerFileParser | application/x-ms-owner | |
OpenDocument Format | OpenDocumentParser | application/x-vnd.oasis.opendocument.presentation application/vnd.oasis.opendocument.chart application/x-vnd.oasis.opendocument.text-web application/x-vnd.oasis.opendocument.image application/vnd.oasis.opendocument.graphics-template application/vnd.oasis.opendocument.text-web application/x-vnd.oasis.opendocument.spreadsheet-template application/vnd.oasis.opendocument.spreadsheet-template application/vnd.sun.xml.writer application/x-vnd.oasis.opendocument.graphics-template application/vnd.oasis.opendocument.graphics application/vnd.oasis.opendocument.spreadsheet application/x-vnd.oasis.opendocument.chart application/x-vnd.oasis.opendocument.spreadsheet application/vnd.oasis.opendocument.image application/x-vnd.oasis.opendocument.text application/x-vnd.oasis.opendocument.text-template application/vnd.oasis.opendocument.formula-template application/x-vnd.oasis.opendocument.formula application/vnd.oasis.opendocument.image-template application/x-vnd.oasis.opendocument.image-template application/x-vnd.oasis.opendocument.presentation-template application/vnd.oasis.opendocument.presentation-template application/vnd.oasis.opendocument.text application/vnd.oasis.opendocument.text-template application/vnd.oasis.opendocument.chart-template application/x-vnd.oasis.opendocument.chart-template application/x-vnd.oasis.opendocument.formula-template application/x-vnd.oasis.opendocument.text-master application/vnd.oasis.opendocument.presentation application/x-vnd.oasis.opendocument.graphics application/vnd.oasis.opendocument.formula application/vnd.oasis.opendocument.text-master |
iWorks document formats | IWorkPackageParser | application/vnd.apple.keynote application/vnd.apple.iwork application/vnd.apple.numbers application/vnd.apple.pages |
WordPerfect document formats | WordPerfectParser | application/vnd.wordperfect; version=5.1 application/vnd.wordperfect; version=5.0 application/vnd.wordperfect; version=6.x org.apache.tika.parser.xml.DcXMLParser application/xml image/svg+xml |
QuattroProParser | application/x-quattro-pro; version=9 | |
Portable Document Format | PDFParser | application/pdf |
Electronic Publication Format | EpubParser | application/x-ibooks+zip application/epub+zip |
FictionBookParser | application/x-fictionbook+xml org.gagravarr.tika.FlacParser audio/x-oggflac audio/x-flac |
|
Rich Text Format | RTFParser | application/rtf |
Compression and packaging formats | CompressorParser | application/zlib application/x-gzip application/x-bzip2 application/x-compress application/x-java-pack200 application/x-lzma application/deflate64 application/x-lz4 application/x-snappy application/x-brotli application/gzip application/x-bzip application/x-xz |
PackageParser | application/x-tar application/java-archive application/x-arj application/x-archive application/zip application/x-cpio application/x-tika-unix-dump application/x-7z-compressed |
|
RarParser | application/x-rar-compressed | |
AppleSingleFileParser | application/applefile | |
Text formats | TXTParser | |
Feed and Syndication formats | FeedParser | application/atom+xml application/rss+xml |
IptcAnpaParser | text/vnd.iptc.anpa | |
Help formats | ChmParser | application/vnd.ms-htmlhelp application/x-chm application/chm |
Audio formats | AudioParser | audio/vnd.wave audio/x-wav audio/basic audio/x-aiff |
MidiParser | application/x-midi audio/midi |
|
Mp3Parser | audio/mpeg | |
Mp4Parser | video/x-m4v application/mp4 video/3gpp video/3gpp2 video/quicktime audio/mp4 video/mp4 |
|
VorbisParser | audio/vorbis | |
OpusParser | audio/opus audio/ogg; codecs=opus |
|
SpeexParser | audio/ogg; codecs=speex audio/speex |
|
FlacParser | ||
Image formats | ImageParser | image/png image/vnd.wap.wbmp image/x-jbig2 image/bmp image/x-xcf image/gif image/x-icon image/x-ms-bmp |
JpegParser | image/jpeg | |
TiffParser | image/tiff | |
PSDParser | image/vnd.adobe.photoshop | |
BPGParser | image/bpg image/x-bpg |
|
WebPParser | image/webp | |
ICNSParser | image/icns | |
TesseractOCRParser | ||
WMFParser | image/wmf | |
EMFParser | image/emf | |
Video formats | FLVParser | video/x-flv |
Mp4Parser | video/x-m4v application/mp4 video/3gpp video/3gpp2 video/quicktime audio/mp4 video/mp4 |
|
OggParser | audio/ogg application/kate application/ogg video/daala video/x-ogguvs video/x-ogm audio/x-oggpcm video/ogg video/x-dirac video/x-oggrgb video/x-oggyuv |
|
TheoraParser | video/theora | |
PooledTimeSeriesParser | ||
Java class files and archives | ClassParser | application/java-vm |
Source code | SourceCodeParser | text/x-c++src text/x-groovy text/x-java-source |
Mail formats | MboxParser | application/mbox |
RFC822Parser | message/rfc822 | |
OutlookPSTParser | application/vnd.ms-outlook-pst | |
OfficeParser | application/x-tika-msoffice-embedded; format=ole10_native application/msword application/vnd.visio application/vnd.ms-project application/x-tika-msworks-spreadsheet application/x-mspublisher application/vnd.ms-powerpoint application/x-tika-msoffice application/sldworks application/x-tika-ooxml-protected application/vnd.ms-excel application/vnd.ms-outlook |
|
TNEFParser | application/vnd.ms-tnef application/x-tnef application/ms-tnef |
|
CAD formats | DWGParser | image/vnd.dwg |
Font formats | TrueTypeParser | application/x-font-ttf |
AdobeFontMetricParser | application/x-font-adobe-metric | |
Scientific formats | DIFParser | application/dif+xml |
GDALParser | application/x-gsc image/x-ozi application/x-pds image/eir application/x-usgs-dem application/aaigrid application/x-bag application/elas application/x-rs2 application/x-tsx application/x-lcp image/geotiff application/x-mbtiles application/x-cappi application/x-netcdf application/x-gsag application/x-epsilon application/x-ace2 application/jaxa-pal-sar image/x-pcraster application/x-msgn image/arg application/x-hdf image/x-mff application/x-kro image/x-hdf5-image image/x-dimap image/x-srp image/big-gif application/x-envi application/x-cosar application/x-ntv2 image/bmp application/x-doq2 application/x-bt application/x-kml application/x-gmt application/x-rst application/vrt application/pcisdk application/x-ctg application/x-e00-grid application/x-rik image/ida image/x-mff2 application/sdts-raster application/x-snodas image/jp2 image/sar-ceos application/terragen application/x-wcs application/leveller application/x-ingr application/x-gtx image/sgi application/x-pnm image/raster application/fits application/x-r image/gif application/x-envi-hdr application/x-http application/x-rmf application/x-ecrg-toc application/aig application/x-rpf-toc image/adrg application/x-srtmhgt application/x-generic-bin application/jdem image/x-airsar application/x-webp application/x-ngs-geoid application/x-pcidsk image/x-fujibas application/x-wms application/x-map image/ceos application/xpm application/x-zmap image/envisat application/x-ers application/x-doq1 application/x-isis2 application/x-nwt-grd application/x-ppi image/ilwis application/x-isis3 application/x-nwt-grc application/x-blx application/gff application/x-ndf image/jpeg application/x-geo-pdf application/x-l1b image/fit application/x-gsbg application/x-sdat application/x-ctable2 application/x-grib application/x-coasp application/x-dipex application/grass-ascii-grid image/fits application/x-til application/x-dods image/png application/x-gxf application/x-gs7bg application/x-cpg application/x-lan application/x-xyz image/bsb application/x-p-aux application/dted application/x-rasterlite image/nitf image/hfa application/x-fast application/x-los-las |
|
GeographicInformationParser | text/iso19139+xml | |
GeoParser | application/geotopic | |
GribParser | application/x-grib2 | |
HDFParser | application/x-hdf | |
ISArchiveParser | application/x-isatab | |
NetCDFParser | application/x-netcdf | |
MatParser | application/x-matlab-data | |
Executable programs and libraries | ExecutableParser | application/x-msdownload application/x-sharedlib application/x-elf application/x-object application/x-executable application/x-coredump |
Crypto formats | Pkcs7Parser | application/pkcs7-signature application/pkcs7-mime |
TSDParser | ||
Database formats | SQLite3Parser | |
JackcessParser | application/x-msaccess | |
DBFParser | application/x-dbf | |
Natural Language Processing | SentimentParser | |
JournalParser | ||
Image and Video object recognition | Tika recognization package |
You must log in to post a comment.