[Solved]org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes


TikaZeroByteException is a subclass of TikaException. TikaZeroByteException occurred when using AutoDetectParser to extract the content of the file which is having no text or zero-bytes. In this case, auto-detect parser throws TikaZeroByteException.


public class ZeroByteFileException extends TikaException

Constructors

  • ZeroByteFileException(String msg): This constructor used to throw an exception with a message.

ZeroByteFileException Example

Here is an example to parse content and metadata of text file by using AutoDetectParser. But it’s throwing an exception because it is not having any content/zero.

package com.fiot.tika.exceptions.handling;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.parser.txt.TXTParser;

import org.xml.sax.SAXException;

public class TikaTextParserExample {

   public static void main(final String[] args) throws IOException,SAXException, TikaException {

      //detecting the file type
      BodyContentHandler handler = new BodyContentHandler();
      Metadata metadata = new Metadata();
      FileInputStream inputstream = new FileInputStream(new File("C:\\Users\\Saurabh Gupta\\Desktop\\TIKA\\BLANK-FILE.txt"));
      ParseContext pcontext=new ParseContext();

      //auto detect document parser
      Parser  parser = new AutoDetectParser();
      parser.parse(inputstream, handler, metadata,pcontext);
      System.out.println("Contents of the text document:" + handler.toString());
      System.out.println("Metadata of the text document:");
      String[] metadataNames = metadata.names();

      for(String name : metadataNames) {
         System.out.println(name + " : " + metadata.get(name));
      }
   }
}

Output


Exception in thread "main" org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:122)
    at com.fiot.tika.exceptions.handling.TikaTextParserExample.main(TikaTextParserExample.java:29)

Solutions

To handle ZeroByteException there are two ways:

  1. Always check file size before use it.
  2. If you already know the content type of file using specific Parser. For Example in the above case replace the line with below text parser instance then no exception will occur.

Parser parser = new  TextParser();

References

https://tika.apache.org/1.22/api/org/apache/tika/exception/ZeroByteFileException.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s