[Solved]org.apache.tika.exception.TikaMemoryLimitException


TikaMemoryLimitException is a subclass of TikaException. This exception generally occurred when there are lots of nested or embedded files within documents.

For Example :

  1.  Maven jars: Where one jar contains pom having a reference of other dependencies
  2. Git objects
  3. Word documents having lots of embedded files.

For parsing these nested/embedded files a large number of memory required that’s the reason for parser consuming memory up to highest mark will through this exception.

Solutions

  1. Set memory uses limit for TIKA as much as possible. at least more than 1 GB
  2. Make a common practice to shield the input stream with CloseShieldInputStreams so that it can fail if reaching the max limit.

Generally in TIKA, these allocations were coming from TikaInputStream.get(InputStream, TemporaryResources) which check if the type of InputStream for identify it’s support mark or not.

  • BufferedInputStream
  • ByteArrayInputStream

Unfortunately, because of this common practice to wrap InputStreams in CloseShieldInputStreams, causing this exception even if the mark is in fact supported.

public class TikaMemoryLimitException extends TikaException

Constructors

  • TikaMemoryLimitException(String msg)

References

https://tika.apache.org/1.22/api/org/apache/tika/exception/TikaMemoryLimitException.html

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s