Java: Read Text from an Image

Java provides net.sourceforge.tess4j library to read and extract text from the image. It makes developer life easy for applications where image reading is required.

Example of Reading/Extract Text from Image

  1.  In the hospital, If you have scanned your doctor given a prescription and then some hospitals maintain patient records based on detail. then in the next visit after so many days, if you forget to carry it and the doctor asked about the previous prescription then based on your mobile number, name or date can reprint your doctor prescribed detail.
  2. In Big Data where need to do some analysis based on the above cases can extract detail from images and show reports.

How Text Reading from image works?

In an image extracting text means finding out the text components and then extract the geometric shape components. These text components are extract with geometric components as well and the relationship between these components built up by flow lines between components. These extracted components are a form of metadata (XML format), stored in a knowledge base or shared with others.

Environment Setup

Download tessdata from below git directory and rename to tessdata. Place this folder to your application root directory as below.

https://github.com/tesseract-ocr/tessdata

Read Text from Image Directory

Dependency

Add below dependency in your you application pom.xml


    <dependency> 
        <groupId>net.sourceforge.tess4j</groupId> 
        <artifactId>tess4j</artifactId> 
        <version>3.2.1</version> 
    </dependency>

 

Java Code to Read Text from Image

In this example, you will see complete steps to read/extract text from an image.

Sample Image

test image

Java Code

In this below image you will see complete java lines of code to extract text from the image and output of sample image.

Java Code to Read Text from Image