Skip to main content

Posts

Showing posts with the label Apache Tika

Parse Text from PDF with OCR using Apache Tika

Apache Tika is opensource software working about OCR from PDF, Image file. In this example is using Java Maven project to work with Apache Tika. First of all, you need to add dependencies for using Apache Tika by add these dependencies into pom.xml <dependency> <groupId> org.apache.tika </groupId> <artifactId> tika-parsers </artifactId> <version> 1.18 </version> </dependency> <dependency> <groupId> com.levigo.jbig2 </groupId> <artifactId> levigo-jbig2-imageio </artifactId> <version> 2.0 </version> </dependency> <dependency> <groupId> com.github.jai-imageio </groupId> <artifactId> jai-imageio-core </artifactId> <version> 1.4.0 </version> </dependency> <depende...