Monday, April 14, 2014

Java program to read doc/docx file

Apache POI can be used to read doc/docx file. To know more click here.

You can download latest version of jar files(i.e. poi-bin-3.10-FINAL-20140208.zip) form below link.
http://poi.apache.org/download.html

Add below mentioned Jar files in your classpath -
  • poi-3.10-FINAL-20140208.jar
  • poi-ooxml-3.10-FINAL-20140208.jar
  • poi-ooxml-schemas-3.10-FINAL-20140208.jar
  • poi-scratchpad-3.10-FINAL-20140208.jar
  • xmlbeans-2.3.0.jar
  • dom4j-1.6.1.jar
Please see the self explanatory java code below



Below are the know errors and exception

1. When tried to read docx file using HWPFDocument

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.(POIFSFileSystem.java:138)
at org.apache.poi.hwpf.HWPFDocument.verifyAndBuildPOIFS(HWPFDocument.java:132)
at org.apache.poi.hwpf.HWPFDocument.(HWPFDocument.java:145)


2. When tried to read doc file using XWPFDocument

org.apache.poi.POIXMLException: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:41)
at org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:121)
at com.test.DocReader.main(DocReader.java:21)
Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:199)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:665)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
... 2 more

3. Missing xmlbeans-2.3.0.jar in classpath

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlException
at com.test.DocReader.readDocxFile(DocReader.java:42)
at com.test.DocReader.main(DocReader.java:57)
Caused by: java.lang.ClassNotFoundException: org.apache.xmlbeans.XmlException
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
... 2 more

4. Missing dom4j-1.6.1.jar in classpath

Exception in thread "main" java.lang.NoClassDefFoundError: org/dom4j/DocumentException
at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:154)
at org.apache.poi.openxml4j.opc.OPCPackage.(OPCPackage.java:141)
at org.apache.poi.openxml4j.opc.Package.(Package.java:37)
at org.apache.poi.openxml4j.opc.ZipPackage.(ZipPackage.java:83)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:272)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
at org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:121)
at com.test.DocReader.readDocxFile(DocReader.java:42)
at com.test.DocReader.main(DocReader.java:57)
Caused by: java.lang.ClassNotFoundException: org.dom4j.DocumentException
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
... 9 more

5. Missing poi-ooxml-schemas-3.10-FINAL-20140208.jar in classpath

Exception in thread "main" java.lang.NoClassDefFoundError: org/openxmlformats/schemas/wordprocessingml/x2006/main/CTHdrFtr
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389)
at java.lang.Class.getConstructor0(Class.java:2699)
at java.lang.Class.getDeclaredConstructor(Class.java:1985)
at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:59)
at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:426)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:155)
at org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:124)
at com.test.DocReader.readDocxFile(DocReader.java:42)
at com.test.DocReader.main(DocReader.java:57)
Caused by: java.lang.ClassNotFoundException: org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHdrFtr
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
... 10 more