Parsing and Mapping a Docx file with Java

The docx is a standard document format, first introduced in Microsoft Office 2007. It stores documents as a set of individual folders and files in a zip archive. The main content is located in the file document.xml in the folder word.xml. It contains the actual text and some styling information of the entire document. Java provides us with the class ZipFile. We create a new instance and pass our docx file as a.param to the constructor. To the method public ZipEntry getEntry getEnter​(String name) we pass the entry that we want to read. We return the input stream of that specific entry, so that we can read its contents.

Evgenij Reznik

