As I recorded in my last blog entry, OpenXML4J is a Java library under development to open, access, and modify Office 2007 applications' data files in Office Open XML format. I also mentioned that Apache POI has provided useful Excel manipulation support in Java for versions of Excel prior to Excel 2007, but its HSSF library does not support Office Open XML formats (ECMA-376).
It looks like the Apache POI team has started working on support for Office 2007 applications. This work is often referenced as OOXML support for Apache POI. It sounds like they have a nice initial design involving a single interface with two different implementations, one for Office 2007 and one for Office applications pre-dating Office 2007. Another potential promising development is the talk of the Apache POI team and the OpenXML4J team working together. It would be nice to have one area of Java where we don't have multiple and completely disparate APIs and frameworks for doing the same thing.
The article Using Java to Crack Office 2007 shows how one might use Java directly to access Office 2007 files. While Office Open XML format is significantly more approachable than the previous formats used in Office applications, this article demonstrates that it would still be nice to have a library to abstract away some of the details. This is what I hope future versions of Apache POI, or OpenXML4J, or preferably both with the same interchangeable API will offer us in the near future.
Here are some other interesting web links to additional information regarding Apache POI support for Microsoft Office 2007 applications.
Design of OOXML Support Code (16 January 2008)
Initial OOXML Support (30 December 2007)
Using POI with Excel 2007 (30 May 2007)
Fun with Interfaces (Problems in pre-J2SE 5)
Dustin's Software Development Cogitations and Speculations My observations and thoughts concerning software development (general development, Java, JavaFX, Groovy, Flex, ...). Select posts from this blog are syndicated on DZone and Java Code Geeks and were formerly syndicated on JavaWorld.
Showing posts with label POI. Show all posts
Showing posts with label POI. Show all posts
Monday, February 4, 2008
Saturday, February 2, 2008
OpenXML4J: Reading Office 2007 File Properties
Happy Groundhog Day (movie)!
Apache POI is an excellent library for reading, writing, and manipulating Excel spreadsheets. Apache POI's HSSF ("Horrible Spreadsheet Format") support is particularly good and useful. The most significant disadvantage of Apache POI is that it does not support the new format used with Microsoft Office 2007 applications.
For a while, the disadvantage of Office 2007's new file format will not be as large on the effectiveness of Apache POI as it might be because of the following reasons:
However, as time goes on, the ability to read, write, and manipulate Office 2007 files in Java will become more desirable and eventually a necessity. While the new format used in Microsoft 2007 applications (Office Open XML - ECMA-376) is simply composed of XML files zipped up in a zip file, any Java application accessing these file types would need to extract or build the zip file and would also need to understand the underlying XML format. To answer this need, work has begun on an open source library called OpenXML4J. This library recently transitioned from pre-Alpha to Alpha status (on 21 January 2008). The license for this library is currently your choice of BSD license or Apache v2 license.
In this blog entry, I will focus on using OpenXML4J to extract properties from Office 2007 application files. The code sample that will be shown below will extract properties from three types of Office 2007 files (PowerPoint, Word, and Excel files). As a lazy developer, I chose to use some existing files for this. I use Office 2003, but I was able to save these three files in the Office Open XML format in PowerPoint, Word, and Excel using the downloaded Microsoft conversion pack. For the PowerPoint and Word examples, I saved my presentation and paper (respectively) for RMOUG Training Days 2008 ("Excel with Apache POI and Oracle Database"). For the Excel example, I am using a simple spreadsheet containing important dates for submitting papers, submitting presentations, and actually presenting the materials for various conferences this spring. For this example, the contents of the various files are not all that important except for the Microsoft Office properties associated with each.
The next three screen shots show the properties window for the three application data files. The contents of these windows are important because they are what we will be extracting directly from the file using Java and OpenXML4J. As with all images in this blog, please click on any image to see a larger version of the image.
PowerPoint: marx-poi.pptx Properties

Word: marx-poi.docx Properties

Excel: 2008Conferences.xlsx Properties

Before moving onto the example that demonstrates use of OpenXML4J to extract properties from Office 2007 application data files, I will first demonstrate the XML contents of the new Office Open XML formats. In the following three examples, I have used Java's jar command (
Contents of
Contents of
Contents of
All of the above has been background information on the new Office Open XML format employed by Office 2007. Now, here is the Java code that uses OpenXML4J to extract the properties from a PowerPoint presentation, a Word document, and an Excel spreadsheet.
As you can see, the OpenXML4J API for accessing properties is extremely straightforward. I did not get all properties, but did show examples of obtaining the more important Office properties. The code sample also demonstrates the need to catch the checked exception
The next screen shot (click on it to see larger image) shows the output from running the code above against the three Office 2007-compatible (Office Open XML format) documents described earlier in this blog.

This output demonstrates some useful things to know when using OpenXML4J. For example, I needed to place a log4j JAR file and a dom4j JAR file on the classpath when using the OpenXML4J library. In this case, I knew that the Spring framework download with dependencies contains these libraries and so I simply used those versions. The screen shot also shows that I used the Alpha version of OpenXML4J released on 21 January 2008 (
There is, of course, much more to OpenXML4J than the extraction of properties from Office 2007 files and we can expect much more to come as well as it moves to production support and quality.
Apache POI is an excellent library for reading, writing, and manipulating Excel spreadsheets. Apache POI's HSSF ("Horrible Spreadsheet Format") support is particularly good and useful. The most significant disadvantage of Apache POI is that it does not support the new format used with Microsoft Office 2007 applications.
For a while, the disadvantage of Office 2007's new file format will not be as large on the effectiveness of Apache POI as it might be because of the following reasons:
- Not everyone has migrated to Office 2007 and so there are still many earlier versions of Office products in existence and new files are still being created in the older POI-supported format.
- Those who have migrated to Office 2007 can still save their files in the old format when necessary to share with others or for POI's use. Likewise, they canalso read the old format and thus read POI-generated files.
- Files that are created with POI can be opened up in an earlier version of Office products and converted to Office 2007 format using the freely provided converter.
However, as time goes on, the ability to read, write, and manipulate Office 2007 files in Java will become more desirable and eventually a necessity. While the new format used in Microsoft 2007 applications (Office Open XML - ECMA-376) is simply composed of XML files zipped up in a zip file, any Java application accessing these file types would need to extract or build the zip file and would also need to understand the underlying XML format. To answer this need, work has begun on an open source library called OpenXML4J. This library recently transitioned from pre-Alpha to Alpha status (on 21 January 2008). The license for this library is currently your choice of BSD license or Apache v2 license.
In this blog entry, I will focus on using OpenXML4J to extract properties from Office 2007 application files. The code sample that will be shown below will extract properties from three types of Office 2007 files (PowerPoint, Word, and Excel files). As a lazy developer, I chose to use some existing files for this. I use Office 2003, but I was able to save these three files in the Office Open XML format in PowerPoint, Word, and Excel using the downloaded Microsoft conversion pack. For the PowerPoint and Word examples, I saved my presentation and paper (respectively) for RMOUG Training Days 2008 ("Excel with Apache POI and Oracle Database"). For the Excel example, I am using a simple spreadsheet containing important dates for submitting papers, submitting presentations, and actually presenting the materials for various conferences this spring. For this example, the contents of the various files are not all that important except for the Microsoft Office properties associated with each.
The next three screen shots show the properties window for the three application data files. The contents of these windows are important because they are what we will be extracting directly from the file using Java and OpenXML4J. As with all images in this blog, please click on any image to see a larger version of the image.
PowerPoint: marx-poi.pptx Properties

Word: marx-poi.docx Properties

Excel: 2008Conferences.xlsx Properties

Before moving onto the example that demonstrates use of OpenXML4J to extract properties from Office 2007 application data files, I will first demonstrate the XML contents of the new Office Open XML formats. In the following three examples, I have used Java's jar command (
jar tvf
) to display the contents of these files.Contents of
marx-poi.pptx
13756 Tue Jan 01 00:00:00 MST 1980 [Content_Types].xml
737 Tue Jan 01 00:00:00 MST 1980 _rels/.rels
598 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide11.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide12.xml.rels
598 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide13.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide15.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide10.xml.rels
598 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide16.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide17.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide14.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide8.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide18.xml.rels
909 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide1.xml.rels
462 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide2.xml.rels
462 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide3.xml.rels
598 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide4.xml.rels
462 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide5.xml.rels
462 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide6.xml.rels
641 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide7.xml.rels
462 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide9.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide20.xml.rels
5977 Tue Jan 01 00:00:00 MST 1980 ppt/_rels/presentation.xml.rels
829 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide31.xml.rels
839 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide32.xml.rels
882 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide33.xml.rels
1263 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide34.xml.rels
1080 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide35.xml.rels
893 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide36.xml.rels
1237 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide37.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide30.xml.rels
1025 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide29.xml.rels
951 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide28.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide21.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide22.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide23.xml.rels
648 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide24.xml.rels
688 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide25.xml.rels
659 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide26.xml.rels
647 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide27.xml.rels
463 Tue Jan 01 00:00:00 MST 1980 ppt/slides/_rels/slide19.xml.rels
4714 Tue Jan 01 00:00:00 MST 1980 ppt/presentation.xml
2750 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide26.xml
3103 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide30.xml
2200 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide16.xml
6513 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide15.xml
2682 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide31.xml
5865 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide14.xml
2206 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide13.xml
3046 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide25.xml
13352 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide12.xml
6083 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide17.xml
5929 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide18.xml
4080 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide29.xml
3007 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide24.xml
5206 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide23.xml
4207 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide27.xml
4579 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide22.xml
3483 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide21.xml
4177 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide28.xml
6277 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide20.xml
7659 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide19.xml
7534 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide11.xml
4180 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide32.xml
4283 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide4.xml
3439 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide35.xml
3378 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide6.xml
2118 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide2.xml
3049 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide33.xml
3335 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide5.xml
2754 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide36.xml
3992 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide37.xml
2806 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide3.xml
2955 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide7.xml
3515 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide9.xml
2836 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide10.xml
3605 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide1.xml
35794 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide8.xml
4031 Tue Jan 01 00:00:00 MST 1980 ppt/slides/slide34.xml
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide11.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide10.xml.rels
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout12.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide37.xml.rels
447 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide5.xml.rels
447 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide2.xml.rels
447 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide9.xml.rels
447 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide4.xml.rels
447 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide6.xml.rels
447 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide3.xml.rels
447 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide7.xml.rels
447 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide8.xml.rels
447 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide1.xml.rels
20729 Tue Jan 01 00:00:00 MST 1980 ppt/slideMasters/slideMaster1.xml
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide13.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide30.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide29.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide28.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide27.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide31.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide32.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide33.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide34.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide35.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide36.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide26.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide25.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide24.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide17.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide16.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide15.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide14.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide18.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide19.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide20.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide21.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide22.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide23.xml.rels
448 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/_rels/notesSlide12.xml.rels
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout11.xml.rels
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout9.xml.rels
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide11.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide10.xml
1616 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide9.xml
2337 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide8.xml
1616 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide7.xml
1616 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide6.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide12.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide13.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide14.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide19.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide18.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide17.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide16.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide15.xml
1696 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide5.xml
1616 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide4.xml
1616 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide3.xml
1736 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout6.xml
6603 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout5.xml
4030 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout4.xml
3076 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout3.xml
2376 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout2.xml
7618 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout1.xml
1439 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout7.xml
4188 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout8.xml
4164 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout9.xml
1616 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide2.xml
1616 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide1.xml
2203 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout12.xml
2656 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout11.xml
2431 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/slideLayout10.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide20.xml
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout10.xml.rels
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide22.xml
2284 Tue Jan 01 00:00:00 MST 1980 ppt/slideMasters/_rels/slideMaster1.xml.rels
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide37.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide36.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide35.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide34.xml
447 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout1.xml.rels
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout2.xml.rels
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout3.xml.rels
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout8.xml.rels
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout7.xml.rels
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout6.xml.rels
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout5.xml.rels
311 Tue Jan 01 00:00:00 MST 1980 ppt/slideLayouts/_rels/slideLayout4.xml.rels
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide33.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide21.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide31.xml
1734 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide27.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide26.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide25.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide24.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide23.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide28.xml
2685 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide32.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide29.xml
1617 Tue Jan 01 00:00:00 MST 1980 ppt/notesSlides/notesSlide30.xml
521299 Tue Jan 01 00:00:00 MST 1980 ppt/media/image2.jpeg
20941 Tue Jan 01 00:00:00 MST 1980 ppt/media/image7.png
1557708 Tue Jan 01 00:00:00 MST 1980 ppt/media/image4.jpeg
114888 Tue Jan 01 00:00:00 MST 1980 ppt/media/image5.png
292 Tue Jan 01 00:00:00 MST 1980 ppt/notesMasters/_rels/notesMaster1.xml.rels
9241 Tue Jan 01 00:00:00 MST 1980 ppt/theme/theme1.xml
69148 Tue Jan 01 00:00:00 MST 1980 ppt/media/image3.jpeg
43096 Tue Jan 01 00:00:00 MST 1980 docProps/thumbnail.wmf
25927 Tue Jan 01 00:00:00 MST 1980 ppt/media/image6.png
6962 Tue Jan 01 00:00:00 MST 1980 ppt/theme/theme2.xml
93711 Tue Jan 01 00:00:00 MST 1980 ppt/media/image1.jpeg
10986 Tue Jan 01 00:00:00 MST 1980 ppt/media/image8.wmf
9070 Tue Jan 01 00:00:00 MST 1980 ppt/notesMasters/notesMaster1.xml
376 Tue Jan 01 00:00:00 MST 1980 ppt/presProps.xml
1045 Tue Jan 01 00:00:00 MST 1980 ppt/viewProps.xml
182 Tue Jan 01 00:00:00 MST 1980 ppt/tableStyles.xml
986 Tue Jan 01 00:00:00 MST 1980 docProps/core.xml
3736 Tue Jan 01 00:00:00 MST 1980 docProps/app.xml
Contents of
marx-poi.docx
2143 Tue Jan 01 00:00:00 MST 1980 [Content_Types].xml
590 Tue Jan 01 00:00:00 MST 1980 _rels/.rels
10765 Tue Jan 01 00:00:00 MST 1980 word/_rels/document.xml.rels
135202 Tue Jan 01 00:00:00 MST 1980 word/document.xml
319 Tue Jan 01 00:00:00 MST 1980 word/_rels/footer2.xml.rels
887 Tue Jan 01 00:00:00 MST 1980 word/footer2.xml
939 Tue Jan 01 00:00:00 MST 1980 word/footer1.xml
319 Tue Jan 01 00:00:00 MST 1980 word/_rels/footer1.xml.rels
822 Tue Jan 01 00:00:00 MST 1980 word/header1.xml
946 Tue Jan 01 00:00:00 MST 1980 word/endnotes.xml
952 Tue Jan 01 00:00:00 MST 1980 word/footnotes.xml
2751 Tue Jan 01 00:00:00 MST 1980 word/media/image1.png
6992 Tue Jan 01 00:00:00 MST 1980 word/theme/theme1.xml
122008 Tue Jan 01 00:00:00 MST 1980 word/media/image2.png
356 Tue Jan 01 00:00:00 MST 1980 word/_rels/settings.xml.rels
6519 Tue Jan 01 00:00:00 MST 1980 word/settings.xml
654 Tue Jan 01 00:00:00 MST 1980 word/webSettings.xml
1828 Tue Jan 01 00:00:00 MST 1980 word/fontTable.xml
750 Tue Jan 01 00:00:00 MST 1980 docProps/app.xml
1089 Tue Jan 01 00:00:00 MST 1980 docProps/core.xml
1680 Tue Jan 01 00:00:00 MST 1980 word/numbering.xml
23802 Tue Jan 01 00:00:00 MST 1980 word/styles.xml
Contents of
2008Conferences.xlsx
1780 Tue Jan 01 00:00:00 MST 1980 [Content_Types].xml
588 Tue Jan 01 00:00:00 MST 1980 _rels/.rels
980 Tue Jan 01 00:00:00 MST 1980 xl/_rels/workbook.xml.rels
625 Tue Jan 01 00:00:00 MST 1980 xl/workbook.xml
6995 Tue Jan 01 00:00:00 MST 1980 xl/theme/theme1.xml
605 Tue Jan 01 00:00:00 MST 1980 xl/worksheets/_rels/sheet1.xml.rels
518 Tue Jan 01 00:00:00 MST 1980 xl/worksheets/sheet2.xml
518 Tue Jan 01 00:00:00 MST 1980 xl/worksheets/sheet3.xml
1882 Tue Jan 01 00:00:00 MST 1980 xl/drawings/vmlDrawing1.vml
1748 Tue Jan 01 00:00:00 MST 1980 xl/sharedStrings.xml
3998 Tue Jan 01 00:00:00 MST 1980 xl/styles.xml
3530 Tue Jan 01 00:00:00 MST 1980 xl/worksheets/sheet1.xml
994 Tue Jan 01 00:00:00 MST 1980 xl/comments1.xml
364 Tue Jan 01 00:00:00 MST 1980 xl/printerSettings/printerSettings1.bin
1028 Tue Jan 01 00:00:00 MST 1980 docProps/core.xml
866 Tue Jan 01 00:00:00 MST 1980 docProps/app.xml
All of the above has been background information on the new Office Open XML format employed by Office 2007. Now, here is the Java code that uses OpenXML4J to extract the properties from a PowerPoint presentation, a Word document, and an Excel spreadsheet.
package marx.openxml4j;
import org.openxml4j.exceptions.InvalidFormatException;
import org.openxml4j.opc.Package;
import org.openxml4j.opc.PackageAccess;
import org.openxml4j.opc.PackageProperties;
import org.openxml4j.util.Nullable;
/**
* Example demonstrating how to use OpenXML4J to read properties from Office
* 2007 formatted files (Excel, Word, and PowerPoint).
*/
public class OpenXML4JExample
{
/**
* Read an Excel document with the Office Open XML format used for Office
* 2007 products.
*
* @param aFilePathAndName
*/
public void readOfficeDocumentProperties(final String aFilePathAndName)
{
try
{
Package pkg = Package.open(aFilePathAndName, PackageAccess.READ);
PackageProperties properties = pkg.getPackageProperties();
final Nullable<String> title = properties.getTitleProperty();
final Nullable<String> language = properties.getLanguageProperty();
final Nullable<String> category = properties.getCategoryProperty();
final Nullable<String> keywords = properties.getKeywordsProperty();
final Nullable<String> subject = properties.getSubjectProperty();
final Nullable<String> description = properties.getDescriptionProperty();
final Nullable<String> version = properties.getVersionProperty();
final Nullable<String> creator = properties.getCreatorProperty();
if ( title.hasValue() )
{
System.out.println( aFilePathAndName + " has a title of '"
+ title.getValue() + "'");
}
if ( language.hasValue() )
{
System.out.println( aFilePathAndName + " uses the language "
+ language.getValue() );
}
if ( category.hasValue() )
{
System.out.println( aFilePathAndName + " is in the category of '"
+ category.getValue() + "'");
}
if ( keywords.hasValue() )
{
System.out.println( aFilePathAndName + " has the keywords: "
+ keywords.getValue() );
}
if ( subject.hasValue() )
{
System.out.println( aFilePathAndName + "'s subject is "
+ subject.getValue() );
}
if ( description.hasValue() )
{
System.out.println( aFilePathAndName + "'s description is "
+ description.getValue() );
}
if ( version.hasValue() )
{
System.out.println( aFilePathAndName + " is version "
+ version.getValue() );
}
if ( creator.hasValue() )
{
System.out.println( aFilePathAndName + "'s creator is "
+ creator.getValue() );
}
}
catch (InvalidFormatException invalidFormatEx) // checked exception
{
System.err.println( "Invalid format (looking for Office 2007 format):"
+ invalidFormatEx.getMessage() );
}
}
/**
* Print a separator to stdout to delineate example file being run.
*
* @param aSeparatorTitle Title to place in separator header.
*/
public static void printSeparatorHeader(final String aSeparatorTitle)
{
System.out.println(
"-------------------------------------------------------------------");
System.out.println(
"-- " + aSeparatorTitle );
System.out.println(
"-------------------------------------------------------------------");
}
/**
* Run test showing how to extract Office 2007 format properties from a
* PowerPoint presentation, a Word document, and an Excel spreadsheet using
* the OpenXML4J library (currently in Alpha).
*
* @param aCommandLineArgs Command-line arguments.
*/
public static void main(final String[] aCommandLineArgs)
{
String officeFilePathAndName;
OpenXML4JExample me = new OpenXML4JExample();
// Example Microsoft Office 2007 PowerPoint presentation
officeFilePathAndName = "C:\\sample2007\\marx-poi.pptx";
printSeparatorHeader("PowerPoint Example: " + officeFilePathAndName);
me.readOfficeDocumentProperties(officeFilePathAndName);
// Example Microsoft Office 2007 Word document
officeFilePathAndName = "C:\\sample2007\\marx-poi.docx";
printSeparatorHeader("Word Example: " + officeFilePathAndName);
me.readOfficeDocumentProperties(officeFilePathAndName);
// Example Microsoft Office 2007 Excel spreadsheet
officeFilePathAndName = "C:\\sample2007\\2008Conferences.xlsx";
printSeparatorHeader("Excel Example: " + officeFilePathAndName);
me.readOfficeDocumentProperties(officeFilePathAndName);
}
}
As you can see, the OpenXML4J API for accessing properties is extremely straightforward. I did not get all properties, but did show examples of obtaining the more important Office properties. The code sample also demonstrates the need to catch the checked exception
org.openxml4j.exceptions.InvalidFormatException
.The next screen shot (click on it to see larger image) shows the output from running the code above against the three Office 2007-compatible (Office Open XML format) documents described earlier in this blog.

This output demonstrates some useful things to know when using OpenXML4J. For example, I needed to place a log4j JAR file and a dom4j JAR file on the classpath when using the OpenXML4J library. In this case, I knew that the Spring framework download with dependencies contains these libraries and so I simply used those versions. The screen shot also shows that I used the Alpha version of OpenXML4J released on 21 January 2008 (
openxml4j-bin-alpha-080121.jar
).There is, of course, much more to OpenXML4J than the extraction of properties from Office 2007 files and we can expect much more to come as well as it moves to production support and quality.
Monday, December 24, 2007
Using the Apache POI POIBrowser
The Javadoc-generated documentation for POIBrowser package demonstrates how to run the POIBrowser class that is distributed with Apache POI. The current version of this package documentation that is available online (http://poi.apache.org/apidocs/org/apache/poi/contrib/poibrowser/package-summary.html#package_description) demonstrates running POIBrowser with version 2.5.1. In this blog entry, I intend to demonstrate running POIBrowser on Windows (Vista) with Apache POI 3.0.1. It is pretty much the same as running the 2.5.1 version, but with the obvious change of specifying the JAR files with the new version numbers in them. I'll also add some other notes related to running the POI Browser.
The first screen shot here (click on it to see zoom in on it) shows how I can run the POIBrowser. I intentionally included a "dir" command and its results in the screen shot to show which directory I was in. This directory was formed by downloading and unzipping the binary distribution of Apache POI and then later downloading and expanding the source distribution into the

As seen in the above screen shot, I need to supply the Microsoft Office files that I wanted browsed with POIBrowser as command-line arguments to the POIBrowser. I have placed some example Microsoft Office files in my directory
The files in the test directory are named
To run POIBrowser on these three files, the following command was used:
The POIBrowser is a basic Swing application and running it as shown above leads to a simple Swing HMI with the title "POI Browser 0.09" and an item that can be clicked on to drop down other choices with the label "POI Filesystems."
The next image (click on it to see larger version), shows how the HMI looks after clicking on the icon to the left of "POI Filesystems."

The files that were passed on the command-line to POIBrowser are listed in inverse order on the tool. Each of these can be drilled down into to view more details about that file. For example, the next snapshot (click on image to see it larger) shows some of the summary details available on the PowerPoint document. The Word document shows similar details.

There is lots of summary and document summary information on both the PowerPoint document and on the Word document in my example, but, as the next image shows, there is not much for the Excel spreadsheet. This isn't due to any Excel limitation, but is instead due to the fact that the viewed Excel document happens to have been generated by my POI-based sample application, which did not bother trying to populate this information. The PowerPoint and Word documents in this example were generated using PowerPoint and Word respectively and so had this information populated.

The primary and most helpful use of POIBrowser is to actually look at the source code to see how the Apache POI API (specifically POI-HPSF) can be used to access property settings of Microsoft Office documents.
The first screen shot here (click on it to see zoom in on it) shows how I can run the POIBrowser. I intentionally included a "dir" command and its results in the screen shot to show which directory I was in. This directory was formed by downloading and unzipping the binary distribution of Apache POI and then later downloading and expanding the source distribution into the
src
and alt-src
directories. I like to have the source directories with this open source product because the examples provide good illustrations of how to use Apache POI APIs.
As seen in the above screen shot, I need to supply the Microsoft Office files that I wanted browsed with POIBrowser as command-line arguments to the POIBrowser. I have placed some example Microsoft Office files in my directory
C:\test
(I wanted to act on copies rather than original files). It is important to note that Apache POI does not work on the new Office Open XML format introduced with the Microsoft Office 2007 products. The files in this test directory were created with Office 2003 products and so POI can access those files' contents.The files in the test directory are named
marx-poi.doc
(Microsoft Word 2003), marx-poi.ppt
(Microsoft PowerPoint 2003), and EmployeesReport.xls
(Microsoft Excel 2003). These files are all in-work files related to my presentation at the upcoming RMOUG Training Days 2008 on "Excel with Apache POI and Oracle Database." The PowerPoint file is the slide presentation, the Word document is the associated white paper, and the Excel spreadsheet was generated from the Oracle database-provided HR schema using Apache POI.To run POIBrowser on these three files, the following command was used:
java -cp poi-3.0.1-FINAL-20070705.jar;poi-contrib-3.0.1-FINAL-20070705.jar org.apache.poi.contrib.poibrowser.POIBrowser C:\test\marx-poi.doc C:\test\EmployeesReport.xls C:\test\marx-poi.ppt
The POIBrowser is a basic Swing application and running it as shown above leads to a simple Swing HMI with the title "POI Browser 0.09" and an item that can be clicked on to drop down other choices with the label "POI Filesystems."
The next image (click on it to see larger version), shows how the HMI looks after clicking on the icon to the left of "POI Filesystems."

The files that were passed on the command-line to POIBrowser are listed in inverse order on the tool. Each of these can be drilled down into to view more details about that file. For example, the next snapshot (click on image to see it larger) shows some of the summary details available on the PowerPoint document. The Word document shows similar details.

There is lots of summary and document summary information on both the PowerPoint document and on the Word document in my example, but, as the next image shows, there is not much for the Excel spreadsheet. This isn't due to any Excel limitation, but is instead due to the fact that the viewed Excel document happens to have been generated by my POI-based sample application, which did not bother trying to populate this information. The PowerPoint and Word documents in this example were generated using PowerPoint and Word respectively and so had this information populated.

The primary and most helpful use of POIBrowser is to actually look at the source code to see how the Apache POI API (specifically POI-HPSF) can be used to access property settings of Microsoft Office documents.
Saturday, October 27, 2007
Apache POI: Use HSSFWorkbook to Create New Cell Style
The Busy Developer's Guide to HSSF Features (PDF format) is an excellent resource for learning how to use Apache POI (formerly Jakarta POI) in conjunction with Java applications for manipulating Microsoft Excel files. This document lives up to its name (for Busy Developers) and provides a concise but highly useful introduction to the POI HSSF API. It would be nice if all open source products had guides this easy for beginners to use.
This guide points out (in the Creating Date Cells section) that new cell styles (HSSFCellStyle) must be created from a workbook (HSSFWorkbook) or else any changes to the HSSFCellStyle will impact other (perhaps even ALL) cells in the spreadsheet generated with Apache POI.
For example, the following code may not work as one might think from looking at it:
From first glance at the above code, it is easy to believe that the various "set" methods called on the passed-in HSSFCell will only impact that particular HSSFCell instance because
The following example shows this in action:
Notice that in the first example, a Cell Style instance is obtained using a getCellStyle method (
In the second example, a completely new instantiation of cell style is obtained with a call to
In essence, there are two opposite approaches to updating cell styles here. The first, getting an existing style from any average cell, works with the assumption that all cells have an existing and same style. Changing that style once therefore changes all the cells' styles. The latter approach, creating a new cell style from a workbook, makes the opposite assumption (that no cells are affected by the newly created style) and cells to which the style should apply must be explicitly associated with the new style.
This significant distinction between acquiring a cell style from a cell or creating a new cell style instance via a workbook is documented in the Quick Start Guide as mentioned above and is not a big issue once it is understood. However, this is the type of issue that is a good example of how important it can be to know more about an API than simply what a favorite IDE lists as available through class and method name completion.
UPDATE (21 December 2007): The HSSFCellUtil Javadoc-based API documentation explains the reason for the different behaviors in acquiring and setting a Cell Style. According to the class description for HSSFCellUtil , Excel has a limitation on the number of styles it can support and so it makes sense to have a single style apply to all cells within a sheet and be controlled by that setting (HSSFCell.setCellStyle()). The HSSFWorkbook.createCellStyle() method actually creates a truly different cell style for individual cell styling needs. The Javadoc method documentation for the
This guide points out (in the Creating Date Cells section) that new cell styles (HSSFCellStyle) must be created from a workbook (HSSFWorkbook) or else any changes to the HSSFCellStyle will impact other (perhaps even ALL) cells in the spreadsheet generated with Apache POI.
For example, the following code may not work as one might think from looking at it:
/**
* Set the style of the supplied cell to be default
* header cell style.
*
* @param aHeaderCell Cell to which default header
* style should be applied.
*/
public static void setDefaultHeaderCellStyle(
final HSSFCell aHeaderCell )
{
final HSSFCellStyle cellStyle = aHeaderCell.getCellStyle();
cellStyle.setAlignment( HSSFCellStyle.ALIGN_CENTER );
cellStyle.setFillPattern( HSSFCellStyle.SPARSE_DOTS );
cellStyle.setBorderBottom( HSSFCellStyle.BORDER_DOUBLE );
cellStyle.setBorderLeft( HSSFCellStyle.BORDER_DOUBLE );
cellStyle.setBorderRight( HSSFCellStyle.BORDER_DOUBLE );
cellStyle.setBorderTop( HSSFCellStyle.BORDER_DOUBLE );
}
From first glance at the above code, it is easy to believe that the various "set" methods called on the passed-in HSSFCell will only impact that particular HSSFCell instance because
getCellStyle()
was called on that instance. However, this code, as shown, actually impacts all cells on the sheet from which this cell came. To ensure that only specific cells are impacted by a custom cell style, one should get the HSSFCellStyle from the workbook as suggested in the Quick Start Guide.The following example shows this in action:
/**
* Create a new cell style designed for header-oriented cells.
* The HSSFWorkbook is required to obtain a newly generated
* cell style instance so that cells other than those to which
* this style is set will not be impacted.
*
* @param aWorkbook Workbook for which cell style should apply.
*/
public static HSSFCellStyle createDefaultHeaderCellStyle(
final HSSFWorkbook aWorkbook )
{
final HSSFCellStyle cellStyle = aWorkbook.createCellStyle();
cellStyle.setAlignment( HSSFCellStyle.ALIGN_CENTER );
cellStyle.setFillPattern( HSSFCellStyle.SPARSE_DOTS );
cellStyle.setBorderBottom( HSSFCellStyle.BORDER_DOUBLE );
cellStyle.setBorderLeft( HSSFCellStyle.BORDER_DOUBLE );
cellStyle.setBorderRight( HSSFCellStyle.BORDER_DOUBLE );
cellStyle.setBorderTop( HSSFCellStyle.BORDER_DOUBLE );
return cellStyle;
}
Notice that in the first example, a Cell Style instance is obtained using a getCellStyle method (
HSSFCell.getCellStyle()
). This gets an already instantiated cell style that applies to many more cells than just the cell upon which the getCellStyle() method was called. This means that any changes to the returned cell style impact all cells tied to that cell style.In the second example, a completely new instantiation of cell style is obtained with a call to
HSSFWorkbook.createCellStyle()
. This creates a new instance independent of any cells. In fact, to make this newly instantiated cell style apply to any cells, one must call setCellStyle()
on the cell (HSSFCell.setCellStyle()
) to which the newly created style should apply.In essence, there are two opposite approaches to updating cell styles here. The first, getting an existing style from any average cell, works with the assumption that all cells have an existing and same style. Changing that style once therefore changes all the cells' styles. The latter approach, creating a new cell style from a workbook, makes the opposite assumption (that no cells are affected by the newly created style) and cells to which the style should apply must be explicitly associated with the new style.
This significant distinction between acquiring a cell style from a cell or creating a new cell style instance via a workbook is documented in the Quick Start Guide as mentioned above and is not a big issue once it is understood. However, this is the type of issue that is a good example of how important it can be to know more about an API than simply what a favorite IDE lists as available through class and method name completion.
UPDATE (21 December 2007): The HSSFCellUtil Javadoc-based API documentation explains the reason for the different behaviors in acquiring and setting a Cell Style. According to the class description for HSSFCellUtil , Excel has a limitation on the number of styles it can support and so it makes sense to have a single style apply to all cells within a sheet and be controlled by that setting (HSSFCell.setCellStyle()). The HSSFWorkbook.createCellStyle() method actually creates a truly different cell style for individual cell styling needs. The Javadoc method documentation for the
HSSFCell.setCellStyle(HSSFCellStyle)
method does warn that the passed-in HSSFCellStyle
"should be an HSSFCellStyle created/retreived from the HSSFWorkbook."
Thursday, October 25, 2007
My Planned Presentations for RMOUG Training Days 2008
I received notification that Rocky Mountain Oracle Users Group (RMOUG) has accepted two of my abstracts for presentation at RMOUG's Training Days 2008 conference. The presentations are "Introduction to XQuery: A Multipurpose XML Tool" and "Excel with Apache POI and Oracle Database." Training Days will be February 13-14, 2008, at the Colorado Convention Center.
Presentations from several recent RMOUG Training Days conferences are available at the following links:
Other Pages / Blogs on RMOUG Training Days 2008
This portion of this blog entry will be updated regularly as new pages and blogs reference RMOUG Training Days 2008 or RMOUG Training Days in general.
Oracle 2008 Denver Conferences: RMOUG and IOUG Collaborate 2008 (Burleson Consulting)
JL Computer Consultancy Appearances
RMOUG Training Days (W3W3)
Presentations from several recent RMOUG Training Days conferences are available at the following links:
Other Pages / Blogs on RMOUG Training Days 2008
This portion of this blog entry will be updated regularly as new pages and blogs reference RMOUG Training Days 2008 or RMOUG Training Days in general.
Oracle 2008 Denver Conferences: RMOUG and IOUG Collaborate 2008 (Burleson Consulting)
JL Computer Consultancy Appearances
RMOUG Training Days (W3W3)
Subscribe to:
Posts (Atom)