Monday, December 24, 2007

Using the Apache POI POIBrowser

The Javadoc-generated documentation for POIBrowser package demonstrates how to run the POIBrowser class that is distributed with Apache POI. The current version of this package documentation that is available online (http://poi.apache.org/apidocs/org/apache/poi/contrib/poibrowser/package-summary.html#package_description) demonstrates running POIBrowser with version 2.5.1. In this blog entry, I intend to demonstrate running POIBrowser on Windows (Vista) with Apache POI 3.0.1. It is pretty much the same as running the 2.5.1 version, but with the obvious change of specifying the JAR files with the new version numbers in them. I'll also add some other notes related to running the POI Browser.

The first screen shot here (click on it to see zoom in on it) shows how I can run the POIBrowser. I intentionally included a "dir" command and its results in the screen shot to show which directory I was in. This directory was formed by downloading and unzipping the binary distribution of Apache POI and then later downloading and expanding the source distribution into the src and alt-src directories. I like to have the source directories with this open source product because the examples provide good illustrations of how to use Apache POI APIs.



As seen in the above screen shot, I need to supply the Microsoft Office files that I wanted browsed with POIBrowser as command-line arguments to the POIBrowser. I have placed some example Microsoft Office files in my directory C:\test (I wanted to act on copies rather than original files). It is important to note that Apache POI does not work on the new Office Open XML format introduced with the Microsoft Office 2007 products. The files in this test directory were created with Office 2003 products and so POI can access those files' contents.

The files in the test directory are named marx-poi.doc (Microsoft Word 2003), marx-poi.ppt (Microsoft PowerPoint 2003), and EmployeesReport.xls (Microsoft Excel 2003). These files are all in-work files related to my presentation at the upcoming RMOUG Training Days 2008 on "Excel with Apache POI and Oracle Database." The PowerPoint file is the slide presentation, the Word document is the associated white paper, and the Excel spreadsheet was generated from the Oracle database-provided HR schema using Apache POI.

To run POIBrowser on these three files, the following command was used:


java -cp poi-3.0.1-FINAL-20070705.jar;poi-contrib-3.0.1-FINAL-20070705.jar org.apache.poi.contrib.poibrowser.POIBrowser C:\test\marx-poi.doc C:\test\EmployeesReport.xls C:\test\marx-poi.ppt


The POIBrowser is a basic Swing application and running it as shown above leads to a simple Swing HMI with the title "POI Browser 0.09" and an item that can be clicked on to drop down other choices with the label "POI Filesystems."

The next image (click on it to see larger version), shows how the HMI looks after clicking on the icon to the left of "POI Filesystems."



The files that were passed on the command-line to POIBrowser are listed in inverse order on the tool. Each of these can be drilled down into to view more details about that file. For example, the next snapshot (click on image to see it larger) shows some of the summary details available on the PowerPoint document. The Word document shows similar details.



There is lots of summary and document summary information on both the PowerPoint document and on the Word document in my example, but, as the next image shows, there is not much for the Excel spreadsheet. This isn't due to any Excel limitation, but is instead due to the fact that the viewed Excel document happens to have been generated by my POI-based sample application, which did not bother trying to populate this information. The PowerPoint and Word documents in this example were generated using PowerPoint and Word respectively and so had this information populated.



The primary and most helpful use of POIBrowser is to actually look at the source code to see how the Apache POI API (specifically POI-HPSF) can be used to access property settings of Microsoft Office documents.

No comments: