Tuesday, February 8, 2011

Groovy Scripts Master Their Own Classpath with RootLoader

Groovy's RootLoader is a handy class that can be used to encapsulate a Groovy script's reference to external classpath dependencies within the script itself. The documentation for RootLoader states, "It's possible to add urls to the classpath at runtime through addURL(URL)." The ability to add to a Groovy script's classpath at runtime allows the script developer to write a Groovy script that can take care of its own bootstrapping and removes the need to necessarily start a Groovy script from a shell script that provides the classpath with the -classpath (or -cp) option.

This post is not my first to focus on using Groovy's RootLoader. My previous post Viewing Groovy Application's Classpath demonstrated using RootLoader to detect all of the resources already on Groovy's classpath. In this post, I look at using this same RootLoader to allow a Groovy script to bootstrap itself with appropriate classpath dependencies.

I am going to use a simple Groovy script with a dependency on the Oracle JDBC driver to illustrate. That script is called printEmployees.groovy and is shown next.

printEmployees.groovy
// printEmployees.groovy
import groovy.sql.Sql
sql = Sql.newInstance("jdbc:oracle:thin:@localhost:1521:orcl", "hr", "hr",
                      "oracle.jdbc.pool.OracleDataSource")
sql.eachRow("SELECT employee_id, last_name, first_name FROM employees")
{
   println "The employee's name is ${it.first_name} ${it.last_name}."
}

Supposing that the appropriate Oracle JDBC driver is located at C:\app\Dustin\product\11.1.0\db_1\jdbc\lib\ojdbc6.jar, this Groovy script could be run by specifying the Oracle JDBC driver on the command line like this:

groovy -cp C:\app\Dustin\product\11.1.0\db_1\jdbc\lib\ojdbc6.jar printEmployees

The next screen snapshot shows the beginning of the script's output when this is done against the 'hr' sample schema supplied with the Oracle database.


The above output is much better than that which is provided when no classpath is provided. The result of not specifying where the Groovy script can find the Oracle JDBC driver is shown in the next screen snapshot.


Using Groovy's -classpath (or -cp) option is not the only way to tell the Groovy script about a resource it requires. In Dustin's Blog (I like the name!), the Dustin Whitney post Groovy Classpath concisely describes another way to place a resource on a Groovy script's classpath. He simply states (I have added the emphasis): "You can place jars in your ${user.home}/.groovy/lib directory to have them automatically loaded into your classpath."

The next screen snapshot demonstrates that I have now tried this by placing the appropriate Oracle JAR file in the appropriate directory (C:\Users\Dustin\.groovy\lib in my case).


The next screen snapshot shows that the script can now be run without explicitly specifying the classpath on the command line. I added a line of Groovy to the original script to print out the location of "user.home": println "${System.getProperty('user.home')}". It prints out C:\Users\Dustin.


There is another directory common to all Groovy scripts in which dependent JARs can also be placed. This is the %GROOVY_HOME%\lib or $GROOVY_HOME/lib directory. Although I don't show it here, placing the Oracle JAR file in the Groovy distribution's lib directory makes it available to all Groovy scripts run from that distribution regardless of the user running the Groovy script.

There is a single file provided with the Groovy installation for controlling use of the directories for files placed automatically on the classpath. The directory for this configuration is %GROOVY_HOME%\conf or $GROOVY_HOME/conf and the file is named groovy-starter.conf. The next screen snapshot shows this on my current environment.


Within this file, there are two lines that set the directories where Groovy automatically looks for classpath entries. There are comments indicating which line "load[s] required libraries" (load !{groovy.home}/lib/*.jar) and which line "load[s] user specific libraries" (load !{user.home}/.groovy/lib/*.jar). A typical configuration is to have the user specific line commented out, but the comment character (#) can be removed so that the user directory's contents will be automatically on the classpath. As a complete example, here is the current groovy-starter.conf file in my environment:

##############################################################################
##                                                                          ##
##  Groovy Classloading Configuration                                       ##
##                                                                          ##
##############################################################################

##
## $Revision: 9225 $ $Date: 2007-11-15 21:17:45 +0100 (Do, 15. Nov 2007) $
##
## Note: do not add classes from java.lang here. No rt.jar and on some
##       platforms no tools.jar
##
## See http://groovy.codehaus.org/api/org/codehaus/groovy/tools/LoaderConfiguration.html
## for the file format

    # load required libraries
    load !{groovy.home}/lib/*.jar

    # load user specific libraries
    load !{user.home}/.groovy/lib/*.jar
    
    # tools.jar for ant tasks
    load ${tools.jar}

Note also that this configuration file refers to the URL http://groovy.codehaus.org/api/org/codehaus/groovy/tools/LoaderConfiguration.html, where the syntax of this file and its makeup are more comprehensively defined. The RootLoader receives prominent mention in this document.

The script works without explicitly specifying the classpath on the command line when the dependent JAR is placed either in the user's specific .groovy/lib directory or in the general Groovy distribution's $GROOVY_HOME/lib directory. The difference between the two is that one minimizes the dependent JAR's availability to classpaths of Groovy scripts run by the specific user while the other makes the dependent JAR available to classpaths of all Groovy scripts run from that distribution. In either case, the dependent JARs are necessarily present on either all of the user's Groovy scripts or all of the Groovy scripts run from a particular Groovy installation and are not limited to a single script.

The downside (or upside depending on perspective) is that this affects all Groovy scripts. It's akin to setting the environment variable CLASSPATH or to placing the JAR in a directory used for all Java applications rather than explicitly setting the classpath when running Java applications. In general, the Java equivalent is considered bad form and the Groovy version suffers the same potential drawbacks. Incidentally, setting the CLASSPATH environment variable to include whatever is required by the Groovy script works for Groovy as well as for Java.

Specifying the classpath used by a Groovy script using the groovy command's -classpath (or -cp) requires the person running the script to either type that in or requires a shell or other "outer" script to invoke the Groovy script. Even using the approach of placing a dependent JAR file in the .groovy/lib subdirectory of the user directory requires the person running the script to have placed the JAR there. The best solution is often the one that allows the script to contain its own classpath references. This is discussed next.

When it is less than desirable to have a shell script kick off a Groovy script and it is also undesirable to pollute other Groovy scripts' classpaths by placing dependent JARs in the CLASSPATH environment variable or in the %JAVA_HOME%\lib directory or in the user's .groovy/lib directory, the most desirable solution may be to dynamically add a dependency to the classpath using RootLoader. The next Groovy code listing shows the previously shown script amended to use the RootLoader to dynamically load the Oracle JDBC driver JAR and append it to the script's classpath.

// printEmployees.groovy
this.class.classLoader.rootLoader.addURL(
   new URL("file:///C:/app/Dustin/product/11.1.0/db_1/jdbc/lib/ojdbc6.jar"))
import groovy.sql.Sql
sql = Sql.newInstance("jdbc:oracle:thin:@localhost:1521:orcl", "hr", "hr",
                      "oracle.jdbc.pool.OracleDataSource")
sql.eachRow("SELECT employee_id, last_name, first_name FROM employees")
{
   println "The employee's name is ${it.first_name} ${it.last_name}."
}

The above script does not require the Oracle JDBC driver JAR to be explicitly specified on the command line and does not require the JAR to be in any Groovy-specific directory. This frees the script developer and the script's users from these command-line and directory dependencies and removes the risk of polluting other Groovy scripts' classpaths.


Conclusion

I often use shell scripts to start my Groovy scripts. Not only can the shell scripts properly specify the -classpath or -cp option when running the Groovy script, but they can also set JAVA_OPTS environment variable appropriately (such as for setting JVM heap sizing). However, there are times when it seems unnecessary or less than desirable to need two scripts (a shell and the Groovy script) to accomplish a single script's job. In such cases, the ability to dynamically append resources to the script's classpath via RootLoader is welcome.

6 comments:

Dustin said...

The three paragraphs, screen snapshot, and groovy-starter.conf listing related to the discussion of setting up Groovy configuration to look in certain directories for classpath entries were all added after the original blog post (later the same day).

Dustin said...

I have posted far more detailed information on using JAVA_OPTS in Groovy in the post Groovy Uses JAVA_OPTS.

Wolfgang Schell said...

Hi Dustin,

thanks for this article!

You are probably aware of Grape (http://groovy.codehaus.org/Grape). Is there a specific reason, you didn't use @Grab?

Regards,

Wolfgang

Dustin said...

Wolfgang,

I was aware of Grape's existence, but that was about the limit of my knowledge of Grape. Thanks to your post, I see that Grape and the @Grab annotation are useful for accessing dependent libraries in Ivy or Maven or other repositories. That's a nice way to access these libraries from a centralized repository. Thanks for pointing out this approach.

Dustin

Kenneth said...

On a linux machine, I keep getting "Caught: java.lang.NullPointerException: Cannot invoke method addURL() on null object" on the line: this.class.classLoader.rootLoader.addURL(new URL("file:////home/klee/zips/jdbc_software/ojdbc6.jar"))

Have you ever seen this or know of a solution?

Thanks,

Ken

Dustin said...

Kenneth,

I have not seen that problem myself, but I have read of others running into it. It sounds like it's typically an issue related with conflicting classloaders. The getRootLoader() method's Javadoc documentation states situations under which this method may return null: "Iterates through the classloader parents until it finds a loader with a class named 'org.codehaus.groovy.tools.RootLoader'. If there is no such class null will be returned."

It appears that people have had trouble with this when running in Eclipse, for example.