Monday, August 20, 2018

Apache Commons ArrayUtils.toString(Object) versus JDK Arrays.toString(Object)

Apache Commons Lang provides an ArrayUtils class that includes the method toString(Object) that "Outputs an array as a String." In this post, I look at situations in which this method might still be useful when the JDK provides the Arrays.toString(Object[]) method [and several overloaded versions of that method on the Arrays class for arrays of primitive types].

At one point, a reason to use the Apache Commons Lang ArrayUtils.toString(Object) method might have been that there was no alternative provided by the JDK. Arrays.toString(Object[]) was introduced with J2SE 5 (late 2004) where as Apache Commons Lang has had ArrayUtils.toString(Object) since at least Lang 2.0 (late 2003). Although there's just over a year's difference between those releases, many organizations are more willing to upgrade libraries than JDK versions, so it's possible some organizations elected to use the Lang API because they had not yet adopted JDK even after its General Availability release. By today, however, it's likely only a very, very small percentage of Java deployments are not using JDK 5 or later, so this no longer seems to be a reason to use the Apache Commons Lang method instead of the JDK method in newly written code.

Another reason Apache Commons Lang's ArrayUtils.toString(Object) might be selected over the JDK's Arrays.toString(Object[]) is the format of the string constructed from the array of elements. This doesn't seem like a very compelling motivation because the respective outputs are not that different.

For my examples in this post, I'm assuming an array of Strings defined like this:

/** Array of {@code String}s used in these demonstrations. */
private static final String[] strings
   = {"Dustin", "Inspired", "Actual", "Events"};

When the array defined above is passed to the JDK's Arrays.toString(Object[]) and to Apache Commons Lang's ArrayUtils.toString(Object), the respectively generated String representation of the array by each is compared below.

JDK Arrays.toString(Object[]) vs. ACL ArrayUtils.toString(Object)
Comparing Single String Output of Typical Java Array
Input Array JDK Arrays.toString(Object[]) Apache Commons Lang ArrayUtils.toString(Object)
{"Dustin", "Inspired", "Actual", "Events"} [Dustin, Inspired, Actual, Events] {Dustin,Inspired,Actual,Events}

The table illustrates that both methods' generated Strings are very similar in substantive terms, but there are cosmetic differences in their output. The JDK version surrounds the array contents of the generated string with square braces while the Apache Commons Lang version surrounds the array contents with curly braces. The other obvious difference is that the JDK array elements are presented in the string with a delimiter that consists of a comma and space while the Apache Commons Lang representation of the array elements delimits those elements with just a comma and no space.

If the Apache Commons Lang ArrayUtils.toString(Object) allowed the "style" of its output to be customized, that might strengthen the argument that its style of representation is an advantage. However, as can be seen in the method's implementation, it always uses ToStringStyle.SIMPLE_STYLE.

Another minor difference between the two approaches being discussed here for presenting a Java array as a single String representation is the handling of null passed to the methods. Both methods return a non-null, non-empty String when passed null, but the contents of that string differ depending on the implementation invoked. The JDK's Arrays.toString(Object[]) returns the string "null" when null is passed to it while the Apache Commons Lang's ArrayUtils.toString(Object) returns the string "{}" when null is passed to it.

The "{}" returned by ArrayUtils.toString(Object) is easy to understand and is some ways is more aesthetically pleasing for presenting the string version of null provided for an array. However, it could be argued that the "{}" implies an empty array instead of a null. The Apache Commons Language version does indeed return the same "{}" string for an empty array as well (and that matches exactly how one would declare an empty array with an array initializer). The JDK's Arrays.toString(Object[]) method provides the string "null" for null input and provides "[]" for an empty array input.

It could be argued that the JDK approach of presenting the string version of a null array parameter as "null" is more consistent with what a Java developer might expect given other similar situations in which a String representation of null is provided. Both the implicit String conversion of null (see Section 5.1.11 of Java SE 10 Language Specification for more details) and the String returned by calling String.valueOf(Object) on a null parameter present the string "null". The implicit String conversion of null for an array type results in the "null" string as well.

Another difference between ArrayUtils.toString(Object) and Arrays.toString(Object[]) is the type of the parameter expected by each method. The ArrayUtils.toString(Object) implementation expects an Object and so accepts just about anything one wants to provide to it. The JDK's Arrays.toString(Object[]), forces an array (or null) to be provided to it and non-array types cannot be provided to it. It's debatable which approach is better, but I personally generally prefer more strictly typed APIs that only allow what they advertise (help enforce their contract). In this case, because the desired functionality is to pass in an array and have a String representation of that array returned, I prefer the more definitively typed method that expects an array. On the other hand, one might argue that they prefer the method that accepts a general Object because then any arbitrary object (such as a Java collection) can be passed to the method.

In general, I don't like the idea of using a method on a class called ArrayUtils to build a String representation of anything other than an array. I have seen this method used on Java collections, but that is unnecessary as Java collections already provide reasonable toString() implementations (arrays cannot override Object's toString() and that's why they require these external methods to do so for them). It's also unnecessary to use ArrayUtils.toString(Object) to ensure that a null is handled without a NullPointerException because Objects.toString(Object) and String.valueOf(Object) handle that scenario nicely and don't pretend to be an "array" method (in fact, they don't help much with arrays).

The difference in parameter expected by each implementation that provides a String representation of a provided array leads into the motivation that I believe is most compelling for choosing the third party library-provided ArrayUtils.toString(Object) over the built-in Arrays.toString(Object[]), but it's for one specific case that this is a significant advantage: multi-dimensional Java arrays. The JDK's Arrays.toString(Object[]) is designed for a single-dimensional Java array only. The Apache Commons Lang ArrayUtils.toString(Object), however, nicely supports presenting a single String representation even of multi-dimensional Java arrays. Its method-level Javadoc advertises this advantage: "Multi-dimensional arrays are handled correctly, including multi-dimensional primitive arrays." To illustrate the differences in these methods' output for a multi-dimensional array, I'll be using this ridiculously contrived example:

/** Two-dimensional array of {@code String}s used in demonstrations. */
private static final String[][] doubleDimStrings
   = {{"Dustin"}, {"Inspired", "Actual", "Events"}};

The output from passing that two-dimensional array of Strings to the respective methods is shown in the following table.

JDK Arrays.toString(Object[]) vs. ACL ArrayUtils.toString(Object)
Comparing Single String Output of Two-Dimensional Array
Input Array JDK Arrays.toString(Object[]) Apache Commons Lang ArrayUtils.toString(Object)
{{"Dustin"}, {"Inspired", "Actual", "Events"}} [[Ljava.lang.String;@135fbaa4, [Ljava.lang.String;@45ee12a7] {{Dustin},{Inspired,Actual,Events}}

The table just shown demonstrates that the JDK's Arrays.toString() is not particularly helpful once a Java array has more than a single dimension. The Apache Commons Lang's ArrayUtils.toString(Object) is able to present a nice single String representation even of the multi-dimensional array.

I have intentionally steered clear of comparing the two alternatives covered in this post in terms of performance because I have rarely found the performance difference of these types of methods to matter in my daily work. However, if this functionality was needed in a case where every millisecond counted, then it might be worth trying each in realistic scenarios to choose the one that works best. My intuition tells me that the JDK implementation would generally perform better (especially if working with arrays of primitives and able to use one of Arrays's overloaded toString() methods intended for primitives), but my intuition has been wrong before when it comes to questions of performance.

The following table summarizes my post's discussion on characteristics of Apache Commons Lang's (version 3.7) ArrayUtils.toString(Object) and the JDK's (JDK 10) Arrays.toString(Object[]).

JDK Arrays.toString(Object[]) vs. ACL ArrayUtils.toString(Object)
Input Type JDK Arrays.toString(Object[]) Apache Commons Lang ArrayUtils.toString(Object)
Single-Dimension Array "[Dustin, Inspired, Actual, Events]" "{Dustin,Inspired,Actual,Events}"
Double-Dimension Array "[[Ljava.lang.String;@135fbaa4, [Ljava.lang.String;@45ee12a7]" "{{Dustin},{Inspired,Actual,Events}}"
null "null" "{}"
Empty Single-Dimension Array "[]" "{}"

This post has looked at some possible motivations for choosing the third-party Apache Commons Lang's ArrayUtils.toString(Object) over the built-in JDK's Arrays.toString(Object[]) for generating single String representations of arrays. I find the most obvious situation to choose this particular third-party library over the built-in alternative is for multi-dimensional arrays.

No comments: