Monday, September 17, 2018

Java Subtlety with Arrays of Primitives and Variable Arguments

An interesting question was posed in a comment on the DZone-syndicated version of my recent blog post "Arrays.hashCode() Vs. Objects.hash()". The comment's author set up examples similar to those used in my blog post and showed different results than I saw. I appreciate the comment author taking the time to post this as it brings up a subtle nuance in Java that I think is worth a blog post.

The comment author showed the following valid Java statements:

int[] arr = new int[]{1,2,3,4};
System.out.println(Arrays.hashCode(arr));
System.out.println(Objects.hash(1,2,3,4));
System.out.println(Arrays.hashCode(new Integer[]{new Integer(1),new Integer(2),new Integer(3),new Integer(4)}));
System.out.println(Objects.hash(new Integer(1),new Integer(2),new Integer(3),new Integer(4)));

The author of the comment mentioned that the results from running the code just shown were exactly the same for all four statements. This differed from my examples where the result from calling Arrays.hashCode(int[]) on an array of primitive int values was different than calling Objects.hash(Object...) on that same array of primitive int values.

One response to the original feedback comment accurately pointed out that hash codes generated on different JVMs are not guaranteed to be the same. In fact, the Javadoc comment for the Object.hashCode() method states (I added the emphasis):

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

Having stated all of this, the hash codes calculated for integers will typically be consistent from run to run. It was also interesting that the original commenter's examples' output all had exactly the same values. While I might not expect those values to match my examples' values, it is surprising that all of the examples provided by the commenter had the same answer.

The difference between the examples provided in the feedback comment and my examples comes down to how the commenter's example invoked Objects.hash(Object...) for an array of primitive int values versus how my example invoked Objects.hash(Object...) for an array of primitive int values. In my example, I passed the same local array to all the method calls. The commenter's example passed an explicit array of primitive int values to Arrays.hashCode(int[]), but passed individual int elements to Objects.hash(Object...) instead of passing the array to that latter method. When I add another example to the commenter's set of examples that does pass the array of primitive int values to the Objects.hash(Object...) method, I get a generated hash code that is different than all of the others. That enhanced code is shown next.

final int[] arr = new int[]{1,2,3,4};
out.println("Arrays.hashCode(int[]):              " + Arrays.hashCode(arr));
out.println("Objects.hash(int, int, int, int):    " + Objects.hash(1,2,3,4));
out.println("Objects.hash(int[]):                 " + Objects.hash(arr));
out.println("Objects.hashCode(Object):            " + Objects.hashCode(arr));
out.println("int[].hashCode():                    " + arr.hashCode());
out.println("Arrays.hashCode(Int, Int, Int, Int): " + Arrays.hashCode(new Integer[]{1,2,3,4}));
out.println("Objects.hash(Int, Int, Int, Int):    " + Objects.hash(1,2,3,4));

Running the adapted and enhanced version of the code provided by the commenter leads to this output (with the examples I added highlighted):

Arrays.hashCode(int[]):              955331
Objects.hash(int, int, int, int):    955331
Objects.hash(int[]):                 897913763
Objects.hashCode(Object):            897913732
int[].hashCode():                    897913732
Arrays.hashCode(Int, Int, Int, Int): 955331
Objects.hash(Int, Int, Int, Int):    955331

Comparing the output to the code that generated it quickly shows that Arrays.hashCode(int[]) generates the same hash code value as Objects.hash(Object...) when the elements of the array of int values are passed to that latter method as individual elements. However, we can also see that when the array of primitive int values is passed in its entirety (as a single array instead of as the individual elements of the array), the Objects.hash(Object...) methods generates an entirely different hash code. The other two examples that I added (that are highlighted) are to show what the "direct" hash code is on the array of primitive int values by calling .hashCode() directly on the array or by getting the equivalent result via Objects.hashCode(Object). [It's not a coincidence that the hash code generated by Objects.hash(Object...) for the array of primitive int values is exactly 31 greater than the "direct" hash code generated for the array of primitive int values.]

All of this points to the real issue here: it is typically best to not pass an array of primitives to a method that accepts variable arguments (advertises ellipsis). SonarSource Rules Explorer (Java) provides more details on this in RSPEC-3878. What is particularly relevant in that rule description is the question related to ambiguity, "Is the array supposed to be one object or a collection of objects?"

The answer to the question just posed is that when the array of primitive int values is passed to the variable arguments accepting method Objects.hash(Object...), the entire array is treated as a single Object. In contrast, when an array of reference objects (such as Integer) is passed to that same method, it sees it as the same number of objects being passed to it as elements in the array. This is demonstrated by the next code listing and associated output.

package dustin.examples.hashcodes;

import static java.lang.System.out;

/**
 * Demonstrates the difference in handling of arrays by methods that
 * accept variable arguments (ellipsis) when the arrays have primitive
 * elements and when arrays have reference object elements.
 */
public class ArraysDemos
{
   private static void printEllipsisContents(final Object ... objects)
   {
      out.println("==> Ellipsis Object... - Variable Arguments (" + objects.length + " elements): " + objects.getClass() + " - " + objects);
   }

   private static void printArrayContents(final Object[] objects)
   {
      out.println("==> Array Object[] - Variable Arguments (" + objects.length + " elements): " + objects.getClass() + " - " + objects);
   }

   private static void printArrayContents(final int[] integers)
   {
      out.println("==> Array int[] - Variable Arguments (" + integers.length + " elements): " + integers.getClass() + " - " + integers);
   }

   public static void main(final String[] arguments)
   {
      final int[] primitiveIntegers = ArraysCreator.createArrayOfInts();
      final Integer[] referenceIntegers = ArraysCreator.createArrayOfIntegers();
      out.println("\nint[]");
      printEllipsisContents(primitiveIntegers);
      printArrayContents(primitiveIntegers);
      out.println("\nInteger[]");
      printEllipsisContents(referenceIntegers);
      printArrayContents(referenceIntegers);
   }
}
int[]
==> Ellipsis Object... - Variable Arguments (1 elements): class [Ljava.lang.Object; - [Ljava.lang.Object;@2752f6e2
==> Array int[] - Variable Arguments (10 elements): class [I - [I@1cd072a9

Integer[]
==> Ellipsis Object... - Variable Arguments (10 elements): class [Ljava.lang.Integer; - [Ljava.lang.Integer;@7c75222b
==> Array Object[] - Variable Arguments (10 elements): class [Ljava.lang.Integer; - [Ljava.lang.Integer;@7c75222b

The example code and associated output just shown demonstrate that the method expecting variable arguments sees an array of primitive values passed to it as a single element array. On the other hand, the same method sees an array passed to it with reference object types as being an array with the same number of elements.

Returning to the hash code generation examples with this in mind, the different hash code generated by Objects.hash(Object...) for an array of primitive int values than that generated by Arrays.hashCode(int[]) makes sense. Similarly, we now can explain why the arrays of object references lead to the same hash code regardless of which of those methods is called.

I mentioned earlier that it's not a coincidence that the hash code generated by Objects.hash(Object) is exactly 31 higher than the "direct" hash code of the overall array. This was not surprising because the OpenJDK implementation of Objects.hash(Object...) delegates to Arrays.hashCode(Object[]), which uses 31 as the prime number it multiplies by each element in the calculated hash code. The hash code value provided by Objects.hash(Object...) for an array of primitive int values appears to be exactly what the method's implementation would lead us to expect with the above observations in mind: the direct hash value of the overall array plus the 31 prime number. When that hash code method only loops over a single element (which is the case for an array of primitives passed to a method expecting variable arguments), its calculation is essentially 31 * 1 + <directHashValueOfOverallArray>.

It's worth noting here that even though an array of reference objects had its hash code calculated to the same result as when the elements were passed to the method accepting variable arguments, it is still probably best to avoid passing an array of reference objects to such a method. The javac compiler provides this warning when this occurs: "warning: non-varargs call of varargs method with inexact argument type for last parameter" and adds these useful details about potential ways to address this: "cast to Object for a varargs call" or "cast to Object[] for a non-varargs call and to suppress this warning". Of course, with JDK 8 and later, it's fairly straightforward to process an array in various other ways before providing it to a method expecting variable arguments.

I added a final paragraph to my original post (and its DZone-syndicated version) to attempt to quickly address this, but I have used this post to express this information in greater detail. The specific lessons learned here can be summarized as "Favor the appropriate overloaded Arrays.hashCode method for an array of primitives instead of using Objects.hash(Object...)" and "Favor Arrays.hashCode(Object[]) for arrays of reference types instead of using Objects.hash(Object...)." The more general guidelines are to be wary of passing an array of primitive values to a method expecting variable arguments of type Object if the number of elements the invoked method "sees" is important in any way and to be wary of passing an array of reference objects to a method expecting variable arguments to avoid compiler warnings and the ambiguity being warned about.

No comments: