Wednesday, June 20, 2018

Java's Ternary is Tricky with Autoboxing/Unboxing

The comments section of the DZone-syndicated version of my post "JDK 8 Versus JDK 10: Ternary/Unboxing Difference" had an interesting discussion regarding the "why" of the "fix" for how Java handles autoboxing/unboxing in conjunction with use of the ternary operator (AKA "conditional operator"). This post expands on that discussion with a few more details.

One of the points made in the discussion is that the logic for how primitives and reference types are handled in a ternary operator when autoboxing or unboxing is required can be less than intuitive. For compelling evidence of this, one only needs to look at the number of bugs written for perceived problems with Java's conditional operator's behavior when autoboxing and unboxing are involved:

  • JDK-6211553 : Unboxing in conditional operator might cause null pointer exception
    • The "EVALUATION" section states, "This is not a bug." It then explains that the observed behavior that motivated the writing of the bug "is very deliberate since it makes the type system compositional." That section also provides an example of a scenario that justifies this.
  • JDK-6303028 : Conditional operator + autoboxing throws NullPointerException
    • The "EVALUATION" section states, "This is not a bug." This section also provides this explanation:
      The type of the conditional operator 
      
      (s == null) ? (Long) null : Long.parseLong(s)
      
      is the primitive type long, not java.lang.Long.
      This follows from the JLS, 3rd ed, page 511:
      
      "Otherwise, binary numeric promotion (5.6.2) is applied to the operand
      types, and the type of the conditional expression is the promoted type of the
      second and third operands. Note that binary numeric promotion performs
      unboxing conversion (5.1.8) and value set conversion (5.1.13)."
      
      In particular, this means that (Long)null is subjected to unboxing conversion.
      This is the source of the null pointer exception.
      
  • JDK-8150614 : conditional operators, null argument only for return purpose, and nullpointerexception
    • The "Comments" section explains "The code is running afoul of the complicated rules for typing of the ?: operator" and references the pertinent section of the Java Language Specification for the current version at time of that writing (https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.25).
    • I like the explanation on this one as well: "The code in the bug has one branch of the ?: typed as an Integer (with the 'replace' variable") and the other branch typed as an int from Integer.parseInt. In that case, first an unboxing Integer -> int conversion will occur before a boxing to the final result, leading to the NPE. To avoid this, case the result of parseInt to Integer."
    • The "Comments" section concludes, "Closing as not a bug."
  • JDK-6777143 : NullPointerException occured at conditional operator
    • The "EVALUATION" section of this bug report provides interesting explanation with a historical perspective:
      It is because of NPEs that JLS 15.25 says 'Note that binary numeric promotion performs unboxing conversion'. The potential for NullPointerExceptions and OutOfMemoryErrors in 1.5 where they could never have occurred in 1.4 was well known to the JSR 201 Expert Group. It could have made unboxing conversion from the null type infer the target type from the context (and have the unboxed value be the default value for that type), but inference was not common before 1.5 expanded the type system and it's certainly not going to happen now.
  • JDK-6360739 : Tertiary operator throws NPE due to reduntant casting

It's no wonder it's not intuitive to many of us! Section 15.25 ("Conditional Operator ? :") of the Java Language Specification is the defining authority regarding the behavior of the ternary operator with regards to many influences, including autoboxing and unboxing. This is the section referenced in several of the bug reports cited above and in some of the other resources that I referenced in my original post. It's worth noting that this section of the PDF version of the Java SE 10 Language Specification is approximately 9 pages!

In the DZone comments on my original post, Peter Schuetze and Greg Brown reference Table 15.25-D from the Java Language Specification for the most concise explanation of the misbehavior in JDK 8 that was rectified in JDK 10. I agree with them that this table is easier to understand than the accompanying text illustrated by the table. That table shows the type of the overall ternary operation based on the types of the second expression and third expression (where second expression is the expression between the ? and : and the third expression is the expression following the : as shown next):

    first expression ? second expression : third expression

The table's rows represent the type of the second expression and the table's columns represent the type of the third expression. One can find where the types meet in the table to know the overall type of the ternary operation. When one finds the cell of the table that correlates to row of primitive double and column of reference Double, the cell indicates that the overall type is primitive double. This is why the example shown in my original post should throw a NullPointerException, but was in violation of the specification in JDK 8 when it did not do so.

I sometimes wonder if autoboxing and unboxing are a case of the "cure being worse than the disease." However, I have found autoboxing and unboxing to be less likely to lead to subtle errors if I'm careful about when and how I use those features. A J articulates it well in his comment on the DZone version of my post: "The practical takeaway I got from this article is: when presented with an incomprehensible error, if you see that you are relying on autoboxing in that area of code (i.e., automatic type conversion), do the type conversion yourself manually. Then you will be sure the conversion is being done right."

Saturday, June 16, 2018

JDK 11: Beginning of the End for Java Serialization?

In the blog post "Using Google's Protocol Buffers with Java," I quoted Josh Bloch's Third Edition of Effective Java, in which he wrote, "There is no reason to use Java serialization in any new system you write." Bloch recommends using "cross-platform structured-data representations" instead of Java's deserialization. The proposed JDK 11 API documentation will include a much stronger statement about use of Java deserialization and this is briefly covered in this post.

The second draft of the "Java SE 11 (18.9) (JSR 384)" specification includes an "A2 Annex" called "API Specification differences" that includes the changes coming to the Javadoc-based documentation for package java.io. The new java.io package documentation will include this high-level warning comment:

Warning: Deserialization of untrusted data is inherently dangerous and should be avoided. Untrusted data should be carefully validated according to the "Serialization and Deserialization" section of the Secure Coding Guidelines for Java SE.

At the time of the writing of this post, the referenced Secure Coding Guidelines for Java SE states that it is currently Version 6.0 and was "Updated for Java SE 9."

The intended package-level documentation for package java.io in JDK 11 will also provide links to the following additional references (but likely to be JDK 11-based references):

The former reference link to the "Java Object Serialization" (JDK 8) document will be removed from java.io's package documentation.

In addition to the java.io package documentation that is being updated in JDK 11 related to the dangers of Java deserialization, the java.io.Serializable interface's Javadoc comment is getting a similar high-level warning message.

These changes to the Javadoc-based documentation in JDK 11 are not surprising given various announcements over the past few years related to Java serialization and deserialization. "RFR 8197595: Serialization javadoc should link to security best practices" specifically spelled out the need to add this documentation. A recent InfoWorld article called "Oracle plans to dump risky Java serialization" and an ADT Magazine article called "Removing Serialization from Java Is a 'Long-Term Goal' at Oracle" quoted Mark Reinhold's statement at Devoxx UK 2018 that adding serialization to Java was a "horrible mistake in 1997."

There has been talk of removing Java serialization before. JEP 154: Remove Serialization was created with the intent to "deprecate, disable, and ultimately remove the Java SE Platform's serialization facility." However, that JEP's status is now "Closed / Withdrawn." Still, as talk of removing Java serialization picks up, it seems prudent to consider alternatives to Java serialization for all new systems, which is precisely what Bloch recommends in Effective Java's Third Edition. All this being stated, Apostolos Giannakidis has written in the blog post "Serialization is dead! Long live serialization!" that "deserialization vulnerabilities are not going away" because "Java's native serialization is not the only flawed serialization technology."

Additional References

Thursday, June 14, 2018

JDK 8 BigInteger Exact Narrowing Conversion Methods

In the blog post "Exact Conversion of Long to Int in Java," I discussed using Math.toIntExact(Long) to exactly convert a Long to an int or else throw an ArithmeticException if this narrowing conversion is not possible. That method was introduced with JDK 8, which also introduced similar narrowing conversion methods to the BigInteger class. Those BigInteger methods are the topic of this post.

BigInteger had four new "exact" methods added to it in JDK 8:

As described above, each of these four "exact" methods added to BigInteger with JDK 8 allow for the BigInteger's value to be narrowed to the data type in the method name, if that is possible. Because all of these types (byte, short, int, and long) have smaller ranges than BigInteger, it's possible in any of these cases to have a value in BigDecimal with a magnitude larger than that which can be represented by any of these four types. In such a case, all four of these "Exact" methods throw an ArithmeticException rather than quietly "forcing" the bigger value into the smaller representation (which is typically a nonsensical number for most contexts).

Examples of using these methods can be found on GitHub. When those examples are executed, the output looks like this:

===== Byte =====
125 => 125
126 => 126
127 => 127
128 => java.lang.ArithmeticException: BigInteger out of byte range
129 => java.lang.ArithmeticException: BigInteger out of byte range
===== Short =====
32765 => 32765
32766 => 32766
32767 => 32767
32768 => java.lang.ArithmeticException: BigInteger out of short range
32769 => java.lang.ArithmeticException: BigInteger out of short range
===== Int =====
2147483645 => 2147483645
2147483646 => 2147483646
2147483647 => 2147483647
2147483648 => java.lang.ArithmeticException: BigInteger out of int range
2147483649 => java.lang.ArithmeticException: BigInteger out of int range
===== Long =====
9223372036854775805 => 9223372036854775805
9223372036854775806 => 9223372036854775806
9223372036854775807 => 9223372036854775807
9223372036854775808 => java.lang.ArithmeticException: BigInteger out of long range
9223372036854775809 => java.lang.ArithmeticException: BigInteger out of long range

The addition of these "exact" methods to BigInteger with JDK 8 is a welcome one because errors associated with numeric narrowing and overflow can be subtle. It's nice to have an easy way to get an "exact" narrowing or else have the inability to do that narrowing exactly made obvious via an exception.

Tuesday, June 12, 2018

JDK 8 Versus JDK 10: Ternary/Unboxing Difference

A recent Nicolai Parlog (@nipafx) tweet caught my attention because it referenced an interesting StackOverflow discussion on a changed behavior between JDK 8 and JDK 10 and asked "Why?" The issue cited on the StackOverflow thread by SerCe ultimately came down to the implementation being changed between JDK 8 and JDK 10 to correctly implement the Java Language Specification.

The following code listing is (very slightly) adapted from the original example provided by SerCe on the StackOverflow thread.

Adapted Example That Behaves Differently in JDK 10 Versus JDK 8

public static void demoSerCeExample()
{
   try
   {
      final Double doubleValue = false ? 1.0 : new HashMap<String, Double>().get("1");
      out.println("Double Value: " + doubleValue);
   }
   catch (Exception exception)
   {
      out.println("ERROR in 'demoSerCeExample': " + exception);
   }
}

When the above code is compiled and executed with JDK 8, it generates output like this: Double Value: null

When the above code is compiled and executed with JDK 10, it generates output like this: ERROR in 'demoSerCeExample': java.lang.NullPointerException

In JDK 8, the ternary operator returned null for assigning to the local variable doubleValue, but in JDK 10 a NullPointerException is instead thrown for the same ternary statement.

Two tweaks to this example lead to some interesting observations. First, if the literal constant 1.0 expressed in the ternary operator is specified instead as Double.valueOf(1.0), both JDK 8 and JDK 10 set the local variable to null rather than throwing a NullPointerException. Second, if the local variable is declared with primitive type double instead of reference type Double, the NullPointerException is always thrown regardless of Java version and regardless of whether Double.valueOf(double) is used. This second observation makes sense, of course, because no matter how the object or reference is handled by the ternary operator, it must be dereferenced at some point to be assigned to the primitive double type and that will always result in a NullPointerException in the example.

The following table summarizes these observations:

Complete Ternary Statement Setting of Local Variable doubleValue
JDK 8 JDK 10
Double doubleValue
   =  false
    ? 1.0
    : new HashMap<String, Double>().get("1");
  
null NullPointerException
double doubleValue
   =  false
    ? 1.0
    : new HashMap<String, Double>().get("1");
  
NullPointerException NullPointerException
Double doubleValue
   =  false
    ? Double.valueOf(1.0)
    : new HashMap<String, Double>().get("1");
  
null null
double doubleValue
   =  false
    ? Double.valueOf(1.0)
    : new HashMap<String, Double>().get("1");
  
NullPointerException NullPointerException

The only approach that avoids NullPointerException in both versions of Java for this general ternary example is the version that declares the local variable as a reference type Double (no unboxing is forced) and uses Double.valueOf(double) so that reference Double is used throughout the ternary rather than primitive double. If the primitive double is implied by specifying only 1.0, then the Double returned by the Java Map is implicitly unboxed (dereferenced) in JDK 10 and that leads to the exception. According to Brian Goetz, JDK 10 brings the implementation back into compliance with the specification.

Exact Conversion of Long to Int in Java

With all the shiny things (lambda expressions, streams, Optional, the new Date/Time API, etc.) to distract my attention that came with JDK 8, I did not pay much attention to the addition of the method Math.toIntExact(). However, this small addition can be pretty useful in its own right.

The Javadoc documentation for Math.toIntExact​(long) states, "Returns the value of the long argument; throwing an exception if the value overflows an int." This is particularly useful in situations where one is given or already has a Long and needs to call an API that expects an int. It's best, of course, if the APIs could be changed to use the same datatype, but sometimes this is out of one's control. When one needs to force a Long into an int there is potential for integer overflow because the numeric value of the Long may have a greater magnitude than the int can accurately represent.

If one is told that a given Long will never be larger than what an int can hold, the static method Math.toIntExact(Long) is particularly useful because it will throw an unchecked ArithmeticException if that "exceptional" situation arises, making it obvious that the "exceptional" situation occurred.

When Long.intValue() is used to get an integer from a Long, no exception is thrown if integer overflow occurs. Instead, an integer is provided, but this value will rarely be useful due to the integer overflow. In almost every conceivable case, it's better to encounter a runtime exception that alerts one to the integer overflow than to have the software continue using the overflow number incorrectly.

As a first step in illustrating the differences between Long.intValue() and Math.toIntExact(Long), the following code generates a range of Long values from 5 less than Integer.MAX_VALUE to 5 more than Integer.MAX_VALUE.

Generating Range of Longs that Includes Integer.MAX_VALUE

/**
 * Generate {@code Long}s from range of integers that start
 * before {@code Integer.MAX_VALUE} and end after that
 * maximum integer value.
 *
 * @return {@code Long}s generated over range includes
 *    {@code Integer.MAX_VALUE}.
 */
public static List<Long> generateLongInts()
{
   final Long maximumIntegerAsLong = Long.valueOf(Integer.MAX_VALUE);
   final Long startingLong = maximumIntegerAsLong - 5;
   final Long endingLong = maximumIntegerAsLong + 5;
   return LongStream.range(startingLong, endingLong).boxed().collect(Collectors.toList());
}

The next code listing shows two methods that demonstrate the two previously mentioned approaches for getting an int from a Long.

Using Long.intValue() and Math.toIntExact(Long)

/**
 * Provides the {@code int} representation of the provided
 * {@code Long} based on an invocation of the provided
 * {@code Long} object's {@code intValue()} method.
 *
 * @param longRepresentation {@code Long} for which {@code int}
 *    value extracted with {@code intValue()} will be returned.
 * @return {@code int} value corresponding to the provided
 *    {@code Long} as provided by invoking the method
 *    {@code intValue()} on that provided {@code Long}.
 * @throws NullPointerException Thrown if the provided long
 *    representation is {@code null}.
 */
public static void writeLongIntValue(final Long longRepresentation)
{
   out.print(longRepresentation + " =>       Long.intValue() = ");
   try
   {
      out.println(longRepresentation.intValue());
   }
   catch (Exception exception)
   {
      out.println("ERROR - " + exception);
   }
}

/**
 * Provides the {@code int} representation of the provided
 * {@code Long} based on an invocation of {@code Math.toIntExact(Long)}
 * on the provided {@code Long}.
 *
 * @param longRepresentation {@code Long} for which {@code int}
 *    value extracted with {@code Math.toIntExact(Long)} will be
 *    returned.
 * @return {@code int} value corresponding to the provided
 *    {@code Long} as provided by invoking the method
 *    {@code Math.toIntExact)Long} on that provided {@code Long}.
 * @throws NullPointerException Thrown if the provided long
 *    representation is {@code null}.
 * @throws ArithmeticException Thrown if the provided {@code Long}
 *    cannot be represented as an integer without overflow.
 */
public static void writeIntExact(final Long longRepresentation)
{
   out.print(longRepresentation + " => Math.toIntExact(Long) = ");
   try
   {
      out.println(Math.toIntExact(longRepresentation));
   }
   catch (Exception exception)
   {
      out.println("ERROR: " + exception);
   }
}

When the above code is executed with the range of Longs constructed in the earlier code listing (full code available on GitHub), the output looks like this:

2147483642 =>       Long.intValue() = 2147483642
2147483642 => Math.toIntExact(Long) = 2147483642
2147483643 =>       Long.intValue() = 2147483643
2147483643 => Math.toIntExact(Long) = 2147483643
2147483644 =>       Long.intValue() = 2147483644
2147483644 => Math.toIntExact(Long) = 2147483644
2147483645 =>       Long.intValue() = 2147483645
2147483645 => Math.toIntExact(Long) = 2147483645
2147483646 =>       Long.intValue() = 2147483646
2147483646 => Math.toIntExact(Long) = 2147483646
2147483647 =>       Long.intValue() = 2147483647
2147483647 => Math.toIntExact(Long) = 2147483647
2147483648 =>       Long.intValue() = -2147483648
2147483648 => Math.toIntExact(Long) = ERROR: java.lang.ArithmeticException: integer overflow
2147483649 =>       Long.intValue() = -2147483647
2147483649 => Math.toIntExact(Long) = ERROR: java.lang.ArithmeticException: integer overflow
2147483650 =>       Long.intValue() = -2147483646
2147483650 => Math.toIntExact(Long) = ERROR: java.lang.ArithmeticException: integer overflow
2147483651 =>       Long.intValue() = -2147483645
2147483651 => Math.toIntExact(Long) = ERROR: java.lang.ArithmeticException: integer overflow

The highlighted rows indicate the code processing a Long with value equal to Integer.MAX_VALUE. After that, the Long representing one more than Integer.MAX_VALUE is shown with the results of attempting to convert that Long to an int using Long.intValue() and Math.toIntExact(Long). The Long.intValue() approach encounters an integer overflow, but does not throw an exception and instead returns the negative number -2147483648. The Math.toIntExact(Long) method does not return a value upon integer overflow and instead throws an ArithmeticException with the informative message "integer overflow."

The Math.toIntExact(Long) method is not as significant as many of the features introduced with JDK 8, but it can be useful in avoiding the types of errors related to integer overflow that can sometimes be tricky to diagnose.

Monday, June 11, 2018

Peeking Inside Java Streams with Stream.peek

For a Java developer new to JDK 8-introduced pipelines and streams, the peek(Consumer) method provided by the Stream interface can be a useful tool to help visualize how streams operations behave. Even Java developers who are more familiar with Java streams and aggregation operations may occasionally find Stream.peek(Consumer) useful for understanding the implications and interactions of complex intermediate stream operations.

The Stream.peek(Consumer) method expects a Consumer, which is essentially a block of code that accepts a single argument and returns nothing. The peek(Consumer) method returns the same elements of the stream that were passed to it, so there will be no changes to the contents of the stream unless the block of code passed to the peek(Consumer) method mutates the objects in the stream. It's likely that the vast majority of the uses of Stream.peek(Consumer) are read-only printing of the contents of the objects in the stream at the time of invocation of that method.

The Javadoc-based API documentation for Stream.peek(Consumer) explains this method's behaviors in some detail and provides an example of its usage. That example is slightly adapted in the following code listing:

final List<String> strings
   = Stream.of("one", "two", "three", "four")
      .peek(e-> out.println("Original Element: " + e))
      .filter(e -> e.length() > 3)
      .peek(e -> out.println("Filtered value: " + e))
      .map(String::toUpperCase)
      .peek(e -> out.println("Mapped value: " + e))
      .collect(Collectors.toList());
out.println("Final Results: " + strings);

When the above code is executed, its associated output looks something like this:

Original Element: one
Original Element: two
Original Element: three
Filtered value: three
Mapped value: THREE
Original Element: four
Filtered value: four
Mapped value: FOUR
Final Results: [THREE, FOUR]

The output tells the story of the stream operations' work on the elements provided to them. The first invocation of the intermediate peek operation will write each element in the original stream out to system output with the prefix "Original Element:". Instances of the intermediate peek operation that occur later are not executed for every original String because each of these peek operations occur after at least once filtering has taken place.

The peek-enabled output also clearly shows the results of executing the intermediate operation map on each String element to its upper case equivalent. The collect operation is a terminating operation and so no peek is placed after that. Strategic placement of peek operations provides significant insight into the stream processing that takes place.

The Javadoc for Stream.peek(Consumer) states that "this method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline." This is exactly what the example and output shown above demonstrate and is likely the most common application of Stream.peek(Consumer).

Stream.peek(Consumer)'s Javadoc documentation starts with this descriptive sentence, "Returns a stream consisting of the elements of this stream, additionally performing the provided action on each element as elements are consumed from the resulting stream." In the previous example, the action performed on each element as it was consumed was to merely write its string representation to standard output. However, the action taken can be anything that can be specified as a Consumer (any code block accepting a single argument and returning no arguments). The next example demonstrates how peek(Consumer) can even be used to change contents of objects on the stream.

In the first example in this post, peek(Consumer) could not change the stream elements because those elements were Java Strings, which are immutable. However, if the stream elements are mutable, the Consumer passed to peek(Consumer) can alter the contents of those elements. To illustrate this, I'll use the simple class MutablePerson shown next.

MutablePerson.java

package dustin.examples.jdk8.streams;

/**
 * Represents person whose name can be changed.
 */
public class MutablePerson
{
   private String name;

   public MutablePerson(final String newName)
   {
      name = newName;
   }

   public String getName()
   {
      return name;
   }

   public void setName(final String newName)
   {
      name = newName;
   }

   @Override
   public String toString()
   {
      return name;
   }
}

The next code listing shows how Stream.peek(Consumer) can change the results of the stream operation when the elements in that stream are mutable.

final List<MutablePerson> people
   = Stream.of(
      new MutablePerson("Fred"),
      new MutablePerson("Wilma"),
      new MutablePerson("Barney"),
      new MutablePerson("Betty"))
   .peek(person -> out.println(person))
   .peek(person -> person.setName(person.getName().toUpperCase()))
   .collect(Collectors.toList());
out.println("People: " + people);

When the above code is executed, it produces output that looks like this:

Fred
Wilma
Barney
Betty
People: [FRED, WILMA, BARNEY, BETTY]

This example shows that the Consumer passed to peek did change the case of the peoples' names to all uppercase. This was only possible because the objects being processed are mutable. Some have argued that using peek to mutate the elements in a stream might be an antipattern and I find myself uncomfortable with this approach (but I also generally don't like having methods' arguments be "output parameters"). The name of the peek method advertises one's just looking (and not touching), but the Consumer argument it accepts advertises that something could be changed (Consumer's Javadoc states, "Unlike most other functional interfaces, Consumer is expected to operate via side-effects"). The blog post "Idiomatic Peeking with Java Stream API" discusses potential issues associated with using Stream.peek(Consumer) with mutating operations.

Steam.peek(Consumer) is a useful tool for understanding how stream operations are impacting elements.

Saturday, June 9, 2018

[JDK 11] Class Loader Hierarchy Details Coming to jcmd

I've been a fan of the diagnostic command-line tool jcmd since hearing about jcmd at JavaOne 2012. I've used this tool extensively since then and have blogged multiple times about this tool:

After numerous years of developing with Java, it's my opinion that the classloader is the source of some of the most difficult defects encountered during development and debugging. Given this observation and given my interest in jcmd, I am very interested in JDK-8203682 ["Add jcmd 'VM.classloaders' command to print out class loader hierarchy, details"].

The "Description" for JDK-8203682 states, "It would be helpful, as a complement to VM.classloader_stats, to have a command to print out the class loader hierarchy and class loader details." In other words, this command to be added to jcmd would include display of classloaders in hierarchical fashion similar to that which classes are displayed by jcmd's VM.class_hierarchy command.

JDK-8203682 shows its "Status" as "Resolved" and its "Fix Version" as "11". JDK-8203682 contains three text file attachments that depict the output of jcmd <pid> VM.classloaders: example-with-classes.txt, example-with-classes-verbose.txt, and example-with-reflection-and-noinflation.txt. Additional information is available in the announcement of the change set and in the change set itself.

When dealing with classloader-related issues in Java, any details can be helpful. The addition of the VM.classloaders command to jcmd will make this command-line tool even more valuable and insightful.