Monday, May 16, 2016

On the Virtues of Avoiding Parsing or Basing Logic on toString() Result

With Java or any other programming language I've used significantly, I have found that there are occasionally things that can be done in the language, but generally should not be done. Often, these misuses of the language seem harmless and perhaps beneficial when a developer first uses them, but later that same developer or another developer runs into associated issues that are costly to overcome or change. An example of this, and subject of this blog post, is using the results of a toString() call in Java to make a logic choice or to be parsed for content.

In 2010, I wrote in Java toString() Considerations, that I generally prefer it when toString() methods are explicitly available for classes and when they contain the relevant public state of an object of that class. I still feel this way. However, I expect a toString() implementation to be sufficient for a human to read the content of the object via logged statement or debugger and not to be something that is intended to be parsed by code or script. Using the String returned by a toString() method for any type of conditional or logic processing is too fragile. Likewise, parsing the toString()'s returned String for details about the instance's state is also fragile. I warned about (even unintentionally) requiring developers to parse toString() results in the previously mentioned blog post.

Developers may choose to change a toString()'s generated String for a variety of reasons including adding existing fields to the output that may not have been represented before, adding more data to existing fields that were already represented, adding text for newly added fields, removing representation of fields no longer in the class, or changing format for aesthetic reasons. Developers might also change spelling and grammar issues of a toString()'s generated String. If the toString()'s provided String is simply used by humans analyzing an object's state in log messages, these changes are not likely to be an issue unless they remove information of substance. However, if code depends on the entire String or parses the String for certain fields, it can be easily broken by these types of changes.

For the purpose of illustration, consider the following initial version of a Movie class:

package dustin.examples.strings;

/**
 * Motion Picture, Version 1.
 */
public class Movie
{
   private String movieTitle;

   public Movie(final String newMovieTitle)
   {
      this.movieTitle = newMovieTitle;
   }

   public String getMovieTitle()
   {
      return this.movieTitle;
   }

   @Override
   public String toString()
   {
      return this.movieTitle;
   }
}

In this simple and somewhat contrived example, there's only one attribute and so it's not unusual that the class's toString() simply returns that class's single String attribute as the class's representation.

The next code listing contains an unfortunate decision (lines 22-23) to base logic on the Movie class's toString() method.

/**
 * This is a contrived class filled with some ill-advised use
 * of the {@link Movie#toString()} method.
 */
public class FavoriteMoviesFilter
{
   private final static List<Movie> someFavoriteMovies;

   static
   {
      final ArrayList<Movie> tempMovies = new ArrayList<>();
      tempMovies.add(new Movie("Rear Window"));
      tempMovies.add(new Movie("Pink Panther"));
      tempMovies.add(new Movie("Ocean's Eleven"));
      tempMovies.add(new Movie("Ghostbusters"));
      tempMovies.add(new Movie("Taken"));
      someFavoriteMovies = Collections.unmodifiableList(tempMovies);
   }

   public static boolean isMovieFavorite(final String candidateMovieTitle)
   {
      return someFavoriteMovies.stream().anyMatch(
         movie -> movie.toString().equals(candidateMovieTitle));
   }
}

This code might appear to work for a while despite some underlying issues with it when more than one movie shares the same title. However, even before running into those issues, a risk of using toString() in the equality check might be realized if a developer decides he or she wants to change the format of the Movie.toString() representation to what is shown in the next code listing.

@Override
public String toString()
{
   return "Movie: " + this.movieTitle;
}

Perhaps the Movie.toString() returned value was changed to make it clearer that the String being provided is associated with an instance of the Movie class. Regardless of the reason for the change, the previously listed code that uses equality on the movie title is now broken. That code needs to be changed to use contains instead of equals as shown in the next code listing.

public static boolean isMovieFavorite(final String candidateMovieTitle)
{
   return someFavoriteMovies.stream().anyMatch(
      movie -> movie.toString().contains(candidateMovieTitle));
}

When it's realized that the Movie class needs more information to make movies differentiable, a developer might add the release year to the movie class. The new Movie class is shown next.

package dustin.examples.strings;

/**
 * Motion Picture, Version 2.
 */
public class Movie
{
   private String movieTitle;

   private int releaseYear;

   public Movie(final String newMovieTitle, final int newReleaseYear)
   {
      this.movieTitle = newMovieTitle;
      this.releaseYear = newReleaseYear;
   }

   public String getMovieTitle()
   {
      return this.movieTitle;
   }

   public int getReleaseYear()
   {
      return this.releaseYear;
   }

   @Override
   public String toString()
   {
      return "Movie: " + this.movieTitle;
   }
}

Adding a release year helps to differentiate between movies with the same title. This helps to differentiate remakes from originals as well. However, the code that used the Movie class to find favorites will still show all movies with the same title regardless of the year the movies were released. In other words, the 1960 version of Ocean's Eleven (6.6 rating on IMDB currently) will be seen as a favorite alongside the 2001 version of Ocean's Eleven (7.8 rating on IMDB currently) even though I much prefer the newer version. Similarly, the 1988 made-for-TV version of Rear Window (5.6 rating currently on IMDB) would be returned as a favorite alongside the 1954 version of of Rear Window (directed by Alfred Hitchcock, starring James Stewart and Grace Kelly, and rated 8.5 currently in IMDB) even though I much prefer the older version.

I think that a toString() implementation should generally include all publicly available details of an object. However, even if the Movie's toString() method is enhanced to include release year, the client code still will not differentiate based on year because it only performs a contain on movie title.

@Override
public String toString()
{
   return "Movie: " + this.movieTitle + " (" + this.releaseYear + ")";
}

The code above shows release year added to Movie's toString() implementation. The code below shows how the client needs to be changed to respect release year properly.

public static boolean isMovieFavorite(
   final String candidateMovieTitle,
   final int candidateReleaseYear)
{
   return someFavoriteMovies.stream().anyMatch(
      movie ->   movie.toString().contains(candidateMovieTitle)
              && movie.getReleaseYear() == candidateReleaseYear);
}

It is difficult for me to think of cases where it is a good idea to parse a toString() method or base a condition or other logic on the results of a toString() method. In just about any example I think about, there is a better way. In my example above, it'd be better to add equals() (and hashCode()) methods to Movie and then use equality checks against instances of Movie instead of using individual attributes. If individual attributes do need to be compared (such as in cases where object equality is not required and only a field or two needs to be equal), then the appropriate getXXX methods could be employed.

As a developer, if I want users of my classes (which will often end up including myself) to not need to parse toString() results or depend on a certain result, I need to ensure that my classes make any useful information available from toString() available from other easily accessible and more programmatic-friendly sources such as "get" methods and equality and comparison methods. If a developer does not want to expose some data via public API, then it's likely that developer probably doesn't really want to expose it in the returned toString() result either. Joshua Bloch, in Effective Java, articulates this in bold-emphasized text, "... provide programmatic access to all of the information contained in the value returned by toString()."

In Effective Java, Bloch also includes discussion about whether a toString() method should have an advertised format of the String representation it provides. He points out that this representation, if advertised, must be the same from then on out if it's a widely used class to avoid the types of runtime breaks I have demonstrated in this post. He also advises that if the format is not guaranteed to stay the same, that the Javadoc include a statement to that effect as well. In general, because Javadoc and other comments are often more ignored than I'd like and because of the "permanent" nature of an advertised toString() representation, I prefer to not rely on toString() to provide a specific format needed by clients, but instead provide a method specific for that purpose that clients can call. This leaves me the flexibility to change my toString() as the class changes.

An example from the JDK illustrates my preferred approach and also illustrates the dangers of prescribing a particular format to an early version of toString(). BigDecimal's toString() representation was changed between JDK 1.4.2 and Java SE 5 as described in "Incompatibilities in J2SE 5.0 (since 1.4.2)": "The J2SE 5.0 BigDecimal's toString() method behaves differently than in earlier versions." The Javadoc for the 1.4.2 version of BigDecimal.toString() simply states in the method overview: "Returns the string representation of this BigDecimal. The digit-to- character mapping provided by Character.forDigit(int, int) is used. A leading minus sign is used to indicate sign, and the number of digits to the right of the decimal point is used to indicate scale. (This representation is compatible with the (String) constructor.)" The same method overview documentation for BigDecimal.toString() in Java SE 5 and later versions is much more detailed. It's such a lengthy description that I won't show it here.

When BigDecimal.toString() was changed with Java SE 5, other methods were introduced to present different String representations: toEngineeringString() and toPlainString(). The newly introduced method toPlainString() provides what BigDecimal's toString() provided through JDK 1.4.2. My preference is to provide methods that provide specific String representations and formats because those methods can have the specifics of the format described in their names and Javadoc comments and changes and additions to the class are not as likely to impact those methods as they are to impact the general toString() method.

There are some simple classes that might fit the case where an originally implemented toString() method will be fixed once and for all and will "never" change. Those might be candidates for parsing the returned String or basing logic on the String, but even in those cases I prefer to provide an alternate method with an advertised and guaranteed format and leave the toString() representation some flexibility for change. It's not a big deal to have the extra method because, while they return the same thing, the extra method can be simply a one-line method calling the toString. Then, if the toString() does change, the calling method's implementation can be changed to be what toString() formerly provided and any users of that extra method won't see any changes.

Parsing of a toString() result or basing logic on the result of a toString() call are most likely to be done when that particular approach is perceived as the easiest way for a client to access particular data. Making that data available via other, specific publicly available methods should be preferred and class and API designers can help by ensuring that any even potentially useful data that will be in the String provided by toString() is also available in a specific alternate programmatically-accessible method. In short, my preference is to leave toString() as a method for seeing general information about an instance in a representation that is subject to change and provide specific methods for specific pieces of data in representations that are much less likely to change and are easier to programmatically access and base decisions on than a large String that potentially requires format-specific parsing.

Saturday, May 7, 2016

HotSpot Incremental Java Garbage Collector

In my recent blog post Determining the Active HotSpot Garbage Collector, I described different approaches that can be used to determine the garbage collector that is being used by HotSpot JVM (Java process) when it is not obvious from the command-line arguments (flags) passed to the Java launcher. For significant Java applications, I prefer to explicitly specify the appropriate garbage collector flag with the Java launcher to leave out any doubt as to which collector is being used: -XX:+UseParallelGC (or -XX:+UseParallelOldGC) for the parallel/throughput collector, -XX:+UseConcMarkSweepGC for the Concurrent Mark Sweep (CMS) collector, -XX:+UseG1GC for Garbage-First Garbage Collector, and -XX:+UseSerialGC for Serial Garbage Collector.

When none of these are garbage collector JVM flags are explicitly specified, the virtual machine selects a garbage collector and which collector has been selected by the VM can be identified using JDK command-line tool jcmd or its related JMX MBean implementation DiagnosticCommandMBean (often via JConsole or VisualVM). In some cases, a particular garbage collector is specified because of the existence of a particular HotSpot JVM flag not in my previous list. In this post, I look briefly at one of these: -Xincgc.

The Oracle-provided JDK Tools and Utilities documentation for the Java launcher java briefly describes -Xincgc. The documentation for Java SE 6's java executable and for Java SE 7's java executable describe -Xincgc: "Enable the incremental garbage collector. The incremental garbage collector, which is off by default, will reduce the occasional long garbage-collection pauses during program execution. The incremental garbage collector will at times execute concurrently with the program and during such times will reduce the processor capacity available to the program."

Java SE 8's version of the java documentation states of -Xincgc: "Enables incremental garbage collection. This option was deprecated in JDK 8 with no replacement." The incremental collector is deprecated as of Java 8, which is confirmed in the "Deprecated APIs" section of the Compatibility Guide for JDK 8: "The -Xincgc option is deprecated." More details on why this and other garbage collection combinations have been deprecated in JDK 8 can be found in JEP 173 ("Retire Some Rarely-Used GC Combinations") and in JEP 214 ("Remove GC Combinations Deprecated in JDK 8").

It appears that -Xincgc will not be available after JDK 8. For versions of the JVM that support -Xincgc, a question might be what it means in terms of garbage collector to run the incremental garbage collector. The previously referenced Compatibility Guide for JDK 8 states that "Incremental CMS" is one of the now deprecated "garbage collector combinations." It has also been stated that "-Xincgc simply translates" in Java 6 to "-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode." The next two screenshots demonstrate this is the case in Java 8 for a small, simple Java application. The first snapshot depicts the running of the same executable JAR (-jar) with and without -Xincgc specified. The second snapshot depicts the different VM-selected command-line flags for this same application that are a result of having or not having -Xincgc specified.

These screenshots demonstrate that the Java 8 VM used against this simple executable JAR application chooses the parallel/throughput collector (-XX:+UseParallelGC ) when -Xincgc is not specified and chooses the CMS collector (-XX:+UseConcMarkSweepGC) with additional flag -XX:+CMSIncrementalMode when -Xincgc is explicitly specified. It's also worth pointing out that when running the Java 8 HotSpot Java launcher with -Xincgc, a deprecation warning message is presented (I've added the emphasis): "Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future version."

I decided it was worth posting on a deprecated JVM flag related to garbage collection because there are some high-level observations that can be made from this discussion:

  • Besides the HotSpot VM flags that directly specify a particular garbage collector, other flags can imply a garbage collector.
  • The jcmd tool is useful for identifying which garbage collector is in use not only in cases where no JVM flags are provided, but also for cases where a flag is used that implies a particular collector (such as -Xincgc implying CMS collector in this example).
  • The incremental CMS Collector is going away. Anyone considering upgrading to Java 9 might find it prudent to inspect the VM flags their applications use to identify cases where the incremental CMS collector (-Xincgc) is currently being used. If running on Java 8, there should already be a deprecation warning as demonstrated in this post.
  • This is a reminder that while Java and the JVM remain largely backwards compatible language and platform, there are features and capaibilities that are removed from time to time (typically rarely used things or things that are significantly flawed).

Tuesday, May 3, 2016

The End of Java.net Forge and Project Kenai

The overwhelming popularity of GitHub continues to lead to the demise of other, often much older project hosting sites. The recent confirmation of a 2015 announcement that "Kenai.com site will be closing permanently on April 28, 2017" and associated Java.net Forge Sunset is not terribly surprising given the previous shuttering of once-popular project-hosting sites such as Codehaus and Google Code. Back in 2010, there was the blog post announcement that Project Kenai was moving to Java.net after originally apparently communicating plans to close Kenai to outside projects.

Some projects have already moved (starting in 2010 with the previously mentioned announcement that was changed later) or have begun moving off of Kenai and Java.net and most of them, not surprisingly, seem to be moving to GitHub. Examples of these include Java Native Access (JNA), jawr, and MonkeyBars.

It has been proposed on Twitter that "It would be a nice move from @oracle to move the JSR content from http://kenai.com and http://java.net to #github" with some brief Twitter discussion on what this would entail.