Monday, January 5, 2015

Stream-Powered Collections Functionality in JDK 8

This post presents application of JDK 8-introduced Streams with Collections to more concisely accomplish commonly desired Collections-related functionality. Along the way, several key aspects of using Java Streams will be demonstrated and briefly explained. Note that although JDK 8 Streams provide potential performance benefits via parallelization support, that is not the focus of this post.

The Sample Collection and Collection Entries

For purposes of this post, instances of Movie will be stored in a collection. The following code snippet is for the simple Movie class used in these examples.

Movie.java
package dustin.examples.jdk8.streams;

import java.util.Objects;

/**
 * Basic characteristics of a motion picture.
 * 
 * @author Dustin
 */
public class Movie
{
   /** Title of movie. */
   private String title;

   /** Year of movie's release. */
   private int yearReleased;

   /** Movie genre. */
   private Genre genre;

   /** MPAA Rating. */
   private MpaaRating mpaaRating;

   /** imdb.com Rating. */
   private int imdbTopRating;

   public Movie(final String newTitle, final int newYearReleased,
                final Genre newGenre, final MpaaRating newMpaaRating,
                final int newImdbTopRating)
   {
      this.title = newTitle;
      this.yearReleased = newYearReleased;
      this.genre = newGenre;
      this.mpaaRating = newMpaaRating;
      this.imdbTopRating = newImdbTopRating;
   }

   public String getTitle()
   {
      return this.title;
   }

   public int getYearReleased()
   {
      return this.yearReleased;
   }

   public Genre getGenre()
   {
      return this.genre;
   }

   public MpaaRating getMpaaRating()
   {
      return this.mpaaRating;
   }

   public int getImdbTopRating()
   {
      return this.imdbTopRating;
   }

   @Override
   public boolean equals(Object other)
   {
      if (!(other instanceof Movie))
      {
         return false;
      }
      final Movie otherMovie = (Movie) other;
      return   Objects.equals(this.title, otherMovie.title)
            && Objects.equals(this.yearReleased, otherMovie.yearReleased)
            && Objects.equals(this.genre, otherMovie.genre)
            && Objects.equals(this.mpaaRating, otherMovie.mpaaRating)
            && Objects.equals(this.imdbTopRating, otherMovie.imdbTopRating);
   }

   @Override
   public int hashCode()
   {
      return Objects.hash(this.title, this.yearReleased, this.genre, this.mpaaRating, this.imdbTopRating);
   }

   @Override
   public String toString()
   {
      return "Movie: " + this.title + " (" + this.yearReleased + "), " + this.genre + ", " + this.mpaaRating + ", "
            + this.imdbTopRating;
   }
}

Multiple instances of Movie are placed in a Java Set. The code that does this is shown below because it also shows the values set in these instances. This code declares the "movies" as a static field on the class and then uses a static initialization block to populate that field with five instances of Movie.

Populating movies Set with Instances of Movie Class
private static final Set<Movie> movies;

static
{
   final Set<Movie> tempMovies = new HashSet<>();
   tempMovies.add(new Movie("Raiders of the Lost Ark", 1981, Genre.ACTION, MpaaRating.PG, 31));
   tempMovies.add(new Movie("Star Wars: Episode V - The Empire Strikes Back", 1980, Genre.SCIENCE_FICTION, MpaaRating.PG, 12));
   tempMovies.add(new Movie("Inception", 2010, Genre.SCIENCE_FICTION, MpaaRating.PG13, 13));
   tempMovies.add(new Movie("Back to the Future", 1985, Genre.SCIENCE_FICTION, MpaaRating.PG, 49));
   tempMovies.add(new Movie("The Shawshank Redemption", 1994, Genre.DRAMA, MpaaRating.R, 1));
   movies = Collections.unmodifiableSet(tempMovies);
}
A First Look at JDK 8 Streams with Filtering

One type of functionality commonly performed on collections is filtering. The next code listing shows how to filter the "movies" Set for all movies that are rated PG. I'll highlight some observations that can be made from this code after the listing.

Filtering Movies with PG Rating
/**
 * Demonstrate using .filter() on Movies stream to filter by PG ratings
 * and collect() as a Set.
 */
private void demonstrateFilteringByRating()
{
   printHeader("Filter PG Movies");
   final Set<Movie> pgMovies =
      movies.stream().filter(movie -> movie.getMpaaRating() == MpaaRating.PG)
            .collect(Collectors.toSet());
   out.println(pgMovies);
}

One thing that this first example includes that all examples in this post will also have is the invocation of the method stream() on the collection. This method returns an object implementing the java.util.Stream interface. Each of these returned Streams use the collection the stream() method is invoked against as their data source. All operations at this point are on the Stream rather than on the collection which is the source of the data for the Stream.

In the code listing above, the filter(Predicate) method is called on the Stream based on the "movies" Set. In this case, the Predicate is given by the lambda expression movie -> movie.getMpaaRating() == MpaaRating.PG. This fairly readable representation tells us that the predicate is each movie in the underlying data that has an MPAA rating of PG.

The Stream.filter(Predicate) method is an intermediate operation, meaning that it returns an instance of Stream that can be further operated on by other operations. In this case, there is another operation, collect(Collector), that is called upon the Stream returned by Stream.filter(Predicate). The Collectors class features numerous static methods that each provide an implementation of Collector that can be provided to this collect(Collector) method. In this case, Collectors.toSet() is used to get a Collector that will instruct the stream results to be arranged in a Set. The Stream.collect(Collector) method is a terminal operation, meaning that it's the end of the line and does NOT return a Stream instance and so no more Stream operations can be executed after this collect has been executed.

When the above code is executed, it generates output like the following:

===========================================================
= Filter PG Movies
===========================================================
[Movie: Raiders of the Lost Ark (1981), ACTION, PG, 31, Movie: Back to the Future (1985), SCIENCE_FICTION, PG, 49, Movie: Star Wars: Episode V - The Empire Strikes Back (1980), SCIENCE_FICTION, PG, 12]
Filtering for Single (First) Result
/**  
 * Demonstrate using .filter() on Movies stream to filter by #1 imdb.com
 * rating and using .findFirst() to get first (presumably only) match.
 */
private void demonstrateSingleResultImdbRating()
{
   printHeader("Display One and Only #1 IMDB Movie");
   final Optional<Movie> topMovie =
      movies.stream().filter(movie -> movie.getImdbTopRating() == 1).findFirst();
   out.println(topMovie.isPresent() ? topMovie.get() : "none");
}

This example shares many similarities with the previous example. Like that previous code listing, this listing shows use of Stream.filter(Predicate), but this time the predicate is the lambda expression movie -> movie.getImdbTopRating() == 1). In other words, the Stream resulting from this filter should contain only instances of Movie that have the method getImdbTopRating() returning the number 1. The terminating operation Stream.findFirst() is then executed against the Stream returned by Stream.filter(Predicate). This returns the first entry encountered in the stream and, because our underlying Set of Movie instances only had one instance with IMDb Top 250 Rating of 1, it will be the first and only entry available in the stream resulting from the filter.

When this code listing is executed, its output appears as shown next:

===========================================================
= Display One and Only #1 IMDB Movie
===========================================================
Movie: The Shawshank Redemption (1994), DRAMA, R, 1

The next code listing illustrates use of Stream.map(Function).

/**
 * Demonstrate using .map to get only specified attribute from each
 * element of collection.
 */
private void demonstrateMapOnGetTitleFunction()
{
   printHeader("Just the Movie Titles, Please");
   final List<String> titles = movies.stream().map(Movie::getTitle).collect(Collectors.toList());
   out.println(titles.size() + " titles (in " + titles.getClass() +"): " + titles);
}

The Stream.map(Function) method acts upon the Stream against which it is called (in our case, the Stream based on the underlying Set of Movie objects) and applies the provided Function against that Steam to return a new Stream that results from the application of that Function against the source Stream. In this case, the Function is represented by Movie::getTitle, which is an example of a JDK 8-introduced method reference. I could have used the lambda expression movie -> movie.getTitle() instead of the method reference Movie::getTitle for the same results. The Method References documentation explains that this is exactly the situation a method reference is intended to address:

You use lambda expressions to create anonymous methods. Sometimes, however, a lambda expression does nothing but call an existing method. In those cases, it's often clearer to refer to the existing method by name. Method references enable you to do this; they are compact, easy-to-read lambda expressions for methods that already have a name.

As you might guess from its use in the code above, Stream.map(Function) is an intermediate operation. This code listing applies a terminating operation of Stream.collect(Collector) just as the previous two examples did, but in this case it's Collectors.toList() that is passed to it and so the resultant data structure is a List rather than a Set.

When the above code listing is run, its output looks like this:

===========================================================
= Just the Movie Titles, Please
===========================================================
5 titles (in class java.util.ArrayList): [Inception, The Shawshank Redemption, Raiders of the Lost Ark, Back to the Future, Star Wars: Episode V - The Empire Strikes Back]
Reduction (to Single Boolean) Operations anyMatch and allMatch

The next example does not use Stream.filter(Predicate), Stream.map(Function), or even the terminating operation Stream.collect(Collector) that were used in most of the previous examples. In this example, the reduction and terminating operations Stream.allMatch(Predicate) and Stream.anyMatch(Predicate) are applied directly on the Stream based on our Set of Movie objects.

/**
 * Demonstrate .anyMatch and .allMatch on stream.
 */
private void demonstrateAnyMatchAndAllMatchReductions()
{
   printHeader("anyMatch and allMatch");
   out.println("All movies in IMDB Top 250? " + movies.stream().allMatch(movie -> movie.getImdbTopRating() < 250));
   out.println("All movies rated PG? " + movies.stream().allMatch(movie -> movie.getMpaaRating() == MpaaRating.PG));
   out.println("Any movies rated PG? " + movies.stream().anyMatch(movie -> movie.getMpaaRating() == MpaaRating.PG));
   out.println("Any movies not rated? " + movies.stream().anyMatch(movie -> movie.getMpaaRating() == MpaaRating.NA));
}

The code listing demonstrates that Stream.anyMatch(Predicate) and Stream.allMatch(Predicate) each return a boolean indicating, as their names respectively imply, whether the Stream has at least one entry matching the predicate or all of the entries matching the predicate. In this case, all movies come from the imdb.com Top 250, so that "allMatch" will return true. Not all of the movies are rated PG, however, so that "allMatch" returns false. Because at least one movie is rated PG, the "anyMatch" for PG rating predicate returns true, but the "anyMatch" for N/A rating predicate returns false because not even one movie in the underlying Set had a MpaaRating.NA rating. The output from running this code is shown next.

===========================================================
= anyMatch and allMatch
===========================================================
All movies in IMDB Top 250? true
All movies rated PG? false
Any movies rated PG? true
Any movies not rated? false
Easy Identification of Minimum and Maximum

The final example of applying the power of Stream to collection manipulation in this post demonstrates use of Stream.reduce(BinaryOperator) with two different instances of BinaryOperator: Integer::min and Integer::max.

private void demonstrateMinMaxReductions()
{
   printHeader("Oldest and Youngest via reduce");
   // Specifying both Predicate for .map and BinaryOperator for .reduce with lambda expressions
   final Optional<Integer> oldestMovie = movies.stream().map(movie -> movie.getYearReleased()).reduce((a,b) -> Integer.min(a,b));
   out.println("Oldest movie was released in " + (oldestMovie.isPresent() ? oldestMovie.get() : "Unknown"));
   // Specifying both Predicate for .map and BinaryOperator for .reduce with method references
   final Optional<Integer> youngestMovie = movies.stream().map(Movie::getYearReleased).reduce(Integer::max);
   out.println("Youngest movie was released in " + (youngestMovie.isPresent() ? youngestMovie.get() : "Unknown"));
}

This convoluted example illustrates using Integer.min(int,int) to find the oldest movie in the underlying Set and using Integer.max(int,int) to find the newest movie in the Set. This is accomplished by first using Stream.map to get a new Stream of Integers provided by the release year of each Movie in the original Stream. This Stream of Integers then has Stream.reduce(BinaryOperation) operation executed with the static Integer methods used as the BinaryOperation.

For this code listing, I intentionally used lambda expressions for the Predicate and BinaryOperation in calculating the oldest movie (Integer.min(int,int)) and used method references instead of lambda expressions for the Predicate and BinaryOperation used in calculating the newest movie (Integer.max(int,int)). This proves that either lambda expressions or method references can be used in many cases.

The output from running the above code is shown next:

===========================================================
= Oldest and Youngest via reduce
===========================================================
Oldest movie was released in 1980
Youngest movie was released in 2010
Conclusion

JDK 8 Streams introduce a powerful mechanism for working with Collections. This post has focused on the readability and conciseness that working against Streams brings as compared to working against Collections directly, but Streams offer potential performance benefits as well. This post has attempted to use common collections handling idioms as examples of the conciseness that Streams bring to Java. Along the way, some key concepts associated with using JDK streams have also been discussed. The most challenging parts about using JDK 8 Streams are getting used to new concepts and new syntax (such as lambda expression and method references), but these are quickly learned after playing with a couple examples. A Java developer with even light experience with the concepts and syntax can explore the Stream API's methods for a much lengthier list of operations that can be executed against Streams (and hence against collections underlying those Streams) than illustrated in this post.

Additional Resources

The purpose of this post was to provide a light first look at JDK 8 streams based on simple but fairly common collections manipulation examples. For a deeper dive into JDK 8 streams and for more ideas on how JDK 8 streams make Collections manipulation easier, see the following articles:

2 comments:

Magnus said...

(youngestMovie.isPresent() ? youngestMovie.get() : "Unknown") can be simply expressed more succinctly as youngestMovie.orElse(" Unknown")

Unknown said...

@Magnus
Unfortunately, it's not that easy because Optional<Movie> does not accept a String parameter for orElse().

Alternatives:
1) Convert to Optional<String>
youngestMovie.map(Movie::toString).orElse("Unknown");

2) Convert to Optional<Object>. This is the more general case.
Optional<Object> youngestMovieObject = youngestMovie.map(obj -> obj);
System.out.println(youngestMovieObject.orElse("Unknown"));