Wednesday, December 23, 2015

Discovering a Trove of Java Primitives Collection Handling

While reading the blog post 5 Tips for Reducing Your Java Garbage Collection Overhead, I was reminded of the existence of a small Java collection library called Trove that "provides high speed regular and primitive collections for Java." I am especially interested in the ability to apply Trove to allow for collections of primitives rather than requiring elements in collections to be full-fledged reference objects. I look at Trove in greater detail in this post.

The JDK's standard collections respect generics and require object references for their elements and do not allow for storing of primitives in the collections. Even code that appears to be placing primitives in the standard JDK collections is actually placing object references in the collection via autoboxing. The advantage of this generics approach is the ability to have the same classes and methods work on objects of many different types. The cost is the need to store full reference objects even when leaner primitives could be stored.

The Trove library has an LPGL license and is relatively small (well under 10 MB) as shown in the next screen snapshot of the Downloads page:

The small download contains more than just the necessary library in JAR format. It also contains documentation and source. The library JAR itself (trove-3.1a1.jar in this case) is around 2.5 MB in size.

One of the reasons that Trove is easy to use is that it largely mimics the JDK collections' interfaces in its own collections' APIs. The next code listing demonstrates how adding values to a List implementation is essentially the same API calls whether using a JDK 7 List (ArrayList in this case) or the Trove-provided TDoubleArrayList.

Adding Elements to JDK's ArrayList and Trove's TDoubleArrayList

/**
 * Demonstrate standard JDK {@code ArrayList<Double>}
 * with some JDK 8 functionality.
 */
public void demonstrateJdkArrayListForDoubles()
{
   final ArrayList<Double> doubles = new ArrayList<>();
   doubles.add(15.5);
   doubles.add(24.4);
   doubles.add(36.3);
   doubles.add(67.6);
   doubles.add(10.0);
   out.println("JDK ArrayList<Double>:");
   out.println("\tDoubles List: " + doubles);
   out.println("\tMaximum double: " + doubles.stream().max(Double::compare));
   out.println("\tMinimum double: " + doubles.stream().min(Double::compare));
   out.println("\tSum of doubles: " + doubles.stream().mapToDouble(Double::doubleValue).sum());
}

/**
 * Demonstrate use of TDoubleArrayList and show how
 * similar using it is to using {@code ArrayList<Double>}.
 */
public void demonstrateTroveArrayListForDoubles()
{
   // Demonstrate adding elements to TDoubleArrayList is
   // exactly like adding elements to ArrayList<Double>.
   final TDoubleArrayList doubles = new TDoubleArrayList();
   doubles.add(15.5);
   doubles.add(24.4);
   doubles.add(36.3);
   doubles.add(67.6);
   doubles.add(10.0);
   out.println("Trove TDoubleArrayList:");  // TDoubleArrayList overrides toString()
   out.println("\tDoubles List: " + doubles);
   out.println("\tMaximum double: " + doubles.max());
   out.println("\tMinimum double: " + doubles.min());
   out.println("\tSum of doubles: " + doubles.sum());
}

The above code listing also demonstrates how easy it is with the Trove implementation of an array list to access the maximum, minimum, and sum of the collection of doubles. One of the advantage of these collections written to a specific primitive data type (double in this case) is that methods that apply specifically to that data type can be provided in the implementation. While it might not make a lot of sense for a collection of String or a collection of an arbitrary object return maximum, minimum, and sums, the meaning of these methods is obvious for a collection devoted to doubles such as TDoubleArrayList. The above listing does indicate how the same can be accomplished with JDK 8 using streams.

One subtle difference that may not be obvious (due to autoboxing) when looking at the code listing above is that the JDK implementation ArrayList stores reference Double objects while the Trove TDoubleArrayList implementation stores primitive doubles. Trove supplies implementations of lists, sets, and maps for various numeric types such as bytes, characters, shorts, integers, longs, floats, and doubles.

One of the interesting data structures/collections provided by Trove is the TDoubleArrayStack. Backed by the just-demonstrated TDoubleArrayList, the TDoubleArrayStack does not expose add methods in its API for adding elements. Rather, its methods reflect semantics one would expect in a last-in-first-out (LIFO) stack implementation: push(double) to add, pop() to access and remove the mostly recently added entry, and peek() to see the most recently added entry without removing it. Application of this stack implementation is shown in the next code listing. There are stack implementations for other numeric data types as well.

Trove's TDoubleArrayStack

/**
 * Demonstrate Trove's Double Array Stack.
 *
 * Trove's TDoubleArrayStack allows access to its
 * contents via push, pop, and peek.
 */
public void demonstrateTroveDoubleArrayStack()
{
   final TDoubleArrayStack stack = new TDoubleArrayStack();
   stack.push(15.5);
   stack.push(17.3);
   stack.push(16.6);
   stack.push(2.2);
   out.println("Trove Array Stack of Doubles");
   out.println("\tPeek: " + stack.peek() + "; After Size: " + stack.size());
   out.println("\tPop:  " + stack.pop() + "; After Size: " + stack.size());
   out.println("\tPeek: " + stack.peek() + "; After Size: " + stack.size());
}

Although not shown here, Trove also supports first-in-first-out (FIFO) queue structures for Java's primitive types in its gnu.trove.queue package. Classes in this package provide methods adhering to queue semantics: offer, poll, and peek.

The java.util.Collections class provides much useful functionality when working with JDK collections. Trove provides a subset of java.util.Collections's functionality for working with Trove-based collections in its own class called gnu.trove.TCollections. Specifically, at the time of this writing, the TCollections class provides support for synchronized and unmodified Trove collections. The next code listing demonstrates using TCollections and also demonstrates using a Trove collection oriented toward a data type other than double (int in this case) and to a different data structure type (linked list).

TCollections and TIntLinkedList Demonstrated

/**
 * Demonstrate one of Trove's "equivalent"s of the
 * java.util.Collections class.
 */
public void demonstrateTroveCollectionsClass()
{
   final TIntLinkedList integers = new TIntLinkedList();
   integers.add(5);
   integers.add(7);
   integers.add(3);
   integers.add(1);
   final TIntList unmodifiableIntegers = TCollections.unmodifiableList(integers);
   try
   {
      unmodifiableIntegers.add(15);
   }
   catch (Exception ex)
   {
      out.println("\tException caught: " + ex);
   }
}

When one wishes to iterate over a Trove-based collection, one can access it via a traditional iterator as shown in the next code listing. Although the collection and associated iterator work on long values in this example, Trove provides similar collections and iterators for Java's other primitive data types.

Using Trove Iterator to Iterate Trove Collection

/**
 * Demonstrate "traditional" iteration of a
 * Trove collection.
 */
public void demonstrateIterationWithIterator()
{
   final TLongHashSet longs = new TLongHashSet();
   longs.add(15);
   longs.add(6);
   longs.add(12);
   longs.add(13);
   longs.add(2);
   TLongIterator longIterator = longs.iterator();
   while (longIterator.hasNext())
   {
      final long longValue = longIterator.next();
      out.println(longValue);
   }
}

An alternate approach for iterating a Trove collection is to use a Procedure. This is demonstrated in the following two code listings. The first listing demonstrates a custom long-oriented Procedure and the second listing demonstrates applying that custom Procedure to iteration on a TLongLinkedList via its forEach method.

Using Trove Procedure to Iterate Trove Collection

/**
 * Demonstrate iteration of a Trove collection
 * using a Procedure.
 */
public void demonstrateIterationWithProcedure()
{
   final TLongLinkedList longs = new TLongLinkedList();
   longs.add(15);
   longs.add(6);
   longs.add(12);
   longs.add(13);
   longs.add(2);
   longs.forEach(new StandardOutputLongProcedure());
}

Procedure Implementation Used in Previous Iteration Example


package dustin.examples.trove;

import static java.lang.System.out;

import gnu.trove.procedure.TLongProcedure;

/**
 * Simple implementation of TLongProcedure that
 * iterates over Trove collection of {@code long}
 * values and writes those values to standard
 * output with one value per line.
 */
public class StandardOutputLongProcedure implements TLongProcedure
{
   @Override
   public boolean execute(long longValue)
   {
      out.println(longValue);
      return true;
   }
}

It's worth noting that Trove collections tend to provide forEachDescending methods as well to provide iteration in reverse order.

Additional Observations Related to GNU Trove

  • GNU Trove is a library providing "high speed regular and primitive collections for Java" and should not be confused with Trove that is a "Database as a Service for OpenStack."
  • The Trove collections and data structures all have names prefixed with "T" (for Trove). In fact, all the classes and interfaces in Trove start with "T" except HashingStrategy, IdentityHashingStrategy, and Version.
  • Trove collections generally provide a constructor accepting an array of their underlying data type and provide toArray() methods to provide their data elements in form of array of primitives.
  • Trove collections generally provide explicitly overridden toString() implementations that allow for the individual data elements to be easily written similar to JDK collections and differently than Java arrays (which require Arrays.toString() methods).
  • Additional Trove details can be found in the Overview, the FAQ, and the message forums. Other resources include Enhance Collection Performance with this Treasure Trove, Java HashMap Performance, High Performance Libraries in Java, and TROVE – High Performance Collections for Java.
  • Trove's Java packages are organized generally by data structure type with all primitive type specific implementations for a given data structure type in the same package. For example, packages are named like gnu.trove.list, gnu.trove.set, and so on.
  • Because each Trove collection is specific to a particular primitive data type, each collection does not require a generic parameter and has none of the issues related to generics (such as erasure). This approach also allows each collection to support methods specific to the data type that is stored in that collection. For example, collections of numeric types can provide sum methods while collections specific to character types can provide grep methods.

Conclusion

In many common uses, the JDK-provided collections will perform sufficiently well and storing the object references may not be an issue. However, there are cases where the ability to use Trove collections and particularly to store primitives rather than object references may provide necessary advantage. The advantage of storing primitives rather than their equivalent object references in collections becomes more obvious as the collection gets larger.

4 comments:

TheAlchemist said...

I much prefer fastutil (http://fastutil.di.unimi.it/), because it's still in active development, has more features, supports large sizes (> 2^32), and has better documentation.

Attila-Mihaly Balazs said...

I second @TheAlchemist's recommendation for fastutil! It's a great library.

@DustinMarx said...

TheAlchemist (and Attila-Mihaly),

Thanks for the reference. I'll check out fastutil.

By the way, I like the name of your blog ("Rantings of a Bitter Coder").

Dustin

Attila-Mihaly Balazs said...

Yep, great blog name, subscribed!