Monday, July 30, 2018

Memory-Hogging Enum.values() Method

I'm a big fan of Java's enum. It seemed like we waited forever to get it, but when we did finally get it (J2SE 5), the enum was so much better than that provided by C and C++ that it seemed to me "well worth the wait." As good as the Java enum is, it's not without issues. In particular, the Java enum's method values() returns a new copy of an array representing its possible values each and every time it is called.

The Java Language Specification spells out enum behavior. In The Java Language Specification Java SE 10 Edition, it is Section 8.9 that covers enums. Section 8.9.3 ("Enum Members") lists two " implicitly declared methods": public static E[] values() and public static E valueOf(String name). Example 8.9.3-1 ("Iterating Over Enum Constants With An Enhanced for Loop") demonstrates calling Enum.values() to iterate over an enum. The problem, however, is that Enum.values() returns an array and arrays in Java are mutable [Section 10.9 ("An Array of Characters Is Not a String") of the Java Language Specification reminds us of this when differentiating between a Java string and a Java array of characters.]. Java enums are tightly immutable, so it makes sense that the enum must return a clone of the array returned by the values() method each time that method is called to ensure that the array associated with the enum is not changed.

A recent post on the OpenJDK compiler-dev mailing list titled "about Enum.values() memory allocation" observes that "Enum.values() allocates a significant amount of memory when called in a tight loop as it clones the constant values array." The poster of that message adds that this "is probably for immutability" and states, "I can understand that." This message also references a March 2012 message and associated thread on this same mailing list.

The two threads on the compiler-dev mailing list include a few interesting currently available work-arounds for this issue.

Brian Goetz's message on this thread starts with the sentence, "This is essentially an API design bug; because values() returns an array, and arrays are mutable, it must copy the array every time." [Goetz also teases the idea of "frozen arrays" (Java arrays made immutable) in that message.]

This issue is not a new one. William Shields's December 2009 post "Mutability, Arrays and the Cost of Temporary Objects in Java" states, "The big problem with all this is that Java arrays are mutable." Shields explains old and well-known problems of mutability in the Java Date class before writing about the particular issue presented b Enum.values():

Java enums have a static method called values() which returns an array of all instances of that enum. After the lessons of the Date class, this particular decision was nothing short of shocking. A List would have been a far more sensible choice. Internally this means the array of instances must be defensively copied each time it is called...

Other references to this issue include "Enums.values() method" (Guava thread) and "Java’s Enum.values() Hidden Allocations" (shows caching the array returned by Enum.values()). There is also a JDK bug written on this: JDK-8073381 ("need API to get enum's values without creating a new array").

Some of the currently available work-arounds discussed in this post are illustrated in the next code listing, which is a simple Fruit enum that demonstrates caching the enum's values in three different formats.

Fruit.java Enum with Three Cached Sets of 'Values'

package dustin.examples.enums;

import java.util.EnumSet;
import java.util.List;

/**
 * Fruit enum that demonstrates some currently available
 * approaches for caching an enum's values so that a new
 * copy of those values does not need to be instantiated
 * each time .values() is called.
 */
public enum Fruit
{
   APPLE("Apple"),
   APRICOT("Apricot"),
   BANANA("Banana"),
   BLACKBERRY("Blackberry"),
   BLUEBERRY("Blueberry"),
   BOYSENBERRY("Boysenberry"),
   CANTALOUPE("Cantaloupe"),
   CHERRY("Cherry"),
   CRANBERRY("Cranberry"),
   GRAPE("Grape"),
   GRAPEFRUIT("Grapefruit"),
   GUAVA("Guava"),
   HONEYDEW("Honeydew"),
   KIWI("Kiwi"),
   KUMQUAT("Kumquat"),
   LEMON("Lemon"),
   LIME("Lime"),
   MANGO("Mango"),
   ORANGE("Orange"),
   PAPAYA("Papaya"),
   PEACH("Peach"),
   PEAR("Pear"),
   PLUM("Plum"),
   RASPBERRY("Raspberry"),
   STRAWBERRY("Strawberry"),
   TANGERINE("Tangerine"),
   WATERMELON("Watermelon");

   private String fruitName;

   Fruit(final String newFruitName)
   {
      fruitName = newFruitName;
   }

   /** Cached fruits in immutable list. */
   private static final List<Fruit> cachedFruitsList = List.of(Fruit.values());

   /** Cached fruits in EnumSet. */
   private static final EnumSet<Fruit> cachedFruitsEnumSet = EnumSet.allOf(Fruit.class);

   /** Cached fruits in original array form. */
   private static final Fruit[] cachedFruits = Fruit.values();

   public static List<Fruit> cachedListValues()
   {
      return cachedFruitsList;
   }

   public static EnumSet<Fruit> cachedEnumSetValues()
   {
      return cachedFruitsEnumSet;
   }

   public static Fruit[] cachedArrayValues()
   {
      return cachedFruits;
   }
}

The fact that Enum.values() must clone its array each time it is called is really not a big deal in many situations. That stated, it's not difficult to envision cases where it would be useful to invoke Enum.values() repeatedly in a "tight loop" and then the copying of the enum values into an array each time would start to have a noticeable impact on memory used and the issues associated with greater memory use. It would be nice to have a standard approach to accessing an enum's values in a more memory efficient manner. The two threads previously mentioned discuss some ideas for potentially implementing this capability.

1 comment:

@DustinMarx said...

Joe Darcy's message on the compiler-dev mailing list provides some historical context on the design of Enum's values() method: "At the time in JDK 5.0, there was a concern about using java.util APIs in java.lang.* classes. Now that both java.lang and java.util packages are ensconced in the java.base module, such concerns are no longer germane."