Tuesday, August 7, 2018

JDK 12, Merging Collectors, and the Challenge of Naming

It appears likely that a new method will be available on the java.util.streams.Collectors class in JDK 12 that will, according to the new method's proposed Javadoc-based documentation, "Return a Collector that passes the input elements to two specified collectors and merges their results with the specified merge function." The currently proposed name of this new Collectors method is pairing, but that new method's name has been the source of significant discussion.

The naming of this method has solicited wide discussion on the OpenJDK core-libs-dev mailing list. Although it would be easy at first thought to label this an example of bike-shedding (or Parkinson's Law of Triviality), my experience has been that proper naming can be more important than it might first seem. I've seen many situations in which there was nothing wrong with the logic of a particular implementation, but problems ensued related to use of that implementation due to miscommunication or bad assumptions tied to poorly named code constructs. For a major API in the JDK, it's not so surprising after all that the name of this method parameter would be so seriously considered.

The discussion began with Peter Levart's post "BiCollector" (June 17), in which he opened with the question, "Have you ever wanted to perform a collection of the same Stream into two different targets using two Collectors?" Levart included an example of an implementation of such a "BiCollector" and asked if this was the type of thing that might be added to the JDK. Not surprisingly, it turns out that this is desired by others and some alternate existing implementations (Kirk Pepperdine and Tagir Valeev's streamex implementation) were mentioned.

After discussion regarding the multiple implementations of the "BiCollector," Tagir Valeev created an OpenJDK "preliminary webrev of my own implementation" and put it out for review (June 15). In that post, Valeev specifically called out that he had made up the name "pairing" for the method and added, "as I'm not a native English speaker I cannot judge whether it's optimal, so better ideas are welcome." That "opened the flood gates!"

Although there was some interesting and significant discussion surrounding other implementation details of the proposed "BiCollector" (now in proposed code as "Collectors.pairing(...)," the naming of the method received the most contributions. In a June 21 post, Valeev summarized the proposed names with accompanying comments about each recommendation and I have reproduced that list (but without the insightful comments) here:

For those interested in arguments "for" and "against" the above proposed names, I recommend viewing Valeev's original post. Most of the posts linked to above with the name suggestions provide arguments for their favored name and there is some interesting insight into what OpenJDK contributors think what aspects in a method name might aid or hinder understanding of the method.

After the excitement of naming the method, discussion died down for a while on this addition to the Collectors API until Valeev posted a "ping message" today with a link to the latest webrev for review (changes @since 11 to @since 12). In response to this "ping" message, there is feedback regarding the name of the last argument to the proposed method (currently named "finisher"), which is another reminding of the importance of naming to many of us.

Other posts on this topic on the core-libs-dev mailing list remind us that for this new method to be added to the Collectors public API, a few things still need to happen that include a sponsor volunteering to review and sponsor the change set as well as the need for a CSR (Compatibility & Specification Review) and "a couple of reviewers that are fully aware of Streams design."

A Brian Goetz post on this thread summarizes why naming this proposed method is so difficult:

The essential challenge in naming here is that this Collector does two (or maybe three) things: duplicate the stream into two identical streams ("tee"), sends each element to the two collectors ("collecting"), and then combines the results ("finishing"). So all the one-word names (pairing, teeing, unzipping, biMapping) only emphasize one half of the operation, and names that capture the full workflow accurately (teeingAndCollectingAndThen) are unwieldy.

That same Goetz post argues against "merging" (or its derivatives) for the method's name because "names along the lines of 'merging' may incorrectly give the idea that the merge is happening elementwise, rather than duplicating the streams, collecting, and merging the results."

I find several of the proposed method names to be reasonable, but there a couple that I believe (hope) were made out of an attempt at humor.

JDK-8205461 ["Create Collector which merges results of two other collectors"] is the "Enhancement" "bug" describing this issue. Its description currently begins with, "Add a new Collector into Collectors class which merges results of two other collectors" before explicitly stating, "One API method should be added (name is subject to discussion)." If you've ever wanted to name a method in a public JDK API, this might be your opportunity!

I have used this blog post in attempt to accomplish two things:

  1. Create awareness of this method that is likely to be available in the public API as of JDK 12
  2. Present an example of why naming is important and why it can be as difficult as the technical implementation
    • Proper naming can be difficult for anyone, even those of us who are native English speakers!

Although one or more names in the implementation may change, the currently proposed implementation is logically likely very close to what will eventually be delivered in conjunction with JDK-8205461.

3 comments:

@DustinMarx said...

The method discussed in this post was added as "teeing" to the Collectors class in JDK 12 via JDK 12 Early Access Build #13.

@DustinMarx said...

"Introducing JDK 12 Teeing" is a recently published post with details regarding use of Collectors.teeing(-). Another useful overview is "Teeing Collector in Java 12."

@DustinMarx said...

"Teeing, a hidden gem in the Java API" has recently been posted.