Monday, April 15, 2019

April 2019 Update on Java Records

After Project Valhalla's "Value Types/Objects", the language feature I am perhaps the most excited to see come to Java is Project Amber's "Data Classes" (AKA "Records"). I wrote the post "Updates on Records (Data Classes for Java)" about this time last year and use this post to provide an update on my understanding of where the "records" proposal is now.

A good starting point for the current state of the "records" design work is Brian Goetz's February 2019 version of "Data Classes and Sealed Types for Java." In addition to providing background on the usefulness of "plain data carriers" being implemented with less overhead than with traditional Java classes and summarizing design decisions related to achieving that goal, this post also introduces noted Java developer personas Algebraic Annie, Boilerplate Billy, JavaBean Jerry, POJO Patty, Tuple Tommy, and Values Victor.

Here are some key observations that Goetz makes in the "Data Classes and Sealed Types for Java" document.

  • "Java asks all classes ... to pay equally for the cost of encapsulation -- but not all classes benefit equally from it."
  • Because "the cost of establishing and defending these boundaries ... is constant across classes, but the benefit is not, the cost may sometimes be out of line with the benefit."
  • "This is what Java developers mean by too much ceremony' -- not that the ceremony has no value, but that they're forced to invoke it even when it does not offer sufficient value."
  • "The encapsulation model that Java provides -- where the representation is entirely decoupled from construction, state access, and equality -- is just more than many classes need."
  • "... we prefer to start with a semantic goal: modeling data as data."
  • "The API for a data class models the state, the whole state, and nothing but the state. One consequence of this is that data classes are transparent; they give up their data freely to all requestors."
  • "We propose to surface data classes in the form of records; like an enum, a record is a restricted form of class. It declares its representation, and commits to an API that matches that representation. We pair this with another abstraction, sealed types, which can assert control over which other types may be its subclasses."
  • "Records use the same tactic as enums for aligning the boilerplate-to-information ratio: offer a constrained version of a more general feature that enables standard members to be derived. ... For records, we make a similar trade; we give up the flexibility to decouple the classes API from its state description, in return for getting a highly streamlined declaration (and more)."
  • Restrictions on currently proposed records include: "record fields cannot be mutable; no fields other than those in the state description are permitted; and records cannot extend other types or be extended."
  • "... an approach that is focused exclusively on boilerplate reduction for arbitrary code is guaranteed to merely create a new kind of boilerplate."
  • "...records are not intended to replace JavaBeans, or other mutable aggregates..."

One section of Goetz's post provides an overview of likely use cases for records. These usage cases (which include descriptions in the Goetz post) include multiple return values (something that Java developers seem to frequently use custom or library-provided tuples for), Data Transfer Objects (DTOs), compound map keys, messages, and value wrappers.

Goetz specifically addresses the question related to the records proposal, "Why not 'just' do tuples?" Goetz answers his own question with multiple reasons for using the data class/record concept rather than simply adding tuples to Java. I'm generally not a fan of tuples because I think they reduce the readability of Java code and, especially if the values in the tuple have the same data type, can lead to subtle errors. Goetz articulates similar thinking, "Classes and class members have meaningful names; tuples and tuple components do not. A central aspect of Java's philosophy is that names matter; a Person with properties firstName and lastName is clearer and safer than a tuple of String and String." I prefer getFirstName() and getLastName() to getLeft() and getRight() or to getX() and getY(), so this resonates with me.

Another section of the Goetz document that I want to emphasize is the section headlined "Are records the same as value types?" This section compares Project Valhalla's value types to Project Amber's data classes. Goetz writes, "Value types are primarily about enabling flat and dense layout of objects in memory. In exchange for giving up object identity ..., the runtime gains the ability to optimize the heap layout and calling conventions for values. With records, in exchange for giving up the ability to decouple a classes API from its representation, we gain a number of notational and semantic benefits. ... some values may still benefit from state encapsulation, and some records may still benefit from identity, so they are not the exact same trade."

There has been more discussion on the amber-spec-experts mailing list this week regarding what to call "data classes." Naming is important and notoriously difficult in software development and I appreciate this discussion because the arguments for various names have helped me to understand what the current thinking is about what "data classes" are currently envisioned to be and not to be. Here are some excerpts from this enlightening thread:

  • RĂ©mi Forax likes "named tuples" because data classes are immutable and have some commonality with "nominal tuples."
  • Brian Goetz likes starting with "records are just nominal tuples" to "avoid picking that fight" between different groups of people with "two categories of preconceived notions" of what a tuple is.
  • Kevin Bourrillion adds, "Records have semantics, which makes them 'worlds' different from tuples. ... I think it's fair to say that all a record 'holds' is a 'tuple', but it's so much more. Record is to tuple as enum is to int."
  • Guy Steele adds, "Java `record` is to C `struct` as Java `enum` is to C `enum`."

I continue looking forward to getting "data classes" in Java at some point in the future and appreciate the effort being put into ensuring their successful adoption when added. When transitioning from C++ to Java, I missed the enum greatly, but the wait was worth it when Java introduced its own (more powerful and safer) enum. I hope for a similar feeling about Java data classes/records when we get to start using them.

3 comments:

@DustinMarx said...

Joe Darcy has posted "Design notes of and request for feedback on preliminary javax.lang.model support for records and sealed types."

@DustinMarx said...

Draft JEP "Records and Sealed Types" was created late last week. The "Summary" includes these definitions: "Records provide a compact syntax for declaring classes which are transparent holders for shallowly immutable data; sealed types provide a means for declaring classes and interfaces that can restrict who their subtypes are."

@DustinMarx said...

In the post "on implementing state components as a first class concept," Brian Goetz writes a simple but significant sentence: "Records are classes."