Saturday, June 15, 2019

History and Motivations Behind Java's Maligned Serialization

Issues related to Java's serialization mechanism are well-advertised. The entire last chapter of Effective Java 1st Edition (Chapter 10) and of Effective Java 2nd Edition (Chapter 11) are dedicated to the subject of serialization in Java. The final chapter of Effective Java 3rd Edition (Chapter 12) is still devoted to serialization, but includes a new item (Item 85) that goes even further emphasize two assertions related to Java serialization:

  • "The best way to avoid serialization exploits is to never deserialize anything."
  • "There is no reason to use Java serialization in any new system you write."

In the recently released document "Towards Better Serialization," Brian Goetz "explores a possible direction for improving serialization in the Java Platform." Although the main intention of this document is to propose potential new direction for Java serialization, it is an "exploratory document only and does not constitute a plan for any specific feature." This means that it is an interesting read for the direction Java serialization might take, but there is significant value in reading this document for a summary of Java serialization as it currently exists and how we got to this place. That is the main theme of the rest of my post in which I'll reference and summarize sections of "Towards Better Serialization" that I feel best articulate the current issues with Java's serialization mechanism and why we have these issues.

Goetz opens his document's "Motivation" section with an attention-grabbing paragraph on the "paradox" of Java serialization:

Java's serialization facility is a bit of a paradox. On the one hand, it was probably critical to Java's success --- Java would probably not have risen to dominance without it, as serialization enabled the transparent remoting that in turn enabled the success of Java EE. On the other hand, Java's serialization makes nearly every mistake imaginable, and poses an ongoing tax (in the form of maintenance costs, security risks, and slower evolution) for library maintainers, language developers, and users.

The other paragraph in the "Motivation" section of the Goetz document distinguishes between the general concept of serialization and the specific design of Java's current serialization mechanism:

To be clear, there's nothing wrong with the concept of serialization; the ability to convert an object into a form that can be easily transported across JVMs and reconstituted on the other side is a perfectly reasonable idea. The problem is with the design of serialization in Java, and how it fits (or more precisely, does not fit) into the object model.

Goetz states that "Java's serialization [mistakes] are manifold" and he outlines the "partial list of sins" committed by Java's serialization design. I highly recommend reading the original document for the concise and illustrative descriptions of these "sins" that I only summarize here.

  • "Pretends to be a library feature, but isn't."
    • "Serialization pretends to be a library feature. ... In reality, though, serialization extracts object state and recreates objects via privileged, extralinguistic mechanisms, bypassing constructors and ignoring class and field accessibility."
  • "Pretends to be a statically typed feature, but isn't."
    • "Serializability is a function of an object's dynamic type, not its static type."
    • "implements Serializable doesn't actually mean that instances are serializable, just that they are not overtly serialization-hostile."
  • "The compiler won't help you" identify "all sorts of mistakes one can make when writing serializable classes"
  • "Magic methods and fields" are "not specified by any base class or interface) that affect the behavior of serialization"
  • "Woefully imperative."
  • "Tightly coupled to encoding."
  • "Unfortunate stream format" that is "neither compact, nor efficient, nor human-readable."

Goetz also outlines the ramifications of these Java serialization design decisions (see the original document for more background on each of these "serious problems"):

  • "Cripples library maintainers."
    • "Library designers must think very carefully before publishing a serializable class --- as doing so potentially commits you to maintaining compatibility with all the instances that have ever been serialized."
  • "Makes a mockery of encapsulation."
    • "Serialization constitutes an invisible but public constructor, and an invisible but public set of accessors for your internal state."
  • "Readers cannot verify correctness merely by reading the code."
    • "But because serialization constitutes a hidden public constructor, you have to also reason about the state that objects might be in based on previous versions of the code."
    • "By bypassing constructors, serialization completely subverts the integrity of the object model."
  • "Too hard to reason about security."
    • "The variety and subtlety of security exploits that target serialization is impressive; no ordinary developer can keep them all in their head at once."
  • "Impedes language evolution."
    • "Complexity in programming languages comes from unexpected interactions between features, and serialization interacts with nearly everything."
    • "Serialization is an ongoing tax on evolving the language."

Perhaps my favorite section of Goetz's "Toward Better Serialization" document is the section "The underlying mistake" because the items that Goetz outlines in this section are common reasons for mistakes in other Java code I've written, read, and worked with. In other words, while Goetz is specifically discussion how these design decisions led to problems for Java's serialization mechanism, I have (unsurprisingly) found these general design decisions to cause problems in other areas as well.

Goetz opens the section "The underlying mistake" with this statement: "Many of the design errors listed above stem from a common source --- the choice to implement serialization by 'magic' rather than giving deconstruction and reconstruction a first-class place in the object model itself." I have found "magic" code written by other developers and even myself at a later date to often be confusing and difficult to reason. I've definitely realized that clean, explicit code is often preferable.

Goetz adds, "Worse, the magic does its best to remain invisible to the reader." Invisible "magic" designs often seem clever when we first implement them, but then cause developers who must read, maintain, and change the code a lot of pain when they suddenly need some visibility to the underlying magic.

Goetz cites Edsger W.Dijkstra and writes, "Serialization, as it is currently implemented, does the exact opposite of minimizing the gap between the text of the program and its computational effect; we could be forgiven for mistakenly assuming that our objects are always initialized by the constructors written in our classes, but we shouldn't have to be".

Goetz concludes "The underlying mistake" section withe a paragraph that begins, "In addition to trying to be invisible, serialization also tries to do too much. Although Goetz is writing particularly about Java's serialization currently "serializing programs [rather than] merely serializing data," I have seen this issue countless times in a more general sense. It is tempting for we developers to design and implement code that performs every little feature we think might be useful to someone at some point even if the vast majority of (or even all currently known) users and use cases only require a simpler subset of the functionality.

Given that the objective of "Towards Better Serialization" is to "explore a possible direction for improving serialization in the Java Platform," it's not surprising that the document goes into significant detail about design and even implementation details that might influence Java's future serialization mechanism. In addition, the Project Amber mailing lists (amber-dev and amber-spec-experts) also have significant discussion on possible future direction of Java serialization. However, the purpose of my post here is not to look at the future of Java's serialization, but to instead focus on how this document has nicely summarized Java's current serialization mechanism and its history.

Although the previously mentioned Project Amber mailing lists messages focus on the potential future of Java's serialization mechanism, there are some interesting comments in these posts about Java's current serialization that add to what Goetz summarized in "Toward Better Serialization." Here are some of the most interesting:

  • Goetz's post that announced "Toward Better Serialization" states that the proposal "addresses the risks of serialization at their root" and "brings object serialization into the light, where it needs to be in order to be safer."
  • Brian Goetz post reiterates through implication that big part of problems with Java's serialization today is constructing objects without invoking a constructor: "our main security goal [is to allow] deserialization [to] proceed through constructors."
  • Stuart Marks's post states, "The line of reasoning about convenience in the proposal is not that convenience itself is evil, but that in pursuit of convenience, the original design adopted extralinguistic mechanisms to achieve it. This weakens some of the fundamentals of the Java platform, and it has led directly to several bugs and security holes, several of which I've fixed personally."
    • Marks outlines some specific examples of subtle bugs in the JDK due to serialization-related design decisions.
    • Marks outlines the explicit and specific things a constructor must do ("bunch of special characteristics") that are circumvented when current deserialization is used.
    • He concludes, "THIS is the point of the proposal. Bringing serialization into the realm of well-defined language constructs, instead of using extralinguistic 'magic' mechanisms, is a huge step forward in improving quality and security of Java programs."
  • Kevin Bourrillion's post states, "Java's implementation of serialization has been a gaping wound for a long time" and adds that "every framework to support other wire formats has always had to start from scratch."

I highly recommend reading "Towards Better Serialization" to anyone interested in Java serialization regardless of whether their primary interest is Java's current serialization mechanism or what it might one day become. It's an interesting document from both perspectives.

Monday, June 10, 2019

JDK 13: VM.events Added to jcmd

CSR (Compatibility and Specification Review) JDK-8224601 ["Provide VM.events diagnostic command"] is implemented in JDK 13 as of JDK 13 Early-Access Build #24 (dated 2019/6/6) and was added via Enhancement JDK-8224600 ["Provide VM.events command"]. The CSR's "Summary" describes this enhancement: "Add a VM.events command to jcmd to display event logs." The CSR's "Solution" states, "Add a command to jcmd to print out event logs. The proposed name is 'VM.events'."

The "Problem" section of CSR JDK-8224601 explains the value achieved from adding VM.events to the already multi-functioning jcmd: "Event logs are a valuable problem analysis tool. Right now the only way to see them is via hs-err file in case the VM died, or as part of the VM.info output."

To demonstrate jcmd's new VM.events in action, I downloaded JDK 13 Early Access Build #24 and used it to compile a simple, contrived Java application that I could then run jcmd tool delivered with that same JDK 13 Early Access Build #24 against.

The first screen snapshot shown here demonstrates using jcmd to detect the PID of the simple Java application and using jcmd <pid> help to see what jcmd options are available for that particular running Java process. The presence of VM.events is highlighted.

The next screen snapshot demonstrates applying jcmd <pid> help VM.events to see the usage (including available options) for the newly added VM.events command.

The final screen snapshot demonstrates application of jcmd's new VM.events command by showing the top (most) portion of the output from running that command without any options.

The options for the VM.events command are to narrow down results to a specified log to be printed or to limit the number of events shown. By not specifying any options, I was implicitly requesting the default of all logs and all events.

In the last displayed screen snapshot, we could see that types of JVM events rendered in the output include "compilation events", "deoptimization events", garbage collection events, classes unloaded, classes redefined, and classes loaded.

I have been a big fan of jcmd for a number of years and believe it is still generally an underappreciated command-line tool for many Java developers. The addition of the VM.events command in JDK 13 makes the tool even more useful for diagnosing a wider variety of issues.

Tuesday, May 28, 2019

Google Plus's Demise Impacts Software Development

I didn't think much about it when Google announced in late 2018 "sunsetting the consumer version of Google+." Although I had a Google+ account, I didn't really use it and didn't think I'd miss it. Although I had recognized that the demise of other online resources (Google Code in 2015, Codehaus in 2015, GeoCities in 2009, Google's Knol in 2012, Dr. Dobb's in 2014) would impact my ability as a software developer to access previously published online material, I didn't consider that this could be the case for Google+. Although some of the resources mentioned earlier have left read-only content in place, it appears that Google+ content has already been removed.

The "consumer (personal) version of Google+" was shut down in early April of this year and I ran into my first link that now points only to mention of "Google+ is no longer available for consumer (personal) and brand accounts" rather than to the original content available there. The article I was attempting to read was Jean-Baptiste 'JBQ' Quéru's 2011 post "Dizzying but invisible depth."

The "Dizzying but invisible depth" post is one that that I highly recommend because it describes well how too many layers of abstraction eventually make it difficult or impossible for any one person or even any one team to understand how the entire system works. Here is one paragraph from that post:

Once you start to understand how our modern devices work and how they're created, it's impossible to not be dizzy about the depth of everything that's involved, and to not be in awe about the fact that they work at all, when Murphy's law says that they simply shouldn't possibly work.

After spending a couple of minutes trying to find an alternate version of this post, I decided to spend my time using the Wayback Machine (Internet Archive) to access the post. The Internet Archive Wayback Machine has been useful to me before and it was again this time. I was able to find a snapshot of the "Dizzying but invisible depth" post. I saved a copy of it this time.

The post I referenced here was available on the Internet Archive Wayback Machine and may be available from some other site. However, there may be other software development related resources previously hosted on Google+ that are no longer available. This is a reminder to me of the fragility of online resources. Because of this fragility, I try to save at least the links to articles and posts of interest to me so that I have a chance of accessing the article or post via the Internet Archive Wayback Machine if the original site goes away. In the case of particularly useful posts and articles, I will sometimes go as far as saving it to a PDF for future reference.

Tuesday, May 21, 2019

Explicit No-Arguments Constructor Versus Default Constructor

Most developers new to Java quickly learn that a "default constructor" is implicitly created (by javac) for their Java classes when they don't specify at least one explicit constructor. Section 8.8.9 of the Java Language Specification succinctly states, "If a class contains no constructor declarations, then a default constructor is implicitly declared." That section further describes characteristics of the implicitly created default constructor including it having no parameters, having no throws clause, and invoking the constructor of its super class that similarly accepts no arguments. A Java developer can choose to explicitly implement a no-arguments constructor that is similar to the default constructor (such as accepting no arguments and having no throws clause). In this post, I look at some reasons a developer might decide to implement an explicit no-arguments constructor rather than relying on the implicit default constructor.

Some Reasons to Explicitly Specify No-Arguments Constructors

Preclude Instantiation of a Class

A common reason for implementing an explicit no-arguments constructor is to preclude the default constructor from being implicitly created with public accessibility. This is an unnecessary step if the class has other explicit constructors (that accept parameters) because the presence of any explicit constructor will prevent the implicit default constructor from being generated. However, if there is no other explicit constructor present (such as in a "utility" class with all static methods), the implicit default constructor can be precluded by implement an explicit no-arguments constructor with private access. Section 8.8.10 of the Java Language Specification describes use of all private explicit constructors to prevent instantiation of a class.

Force Class Instantiation via Builder or Static Initialization Factory

Another reason to explicitly implement a private no-arguments constructor is to force instantiation of an object of that class via static initialization factory methods or builders instead of constructors. The first two items of Effective Java (Third Edition) outline advantages of using static initialization factory methods and builders over direct use of constructors.

Multiple Constructors Required Including No-arguments Constructor

An obvious reason for implementing a no-arguments constructor that might be as common or even more common than the reason discussed above is when a no-arguments constructor is needed, but so are constructors that expect arguments. In this case, because of the presence of other constructors expecting arguments, a no-arguments constructor must be explicitly created because a default constructor is never implicitly created for a class that already has one or more explicit constructors.

Document Object Construction with Javadoc

Another reason for explicitly implementing a no-arguments constructor rather than relying on the implicitly created default constructor is to express Javadoc comments on the constructor. This is the stated justification for JDK-8224174 ("java.lang.Number has a default constructor") that is now part of JDK 13 and is also expressed in currently unresolved JDK-8071961 ("Add javac lint warning when a default constructor is created"). Recently written CSR JDK-8224232 ("java.lang.Number has a default constructor") elaborates on this point: "Default constructors are inappropriate for well-documented APIs."

Preference for Explicit Over Implicit

Some developers generally prefer explicit specification over implicit creation. There are several areas in Java in which a choice can be made between explicit specification or the implicit counterpart. Developers might prefer an explicit no-arguments constructor over an implicit constructor if they value the communicative aspect or presumed greater readability of an explicit constructor.

Replacing Default Constructors with Explicit No-Arguments Constructors in the JDK

There are cases in the JDK in which implicit default constructors have been replaced with explicit no-arguments constructors. These include the following:

  • JDK-8071959 ("java.lang.Object uses implicit default constructor"), which was addressed in JDK 9, replaced java.lang.Object's "default constructor" with an explicit no-arguments constructor. Reading the "Description" of this issue made me smile: "When revising some documentation on java.lang.Object (JDK-8071434), it was noted that the class did *not* have an explicit constructor and instead relied on javac to create an implicit default constructor. How embarrassing!"
  • JDK-8177153 ("LambdaMetafactory has default constructor"), which was addressed in JDK 9, replaced an implicit default constructor with an explicit (and private) no-arguments constructor.
  • JDK-8224174 ("java.lang.Number has a default constructor"), which is planned for JDK 13, will replace java.lang.Number's implicit default constructor with an explicit no-arguments constructor.

Potential javac lint Warning Regarding Default Constructors

It is possible that one day javac will have an available lint warning to point out classes with default constructors. JDK-8071961 ("Add javac lint warning when a default constructor is created"), which is not currently targeted for any specific JDK release, states: "JLS section 8.8.9 documents that if a class does not declare at least one constructor, the compiler will generate a constructor by default. While this policy may be convenient, for formal classes it is a poor programming practice, if for no other reason that the default constructor will have no javadoc. Use of a default constructor may be a reasonable javac lint warning."

Conclusion

Relying on default constructors to be created at compile time is definitely convenient, but there are situations in which it may be preferable to explicitly specify a no-arguments constructor even when explicit specification is not required.

Tuesday, May 14, 2019

Java Text Blocks

In the 13 May 2019 post "RFR: Multi-line String Literal (Preview) JEP [EG Draft]" on the OpenJDK amber-spec-experts mailing list, Jim Laskey announced a draft feature JEP named "Text Blocks (Preview)" (JDK-8222530).

Laskey's post opens with (I've added the links), "After some significant tweaks, reopening the JEP for review" and he is referring to the draft JEP that was started after the closing/withdrawing of JEP 326 ["Raw String Literals (Preview)"] (JDK-8196004). Laskey explains the most recent change to the draft JEP, "The most significant change is the renaming to Text Blocks (I'm sure it will devolve over time Text Literals or just Texts.) This is primarily to reflect the two-dimensionality of the new literal, whereas String literals are one-dimensional." This post-"raw string literals" draft JEP previously referred to "multi-line string literals" and now refers to "text blocks."

The draft JEP "Text Blocks (Preview)" provides detailed overview of the proposed preview feature. Its "Summary" section states:

Add text blocks to the Java language. A text block is a multi-line string literal that avoids the need for most escape sequences, automatically formats the string in predictable ways, and gives the developer control over format when desired. This will be a preview language feature.

This is a follow-on effort to explorations begun in JEP 326, Raw String Literals (Preview).

The draft JEP currently lists three "Goals" of the JEP and I've reproduced the first two here:

  1. "Simplify the task of writing Java programs by making it easy to express strings that span several lines of source code, while avoiding escape sequences in common cases."
  2. "Enhance the readability of strings in Java programs that denote code written in non-Java languages."

The "Non-Goals" of this draft JEP are also interesting and the two current non-goals are reproduced here:

  1. "It is not a goal to define a new reference type (distinct from java.lang.String) for the strings expressed by any new construct."
  2. "It is not a goal to define new operators (distinct from +) that take String operands."

The current "Description" of the draft JEP states:

A text block is a new kind of literal in the Java language. It may be used to denote a string anywhere that a string literal may be used, but offers greater expressiveness and less accidental complexity.

A text block consists of zero or more content characters, enclosed by opening and closing delimiters.

The draft JEP describes use of "fat delimiters" ("three double quote characters": ===) in the opening delimiter and closing delimiter that mark the beginning and ending of a "text block." As currently proposed, the text block actually begins on the line following the line terminator of the line with the opening delimiter (which might include spaces). The content of the text block ends with the final character before the closing delimiter.

The draft JEP describes "text block" treatment of some special characters. It states, 'The content may include " characters directly, unlike the characters in a string literal.' It also states that \" and \n are "permitted, but not necessary or recommended" in a text block. There is a section of this draft JEP that shows examples of "ill-formed text blocks."

There are numerous implementation details covered in the draft JEP. These include "compile-time processing" of line terminators ("normalized" to "to LF (\u000A)"), incidental white space (differentiation of "incidental white space from essential white space" and use of String::indent for custom indentation management), and escape sequences ("any escape sequences in the content are interpreted" per Java Language Specification and use of String::translateEscapes for custom escape processing).

Newly named "Java Text Blocks" look well-suited for the stated goals and the current proposal is the result of significant engineering effort. The draft JEP is approachable and worth reading for many details I did not cover here. Because this is still a draft JEP, it has not been proposed as a candidate JEP yet and has not been targeted to any specific Java release.

Monday, April 29, 2019

A New Era for Determining Equivalence in Java?

Liam Miller-Cushon has published a document simply called "Equivalence" in which he proposes "to create a library solution to help produce readable, correct, and performant implementations of equals() and hashCode()." In this post, I summarize some reasons why I believe this proposal is worth reading for most Java developers even if the proposal never gets implemented and why the proposal's implementation would benefit all Java developers if realized.

Miller-Cushon opens his proposal with a single-sentence paragraph: "Correctly implementing equals() and hashCode() requires too much ceremony." The proposal points out that today's powerful Java IDEs do a nice job of generating these methods, but that there is still code to be read and maintained. The proposal also mentions that "over time these methods become a place for bugs to hide." I have been on the wrong end more than once of particularly insidious bugs caused by an error in one of these methods and these can be tricky to detect.

All three editions of "Effective Java" provide detailed explanation and examples for how to write effective implementations of these methods, but it's still easy to get them wrong. The JDK 7 (Project Coin)-introduced methods Objects.equals(Object, Object) and Objects.hash(Object...) have helped considerably (especially in terms of readability and dealing with nulls properly), but there are still errors made in implementations of Object.equals(Object) and Object.hashCode().

Even if this "Equivalence" proposal never comes to fruition, there is some value in reading Miller-Cushon's document. One obvious benefit of this document is its capturing of "Examples of bugs in equals and hashCode implementations." There are currently nine bullets in this section describing the "wide array of bugs in implementations of equals and hashCode methods" that were often identified only when "static analysis to prevent these issues" was performed. These examples serve as a good reminder of the things to be careful about when writing implementations of these methods and also reminds us of the value of static analysis (note that Miller-Cushon is behind the static analysis tool error-prone).

Reading of the "Equivalence" document can also be enlightening for those wanting to better understand the related issues one should think about when developing the equivalence concept in Java. Through sets of questions in the "Requirements" and "Design Questions" sections, the document considers trade-offs and implementation choices that would need to be made. These cover topics such as how to handle nulls, instanceof versus getClass(), and the relationship to Comparator. Many of these considerations should probably be made today by Java developers implementing or maintaining their own implementations of equals(Object) and hashCode().

The "Related reading" section of the "Equivalence" document provides links to some interesting reading that includes the 2009 classic article "How to Write an Equality Method in Java" and Rémi Forax's ObjectSupport class (which delegates to ObjectSupports in some cases).

The "Equivalence" proposal was presented on the OpenJDK amber-spec-experts mailing list in a post title "A library for implementing equals and hashCode" and some of the feedback on that mailing list has led to updates to the document. One particularly interesting sentence for me in this discussion is Brian Goetz's statement, "That people routinely implement equals/hashCode explicitly is something we would like to put in the past." That seems like a welcome change!

Saturday, April 27, 2019

Two JEPs Proposed for JDK 13: Enhancing AppCDS and ZGC

Two JDK Enhancement Proposals (JEPs) were proposed for JDK 13 this week on the OpenJDK jdk-dev mailing list. Mark Reinhold posted these proposals in messages with titles that indicate the JEP topic: "JEP proposed to target JDK 13: 350: Dynamic CDS Archives" and "JEP proposed to target JDK 13: 351: ZGC: Uncommit Unused Memory".

The "Summary" of proposed JEP 350 ["Dynamic CDS Archives"] states, "Extend application class-data sharing to allow the dynamic archiving of classes at the end of Java application execution. The archived classes will include all loaded application classes and library classes that are not present in the default, base-layer CDS archive." JEP 310 introduced "Application Class-Data Sharing" (AKA "AppCDS") via JDK-8185996 and in conjunction with JDK 10.

JEP 351 ["ZGC: Uncommit Unused Memory"]'s "Summary" section states simply, "Enhance ZGC to return unused heap memory to the operating system." The "Motivation" section adds more background details, "ZGC does not currently uncommit and return memory to the operating system, even when that memory has been unused for a long time. This behavior is not optimal for all types of applications and environments, especially those where memory footprint is a concern." "ZGC" refers to the "Z Garbage Collector" and more details regarding it can be found on the OpenJDK ZGC page and on the ZGC Wiki page. The main project page states, "The goal of this project is to create a scalable low latency garbage collector capable of handling heaps ranging from a few gigabytes to multi terabytes in size, with GC pause times not exceeding 10ms."

Both proposed JEPs will be officially targeted for JDK 13 next week if no objections are raised or if any raised objections are "satisfactorily answered."