Thursday, October 6, 2011

JavaOne 2011: Serialization: Tips, Tricks, and Techniques

I returned to the Hilton San Francisco for the last two sessions of JavaOne 2011. The first of these was Steve Poole's (IBM) "Serialization: Tips, Tricks, and Techniques" (24606) in Golden Gate 6/7/8. This fit with one of the "themes" of the sessions I attended this year: a focus on core Java principles such as class loading, HotSpot JVM performance tuning, JVM bytecode, and garbage collection. This was probably the biggest "theme" of my session choices with the other most common theme being alternative JVM languages. Since the beginning of Java, Java serialization has been the source of some of Java's greatest mysteries and secrets and has provided easy fodder for interview questions. No further evidence of the trickiness of Java serialization is needed than the fact that Josh Bloch devotes an entire chapter of Effective Java (Chapter 11 in the Second Edition) to five items covering its use.

Poole started by talking about IBM's Java Technology Edition Version 7 (IBM's Java 7 JVM) which was made generally available on 19 September 2011. Poole has been the IBM representative on JSR 270 (Java SE 6) and JSR 337 (Java SE 8). His "recent work focus" has been IBM and OpenJDK and IBM's Java 7 implementation.

Poole had a mostly filled room and he observed that "he wouldn't be here" for the last presentation of the conference (in this room). He stated his talk would cover "basic use and abuse of serialization as personified by 'implements Serializable'." Poole stated this his presentation focuses on "traditional serialization," but that much of it is equally applicable to "other serialization stream technologies."

Poole made the point that Serialization matters because it "underpins" numerous technologies such as RMI, EJB, and JPA and "is becoming more important in Cloud-related technologies." His slide provided a basic definition of serialization: "Serialization converts between data held in object graphs and a linear stream of bytes." The same slide provided an important overall piece of advice: doing serialization first is much easier than retrofitting it on later. Poole covered the basic requirements of a Serializable class.

Poole talked about the basic methods used in the act of serializing and deserializing. He pointed out that the readObject method cannot read in data that was never written out (writeObject method). Poole recommended treating readObject() like a constructor, which makes sense, of course, because both approaches create instances of a given class. This means, for example, that one should not call overridable methods on the same class from a class's readObject method because object construction may still be ongoing. Poole's advice is, "Don't think of readObject() as a method, but think of it as a constructor."

Poole also warned that every bit of data brought in via readObject is valid. All incoming data should be treated as untrusted and not valid until verified. Poole stated that he saw numerous problems resulting from assuming good data on deserialization. You cannot assume that any data being deserialized was created by the corresponding and compliant writeObject() method.

Poole showed problems switching from a primitive boolean to a reference type Boolean in terms of serialization and de-serialization. He talked about serialVersionUID. Poole referenced Bloch and introduced the "Serialization Proxy Pattern," which he described, "The Serialization Proxy Pattern provides a stand-in object that gets serialized instead of the main object." This approach separates business data objects from the objects responsible for serialization and deserialization. The best way to avoid these issues is to stay out of the Serialization business altogether ("opt out"). When that's not possible, the Serialization Proxy Pattern allows one to at least decouple Serialization issues from business data classes. Poole said another benefit of the Serialization Proxy Pattern is related to dealing with versions.

An important observation that Poole made is "'implements Serializable' and being serializable are not always the same thing." He recommends use of tools to help, including FindBugs, appropriate Javadoc tags (@serial, @serialField, and @serialData), and specifying serialVersionUID. Poole added the recommendation to ensure that an object's serialized data is consistent via writeObject() synchronization and use of 'transient' to eliminate duplication. He also recommended defending against synchronization deadlocks. Poole stated that it is helpful to serialize as little as possible and there are techniques for doing this. One is to use "short field names" (an advantage of separating serialization from business data with Serialization Proxy Pattern!) and not serializing duplicate/derived data.

A Poole concluding slide that "having a good understanding of the art of Serialization is a necessity - even if you never intend to use it." He warned that Seriallization will continue to be of "importance to Java technology directions."

3 comments:

Sridhar said...

I was one of the attendee to this year's (2011) JavaOne conference. And I attended to this session. You provided a complete recap of Steve's presentation. Very well appreciated. Thanks

@DustinMarx said...

Sridhar,

Thanks for taking the time to share the nice words. It was difficult typing, thinking, and listening at the same time, so I'm glad to hear that someone else who was there thought I hit the main points.

Dustin

dharmendra said...

Great Explanation. Another great article i recommend is:

this