Monday, December 13, 2010

Java toString() Considerations

Even beginning Java developers are aware of the utility of the Object.toString() method that is available to all instances of Java classes and can be overridden to provide useful details regarding any particular instance of a Java class. Unfortunately, even seasoned Java developers occasionally don't take full advantage of this powerful Java feature for a variety of reasons. In this blog post, I look at the humble Java toString() and describe easy steps one can take to improve the utilitarianism of toString().


Explicitly Implement (Override) toString()

Perhaps the most important consideration related to achieving maximum value from toString() is to provide implementations of them. Although the root of all Java class hierarchies, Object, does provide a toString() implementation that is available to all Java classes, this method's default behavior is almost never useful. The Javadoc for Object.toString() explains what is provided by default for toString() when a custom version is not provided for a class:
The toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character `@', and the unsigned hexadecimal representation of the hash code of the object. In other words, this method returns a string equal to the value of:
getClass().getName() + '@' + Integer.toHexString(hashCode())
It is difficult to come up with a situation in which the name of the class and the hexadecimal representation of the object's hash code separated by the @ sign is useful. In almost all cases, it is significantly more useful to provide a custom, explicit toString() implementation in a class to override this default version.

The Javadoc for Object.toString() also tells us what a toString() implementation generally should entail and also makes the same recommendation I'm making here: override toString():
In general, the toString method returns a string that "textually represents" this object. The result should be a concise but informative representation that is easy for a person to read. It is recommended that all subclasses override this method.
Whenever I write a new class, there are several methods I consider adding as part of the act of new class creation. These include hashCode() and equals(Object) if appropriate. However, in my experience and in my opinion, implementing explicit toString() is always appropriate.

If the Javadoc "recommendation" that "all subclasses override this method" is not enough (then I don't presume that my recommendation is either) to justify to a Java developer the importance and value of an explicit toString() method, then I recommend reviewing Josh Bloch's Effective Java Item "Always Override toString" for additional background on the importance of implementing toString(). It is my opinion that all Java developers should own a copy of Effective Java, but fortunately the chapter with this item on toString() is available for those who don't own a copy: Methods Common to All Objects.


Maintain/Update toString()

It is frustrating to explicitly or implicitly call an object's toString() in a log statement or other diagnostic tool and have the default class name and object's hexidecimal hash code returned rather than something more useful and readable. It is almost as frustrating to have an incomplete toString() implementation that does not include significant pieces of the object's current characteristics and state. I try to be disciplined enough and generate and follow the habit of always reviewing the toString() implementation along with reviewing the equals(Object) and hashCode() implementations of any class I work on that is new to me or whenever I am adding to or changing the attributes of a class.


Just the Facts (But All/Most of Them!)

In the chapter of Effective Java previously mentioned, Bloch writes, "When practical, the toString method should return all of the interesting information contained in the object." It can be painful and tedious to add all attributes of an attribute-heavy class to its toString implementation, but the value to those trying to debug and diagnose issues related to that class will be worth the effort. I typically strive to have all significant non-null attributes of my instance in the generated String representation (and sometimes include the fact that some attributes are null). I also typically add minimal identifying text for the attributes. It's in many ways more of an art than a science, but I try to include enough text to differentiate attributes without bludgeoning future developers with too much detail. The most important thing to me is to get the attributes' values and some type of identifying key in place.


Know Thy Audience

One of the most common mistakes I've seen non-beginner Java developers make with regards to toString() is forgetting what and who the toString() is typically intended for. In general, toString() is a diagnostic and debugging tool that makes it easy to log details on a particular instance at a particular time for later debugging and diagnostics. It is typically a mistake to have user interfaces display String representations generated by toString() or to make logic decisions based on a toString() representation (in fact, making logic decisions on any String is fragile!). I have seen well-meaning developers have toString() return XML format for use in some other XML-friendly aspect of code. Another significant mistake is to force clients to parse the String returned from toString() in order to programatically access data members. It is probably better to provide a public getter/accessor method than to rely on the toString() never changing. All of these are mistakes because these approaches forget the intention of a toString() implementation. This is especially insidious if the developer removes important characteristics from the toString() method (see last item) to make it look better on a user interface.

I like toString() implementations to have all the pertinent details and to provide minimal formatting to make these details more palatable. This formatting might include judiciously selected new line characters [System.getProperty("line.seperator");] and tabs, colons, semicolons, etc. I don't invest the same amount of time as I would in a result presented to an end-user of the software, but I do try to make the formatting nice of enough to be more readable. I try to implement toString() methods that are not overly complicated or expensive to maintain, but that provide some very simple formatting. I try to treat future maintainers of my code as I would like to be treated by developers whose code I will one day maintain.

In his item on toString() implementation, Bloch states that a developer should choose whether or not to have the toString() return a specific format. If a specific format is intended, that should be documented in the Javadoc comments and Bloch further recommends that a static initializer be provided that can return the object to its instance characteristics based on a String generated by the toString(). I agree with all of this, but I believe this is more trouble than most developers are willing to go. Bloch also points out that any changes to this format in future releases will cause pain and angst for people depending on it (which is why I don't think it's a good idea to have logic depend on a toString() output). With significant discipline to write and maintain appropriate documentation, having a predefined format for a toString() might be plausible. However, it seems like trouble to me and better to simply create a new and separate method for such uses and leave the toString() unencumbered.


No Side Effects Tolerated

As important as the toString() implementation is, it is generally unacceptable (and certainly considered bad form) to have the explicit or implicit calling of toString() impact logic or lead to exceptions or logic problems. The author of a toString() method should be careful to ensure that references are checked for null before accessing them to avoid a NullPointerException. Many of the tactics I described in the post Effective Java NullPointerException Handling can be used in toString() implementation. For example, String.valueOf(Object) provides an easy mechanism for null safety on attributes of questionable origin.

It is similarly important for the toString() developer to check array sizes and other collection sizes before trying to access elements outside of that collection. In particular, it is all too easy to run into a StringIndexOutOfBoundsException when trying to manipulate String values with String.substring.

Because an object's toString() implementation can easily be invoked without the developer conscientiously realizing it, this advice to make sure it doesn't throw exceptions or perform logic (especially state-changing logic) is especially important. The last thing anyone wants is to have the act of logging an instance's current state lead to an exception or change in state and behavior. A toString() implementation should effectively be a read-only operation in which object state is read to generate a String for return. If any attributes are changed in the process, bad things are likely to happen at unpredictable times.

It is my position that a toString() implementation should only include state in the generated String that is accessible in the same process space at the time of its generation. To me, it's not defensible to have a toString() implementation access remote services to build up an instance's String. Perhaps a little less obvious is that an instance should not populate data attributes because toString() was called. The toString() implementation should only report on how things are in the current instance and not on how they might be or will be in the future if certain different scenarios occur or if things are loaded. To be effective in debugging and diagnostics, the toString() needs to show how conditions are and not how they could be.


Simple Formatting Is Sincerely Appreciated

As described above, judicious use of line separators and tabs can be useful in making lengthy and complex instances more palatable when generated in String format. There are other "tricks" that can make things nicer. Not only does String.valueOf(Object) provide some null protection, but it also presents null as the String "null" (which is often the preferred representation of null in a toString()-generated String. Arrays.toString(Object) is useful for easily representing arrays as Strings (see my post Stringifying Java Arrays for additional details).


Include Class Name in toString Representation

As described above, the default implementation of toString() provides the class name as part of the instance's representation. When we explicitly override this, we potentially lose this class name. This is not a big deal typically if logging an instance's string because the logging framework will include class name. However, I prefer to be on the safe side and always have the class name available. I don't care about keeping the hexadecimal hash code representation from the default toString(), but class name can be useful. A good example of this is Throwable.toString(). I prefer to use that method rather than getMessage or getLocalizedMessage because the former (toString()) does include the Throwable's class name while the latter two methods don't.


toString() Alternatives

We don't currently have this (at least not a standard approach), but there has been talk of an Objects class in Java that would go a long toward safely and usefully preparing String representations of various objects, data structures, and collections. I have not heard of any recent progress in JDK7 on this class. A standard class in the JDK that provided String representation of objects even when the objects' class definitions did not provide an explicit toString() would be helpful.

The Apache Commons ToStringBuilder may be the most popular solution for building safe toString() implementations with some basic formatting controls. I have blogged on ToStringBuilder previously and there are numerous other online resources regarding use of ToStringBuilder.

Glen McCluskey's Java Technology Tech Tip "Writing toString Methods" provides additional details about how to write a good toString() method. In one of the reader comments, Giovanni Pelosi states a preference for delegating production of a string representation of an instance within inheritance hierarchies from the toString() to a delegate class built for that purpose.


Conclusion

I think most Java developers acknowledge the value of good toString() implementations. Unfortunately, these implementations are not always as good or useful as they could be. In this post I have attempted to outline some considerations for improving toString() implementations. Although a toString() method won't (or at least shouldn't) impact logic like an equals(Object) or hashCode() method can, it can improve debugging and diagnostic efficiency. Less time spent figuring out what the object's state is means more time fixing the problem, moving onto more interesting challenges, and satisfying more client needs.

2 comments:

@DustinMarx said...

toString or not toString is a recent post that talks about the virtues of overriding toString() on classes.

Dustin

@DustinMarx said...

An interesting java subreddit discussion provides another example of the dangers of having toString() implementations change state or generate side effects.