Monday, February 1, 2010

Basic Java hashCode and equals Demonstrations

I often like to use this blog to revisit hard-earned lessons in the basics of Java. This blog post is one such example and focuses on illustration of the dangerous power behind the equals(Object) and hashCode() methods. I won't cover every nuance of these two highly significant methods that all Java objects have whether explicitly declared or implicitly inherited from a parent (possibly directly from Object itself), but I will cover some of the common issues that do arise when these are not implemented or are not implemented correctly. I also attempt to show by these demonstrations why it is important for careful code reviews, thorough unit testing, and/or tool-based analysis to verify the correctness of these methods' implementations.

Because all Java objects ultimately inherit implementations for equals(Object) and hashCode(), the Java compiler and indeed the Java runtime launcher will report no problem when invoking these "default implementations" of these methods. Unfortunately, when these methods are needed, the default implementations of these methods (like their cousin the toString method) are rarely what are desired. The Javadoc-based API documentation for the Object class discusses the "contract" expected of any implementation of the equals(Object) and hashCode() methods and also discusses the likely default implementation of each if not overridden by child classes.

For the examples in this post, I'll be using the HashAndEquals class whose code listing is shown next to process object instantiations of various Person classes with differing levels of support for hashCode and equals methods.

HashAndEquals.java
package dustin.examples;

import java.util.HashSet;
import java.util.Set;
import static java.lang.System.out;

public class HashAndEquals
{
private static final String HEADER_SEPARATOR =
"======================================================================";

private static final int HEADER_SEPARATOR_LENGTH = HEADER_SEPARATOR.length();

private static final String NEW_LINE = System.getProperty("line.separator");

private final Person person1 = new Person("Flintstone", "Fred");
private final Person person2 = new Person("Rubble", "Barney");
private final Person person3 = new Person("Flintstone", "Fred");
private final Person person4 = new Person("Rubble", "Barney");

public void displayContents()
{
printHeader("THE CONTENTS OF THE OBJECTS");
out.println("Person 1: " + person1);
out.println("Person 2: " + person2);
out.println("Person 3: " + person3);
out.println("Person 4: " + person4);
}

public void compareEquality()
{
printHeader("EQUALITY COMPARISONS");
out.println("Person1.equals(Person2): " + person1.equals(person2));
out.println("Person1.equals(Person3): " + person1.equals(person3));
out.println("Person2.equals(Person4): " + person2.equals(person4));
}

public void compareHashCodes()
{
printHeader("COMPARE HASH CODES");
out.println("Person1.hashCode(): " + person1.hashCode());
out.println("Person2.hashCode(): " + person2.hashCode());
out.println("Person3.hashCode(): " + person3.hashCode());
out.println("Person4.hashCode(): " + person4.hashCode());
}

public Set addToHashSet()
{
printHeader("ADD ELEMENTS TO SET - ARE THEY ADDED OR THE SAME?");
final Set set = new HashSet();
out.println("Set.add(Person1): " + set.add(person1));
out.println("Set.add(Person2): " + set.add(person2));
out.println("Set.add(Person3): " + set.add(person3));
out.println("Set.add(Person4): " + set.add(person4));
return set;
}

public void removeFromHashSet(final Set sourceSet)
{
printHeader("REMOVE ELEMENTS FROM SET - CAN THEY BE FOUND TO BE REMOVED?");
out.println("Set.remove(Person1): " + sourceSet.remove(person1));
out.println("Set.remove(Person2): " + sourceSet.remove(person2));
out.println("Set.remove(Person3): " + sourceSet.remove(person3));
out.println("Set.remove(Person4): " + sourceSet.remove(person4));
}

public static void printHeader(final String headerText)
{
out.println(NEW_LINE);
out.println(HEADER_SEPARATOR);
out.println("= " + headerText);
out.println(HEADER_SEPARATOR);
}

public static void main(final String[] arguments)
{
final HashAndEquals instance = new HashAndEquals();
instance.displayContents();
instance.compareEquality();
instance.compareHashCodes();
final Set set = instance.addToHashSet();
out.println("Set Before Removals: " + set);
//instance.person1.setFirstName("Bam Bam");
instance.removeFromHashSet(set);
out.println("Set After Removals: " + set);
}
}


The class above will be used as-is repeatedly with only one minor change later in the post. However, the Person class will be changed to reflect the importance of equals and hashCode and to demonstrate how easily it can be to mess these up while at the same time being difficult to track down the problem when there is a mistake.


No Explicit equals or hashCode Methods

The first version of the Person class does not provide an explicit overridden version of either the equals method or the hashCode method. This will demonstrate the "default implementation" of each of these methods inherited from Object. Here is the source code for Person without hashCode or equals explicitly overridden.

Person.java (no explicit hashCode or equals method)
package dustin.examples;

public class Person
{
private final String lastName;
private final String firstName;

public Person(final String newLastName, final String newFirstName)
{
this.lastName = newLastName;
this.firstName = newFirstName;
}

@Override
public String toString()
{
return this.firstName + " " + this.lastName;
}
}


This first version of Person does not provide get/set methods and does not provide equals or hashCode implementations. When the main demonstration class HashAndEquals is executed with instances of this equals-less and hashCode-less Person class, the results appear as shown in the next screen snapshot.



Several observations can be made from the output shown above. First, without explicit implementation of an equals(Object) method, none of the instances of Person are considered equal, even when all attributes of the instances (the two Strings) are identical. This is because, as is explained in the documentation for Object.equals(Object), the default equals implementation is based on an exact reference match:

The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value true).


A second observation from this first example is that the hash code is different for each instance of the Person object even when two instances share the same values for all of their attributes. The HashSet returns true when a "unique" object is added (HashSet.add) to the set or false if the added object is not considered unique and so is not added. Similarly, the HashSet's remove method returns true if the provided object is considered found and removed or false if the specified object is considered to not be part of the HashSet and so cannot be removed. Because the equals and hashCode inherited default methods treat these instances as completely different, it is no surprise that all are added to the set and all are successfully removed from the set.


Explicit equals Method Only

The second version of the Person class includes an explicitly overridden equals method as shown in the next code listing.

Person.java (explicit equals method provided)
package dustin.examples;

public class Person
{
private final String lastName;
private final String firstName;

public Person(final String newLastName, final String newFirstName)
{
this.lastName = newLastName;
this.firstName = newFirstName;
}

@Override
public boolean equals(Object obj)
{
if (obj == null)
{
return false;
}
if (this == obj)
{
return true;
}
if (this.getClass() != obj.getClass())
{
return false;
}
final Person other = (Person) obj; 
if (this.lastName == null ? other.lastName != null : !this.lastName.equals(other.lastName))
{
return false;
}
if (this.firstName == null ? other.firstName != null : !this.firstName.equals(other.firstName))
{
return false;
}
return true;
}

@Override
public String toString()
{
return this.firstName + " " + this.lastName;
}
}


When instances of this Person with equals(Object) explicitly defined are used, the output is as shown in the next screen snapshot.



The first observation is that now the equals calls on the Person instances do indeed return true when the object is equal in terms of all attributes being the same rather than checking for a strict reference equality. This demonstrates that the custom equals implementation on Person has done its job. The second observation is that implementation of the equals method has had no effect on the ability to add and remove the seemingly same object to the HashSet.


Explicit equals and hashCode Methods

It is now time to add an explicit hashCode() method to the Person class. Indeed, this really should have been done when the equals method was implemented. The reason for this is stated in the documentation for the Object.equals(Object) method:

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.


Here is Person with an explicitly implemented hashCode method based on the same attributes of Person as the equals method.

Person.java (explicit equals and hashCode implementations)
package dustin.examples;

public class Person
{
private final String lastName;
private final String firstName;

public Person(final String newLastName, final String newFirstName)
{
this.lastName = newLastName;
this.firstName = newFirstName;
}

@Override
public int hashCode()
{
return lastName.hashCode() + firstName.hashCode();
}

@Override
public boolean equals(Object obj)
{
if (obj == null)
{
return false;
}
if (this == obj)
{
return true;
}
if (this.getClass() != obj.getClass())
{
return false;
}
final Person other = (Person) obj; 
if (this.lastName == null ? other.lastName != null : !this.lastName.equals(other.lastName))
{
return false;
}
if (this.firstName == null ? other.firstName != null : !this.firstName.equals(other.firstName))
{
return false;
}
return true;
}

@Override
public String toString()
{
return this.firstName + " " + this.lastName;
}
}


The output from running with the new Person class with hashCode and equals methods is shown next.



It is not surprising that the hash codes returned for objects with the same attributes' values are now the same, but the more interesting observation is that we can only add two of the four instances to the HashSet now. This is because the third and fourth add attempts are considered to be attempting to add an object that was already added to the set. Because there were only two added, only two can be found and removed.


The Trouble with Mutable hashCode Attributes

For the fourth and final example in this post, I look at what happens when the hashCode implementation is based on an attribute that changes. For this example, a setFirstName method is added to Person and the final modifier is removed from its firstName attribute. In addition, the main HashAndEquals class needs to have the comment removed from the line that invokes this new set method. The new version of Person is shown next.

package dustin.examples;

public class Person
{
private final String lastName;
private String firstName;

public Person(final String newLastName, final String newFirstName)
{
this.lastName = newLastName;
this.firstName = newFirstName;
}

@Override
public int hashCode()
{
return lastName.hashCode() + firstName.hashCode();
}

public void setFirstName(final String newFirstName)
{
this.firstName = newFirstName;
}

@Override
public boolean equals(Object obj)
{
if (obj == null)
{
return false;
}
if (this == obj)
{
return true;
}
if (this.getClass() != obj.getClass())
{
return false;
}
final Person other = (Person) obj; 
if (this.lastName == null ? other.lastName != null : !this.lastName.equals(other.lastName))
{
return false;
}
if (this.firstName == null ? other.firstName != null : !this.firstName.equals(other.firstName))
{
return false;
}
return true;
}

@Override
public String toString()
{
return this.firstName + " " + this.lastName;
}
}


Output generated from running this example is shown next.



The most interesting observation in this example is that although two instances get added to the set, only one gets removed. This is because one of the attributes upon which the hash code is based, first name, changes in between adding the object to the HashSet and attempting to remove the same object (albeit with a changed first name attribute) from the same HashSet. This illustrates the importance of implementing hashcode (and by extension, equals) on immutable values. More details regarding this can be found in blog posts HashSet.contains(): does your busket contain something? (original location) and Back to hashCode Mutability (original location).


Detecting Problems Related to hashCode and equals Implementations

As the examples in this post demonstrate, different behaviors occur depending on how these methods are implemented, but none of them involve an obvious error or warning. This can make it difficult to track down seemingly inconsistent or strange behavior. The best way to address this is with careful implementation of these methods, careful reviews of these important methods, and thorough testing. Another useful tactic is to avoid mindless creation of unnecessary "set" methods for all of a class's data attributes. If a "set" method is truly appropriate, that attribute should not be used in the implementation of equals or hashCode.


Conclusion

It is almost never appropriate to rely on an object's default implementation of hashCode and equals as inherited from Object. Furthermore, the equals and hashCode methods should be implemented to the contract advertised in the Javadoc documentation (only a small part of which was covered here) and should not be based on any attributes that will be changed (mutable) during the lifecycle of the instance.

3 comments:

Boris Kirzner said...

Hi Dustin

Thanks for referencing, I'm glad my four-year old thoughts are still useful for the people.

Since than I've migrated my blog to another location. Would you mind referencing the posts from the new location (HashSet.contains() and hashCode mutability)?

Thanks,
Boris

Dustin said...

Boris,

I have updated those links to your blog's newer location. Thanks for sending me the updated links.

Dustin

buddy said...

Nice article but I believe its important to understand the consequences of not following this contract as well and for that its important to understand application of hashcode in collection classes e.g. How HashMap works in Java and how hashcode() of key is used to insert and retrieve object from hashMap.

Javin