Consequences when using Mutable Fields in hashCode()

We start our new series with an informative HashSet puzzler. It’s about a bug that gave us quite a headache since its root cause was hard to identify. This subtle bug has without doubt crept into many code bases, so a detailed discussion should interest every Java coder. I will also discuss code inspection tools that detect this violation (sadly, only few). And by the way, what we learned about HashSet also makes a good topic in our job interviews.

The starting point was that we had implemented a class with some fields and needed quick access to its instances. This use case was pretty performance-critical, so we took a HashSet to maintain the class instances. As the instance count was high, we needed a broad distribution of possible hash-values. We used the IDE to generate code for the hashCode() and equals() method. Everything should have been fine with this generated code — at least we thought so. However, our application showed some unexpected results and it took us quite a long time to nail down the root cause.

Here’s a simplified example of code using a HashSet of PhoneNumber instances:

// Class PhoneNumber implements hashCode() and equals()
PhoneNumber obj = new PhoneNumber("mgm", "089/358680");
System.out.println("Hashcode: " +
	obj.hashCode());  //prints "1476725853"

// Add PhoneNumber object to HashSet
Set<PhoneNumber> set = new HashSet();
set.add(obj);

// Modify object after it has been inserted
obj.setNumber("089/358680-0");

// Modification causes a different hash value
System.out.println("New hashcode: " +
	obj.hashCode()); //prints "7130851"

// ... Later or in another class, code such as the following
// is operating on the Set:

// Unexpected Result!
// Output: obj is set member: FALSE
System.out.println("obj is set member: " +
	set.contains(obj));

// Even stranger unexpected Result!
// Output: obj is set member: FALSE
for (PhoneNumber p : set) {
	if (p.equals(obj)) {
		System.out.println("obj is set member: " +
			set.contains(p));
	}
}

Executing the code above surprisingly produces the following output:

obj is set member: FALSE
obj is set member: FALSE

Obviously, what we would expect is a “TRUE”, since obj has been inserted into the HashSet.

What just happened?

The unexpected result from the code above is caused by a trap in the JDK Collections framework into which many developers have fallen: If an implementation of hashCode() uses mutable fields to calculate the value, HashSet.contains() produces unexpected results, i.e. your object seems to be not a member of the set.

For an illustration, let’s look at the class PhoneNumber and its mutable field “number”:

public class PhoneNumber {

    private final String name;
    private String number;

    public PhoneNumber(String number, String name) {
        this.number = number;
        this.name = name;
    }

	// Setter makes "number" mutable!
    public void setNumber(String number) {
        this.number = number;
    }

    @Override
    public int hashCode() {
        int result = name != null ? name.hashCode() : 0;
        result = 31 * result +
			(number != null ? number.hashCode() : 0);
        return result;
    }

	// equals() left out here ...
}

What’s wrong with this class? Well, it’s a bad idea to use mutable fields in hashCode() when its instances are put into a HashSet (or as keys into a HashMap). In general, any hash-based collection is problematic. See also “HashSet contains problem with custom objects” (Stackoverflow) and “hashCode() pitfalls with HashSet and HashMap”.

HashSet.contains() surprisingly uses hashCode()

Part of our problem to spot our bug was that the HashSet.contains() method relies on hash values to stay immutable. Unfortunately, this is not stated explicitly in the HashSet JavaDoc, which only mentions “…returns true if and only if this set contains an element e such that (o==null ? e==null : o.equals(e)). Actually, this is the same description as the JavaDoc of Set.contains().

A conscientious reader may also find the following note in the JavaDoc of the Set interface, which only mentions equals():

“Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.”

By the way, the following Sun JDK bug was reported quite some time ago: “(coll) HashSet.contains method violates Set.contains() contract”. The bug is approved (but not fixed) and the last comment was made in late 2007.

Properly coding the hashCode() method

The contract of hashCode() is explained in the JavaDoc of Object. You will also find hints on proper implementations in the very interesting book “Effective Java” from Joshua Bloch. The book covers many interesting topics and especially the items 7 and 8 shed light on good equals() and hashCode() implementation practices. Another book by the same author, “Java Puzzlers”, also contains two puzzlers on this problematic area: specifically, puzzlers 57 and 58 show how these two methods depend on each other.

There are many more discussions about the problems that can occur with hashCode(). For example, in his Google TechTalks presentation “How To Design A Good API and Why it Matters”, Joshua Bloch says that hashCode() is an implementation detail that should not have leaked into the Java API at all (at about 27:30 min).

And don’t forget the lesson learned here: using mutable fields in hashCode() is a recipe for disaster. And disaster strikes when instances of this class are put in a hash-based collection like HashSet or HashMap (as map keys).

Please note that since code usually uses only the respective collection interfaces, e.g. Set and Map, you might not even know about it (as in our case). Or you use a module or library that stores your objects in a collection internally as an implementation detail that’s hidden from you.

Don’t rely on automatic hashCode() Generation

Even with coding rules in mind, a hashCode() implementation that uses mutable fields creeps into our code base faster than you can spell “bug”. This is because developers are reluctant to write the long-winded calculations in the hashCode() methods manually and often generate them with the help of the IDE, as shown in the screenshot below. But it’s just too easy for a developer to press “Generate” without first checking the specific fields that can be included and leaving the mutable fields out. Of all IDEs I tested only NetBeans at least has all fields unchecked, which forces the developer to select them on purpose.

Modern IDEs provide automatic generation of hashCode(). Eclipse and Intellij IDEA by default include all mutable fields.

Code Inspection Tools to the Rescue?

You might wonder if classes of your code base contain an hashCode() implementation that uses mutable fields. One option (besides a manual code review) is using a code inspection tool. Unfortunately, the prominent open source tools like FindBugs, PMD, CheckStyle do not offer such a built-in inspection.

Only IntelliJ IDEA has a built-in code inspection that detects the use of mutable (actually non-final) fields in hashCode().

The only tool support I found was Intellij IDEA. This IDE provides a code inspection named “Non-final field referenced in ‘hashCode()’”. Any violation is highlighted as shown in the screenshot above.

Share

Leave a Reply

You must be logged in to post a comment.

12 Responses to “Consequences when using Mutable Fields in hashCode()”

  1. It’s a bad idea to use mutable *objects* in a HashSet. That’s the bug — it’s not a flaw in JDK’s HashSet, but a flaw in the person who thought it was ever a good idea to have a Set of mutable elements.

  2. Jake says:

    What is the implementation of your equals() code exactly?

    Does it compare on name and number? From your code, it doesn’t look like it does. Are you aware that by violating the equals()/hashCode() contract, you break basic assumptions that the Java language requires of your objects?

    Because if your hashCode return value changes, the object should no longer be equal() to itself. And that violates the *exact* Set behavior that you cited.

    HashSet is not violating the Set.contains() contract. You are violating the Object contract.

    If you’re going to rules-lawyer, make sure you’re on the right side of the rules.

  3. Eivind Eklund says:

    This is a perfectly valid implementation of HashSet, and the one I would expect.

    HashSet is relying on the contract of hashCode(); if you violate the contract of hashCode(), then you should expect random things to break. There is an implicit condition in contracts in Java programming: “Assuming the things accessed follow their contract”. Sometimes code will explicitly guard against contract violations in the callees, but you should not rely on that unless it is documented.

  4. Ulrich Schrempp says:

    Thanks for your comments.
    You’re right, these hashCode()/equals() implementations are wrong.

    The title of this post was changed, because it was a bit misleading. Our intent is to show how easily this error can happen and which consequences may occur at runtime, when using such objects.

  5. vytah says:

    The same happens when you use TreeSet and you use mutable fields in compareTo method.

    Anyone who knows how HashSet and TreeSet work shouldn’t be surprised by this.

  6. Gary Trakhman says:

    Well, if you think about it conceptually, when you use something as a key to a hash table, it’s hash bucket index will be computed from the state at the time. If there are collisions it will grow the linked-list attached to each index. On a .contains call, it will hash on the current value of your object, then check along the linked-list for equality. It may hash to a different bucket the second time and skip the list altogether, sporadically there might be a collision and things work out alright :-) . Another common case I imagine is when two different objects should be equal, but another thread mutates the object in the map. Use immutable keys!

  7. banuday says:

    Actually, the HashSet.contains() follows the Set.contains() contract. The parameter and the item contained in the set must satisfy equals() equivalence (but the contract does not necessarily specify that equals() must used for the comparison). Now, if you go to the Object.hashCode()contract, you’ll see two objects that are equals() equivalent must have the same hashCode(). Thus, HashSet.contains() can legitimately use hashCode() for the look up. The only way that HashSet.contains() could fail for objects that are equals() equivalent is if hashCode() is improperly implemented, which is not the fault of HashSet’s implementation.

  8. thSoft says:

    Your blog post is helpful and informative, but I must ask with all respects: seriously, why on Earth would anyone use mutable phone numbers?! A phone number is a value, it obviously can’t change.

  9. Sven says:

    @Eivind:
    Which of the following rules is broken by the given implementation:
    - Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
    - If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
    - It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

    I can’t find any.

  10. Mentallurg says:

    You describe pretty common bug, that many junior developers do. But your analysis has some flaws.

    1. “In general, any hash-based collection is problematic” – that is wrong.

    Let’s consider more simple case, array of Comparable objects. Suppose you have an array of Comparable objects, e.g. orderable by phone number:
    [{"444", "IBM"}, {"555", "MS"}, {"333", "MGM"}]

    You call Arrays.sort(phoneNumbers) and the array becomes sorted ascending. Suppose it looks like follows:
    [{"333", "MGM"}, {"444", "IBM"}, {"555", "MS"}]
    Then you modify the phone number of one of the objects: “555″ -> “111″:
    [{"333", "MGM"}, {"444", "IBM"}, {"111", "MS"}]
    The order is broken.

    If you suppose that the last element has the biggest number, it is wrong now. If you suppose that the first element has the smallest number, it is wrong now.

    Would you say “In general, any Comparable array is problematic”?

    I guess no. The same about hash-based collections. They are not problematic. You just use them not properly and your expectations are wrong.

    2. You describe HashSet.contains() and equals(), but you fail to see the essential contract of hashCode():
    “If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.”

    It is bad that you have not provided your implementation of equals(). Does it use both name and number? Or only name? Or only number? Or none (default from Object)?
    Suppose you use both name and number.

    - If your equals() uses number, than changing the number also changes the behaviour of equals(). Then the behaviour of the set is perfectly CORRECT. Just see the documentation:
    “The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons …”.
    After you modified the number “in a manner that affects equals comparisons”, you CANNOT expect any particular behaviour of any methods, including contains(). It means in particular that contains() can now return ANY value (true/false) for ANY object. Your expectation that contains(obj) will still return true is WRONG.

    - If you equals() does not use ‘number’ (suppose you use ‘name’ only), then you are really allowed to expect that contains(obj) returns true. But in this case you have another error. Namely, your hashCode() violates the contract. If equals() doesn’t use the number attribute and if two objects are equal, they MUST have the same value of the hashCode() -> it means that hasCode() cannot depend on the number -> it means your hashCode() is wrong, it is not allowed to use the number -> if you fix the hashCode() according to the contract then the problem with HashSet disappears too.

    - The same, but shorter:
    if your modification changes hashCode(), then it affects equals() too, (otherwise it would violate the hashCode() contract that equal objects have the same hash code).
    But if equals() changes, then according to the Set contract “The behavior of a set is not specified”.
    It means you cannot expect any particular result from contains(obj).

    Conclusion:
    Your problem is not the HashSet, but your violation of several contracts:
    - You ignore the contract between hashCode() and equals()
    - You ignore the contract that “behavior of a set is not specified” if your modification affects equals()
    - As a consequence of these your errors you erroneously suppose a design flaw in the hash-based collections
    - You don’t see that the same behaviour exists for other implementations like TreeSet

    Basically, you should not modify objects during the “life cycle” of the set. Namely, there should not be cahnges that affect equals().

    Solution:
    I don’t know what problem are you trying to solve. That’s why I don’t have ready solution for you.
    - If you have a collection of companies and they are distinguished by the name, then equals() and hashCode() should only contain name, not a number. Then modifying the number will not cause any problems with Set.
    - May be you need a unique ID (or a primary key in DB) to distinguish the objects
    - May be you need a special kind of collection. Then forget hashCode(), keep in mind equals() only, and answer following question:
    – If one object in the collection is modified and becomes equal to another one, should the collection remove one of duplicates automatically? Which one?
    – If not remove automatically, then how should then work the remove() method? Should it remove all objects that equal to it, or only one of them? Which one?

  11. Swapnil says:

    There is another code inspection tool called “coverity” which traces if mutable fields are being used in hash code calculation and gives an error for that.

  12. [...] nice explanations of using mutable fields in hashCode http://blog.mgm-tp.com/2012/03/hashset-java-puzzler/ and mutable keys in hashmaps Are mutable hashmap keys a dangerous [...]