Monday, 21 September 2015

How HashSet works internally in Java

Before going into internal working of Hashset it is important to know two points about HashSet.

  • HashSet only stores unique values i.e. no duplicates are allowed.
  • HashSet works on the concept of hashing too just like HashMap but how it differs from the HashMap is; in HashMap a Key, Value pair is added and the hash function is calculated using key. Where as in the HashSet hash function is calculated using the value itself. Note that in HashSet we have add(E e) method which takes just the element to be added as parameter.
    Also you may have guessed by now, since hash function is calculated using value that is why only unique values are stored in the HashSet. If you try to store the same element again, the calculated hash function would be same, thus the element will be overwritten.

Now coming back to internal working of HashSet, which is also an important Java Collections interview question, the most important point is HashSet internally uses HashMap to store it's objects. Within the HashSet there are many constructors one without any parameter and several more with initial capacity or load factor but every one of these constructor creates a HashMap. If you know how HashMap works internally in Java it is easy to understand how HashSet works internally.

HashSet Constructor examples

/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
    map = new HashMap<>();
}
public HashSet(int initialCapacity, float loadFactor) {
    map = new HashMap<>(initialCapacity, loadFactor);
}

And the map, which is used for storing values, is defined as

private transient HashMap<E,Object> map;

In the constructor, if you have noticed, there are parameters named initial capacity and load factor. For HashSet, default initial capacity is 16, that is an array (or bucket) of length 16 would be created and default load factor is 0.75. Where load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.

How elements are added in HashSet

I have stated in the point 2 above that HashSet calculates the hash function using value itself and there is no (Key, Value) pair in HashSet and then comes the point that HashSet internally uses HashMap to store objects. These two statements may sound contradictory but if you know How HashMap internally works in Java it will be easy for you to understand how both of these these two contradictory statements hold true.

Actually in HashSet, from add method, put method of HashMap is called where the value which has to be added in the Set becomes Key and a constant object "PRESENT" is used as value.

That's how PRESENT is defined in HashSet implementation -

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

And that's how add method is implemented in HashSet class -

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
} 

One thing to note here is, in HashMap value may be duplicate but Key should be unique. That's how HashSet makes sure that only unique values are stored in it, since the value which is to be stored in the HashSet becomes the key while storing it in HashMap.

How Object is removed from a HashSet

When we need to remove an element from the HashSet, internally again remove method of HashSet calls remove(Object key) method of the HashMap.

That is how it is implemented in HashSet class.

public boolean remove(Object o) {
    return map.remove(o)==PRESENT;
}

Here note that remove(Object key) method of the HashMap returns the Value associated with the key. Whereas the remove(Object o) method of the HashSet returns Boolean value. Also we know that for every value added in HashSet that value becomes Key and the value is always an object called PRESENT. Therefore the value that is returned from the remove(Object key) method of the HashMap is always PRESENT thus the condition map.remove(o)==PRESENT.

How Object is retrieved from HashSet

In HashSet there is no get method as provided in Map or List. In HashSet iterator is there which will iterate through the values of the Set. Internally it will call the keyset of the HashMap, as values are stored as keys in the HashMap so what we'll get is the values stored in the HashSet.

That's how iterator is implemented in the HashSet

/**
* Returns an iterator over the elements in this set.  The elements
* are returned in no particular order.
*
* @return an Iterator over the elements in this set
* @see ConcurrentModificationException
*/
public Iterator<E> iterator() {
    return map.keySet().iterator();
}

That's all for this topic How HashSet internally works in Java. If you have any doubt or any suggestions to make please drop a comment. Thanks!


Related Topics

  1. How ArrayList works internally in Java
  2. How HashMap internally works in Java
  3. How Linked List class works internally in Java
  4. LinkedHashSet in Java
  5. TreeSet in Java
  6. EnumSet in Java

You may also like -

6 comments:

  1. very nicely explained..kudos !!

    ReplyDelete
  2. Nice explanation. Thanks for the post.

    ReplyDelete
  3. Here is the doubt, you said "HashSet, from add method, put method of HashMap is called where the value which has to be added in the Set becomes Key and a constant object PRESENT is used as value", but in Hashmap it works on hashing concept change it key into hashcode and save it in a bucket, we can also have same hashcode for two different keys in hashmap but in hashmap if 2 different key has same hashmap, then it use equals() method to find exact value, what in case of HashSet if we have same hashcode for 2 different values even if it try to use equals() method, it save PRESENT as value.
    what will happen in this case. Could you please clarify.

    ReplyDelete
    Replies
    1. Well there is no get() method in HashSet so your scenario will never occur, please read "How Object is retrieved from HashSet" section of the post. All you can get is a iterator which will call keyset of the HashMap so you will get set of keys in the HashMap which incidentally means elements of the HashSet.

      Delete