Race conditions are the bitter truth of concurrent applications. But why would anyone call a race condition nice? The fascination of this race condition lies in its subtlety on the outside, countered by the intricate complexities under the hood. It is probably the reason why people call it a Nice Race Condition.
Despite the disagreement on the naming, this race condition serves as an amazing case study of how seemingly innocent programs can fail in concurrent settings.
In this blog, we will dissect this race condition. We will understand how and why this race condition occurs and what are some ways to eliminate this race condition in our applications.
Let us Start…
Consider the following Java code:
public class NiceRC {
public static void main(String[] args) {
final Map<Object, Object> unsafeMap = new HashMap<>();
final List<Thread> threads = new ArrayList<>();
for (int i = 0; i < 5; ++i) {
threads.add(new Thread(() -> {
for (int j = 0; j < 100000; ++j) {
unsafeMap.put(new Object(), new Object());
}
}));
}
threads.forEach(Thread::start);
threads.forEach(thread -> {
try {
thread.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
});
System.out.println("Size of unsafeMap: " + unsafeMap.size());
}
}
I strongly suggest giving the code above a thorough read before proceeding.
In this code, we start 5 threads and each thread adds around 100k entries to a shared map called unsafeMap
.
If you were to run this code multiple times, it won’t be long before you come across an instance where the code fails to terminate. It goes in some sort of infinite loop! But why? No section of our code could cause an infinite loop.
Turns out there is a race condition at play.
But before talking about the race condition, let us take a refresher on Java’s HashMap. This will be necessary to understand what is happening under the hood!
HashMap is a hash table-based implementation for a Map. A hash function is used to map a key to a bucket/slot from which the desired value can be found by traversing a Linked List.
Following is the definition of the Entry class used to represent nodes of the Linked List inside a HashMap,
static class Entry<K,V> implements Map.Entry<K,V>
{
final K key;
V value;
Entry<K,V> next;
int hash;
}
When the number of key-value pairs kept in a HashMap exceeds a threshold, it needs to resize its internal array and re-hash all the elements. This process is called rehashing or resizing.
Here is a part of the code that performs this resizing:
// Transfer method in java.util.HashMap -
// called to resize the hashmap
for (int j = 0; j < src.length; j++) {
Entry e = src[j];
if (e != null) {
src[j] = null;
do {
Entry next = e.next;
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while (e != null);
}
}
Let us take an example to understand how this rehashing code works.
Consider a map with two buckets and three elements.
At this point, a resizing operation takes place to increase the number of buckets to 4.
All three map elements will be rehashed. Keys A and B will be hashed to Bucket 3, while key C will be hashed to Bucket 1.
After one iteration of the loop, the following will be the state of this map.
Node A is plucked from the original bucket and is added to the front of the new bucket.
After yet another iteration of this loop, we will have node B added to the head of Bucket 3.
With one more iteration, the rehashing process is completed. This is the final state of the map.
With all that knowledge of HashMap’s internals, we are all set to uncover the race condition that plagued our NiceRC
class.
Consider again the following map with two buckets and three elements.
But this time, two threads attempt to perform the resizing/rehashing operation concurrently. Let us call these threads ‘Thread 1’ and ‘Thread 2’.
Now we will see how interleaving of two resizing operations could lead to an infinite loop.
Consider the first thread entering the resizing loop first. It initializes its e
and next
variables. Then for some reason, thread 1 stops. It gets kicked out of the CPU for any reason possible.
(For clarity variables for the thread are suffixed with thread number)
// Transfer method in java.util.HashMap -
// called to resize the hashmap
for (int j = 0; j < src.length; j++) {
Entry e = src[j];
if (e != null) {
src[j] = null;
do {
Entry next = e.next;
// THREAD 1 STOPS HERE
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while (e != null);
}
}
This is the state of thread 1 local variables at the time of eviction.
Assume that while thread 1 is evicted out of CPU, thread 2 does not and completes the resizing process. This is now the new state of our map.
Note that the resizing process does not clone Entry objects, so e1
and next1
still refer to valid objects in the new state. However, the next relation has now changed, in a wrong way.
At this point, thread 1 is scheduled again and it starts its execution from where it left.
It adds node A to the front of the new bucket.
In the next iteration, variables e1
and next1
are updated to point to node B and node A respectively. With this, we add node B to the front of the new bucket.
Next, we update the e1
variable to pointer to node A. Since node A has no next node, next1
now pointer to null.
Our algorithm (again) tries to add node A to the front of the new bucket.
With that the structure of our map is successfully corrupted!
Any future operation that tries to add or get element from this corrupted bucket would end up in an infinite loop.
This is exactly what happened with our NiceRC
class.
Utilizing non-thread-safe classes in a concurrent environment poses inherent risks. Even if your system can handle race conditions, there is always the possibility of unexpected interleaving of operations that could lead to system failures. Therefore, it is strongly recommended to employ thread-safe implementations for data structures when working in a multi-threaded environment.
Also published here.