paint-brush
Why Would Anyone Call a Race Condition Nice?by@infinity

Why Would Anyone Call a Race Condition Nice?

by Rishabh AgarwalFebruary 26th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Race conditions are the bitter truth of concurrent applications. But why would anyone call a race condition nice? The fascination of this race condition lies in its subtlety on the outside, countered by the intricate complexities under the hood. It is probably the reason why people call it a Nice Race Condition.
featured image - Why Would Anyone Call a Race Condition Nice?
Rishabh Agarwal HackerNoon profile picture


Race conditions are the bitter truth of concurrent applications. But why would anyone call a race condition nice? The fascination of this race condition lies in its subtlety on the outside, countered by the intricate complexities under the hood. It is probably the reason why people call it a Nice Race Condition.


Despite the disagreement on the naming, this race condition serves as an amazing case study of how seemingly innocent programs can fail in concurrent settings.


In this blog, we will dissect this race condition. We will understand how and why this race condition occurs and what are some ways to eliminate this race condition in our applications.


Let us Start…


Consider the following Java code:

public class NiceRC {

    public static void main(String[] args) {
        final Map<Object, Object> unsafeMap = new HashMap<>();
        final List<Thread> threads = new ArrayList<>();
        for (int i = 0; i < 5; ++i) {
            threads.add(new Thread(() -> {
                for (int j = 0; j < 100000; ++j) {
                    unsafeMap.put(new Object(), new Object());
                }
            }));
        }
        threads.forEach(Thread::start);
        threads.forEach(thread -> {
            try {
                thread.join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        });
        System.out.println("Size of unsafeMap: " + unsafeMap.size());
    }

}


I strongly suggest giving the code above a thorough read before proceeding.


In this code, we start 5 threads and each thread adds around 100k entries to a shared map called unsafeMap.


If you were to run this code multiple times, it won’t be long before you come across an instance where the code fails to terminate. It goes in some sort of infinite loop! But why? No section of our code could cause an infinite loop.


Turns out there is a race condition at play.


But before talking about the race condition, let us take a refresher on Java’s HashMap. This will be necessary to understand what is happening under the hood!


HashMap’s Internal

HashMap is a hash table-based implementation for a Map. A hash function is used to map a key to a bucket/slot from which the desired value can be found by traversing a Linked List.


Internal structure of HashMap. Keys are divided into buckets. Each bucket points to a Linked List.



Following is the definition of the Entry class used to represent nodes of the Linked List inside a HashMap,


static class Entry<K,V> implements Map.Entry<K,V> 
{
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
}


When the number of key-value pairs kept in a HashMap exceeds a threshold, it needs to resize its internal array and re-hash all the elements. This process is called rehashing or resizing.


Here is a part of the code that performs this resizing:

  // Transfer method in java.util.HashMap -
  // called to resize the hashmap
  
  for (int j = 0; j < src.length; j++) {
    Entry e = src[j];
    if (e != null) {
      src[j] = null;
      do {
        Entry next = e.next;
        int i = indexFor(e.hash, newCapacity);
        e.next = newTable[i];
        newTable[i] = e;
        e = next;
     } while (e != null);
   }
 }


Let us take an example to understand how this rehashing code works.


Consider a map with two buckets and three elements.


HashMap with two buckets and three elements. All three elements falls under Bucket 1.


At this point, a resizing operation takes place to increase the number of buckets to 4.


All three map elements will be rehashed. Keys A and B will be hashed to Bucket 3, while key C will be hashed to Bucket 1.


After one iteration of the loop, the following will be the state of this map.


Node A is removed from the original bucket and is added to the head of the new bucket


Node A is plucked from the original bucket and is added to the front of the new bucket.


After yet another iteration of this loop, we will have node B added to the head of Bucket 3.


Node B is removed from the original bucket and is added to the head of the new bucket


With one more iteration, the rehashing process is completed. This is the final state of the map.


Node C is added to Bucket 1. This completes the rehashing process.


To the Race Condition…

With all that knowledge of HashMap’s internals, we are all set to uncover the race condition that plagued our NiceRC class.


Consider again the following map with two buckets and three elements.



But this time, two threads attempt to perform the resizing/rehashing operation concurrently. Let us call these threads ‘Thread 1’ and ‘Thread 2’.


Now we will see how interleaving of two resizing operations could lead to an infinite loop.

Consider the first thread entering the resizing loop first. It initializes its eand next variables. Then for some reason, thread 1 stops. It gets kicked out of the CPU for any reason possible.


(For clarity variables for the thread are suffixed with thread number)


// Transfer method in java.util.HashMap -
  // called to resize the hashmap
  
  for (int j = 0; j < src.length; j++) {
    Entry e = src[j];
    if (e != null) {
      src[j] = null;
      do {
        Entry next = e.next;
        // THREAD 1 STOPS HERE
        int i = indexFor(e.hash, newCapacity);
        e.next = newTable[i];
        newTable[i] = e;
        e = next;
     } while (e != null);
   }
 }


This is the state of thread 1 local variables at the time of eviction.


Thread 1 initialized its e1 and next1 variables and got evicted


Assume that while thread 1 is evicted out of CPU, thread 2 does not and completes the resizing process. This is now the new state of our map.


HashMap is rehashed by another thread


Note that the resizing process does not clone Entry objects, so e1 and next1 still refer to valid objects in the new state. However, the next relation has now changed, in a wrong way.


At this point, thread 1 is scheduled again and it starts its execution from where it left.

It adds node A to the front of the new bucket.



Note that while node A is rehashed, the next pointer from B to A is not removed!


In the next iteration, variables e1 and next1 are updated to point to node B and node A respectively. With this, we add node B to the front of the new bucket.



B is added to the front of the new list


Next, we update the e1 variable to pointer to node A. Since node A has no next node, next1 now pointer to null.


Our algorithm (again) tries to add node A to the front of the new bucket.


Adding node A to the front of new bucket creates a loop!


With that the structure of our map is successfully corrupted!


Any future operation that tries to add or get element from this corrupted bucket would end up in an infinite loop.


This is exactly what happened with our NiceRC class.


Utilizing non-thread-safe classes in a concurrent environment poses inherent risks. Even if your system can handle race conditions, there is always the possibility of unexpected interleaving of operations that could lead to system failures. Therefore, it is strongly recommended to employ thread-safe implementations for data structures when working in a multi-threaded environment.


Also published here.