10,312 reads

Understanding Concurrent Collections in C#

by Anatoly PashmorgaDecember 15th, 2021

Too Long; Didn't Read

System.NET's System.Collections.Concurrent` is a namespace to work with a multithreaded environment. It provides concurrent addition and removal of items from multiple threads with the `Add` and `Take` methods. Your best choice is steering away from concurrency as much as possible, but when it is not possible, concurrent collections can be handy, even though by no means are they a magic wand.

Company Mentioned

featured image - Understanding Concurrent Collections in C#

You can need them more often when it seems. For example, when you are doing server-side web development, you are in the multithread context because every request runs in a separate thread, and if you have a singleton service in your app, you should be sure that all code in the service is thread-safe. In UI development (WPF, Xamarin, whatever), we always have main thread and background tasks, and if a collection can be modified from UI by a user and background service, you have to be sure that your code is thread-safe.

Why are standard collections not thread-safe?

Let's start with a simple example.

if(!dictionary.KeyExists(key)) 
{
	dictionary.Add(key, value);
}

And let have a look at what may happen in two threads scenario:

When running this code in multiple threads, there may be a chance when in both threads if case will pass but only one thread will be able to modify the dictionary, and you'll get ArgumentException (An element with the same key already exists in the dictionary).

To work with collections in a multi-thread environment in .NET, we have a System.Collections.Concurrent namespace. Let's take a very brief overview of it.

What do we have in `System.Collections.Concurrent` Namespace?

ConcurrentDictionary - a general use thread-safe dictionary that can be accessed by multiple threads concurrently
ConcurrentStack - a thread-safe last in-first out (LIFO) collection
ConcurrentQueue - a thread-safe first in-first out (FIFO) collection
ConcurrentBag - a thread-safe, unordered collection of objects. This type maintains a separate collection for each thread for adding and getting elements to be more performant when producer and consumer reside in the same thread.
BlockingCollection - provides concurrent addition and removal of items from multiple threads with the Add and Take methods (with cancellable overloads TryAdd and TryTake). It also has bounding and blocking capabilities which means that you can set the maximum capacity of the collection, and producers will be blocked when a maximum amount of items is reached to avoid excessive memory consumption.

BlockingCollection is a wrapper for ConcurrentStack, ConcurrentQueue, ConcurrentBag. By default, it uses ConcurrentStack under the hood, but you can provide a more suitable collection for your use case during initialization.

All these collections (BlockingCollection, ConcurrentStack, ConcurrentQueue, ConcurrentBag) implement the IProducerConsumerCollection interface, so always try to use it, and you will be able to switch between different types of collections easily.

There are also Partitioner, OrderablePartitioner, EnumerablePartitionerOptions, which are used by Parallel.ForEach for collection segmentation.

Now let's dive a little deeper and look into the main benefit that concurrent collections offer.

Inner state integrity

Let's have a look at another example: Enqueue method of the standard generic queue implementation in .NET

// Adds item to the tail of the queue.
public void Enqueue(T item)
{
    if (_size == _array.Length)
    {
        Grow(_size + 1);
    }

    _array[_tail] = item;
    MoveNext(ref _tail);
    _size++;
    _version++;
}

Queue<T> uses an array to store elements and resizes this array when necessary. Also, it uses _head and _tail properties for indexes from which to dequeue or enqueue elements, respectively. From the code, we see that Enqueue consists of multiple steps. We check the array's length and resize it if necessary, then we store the item in the array and update _tail and _size properties. So to speak, it's not an atomic operation.

For example, Thread 1 assigns a value to _array[_tail], and while it is modifying _tail property, Thread 2 assigns another value to the same _tail index, and we end up with an inconsistent state of our collection.

Unlike standard, concurrent collections guarantee the integrity of a collection in a multithread environment. But this comes with a price.

Concurrent collections will be less performant than standard collections in a single-thread environment. And the worst performance you'll get then accessing an aggregate state of a concurrent collection. The aggregate state is a value that requires exclusive access to all collection elements (for example, .Count or .IsEmpty properties). Concurrent collections use different technics to optimize locking (granular locks, managing separate collections for different threads), but to query aggregate state, you have to lock the whole collection, potentially blocking multiple threads. So avoid querying aggregate state too often.

Race conditions

In both examples, we've already seen the result of an operation depends on the order in which threads do their work. Such kind of issues is called race conditions. And concurrent collections have specific API to minimize race conditions. Let's have a look at this single-thread example:

if (dictionary.ContainsKey(key))
{
    dictionary[key] += 1;
}
else
{
    dictionary.Add(key, 1);
}

You should understand already that this code can fail in different places if running in a multi-thread environment. To deal with such cases, the concurrent dictionary has the AddOrUpdate method, which can be used like this:

var newValue = dictionary
    .AddOrUpdate(key, 1, (itemKey, itemValue) => itemValue + 1)

Here we have a delegate as the third parameter of AddOrUpdate method. One could expect that AddOrUpdate is an atomic operation, and we won't have any issues here. Even though this operation is really atomic, it uses TryOrUpdate under the hood, and if the latter can't update the current value (for example, then the value has been already updated from another thread), then the delegate will be executed again with a new itemValue. So we should remember that the delegate can be executed multiple times and that it shouldn't contain any side effects or a logic that depends on a number of executions.

Wrapping up, we should say that your best choice is steering away from concurrency as much as possible, but when it is not possible, concurrent collections can be handy, even though by no means are they a magic wand.