You can need them more often when it seems. For example, when you are doing server-side web development, you are in the multithread context because every request runs in a separate thread, and if you have a singleton service in your app, you should be sure that all code in the service is thread-safe. In UI development (WPF, Xamarin, whatever), we always have main thread and background tasks, and if a collection can be modified from UI by a user and background service, you have to be sure that your code is thread-safe.
Let's start with a simple example.
if(!dictionary.KeyExists(key))
{
dictionary.Add(key, value);
}
And let have a look at what may happen in two threads scenario:
When running this code in multiple threads, there may be a chance when in both threads if
case will pass but only one thread will be able to modify the dictionary, and you'll get ArgumentException
(An element with the same key already exists in the dictionary).
To work with collections in a multi-thread environment in .NET
, we have a System.Collections.Concurrent
namespace. Let's take a very brief overview of it.
System.Collections.Concurrent
Namespace?ConcurrentDictionary
- a general use thread-safe dictionary that can be accessed by multiple threads concurrently
ConcurrentStack
- a thread-safe last in-first out (LIFO) collection
ConcurrentQueue
- a thread-safe first in-first out (FIFO) collection
ConcurrentBag
- a thread-safe, unordered collection of objects. This type maintains a separate collection for each thread for adding and getting elements to be more performant when producer and consumer reside in the same thread.
BlockingCollection
- provides concurrent addition and removal of items from multiple threads with the Add
and Take
methods (with cancellable overloads TryAdd
and TryTake
). It also has bounding and blocking capabilities which means that you can set the maximum capacity of the collection, and producers will be blocked when a maximum amount of items is reached to avoid excessive memory consumption.
BlockingCollection
is a wrapper for ConcurrentStack
, ConcurrentQueue
, ConcurrentBag
. By default, it uses ConcurrentStack
under the hood, but you can provide a more suitable collection for your use case during initialization.
All these collections (BlockingCollection
, ConcurrentStack
, ConcurrentQueue
, ConcurrentBag
) implement the IProducerConsumerCollection
interface, so always try to use it, and you will be able to switch between different types of collections easily.
There are also Partitioner
, OrderablePartitioner
, EnumerablePartitionerOptions
, which are used by Parallel.ForEach
for collection segmentation.
Now let's dive a little deeper and look into the main benefit that concurrent collections offer.
Let's have a look at another example: Enqueue
method of the standard generic queue implementation in .NET
// Adds item to the tail of the queue.
public void Enqueue(T item)
{
if (_size == _array.Length)
{
Grow(_size + 1);
}
_array[_tail] = item;
MoveNext(ref _tail);
_size++;
_version++;
}
Queue<T> uses an array to store elements and resizes this array when necessary. Also, it uses _head
and _tail
properties for indexes from which to dequeue or enqueue elements, respectively. From the code, we see that Enqueue
consists of multiple steps. We check the array's length and resize it if necessary, then we store the item in the array and update _tail
and _size
properties. So to speak, it's not an atomic operation.
For example, Thread 1 assigns a value to _array[_tail]
, and while it is modifying _tail
property, Thread 2 assigns another value to the same _tail
index, and we end up with an inconsistent state of our collection.
Unlike standard, concurrent collections guarantee the integrity of a collection in a multithread environment. But this comes with a price.
Concurrent collections will be less performant than standard collections in a single-thread environment. And the worst performance you'll get then accessing an aggregate state of a concurrent collection. The aggregate state is a value that requires exclusive access to all collection elements (for example, .Count
or .IsEmpty
properties). Concurrent collections use different technics to optimize locking (granular locks, managing separate collections for different threads), but to query aggregate state, you have to lock the whole collection, potentially blocking multiple threads. So avoid querying aggregate state too often.
In both examples, we've already seen the result of an operation depends on the order in which threads do their work. Such kind of issues is called race conditions. And concurrent collections have specific API to minimize race conditions. Let's have a look at this single-thread example:
if (dictionary.ContainsKey(key))
{
dictionary[key] += 1;
}
else
{
dictionary.Add(key, 1);
}
You should understand already that this code can fail in different places if running in a multi-thread environment. To deal with such cases, the concurrent dictionary has the AddOrUpdate
method, which can be used like this:
var newValue = dictionary
.AddOrUpdate(key, 1, (itemKey, itemValue) => itemValue + 1)
Here we have a delegate as the third parameter of AddOrUpdate
method. One could expect that AddOrUpdate
is an atomic operation, and we won't have any issues here. Even though this operation is really atomic, it uses TryOrUpdate
under the hood, and if the latter can't update the current value (for example, then the value has been already updated from another thread), then the delegate will be executed again with a new itemValue
. So we should remember that the delegate can be executed multiple times and that it shouldn't contain any side effects or a logic that depends on a number of executions.
Wrapping up, we should say that your best choice is steering away from concurrency as much as possible, but when it is not possible, concurrent collections can be handy, even though by no means are they a magic wand.