508 reads

Choosing the Best Dictionary in C++. Part 2: Unordered Containers

by Denis TDecember 9th, 2024

Too Long; Didn't Read

When it comes to picking a hash map, modern C++ has a lot to offer. Sometimes, it is hard to select the most efficient hash map data structure even for a professional engineer. Let's see what the C++23 standard library and the most prominent third-party libraries provide, and how to pick the most suitable hash map.

featured image - Choosing the Best Dictionary in C++. Part 2: Unordered Containers

This is the second part in the series about choosing the most suitable associative container (dictionary) in C++23. In the first part, we covered ordered containers, and in this part, we'll discuss unordered ones in detail.

Unordered containers

These containers do not provide any defined order for their keys. Moreover, key order can change with each modification, so the program should never rely on it. Such containers are always implemented as hash maps.

Generally, when adding, removing, or searching for a key, a hash map first computes some integral hash value from that key using the hash function. The hash (or rather its part) is then used as an index into the contiguous pre-allocated array. Each entry in this array is called a bucket. Some entries in that array will be empty, some will contain a single element, and some might map to more than one element due to hash collisions. This happens when different keys have hash values that point to the same array index. Hash maps utilize various strategies to deal with hash collisions (see this Wikipedia article). The number of elements in the hash map divided by the total size of the array is called the load factor. The higher the load factor is, the more possible hash collisions become with each newly inserted element.

As opposed to ordered containers, hash maps do not support range queries, rank/select operations, iteration over keys in order, and search for the key smaller/larger than the given key. In return, hash maps reach the best achievable performance: time complexity of search/insertion/removal operations is amortized O(1). I say "amortized" here, because when the number of elements becomes too large, a hash map needs to rehash its contents to reduce the load factor (effectively growing the bucket array). Time complexity of rehashing is O(n). However, it is expected to happen rarely, so on average, each operation is still O(1). Another reason for potentially reduced performance is a hash function with poor distribution, which will increase the frequency of collisions.

Standard hash maps

Starting with C++11, the standard library provides the following hash maps: std::unordered_map (link), std::unordered_set (link), std::unordered_multimap (link), and std::unordered_multiset (link). Maps associate a key with the value, while sets only store the key and are useful to check if the key is present in the container, but not retrieve the associated value. Multi containers allow storing multiple duplicate keys.

All standard unordered containers are node-based and use Separate chaining to deal with hash collisions, which means, they store each key or key-value pair in a separate linked list node. Memory for each new node is allocated individually, which is not particularly efficient. This also makes the data structure not CPU cache-friendly, because each key access requires extra indirection. Here is how the unordered_map internal structure may look like:

On the left, there is an array of buckets, which is pre-allocated to some fixed size (8 in our example). Initially, all buckets are empty. When we start adding elements to the hash map, some buckets will become occupied. In the example above, there is an element with key 10 (which got into bucket 1), and two elements with keys 50 and 256, both ended up in the same bucket 3 because their hash values collided. There is also an element with key 3, which landed in bucket 6. Each bucket points to the linked list, which ideally contains not more than one element. Buckets 0, 2, 4, 5, and 7 are empty and point to nullptr.

Let's assume we need to find the value for key 256.

Firstly, the map computes the key hash and gets the remainder when dividing the hash value by the total number of buckets (8 in our case). In our example, the remainder value is 3.
Then, the map starts reading elements of the linked list pointed to by the bucket 3.
The first element has the key 50, which is not the same as the 256 we are looking for, so the map will continue to iterate. The next element has the key 256, which is exactly the one we need. The corresponding value is xyz.
If the key weren't in the dictionary, the map would either hit an empty bucket or iterate over the linked list until the end. In both cases, the map would return an end iterator, indicating that the key does not exist.

You may notice that the last element of each list points to the first element of the next list. This is done in some implementations to improve the iteration efficiency. Instead of checking bucket by bucket when iterating over all hash map elements, we can use those connections to jump from one non-empty linked list to another much faster.

If we continue adding more elements to the hash map above, at some point the load factor will become too high (for example, 80%), and the hash map will decide to rehash. It will grow the pre-allocated array (e.g. from 8 to 16 elements), recompute hashes for all existing elements, and put elements into the new buckets.

Standard containers provide reference and iterator stability guarantees, but they are weaker than those offered by ordered containers. References to the existing elements are never invalidated. Iterators to the existing elements can be invalidated only when the addition of a new element causes a rehash, or when rehashing is triggered manually:

#include <iostream>
#include <unordered_map>

int main() {
    std::unordered_map<std::string, int> map{
        {"test", 123},
        {"xyz", 456},
        {"abc", 10}
    };

    auto& value = map["abc"];
    auto valueIt = map.find("abc");

    // Might cause a rehash if the load factor
    // is too high
    map.emplace("zzz", 1000);

    // Triggers the rehash manually
    map.rehash(1000);

    // References remain valid in any case:
    std::cout << value << "\n";

    // Iterators are invalidated: the line below
    // is undefined behavior and might crash
    std::cout << valueIt->second << "\n";
}

Since C++17, unordered containers allow node manipulation: it is possible to take a node from one map and move it to another map without copying the key and the value:

#include <iostream>
#include <unordered_map>

int main() {
    std::unordered_map<std::string, int> map1{
        {"test", 123},
        {"xyz", 456},
        {"abc", 10}
    };
    
    // Take the node from map1:
    auto node = map1.extract("xyz");
    // We can even change its key
    // (we can also change the value
    // if needed):
    node.key() = "xyzxyz";

    std::unordered_map<std::string, int> map2;
    // Insert the node into another map:
    map2.insert(std::move(node));

    // Prints "xyzxyz: 456":
    for (const auto& [k, v] : map2) {
        std::cout << k << ": " << v << "\n";
    }
}

This is what happens in the above program:

Another strategy to deal with hash collisions is called Open addressing. In open-addressing hash maps, each bucket stores at most one element. If the bucket is already occupied, the element with the same hash value goes to some other free bucket. Such hash maps try to group elements in contiguous memory blocks to improve efficiency and make the data structure more CPU cache-friendly. C++ standard library provides no open addressing hash maps, but there are plenty of third-party alternatives.

Third-party hash maps

Boost.Unordered is an awesome library that provides a wide range of hash map implementations.

There are boost::unordered_set, boost::unordered_map, boost::unordered_multiset, and boost::unordered_multimap, which are analogs for the std containers, and everything written above applies to them. These containers use a bit more complex internal structure with "bucket groups" to improve iteration efficiency.

The library also provides boost::unordered_node_set, boost::unordered_node_map, boost::unordered_flat_set, and boost::unordered_flat_map, which are open addressing containers. The internal structure is different from open addressing variants:

You can read more about the internal structure in this blog post.

Node-based containers (boost::unordered_node_set, boost::unordered_node_map) still use nodes, which are pointed to by the buckets. These containers provide the same iterator and reference stability guaranteed as the std containers and also provide the same API for node manipulation (i.e. extract method).

In flat containers (boost::unordered_flat_set, boost::unordered_flat_map), keys and values are stored directly in the bucket array. Flat containers are the most efficient, but provide the weakest guarantees: iterators and references to all elements are invalidated when rehashing happens. Node manipulation API is not provided at all. Flat containers might use more memory than node-based ones, especially if the key or the value sizeof is large.

Another third-party library that implements open-addressing containers is Folly F14 (provided by Meta). There are a few dictionary variants that are very close to the Boost library containers described above:

folly::F14NodeSet is the same as boost::unordered_node_set, folly::F14NodeMap is the same as boost::unordered_node_map.
folly::F14ValueSet is the same as boost::unordered_flat_set, and folly::F14ValueMap is the same as boost::unordered_flat_map.
folly::F14VectorMap / folly::F14VectorSet keep keys/values packed in a contiguous array, and the buckets contain indexes into that array.
folly::F14FastMap / folly::F14FastSet is an umbrella class. It chooses the most efficient implementation (either F14Value or F14Vector) based on the parameters you specify (such as key and value types).

An interesting extra efficiency feature of F14 hash maps is prehashing. For example, if you need to search for the same key multiple times, and its hashing is expensive (e.g. a key is a string), you can prehash it once, and then use the F14HashToken in all hash map calls later to avoid re-hashing the same key multiple times. The starting point is the prehash method (link).

You can read more about the internal structure of F14 hash containers in this FB blog post.

Abseil Swiss Tables library (provided by Google) also implements open addressing node-based and flat hash containers: absl::flat_hash_map, absl::flat_hash_set, absl::node_hash_map, absl::node_hash_set. They are similar to the Boost and F14 ones. You can read more about them here and here.

A lesser-known ankerl library (github) claims to be very efficient in most scenarios. The author of this library provides extensive benchmark results for many hash maps in different use cases (blog post). You can follow these results, but take them with a grain of salt. Always test the hash map you pick in the production environment. Benchmarks are always synthetic and do not cover cases when the CPU and memory do other work besides hash map operations. Benchmarks also do not cover the cases when various parts of the hash map memory are not in the CPU cache, which will frequently happen in real programs.

Hash function quality

Hash function quality is important to keep the time complexity of look-up operations O(1). Flat hash maps are particularly sensitive to hash function quality. An ideal hash function can be defined like this: if a single bit in the key changes, the corresponding hash value will change half of its bits randomly. Such hash function is called avalanching.

Unfortunately, some C++ standard library implementations of integer and pointer hash functions are non-avalanching. For example, libstdc++ simply returns the pointer or integer value directly without any additional transformations: github.

Advanced hash tables, such as boost and F14 implementations, deal with this issue by introducing the hash_is_avalanching trait. If the hash function does not state itself as avalanching, the hash table will perform an additional mixing step to improve the hash quality. This comes at an extra cost. If you implement a custom hash function, and you know that it is of decent quality, you can mark it avalanching as shown in this example. Boost uses the is_avalanching property name, and the F14 property is named folly_is_avalanching. Ideally, you should add both.

If you use the key types supported out of the box (e.g. string, int, pointers) and the default hash functions provided by boost or F14, they will be already marked correctly as needed, and you won't need to think about this unless you decide to implement a custom key type, which will need a custom hash function.

Thread safety

All of the above containers are not thread-safe in general, and you will have to implement external synchronization (for example, using mutexes) if one thread may modify the hash map when another thread accesses it.

If all threads only read the map, no synchronization is needed. Make sure that you only call methods marked with const. All non-const methods may modify the container. Keep in mind that operator[] is not const and will modify the container. A common pitfall in the multithreaded code is:

std::unordered_map<std::string, bool> map;

// ...

if (map["key"]) {
  // ...
}

In the code above, the goal is to check if the value corresponding to the key is true. However, map["key"] will add the key to the map if it is not yet there. The newly added value will be set to default (false in the example above). This means, such code is not thread-safe, and is not too optimal. A more efficient and thread-safe way to check the same condition is the following:

if (auto it = map.find("key"); it != map.end() && it->second == true) {
  // ...
}

Here, we first get the iterator to the element identified by the "key".
We then check if the element exists in the map (it != map.end()).
If it does, we finally check its value (it->second == true).

In this example, find and end do not modify the map, and the search becomes thread-safe. If the goal was only to check if the key exists in the map, you could simply call map.contains("key").

Thread-safe unordered maps

There are a few implementations of thread-safe hash maps.

One option is boost::concurrent_flat_set and boost::concurrent_flat_map from the Boost.Unordered library. Boost containers provide visitation-based API which is vastly different from the API provided by the C++ standard library.
Another option is folly::ConcurrentHashMap (github). Folly tries to keep its API as close as possible to the C++ standard unordered containers.
libcds is a large lock-free container library that provides several implementations of lock-free hash maps (e.g. MichaelHashMap, SkipListMap, SkipListSet, FeldmanHashMap, FeldmanHashSet). This library has not been updated in a while and does not provide detailed documentation, but I am still sharing it because the containers it offers are unique. If your use case implies high contention on a hash map, try these lock-free dictionaries offered by libcds.
If efficiency is not a concern, you can synchronize accesses to non-thread-safe maps using mutexes. Alternatively, you can completely avoid concurrent accesses, which is often more efficient than using thread-safe containers or adding synchronization.

Which one do I use?

Let’s look at the summarized points that explain how to pick the most suitable container.

If you need to associate a key with the value, pick a map, otherwise, use a set.
If it's necessary to keep duplicate keys in the map, use multi containers.
If you need strictest possible iterator and reference stability guarantees, use std, boost, folly, or abseil node-based containers (such as std::unordered_map, boost::unordered_map, boost::unordered_node_map, or folly::F14NodeMap). boost::unordered_node_... and folly will likely be the most efficient.
If iterator and reference stability is not important (which means, you do not store iterators, references, or pointers to map/set elements externally or do not expect them to remain valid after the map is modified), then pick a flat container (such as boost::unordered_flat_map or folly::F14FastMap).
Flat containers use more memory than node-based ones, especially when sizeof the key and/or the value is large. If memory usage is a concern, use node-based containers instead.
The F14 containers provide prehashing functionality for extra efficiency. If you search/add/remove identical keys multiple times, F14 will make it possible to save on the hashing cost by prehashing the keys.
All the above points apply to single-threaded container use (or multi-threaded read access without concurrent modifications). If multi-threaded modifications are required, select one of the thread-safe options listed above. They can show varying performance in the real production code, so consider testing them in your application and comparing the results.