paint-brush
Composite Keys: A Guide on How to Handle Themby@kevinmasur
3,402 reads
3,402 reads

Composite Keys: A Guide on How to Handle Them

by Kevin MasurJanuary 14th, 2024
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Most of the time, just keep it simple. Combine your composite keys into a string key for storage in a map or cache if that is the easiest option and performance is not a major concern. In scenarios where performance is critical, make sure to do your own testing. But using nested maps will be the most performant in most cases. It will also likely have the smallest storage requirements. And composite keys are still a performant alternative when nesting mappings become impractical.
featured image - Composite Keys: A Guide on How to Handle Them
Kevin Masur HackerNoon profile picture

Composite keys are when a combination of data is required to define the “key” for your map or cache lookup. An example of this might be where you need to cache values based on a customer name as well as a user’s role. In a case like this, your cache would need to be able to store unique values based on each of these two (or more) criteria.


There are a few different ways composite keys can be handled in code.

Combine the Criteria Into a String

The first answer most jump to is combining the criteria into a string to use as the key. It’s simple and doesn’t take a lot of effort:


private String getMapKey(Long userId, String userLocale) {
    return userId + "." userLocale;
}


This is a pretty basic way to handle the problem. Using a string key can make debugging and investigations easier, as the cache key is in a human-readable format. But there are a few problems to be aware of with this approach:


  1. It requires a new string to be created on every interaction with the map. Though this string allocation is generally small, if the map is accessed frequently, it can lead to a large number of allocations that take time and need to be garbage collected. The size of the string allocation can also be larger depending on how large the components of your key are or how many you have.


  2. You need to ensure that the composite key you create can’t get spoofed into another key value:

public String getMapKey(Integer groupId, Integer accessType) {
    return groupId.toString() + accessType.toString();
}


In the above, if you had groupId = 1 and accessType = 23, that would be the same cache key as groupId = 12 and accessType = 3. By adding a separator character between the strings, you can prevent this kind of overlap. But be careful about optional parts of a key:


public String getMapKey(String userProvidedString, String extensionName) {
    return userProvidedString + (extensionName == null ? "" : ("." + extensionName));
}


In the above example, extensionName is an optional part of the key. If the extensionName is optional, userProvidedString could include a separator and valid extensionName and get access to cache data it shouldn’t have had access to.


When using strings, you’ll want to think about how you are combining your data to avoid any collisions in the keys. Especially around any user-generated input for the key.

Use Nested Maps/Caches

Another option is to not combine the keys at all, and instead, nest your data structures (Maps of Maps of Maps):


Map<Integer, Map<String, String>> groupAndLocaleMap = new HashMap<>();
groupAndLocaleMap.computeIfAbsent(userId, k -> new HashMap()).put(userLocale, mapValue);


This has the advantage of not needing to allocate any new memory when interacting with the maps as the passed-in values for the keys are already allocated. And while you will need to do multiple lookups to get to the final value, the maps will be smaller.


But the downside to this approach is it gets more complicated the deeper down the nesting goes. Even with only two levels, the map initialization can look confusing. When you start dealing with 3 or more pieces of data, this can lead to making your code very verbose. On top of that, each level requires null checking to avoid null pointers.


Some “key parts” also might not work well as a map key. Arrays or collections don’t have default equals methods that compare their contents. So, you would either need to implement them or use another alternative.


Using nested maps could also become less space efficient depending on how unique each level of your keys is.

Create a Composite Key Object

The last option is instead of combining the key values into a string, make a custom object for the key instead:

private class MapKey {  
    private final int userId;  
    private final String userLocale;  

    public MapKey(int userId, String userLocale) {  
        this.userId = userId;  
        this.userLocale = userLocale;  
    }  

    @Override  
    public boolean equals(Object o) {  
        if (this == o) return true;  
        if (o == null || getClass() != o.getClass()) return false;  
        MapKey mapKey = (MapKey) o;  
        return userId == mapKey.userId && Objects.equals(userLocale, mapKey.userLocale);  
    }  

    @Override  
    public int hashCode() {  
        return Objects.hash(userId, userLocale);  
    }  
}


While every interaction still requires a new memory allocation for a new object. The object key allocation is significantly smaller than the one needed for a composite string. The reason for this is that the parts that make up the key don’t need to get reallocated as strings. Instead, only the wrapping object key requires new memory.


A composite key object can also allow for customizations in the key equality and hashcode implementations. Such as ignoring capitalization in a string, or using an array or collection as part of a key.


The downside here though is that, again, it requires a lot more code than a composite string does. And it requires ensuring you have valid equals and hashcode contracts in the key class for your map.


So which should I choose?


Generally speaking, I would suggest using a composite string key. It’s simple and easy to understand, requires the least code, and is easiest to debug later. While it is likely the slowest performing, writing simple, readable code is generally more important than the benefits you would get using one of the other two options. Remember:


“Premature optimization is the root of all evil” Donald Knuth


If you don’t have evidence or a reason to believe that your map/cache lookup is going to be a performance bottleneck, go with readability.


But if you ARE in a scenario where the throughput to your map or cache is very high, then it might be good to switch to one of the other two options. Let’s see how all 3 compare to each other performance-wise, as well as with their memory allocation size.


To test the 3 scenarios above, I wrote code that would replicate the same implementation of all 3 scenarios for a composite key. The key itself consists of an integer value, a string value, and a long value. All three implementations used the same test data on each run to build the keys.


All runs were executed with 1 million records in the map (Java’s hashmap was used). 3 runs were done building the key with different combinations of key sizes:


  • 100 ints, 100 strings, 100 longs — 1 million unique keys

  • 1 int, 1 string, 1,000,000 longs— 1 million unique keys

  • 1,000,000 ints, 1 string, 1 long — 1 million unique keys


First, let’s look at how much space each map takes up in the heap. This is important because it affects how much memory is needed to run your application.


Retained size of the map(s) in MB (captured by heap dump after map creation)


There’s one interesting obvious note to make here: in the last scenario (1,000,000 ints), the nested maps size is significantly larger than the others. This is because, in this scenario, the nested maps create 1 first-level map with 1 million entries. Then, for the second and third levels, it creates 1 million maps with only one entry.


All those nested maps store extra overhead and are mostly empty. This is obviously an edge case, but I wanted to show it to make a point. When using the nest maps implementation, the uniqueness (and the ordering of that uniqueness) matters a lot.


If you reverse the order to 1, 1, 1 million, you actually get the lowest storage requirement.


In the other two scenarios, the nested mapping is the most efficient, with the custom key object coming in second, and the string keys coming in last.


Next, let’s look at the time it takes to create each of these maps from scratch:


Metrics were grabbed using Intellij profiler and looking at CPU timings of the map(s) creation method

Metrics were grabbed using Intellij profiler and looking at memory allocations of the map(s) creation method


Again, we see the nested maps performing the worst in the 1 million-1–1 scenario for memory allocation, but even then, it outperforms the others in CPU time. In the above, we can also see how the String key performs the worst in all cases while using a custom key object is slightly slower and requires more memory allocation than the nested keys.


Lastly, let’s look at the highest throughput scenario and how effective it is to read. We ran 1 million read operations (1 for each key created); we did not include any nonexistent keys.


Metrics grabbed using Intellij profiler and looking at CPU timings of the map(s) lookup method (1 million reads)

Metrics grabbed using Intellij profiler and looking at memory allocations of the map(s) lookup method (1 million reads)


This is where we really see how slow the string-based key lookup is. It is by far the slowest and allocates the most memory of any of the 3 options by far. The custom key object performs “close” to the nested maps implementation but is still consistently slower by a small margin.


However, in lookup memory allocations, notice how well the nested maps shine. No, that’s not a glitch in the graph; looking up a value in the nested maps requires no extra memory allocations to do the lookup. How is that possible?


Well, when combining the composite objects into a string key, you need to allocate memory for a new string object every time:


private String lookup(int key1, String key2, long key3) {
    return map.get(key1 + "." + key2 + "." + key3);
}


When using a composite key, you still need to allocate memory for a new key object. But because the members of that object are already created and referenced, it still allocates much less than a new string:


private String lookup(int key1, String key2, long key3) {
    return map.get(new MapKey(key1, key2, key3));
}


But the nested maps implementation requires no new memory allocation on lookup. You are reusing the given parts as keys to each of the nested maps:


private String lookup(int key1, String key2, long key3) {
    return map.get(key1).get(key2).get(key3);
}


So, based on the above, which is the most performant?


It’s easy to see that the nested maps come out on top in almost all scenarios. If you're looking for raw performance in most use cases, it’s likely the best option. Though, you should perform your own testing to confirm your use cases.


The key object makes for a very good general-purpose option when nested maps become impractical or impossible to use for your implementation. And the composite string key, though easiest to implement, is almost always going to be the slowest.


The last point to consider when looking to implement composite keys is that you can combine the above. For example, you could use nested maps for the first level or two, and then use a composite key object for simplifying the deeper levels.


This could still keep your data partitioned for fast lookups while still optimizing storage and lookup performance. And keep your code readable as well.

TLDR;

Most of the time, just keep it simple. Combine your composite keys into a string key for storage in a map or cache if that is the easiest option and performance is not a major concern.


In scenarios where performance is critical, make sure to do your own testing. But using nested maps will be the most performant in most cases. It will also likely have the smallest storage requirements. And composite keys are still a performant alternative when nesting mappings become impractical.


Also published here