Micro Optimization: Don't Get Lost In The Rabbit Hole

I had to settle a performance discussion within my team. Because of a simple PR, I started a 2 weeks journey in the dark twists and turns of javascript. To save you a lot of pain and frustrating questions, I sum up my research in this really long post. I tried my best to show you the train of thoughts but if you don’t care about the details, you can jump to the end for the TL;DR section. # Call to Adventure Everything started with a simple Pull Request on Github. It’s a javascript file, I let you read: To sum-up an engineer of my team wrote a one-liner to check if the value was valid, using the following code: [ , ].includes(value); 'valid_value1' 'valid_value2' And another one suggested a ‘performance improvement’ using instead: VALID_VALUES = ([ , ]); VALID_VALUES.has(value); const new Set 'valid_value1' 'valid_value2' The argument is about complexity saying we are moving from O(N) to O(1). The author of the PR started to argue that it was not an improvement as the time to create the Set from an array is O(N). Disclaimer, after 10 years of development, I built a strong opinion about this This is called micro-optimisations, and they don’t make any difference in web development (not true for video games). : in term of Performance, it DOES NOT MATTER! So I used my manager authority to stop the discussion. # A quest for the truth I could have just forget about this conversation and go back to my life. But I was afraid it would happen again. Indeed, I noticed in my career, that tech team don’t like managers to take decisions for them. Developers want to understand the reason. So I had to settle that once for all, and I started to dive deep… As every programmer, I started googling to find a benchmark. JSPerf is a great resource. It lists hundreds of benchmarks just to compare array and set. The result was not the one I expected. In experiment (A) is faster than but not in experiment (B). The two most popular tests have opposite conclusions. array.includes() set.has() # Benchmark: allies or enemies? The reason they have different conclusions is that the two benchmarks don’t actually test the same thing. Can you spot the problem? EXPERIMENT (A): keys = [ , , , , ] set = (keys); randomKey = keys[ .floor( .random() * keys.length)]; keys.includes(randomKey) set.has(randomKey) // SETUP const 'build' 'connected' 'firmware' 'model' 'status' const new Set // RUN const Math Math // --> Faster EXPERIMENT (B): keys = [...Array( )] .map( .fromCharCode(index + )); set = (keys); randomKey = .fromCharCode(~~( .random() * keys.length * ) + ); keys.includes(randomKey) set.has(randomKey) // SETUP var 512 ( ) => _, index String 32 var new Set // RUN const String Math 2 32 // --> Faster They are 2 main differences: Experiment (A) only has 5 items in the array, as (B) has 512. Experiment (A) only test for (i.e. when the value is in the array). As (B) test for both and . hit hit miss A is obviously slower for an array because that’s the worst-case scenario. You have to loop over all the items to know the element is not present! You are trying to compare the best case complexity to the average complexity. That was my first hard lesson. miss Do not trust benchmarks! # A benchmark to rule them all? As we saw the size of the array will make the performance vary. I wanted to understand what was the threshold. When does a start becoming more efficient than an ? set array As most of the developers, I thought I would be able to resolve this mystery by writing my own code. I focused on and so I am voluntarily excluding the construction of the Set from the benchmark. .has() .includes() SIZE = ; GLOBAL_ARRAY = []; ( i = ; i < SIZE; i++) { GLOBAL_ARRAY.push( + i); } GLOBAL_SET = (GLOBAL_ARRAY); LAST_KEY = + (SIZE - ); suiteHasVsIncludes = Benchmark.Suite; // SETUP - not included in the perf measure. var 1000 var for var 0 'key_' var new Set var 'key_' 1 var new GLOBAL_ARRAY.includes( ); GLOBAL_SET.has( ); GLOBAL_ARRAY.includes(LAST_KEY); GLOBAL_SET.has(LAST_KEY); GLOBAL_ARRAY.includes( ); GLOBAL_SET.has( ); // BENCHMARK on MISS 'key_unknown' 'key_unknown' // BENCHMARK on HIT - WORSE CASE SCENARIO // BENCHMARK on HIT - BEST CASE SCENARIO 'key_0' 'key_0' As expected the time for a lookup using a doesn’t vary much with the Size, as it should have a complexity of O(1). The cost of a miss in the array is linear until 5 000 (hard to see because of the log scale in X-axis). So far it’s coherent with the complexity, as we have to loop over all the items O(N). However, there is a massive drop after. Set To conclude, if you only compare and until you reach a SIZE of 5 000, arrays perform x8 times faster, after a 100 000 is x10K times faster. set.has() array.includes() set.has() . I am not better than everyone else. For instance, I am using the same key, what if there is some kind of cache mechanism under the hood? I also realised that performance tests are not reliable when you run them on your local computer. Once again, do not trust my own benchmark As you are not in a controlled environment, many processes are competing for the CPU. Two consecutive runs can have different results (especially if you are listening to Spotify in the background). I tried running the same code 10 times on my local machine and I had up to x3 factor difference between tries. Try 1: array.includes() x 27,033 ops/sec ±41.13% (82 runs sampled) Try 2: array.includes() x 9,286 ops/sec ±15.05% (83 runs sampled) The library I used actually tells you to be careful. There was a ±41.13% difference within the 82 runs. It’s pretty fishy and you should probably discard this run. Benchmark.js # A remaining Snag I was first happy with this result. It was coherent with my understanding of the complexity. Search in an is O(N) while uses an O(1) lookup in a Hashtable. array set There was one thing that was still bothering me. With SIZE<5000, is performing 8 times better than array.includes() set.has() I couldn’t live with this incoherence, how could I explain that smaller Array actually performs much better than Set. # The cave: a step further in the dark I started to poke around on the internet, and I found this article: . Optimizing hash tables: hiding the hash code "Set, Map and WeakSet and WeakMap all use hash tables under the hood. A is used to map a given key to a location in the hash table. A is the result of running this hash function over a given key. hash function hash code , independent of the object value. Therefore, we can’t recompute it, meaning we must store it." In V8, the hash code is just a random number When I read the last line, I couldn’t believe it. In javascript, a hashcode is not the result of a hash function, but a random number?! Why is it called a Hash table? I was completely lost 🤔. Then I figured it out. Let’s take an example of two players: player1 = { : , : }; player2 = { : , : } set = (); set.add(player1); set.add(player2); player2.score = ; set.has(player2) var name "Alice" score 87 var name "Bob" score 56 var new Set 66 // -> true In Javascript, , for instance, the score can change. So we can’t use the object content to generate a unique hash. If Bob improves his score the hash will be different. Moreover, the memory position can not be used because the garbage collector moves objects around. That’s why it generates a random number that is stored with the object. Objects are mutable (1) 🔬 The implementation for Both and are using a random number as explained in the following article What about other language? Java OpenJDK 7 OpenJDK 6 How does the default hashCode() work? (2) In , you simply can’t hash (some) mutable objects: Python mylist = [] d = {} d[mylist] = Traceback (most recent call last): File , line , TypeError: unhashable type: 1 " " 1 in 'list' That was the first revelation, hash function in theory and in practice are actually two different things. That could explain performance difference. But wait a minute, the code of the benchmark is: GLOBAL_SET.has( ); 'key_0' the key is not a mutable object, it’s a , an primitive data type in Javascript! string immutable (3) 🔬 vs Do not confuse the primitive and the standard built-in Object. that is a wrapper around the primitive type, and as an Object String are mutable. string String string String (4) name = ( ); var new String 'Alice' // returns a mutable object. Why can’t we use a hash function for immutable keys like ? I had to know. I was back to the beginning, so I made a final move. string The implementation of Set in node.js actually relies on Google V8 engine. As it’s open-source, I looked at the code… # The crossing: truth is in the code From here we will dive deep into optimised C++ code. I know it’s not easy to read. I did my best to only pinpoint essential parts of the implementation, but feel free to trust me and skip the code samples. First I grep for in v8 codebase, I ended up with +3000 results, so I realised that was a bad idea 😅. I looked for to narrow it down, as they both rely on the same hashmap implementation. I found my entry point: Set WeakSet { public: explicit BaseCollectionsAssembler(compiler::CodeAssemblerState* state) : CodeStubAssembler(state) {} virtual ~BaseCollectionsAssembler() = ; protected: enum Variant { kMap, kSet, kWeakMap, kWeakSet }; AddConstructorEntry(Variant variant, TNode context, TNode collection, TNode add_function, TNode key_value, Label* if_may_have_side_effects = nullptr, Label* if_exception = nullptr, TVariable * var_exception = nullptr); : class BaseCollectionsAssembler public CodeStubAssembler default // Adds an entry to a collection. For Maps, properly handles extracting the // key and value from the entry (see LoadKeyValue()). void Object Object Object Object As you can see most of the code is actually shared between , , and . Map Set WeakMap WeakSet We already knew that by reading the v8 blog from the last section. Allocate Memory TNode CollectionsBuiltinsAssembler::AllocateTable( Variant variant, TNode at_least_space_for) { (variant == kMap || variant == kWeakMap) { AllocateOrderedHashTable (); } { AllocateOrderedHashTable (); } } if return else return The memory is allocated via method and it calls . I am going to spare you a couple of jumps. I ended up looking at the constructor of the class AllocateTable AllocateOrderedHashTable OrderedHashTable EnsureGrowable(Isolate* isolate, Handle table); // OrderedHashTable is a HashTable with Object keys that preserves // insertion order. There are Map and Set interfaces (OrderedHashMap // and OrderedHashTable, below). It is meant to be used by JSMap/JSSet. template , > : class Derived int entrysize class OrderedHashTable public public // Returns an OrderedHashTable (possibly |table|) with enough space // to add at least one new element. static To optimise memory, has two different implementations, depending on the size required. OrderedHashTable is similar to the , except for the memory layout uses byte instead of Smi (SMall Integer) as the hash key. It starts with 4 buckets, then it doubles capacity until it reaches 256. SmallOrderedHashTable OrderedHashTable Beyond that limit, the code rehashes every item into a new . This doesn’t explain by itself the fact that Set is less performant for small size than an array. OrderedHashTable are designed to provide O(1) access time to N items. To do so they allocate M memory slots, called buckets. To avoid collision they pick M >> N. Collision are stored as a link list in a bucket. are not designed to list all N elements. For that you would need to loop over all the M buckets even if they are empty. Javascript specifies that all collections have a method, that allows you to iterate efficiently over the keys. An maintains another list of keys inserted in order. (it’s a tradeoff between speed and memory). 🔬 What’s the difference between HashTable and OrderedHashTable? Hash table Hash table .keys() OrderedHashTable set = ([ , ]); set.keys() var new Set 'key1' 'key2' // -> we want to iterate over the keys Find a Key We know how a Set is stored in memory. The next step is to see how we find a key. When you call the method you end up calling an , that will call set.has() OrderedHashTableMethod::hasKey() OrderedHashTableMethod::FindEntry() GetHash(); (hash->IsUndefined(isolate)) kNotFound; entry = HashToFirstEntry(Smi::ToInt(hash)); (entry != kNotFound) { Object candidate_key = KeyAt(entry); (candidate_key->SameValueZero(key)) entry; entry = GetNextEntry(entry); } kNotFound; } template > : class Derived int SmallOrderedHashTable if return int // Walk the chain in the bucket to find the key. while if return return We are almost there! We only need to understand how works. If you don’t want to read the code let me sum up for you. Depending on the type of the Isolate (Object, String, Array), the hash function will differ. key->GetHash() For Object the hash is a random number (we already discovered that in the previous part), for the hash code is generated by iteration on every character. string Isolate::GenerateIdentityHash( mask) { hash; attempts = ; { hash = random_number_generator()->NextInt() & mask; } (hash == && attempts++ HashString(String , start, length, seed) { DisallowHeapAllocation no_gc; (length > String::kMaxHashCalcLength) { StringHasher::GetTrivialHash(length); } :: buffer; Char* chars; ( .IsConsString()) { DCHECK_EQ( , start); DCHECK(! .IsFlat()); buffer.reset( Char[length]); String::WriteToFlat( , buffer.get(), , length); chars = buffer.get(); } { chars = .GetChars (no_gc) + start; } StringHasher::HashSequentialString (chars, length, seed); } // String Hash returns hash based on iteration for each character. template typename uint32_t string size_t int uint64_t if return std unique_ptr const if string 0 string new string 0 else string return If you are familiar with C++, you could wonder why they don’t use the implementation of a HashTable. From my understanding is that because a HashTable in the standard lib is typed. You can only insert object of the same type. In javascript they need an extra wrapper to be able to store different types in the same set. Moreover in the Set are actually implemented as a binary search tree, not as a HashTable, so lookup is O(Nlog(N)). 🔬 C++ Standard Library std::lib std:lib # The Truth about Array Now we understand how is working in Javascript, to be able to understand why performs better for the smaller sizes we need to understand how it’s implemented. Set array Here I am going to spare you the C++ code. Instead, I will refer to this post from the V8 blog. “JavaScript objects can have arbitrary properties associated with them. However JS engine is able to optimize properties whose names are purely numeric, most specifically .” array indices In V8, properties with integer names — array indices — are handled differently. If the external behaviour of numerical property is the same as non-numeric properties, V8 chooses to store them separately for optimization purposes. Internally, V8 calls numeric property: . Objects have named- that map to values, whereas arrays have indices that map to . elements properties elements If we dive even deeper, the are 6 types of elements: On one hand, are used when the array is full and has no hole. In this case, the underlying memory layout is optimised in C++. PACKED_ELEMENT , on the other hand, are used for sparse arrays, every time you create a hole in an array the memory layout change from to and you can never go back to HOLEY_ELEMENTS PACKED HOLEY PACKED The problem with is that v8 cannot guarantee that they will have a value to return, and because prototype can be overridden, v8 has to go up the chain of the prototype to check if there is value somewhere. HOLEY_ELEMENTS In we know the value exists so we don’t inspect the full prototype allowing ! PACKED_ELEMENTS fast access 🚀 As a reminder, in my benchmark I set up my array using the following code: SIZE = ; GLOBAL_ARRAY = [] ( i = ; i < SIZE; i++) { GLOBAL_ARRAY.push( + i); } var 1000 var for var 0 'key_' By doing that I am creating a type. So because v8 knows there is no hole in my array, it will optimise the underlying memory and use . If I had initialised my array differently, for instance, then I would make sure the creation is faster as v8 will allocate the right amount of memory upfront. PACKED_ELEMENTS fast access var = GLOBAL_ARRAY = new Array( SIZE ) But this would create holes, and I would end up with so the lookup would be slower as I can’t use anymore. HOLEY_ELEMENTS fast access Here we are, the final conclusion 💪. Set uses spare memory access and properties, whereas array has indices that map to elements compactly stored in memory allowing extra . HOLLEY_ELEMENTS fast access I was finally satisfied. Until you reach 5 000 elements the complexity overhead to go through the entire array is negligible compared to the memory access improvements made by V8 to store arrays. How far do you go? I had a cold realisation: this journey took me 2 weeks of deep research. I could even go further. It reminded me of things I learned 10 years ago when I started my career in C++. You also have micro optimisation for compiled language with no Garbage collector. 🔬 This one is a classic one in interview, it says that actually creates a copy of i then increment the variable then replace i with the new value. But is directly incrementing the value, so it’s faster. i++ vs ++i C++ i++ ++i You can find much more examples like this one in I also rediscover the blog by , where he explains how he made the video game fit in 2MB RAM with a lot of optimisation tricks. Effective C++ by Scott Meyers. Andy Gavin the creator of Crash Bandicoot This is a never-ending journey. So I decided to stop here and write this article. I probably already went too far. The hard question then is: “Was this quest worth it?” 🚀 TL;DR & Conclusion This article is the end of a 2 weeks journey. That’s a long time to settle a performance argument. So here are my main conclusions: On server-side, the bottleneck will always be network calls to third parties, I/O access and database queries. On the frontend side, the bottleneck will be the modification of your DOM. Even if React has a virtual DOM, a bad implementation of a render method can ruin your performance. Micro Optimisations are irrelevant for Web Development. . Most of the time, micro optimisation will make your code hardly understandable, and challenging to test. In the long term, you will save more time by having a clear code. Put that into perspective, if a dev loses an hour to understand a micro-optimised code, that saves you 0.1ms per run, you need 30 million runs to have a positive ROI. Focus on what really matters, code should be easy to test and easy to read 🏎 If you want to talk about anyway: performance You probably should never have to handle 100 000 items in your javascript web application. Instead, try to use pagination, or map-reduce to avoid large data handling. Do not trust benchmarks! Depending on the implementation they may give you different results. If you have a specific issue write your own benchmark to make sure you test with your data. Make your code as close as possible as the problem you have.If you ever decide to trust a benchmark, run it on a controlled environment, like a dedicated instance with nothing else running on. 🧲 If you want to talk about anyway: complexity There is a massive difference between the theoretical complexity and the real implementation of a data structure like Set. You may have preconceived ideas, based on simplified descriptions used in books. Because objects are mutable and the garbage collector moves objects around, v8 uses random number stored on the object instead of hashing the content. Time is not the only factor you want to optimise, there are tradeoffs between space and memory access (disk vs RAM). V8 will optimise packed array where there is no hole. It’s always interesting to go back to the source code. It reminds us than javascript is not a magic language and the flexibility comes with a price on performance. That’s why video games mostly use C++.