There is an interactive version on . wordsandbuttons.online Reading disassembly is more like reading tracks than reading a book. You have to know the language to read a book, but reading tracks, although it gets better with skills and experience, mostly requires attentiveness and logic thinking. Most of the time we browse disassembly only to answer only one simple question: ? In 3 simple exercises, I’ll show you that you too can answer such questions even if you have never seen disassembly before. I’ll use C++ as a source language, but what I’m trying to show is more or less universal, so it doesn’t matter if you write in C or Java, C# or Rust, — if you compile to some sort of machine code — you can benefit from understanding your compiler. does compiler do what we expect it to 1. Compile time computation Any decent compiler tries to make your code do as little work as possible. Sometimes it can even conduct the whole computation in compile time, so your machine code will simply contain the answer. This source code defines the number of bits in a byte, provides a template function that accepts the type and returns the size of in bits, then calls it from the main section setting . T T T = int static int = 8; BITS_IN_BYTE template<typename T>size_t (){return sizeof(T)* ;} bits_in BITS_IN_BYTE int main(){return <int>();} bits_in Since the compiler knows the size of , it can compute in compile time. But since it isn’t guaranteed by the standard, it might not. int bits_in<int>() Now look at two possible disassemblies for this source code and decide what variant does compile time computation and what doesn’t. Variant A 01021002 01021003 01021008 0102100B 0102100C in al,dx mov eax,dword ptr ds:[01023000h] shl eax,2 pop ebp ret Variant B 003C1000 003C1002 push 20h pop eax By Karl Friedrich Herhold (Own work) [CC BY 3.0 ( )], via Wikimedia Commons http://creativecommons.org/licenses/by/3.0 Well, that’s a no-brainer. Of course does. variant B On 32-bit platform size is 4 bytes, which is 32 bits, which is 20h in hexadecimal. You might not know the convention, by which function returns in , but you see that the first variant is long enough to contain an actual multiplication, while the second one has only two lines: something with the computed answer and the other one. int size_t eax 2. Function inlining Calling function implies some overhead by preparing input data in a particular order, then shifting the execution to another piece of memory, then preparing output data, and then shifting back. If you only use the function once you don’t have to actually call the function. It just makes sense to inline function body to the place it is called from and skip all the formalities. Compilers can do this for you. This code: inline int (int x){return x + x;} twice int main(){return (2);} twice May virtually become like this: int main(){return 2 + 2; } // not really a source code, just explaining the idea // twice gets inlined here But the standard does not guarantee that all the functions marked will get inlined. Now look at these two disassembly variants below and choose the one in which the function gets inlined after all. inline twice Variant A 00E71002 00E71003 00E71008 00E7100B 00E7100C in al,dx mov eax,2 add eax,2 pop ebp ret Varian B 00261002 00261003 00261006 00261009 0026100A **ret...**008F1010 008F1011 008F1013 008F1015 008F101A 008F101D 008F101E in al,dx mov eax,dword ptr [x] add eax,dword ptr [x] pop ebp push ebp mov ebp,esp push 2 call twice (08F1000h) add esp,4 pop ebp ret By Lensim at English Wikipedia. Use “Michael Lensi” for attribution. (Transferred from en.wikipedia to Commons.) [CC BY-SA 3.0 ( ) or GFDL ( )], via Wikimedia Commons http://creativecommons.org/licenses/by-sa/3.0 http://www.gnu.org/copyleft/fdl.html Not really a mystery either. It’s . You might not know, that the instruction to call a function is actually called the , but since the disassembly contains no recall of , it must be inlined. Variant A call twice 3. Loop unrolling Just like calling functions, doing loops implies some overhead. You have to increment the counter, then compare it against some number, then jump back to the loop beginning. Compilers know that in some context it is more effective to unroll the loop, that is to do something several times in a row instead of messing with the counter comparison and jumping here and there. So given this two similar variants of source code with respective disassembly, please choose the one that actually has an unrolled loop. Variant A int main(int argc, char*){int result = 1;for( i = 0; i < 4; ++i)result *= argc;return result;} short int And respective disassembly: 00EB1002 00EB1003 00EB1006 00EB100B 00EB1010 00EB1013 00EB1014 00EB1016 00EB1017 in al,dx mov edx,dword ptr [argc] mov eax,1 mov ecx,4 imul eax,edx dec ecx jne main+10h (0EB1010h) pop ebp ret Variant B int main(int argc, char*){int result = 1;for( i = 0; i < 4; ++i)result *= argc;return result;} size_t With this: 00BF1002 00BF1003 **mov ecx,dword ptr [argc] **00BF1006 00BF1008 **imul eax,ecx **00BF100B **imul eax,ecx **00BF100E **imul eax,ecx **00BF1011 **pop ebp **00BF1012 in al,dx mov eax,ecx ret By NASA / Buzz Aldrin (NASA (original upload; ALSJ (AS11–40–5877))) [Public domain], via Wikimedia Commons And it’s . variant B Once again, you might not know that is the family of jump instructions and stands for “compare”, but variant B clearly has a repeating pattern, while variant A has some address manipulation instead. j<something-something> cmp Conclusion You could argue that these examples were made up deliberately to be obvious. It’s only a half-truth. I did refine them to be more demonstrative, but conceptually they are all taken from my own practice. Using static dispatch instead of dynamic made our image processing pipeline up to 5 times faster. Repairing broken inlining helped to prevent 50% loss of performance for edge-to-edge distance function. And changing counter type to enable loop unrolling is my favorite optimization ever. It only won us about 10% on matrix transformation for software rendering, but all its cost was simply changing to in one place. short int size_t Even considering somewhat simplified examples my point remains valid. You can read disassembly to some degree without learning assembler, and you sure can benefit from reading it. Of course, without proper skill and knowledge, you might not always succeed, but you would definitely not succeed without trying.