A Simplified Explanation of the “Meltdown” CPU Vulnerability

I just read the white paper on the “Meltdown” CPU security bug because I was curious about what exactly was going on here. The whitepaper explains it in detail but it’s fairly long and academic so I thought a simpler overview was in order.

The Meltdown attack is a cunning way of bypassing the security checks of many modern CPUs and allows reading kernel mode memory from any process on un-patched operating systems.

This article is about how it actually works — you can read more about the implications of these vulnerabilities at the Meltdown site and elsewhere.

Firstly, to call this a “bug” (as many are) is a probably over-stating it a little — the CPU functions as advertised. The problem is that someone figured out how to read the side effects caused by operations that should otherwise have been prevented.

What follows is a little technical but I’ll try to explain it as simply as possible. Certainly I’ve glossed over the details — the point here is just to convey conceptually how it works.

Before getting to the actual cause of the problem though there’s a couple of prerequisites you’ll need to understand.

Prerequisite #1 — CPU Memory Cache

The CPU cache is a block of memory on the CPU that stores recently accessed pages of data. Because these pages are cached directly on the CPU access to them is faster than having to go all the way back to main memory.

When the CPU reads from main memory it caches a copy of that page in the CPU cache. Next time it needs to read from main memory if the page is cached, it can load the value from the much faster on-CPU cache.

Each page in the cache is 4096 bytes and the cache is big enough to store many of these pages (depending on the CPU)

Prerequisite #2 — Out of Order Execution

Each CPU core has multiple execution units, each capable of executing different types of operations.

eg: an Intel i7 has between 17 and 20 execution units — 2x integer arthimetic, 1x divide, 1x load memory, 1x store memory unit etc…

In order to keep these units as busy as possible the CPU will look ahead at upcoming instructions and start executing them on different execution units while waiting for the current instruction to complete.

(That’s a grossly over-simplified view but sufficient for this discussion. In practice, instructions are broken down into micro-ops which are dispatched to execution units).

Prerequisite #3 — Priveledged Mode

At any point in time the CPU is operating in either priveledged mode or non-priveledged mode.

Kernel code runs in priveledged mode and is allowed access to all mapped memory.
User code runs in non-priveledged mode and will fail with an exception if it attempts to read or write memory marked as priveledged.

Basically the operating system runs in priveledged mode and installed programs run in non-priveledged mode.

The Problem

If you think about the above it appears that user mode (non-priveledged) code shouldn’t be able to read kernel mode memory because the CPU will throw an exception.

The problem is that:

priveledged mode checks aren’t performed until the instruction is completed. (ie: not during the out-of-order execution).
the out-of-order execution of upcoming instructions causes side effects that can be observed.

Here’s How it Works

Firstly the attacker allocates a block of memory consisting of 256 pages of memory (ie: 256 * 4096 bytes). Each page in this block of memory won’t be cached at this point because it has never been accessed.

Next, a sequence of code similar to the following is executed. This is called the “sender”:

Perform some instruction that will throw an exception (doesn’t really matter what)
Read a byte from priveledged memory. Let’s call this the “secret”
Multiply the secret by 4096 (the cache page size)
Use that multiplied value as an index into the block of allocated memory and read one byte (ie: read one byte from page N where N is the secret)

Assuming the CPU starts out-of-order execution of instructions 2–4 before instruction 1 completes then instruction 4 will cause one page of the allocated block of memory to be cached on the CPU. The page that is cached will be directly related to the byte read from kernel mode memory. eg: if the secret is 21 then the 21st page of the allocated memory block will now be cached on the CPU.

Finally, the “receiver” observes the side effects of this out-of-order execution to determine the secret byte that was read.

Catch the exception thrown by instruction 1 above.
Loop through every page in the allocated block of memory and…
Time how long it takes to read one byte from each page.
If the byte loads quickly then the page must have been cached and gives away the secret.

Continuing the example from above, pages 0 through 20 of the allocated memory block would be slow to read, but page 21 would be considerably faster — so the secret value must be 21.

Now just repeat everything above for as many secret bytes as you want to read. The white paper says they were able to read up to about 500Kb/second.