Timing is Everything

If you’ve read today’s tech news, you’ve heard about the new Meltdown and Spectre attacks. Unlike most attacks, these target hardware flaws, rather than software flaws. While computer security researchers have apparently known about the for months, they were kept secret — “embargoed” — to give vendors a head start to fix the problems. But rumors started circulating over the holiday weekend, and now the cat is out of the bag.

Background

A nanosecond is an incredibly short amount of time. Light and electrical signals travel about one foot in a nanosecond. But for a multi-gigahertz processor, a nanosecond is enough time to execute 3 or 4 instructions, simultaneously on several different cores. Accessing memory takes on the order of 20 nanoseconds, so if the processor had to wait that long, it would be sitting idle most of the time.

Processor designers have employed two tricks to avoid this problem. The first is caching: frequently used areas of memory are kept in a special super-fast memory which is part of the processor and connected by a super-fast interface. The second is called speculative execution: if a series of instructions depends on a value from memory, the processor may execute both choices and then, when the value returns from memory, discard the option that wasn’t needed. It’s a little counterintuitive, since it means the processor is definitely doing 50% unnecessary work. But because the wait for a memory access is so enormous compared to the speed of the processor, it pays off in practice.

The problem at the core of both Spectre and Meltdown is that even when a speculatively executed set of instructions turns out not to be needed, running them still affects the cache. This seems to be part of the hardware design of many processors, going back to the invention of speculative execution. If you can make the processor speculatively execute some code that reads an area of memory it shouldn’t be able to — perhaps it’s a web page trying to read your system password — it’s possible that the code will fail, but it will still leak data into the cache.

It’s not enough to get secret data into the cache. You have to somehow get it out of the cache, too. Modern processors provide precise timing instructions that let user programs figure out what data is in the cache, and what data is in memory. This allows an attacker to construct a “side channel,” using the cache to communicate between speculatively-executed code and regular code. Without nanosecond-precision timing, these attacks would be impossible.

Meltdown

Of the two issues, Meltdown is both easier to understand, and easier to protect against.

Meltdown is specifically a way for a user program reading operating system kernel memory. Normally, user programs have no way to access this memory.

The attack code looks a little bit like this (extremely simplified):

variable x = false

/* Do some operations to make sure x is not in the cache */

if (x) {

_// This will fail, but it will take dozens of nanoseconds to   
// fail if x is not in the cache_ 

secret\_data = read\_some\_kernel\_memory()

_// Now, send the secret data to the outside world by using   
// the cache as a side channel  
// We have ~20 nanoseconds to do something here_

send\_secret\_data(secret\_data)

_// The program will fail after ~20 nanoseconds and any changes   
// to main memory will be discarded_

}

The most complicated part of the attack is exfiltrating the data from the speculative branch back to the outside world. Creating these “side channels” by precisely timing accesses the cache has been a topic of research for years, and there are several good established methods.

Fortunately, Meltdown is relatively straightforward to prevent. By changing the way user programs interact with the kernel, it is possible to completely unmap kernel memory from user programs. There is a modest performance penalty. Linux, Windows, and Mac OS have apparently acted on the secret vulnerability disclosure and fixed the problem in their latest versions.

Spectre

Spectre uses the same tools as Meltdown, but it is more general and much harder to protect against.

In Spectre, instead of simply reading an area of memory that it shouldn’t, the attack forces a “victim” program to leak secrets by passing bad user data. How can it do this? Imagine a program executes a function on some user-provided data. First, the victim program checks that the data is valid; then it does some processing with the data. But remember that processing the data can actually start many nanoseconds before checking that the data is valid has completed.

The processor speculatively executes based on the bad user data before it realizes it’s bad. By very carefully constructing the user data and setting up the cache in advance, it is possible to force the victim program to leak data, much as in Meltdown.

In their demonstration, the Google Spectre researchers attacked the Linux kernel, using a Linux kernel feature called eBPF that allows uploading a small “script” into the kernel to execute. This script is carefully checked to make sure it won’t do anything bad. But with a very careful setup, those checks can actually happen after the script has already executed through speculative execution.

Even though the initial demonstration relied on the eBPF feature of the Linux kernel, it may work for any program or operating system that accepts user data and can speculatively execute before that check is complete. It’s hard to think of code that _isn’t_vulnerable. For Spectre, there’s no easy fix.

Do the attacks impact you?

Both attacks only apply if you run somebody else’s untrusted code on your computer.

For most of us, the only way we do this is by browsing the Web and going to pages that have JavaScript. Major browsers are already limiting the ability for JavaScript to perform precise timing, which will make both attacks impossible.

If you’re a cloud provider, and you’re in the business of running other people’s untrusted code on your computers for money, then you should be extremely worried. One customer could read another customer’s secret data! There’s evidence that Amazon and Microsoft are already rebooting thousands of VMs to apply preliminary fixes.

Why did the attacks take so long to find?

In all probability these attacks have been possible since speculative execution became popular in the mid-1990s. So why did they take so long to discover?

Many people have theorized that processors can leak secret information. I even helped with a project starting in 2008. But at the time, no one outside Intel knew the exact details of how their caches and speculative execution algorithms worked.

Over the last few years, researchers have deduced enough about the internal designs of these processors to make these attacks a real threat. Google’s blog post goes into a great amount of detail about how they reverse-engineered these processor details.

Now that we know these details, these attacks and others that were theoretical before are possible in practice. In security research, as in the attacks themselves, timing is everything.

How we’ll fix it

In the short term, operating system vendors will use the unmapping fix to prevent Meltdown. Programs can specifically disable speculative execution in critical places to prevent attacks. Browsers can prevent using high-precision timers to use the cache as a side channel. We can turn off the eBPF feature in Linux. But these are not complete measures, and new variations of these attacks will work around them.

In the long term, there is only one fix: processors have to not allow speculatively executed code to affect the cache. But that requires redesigning and replacing hardware, which will take years.

References

Google Project Zero

Spectre Paper

Meltdown Paper