How Meltdown and Spectre are Breaching Device Security
At CriticalBlue, we’ve been discussing the newly disclosed Meltdown and Spectre flaws and their impact on device security. Our CTO, Richard Taylor, wrote a blog post giving a nice overview of the exploits. It’s a clear explanation and goes a little deeper than most, so I have included his thoughts below to help get the word out.
There is much to discuss in the wake of the security news flow last week. It was dominated by the Meltdown and Spectre CPU bug announcements — 2018 has certainly got off to an interesting start. In part one of this two part blog I will look at these bugs from a high level. In part two I shine the spotlight on the implications for mobile security, and for Android in particular.
Although patches for these bugs have been in the works for some time, the announcements have been rushed out and it has showed at times. However The Register and Ars Technica amongst others have provided some excellent coverage which I won’t repeat again here.
Cache side channel attacks are not new but until now many of those attacks have been somewhat contrived. For the attack to work, there needs to be a strong signal coming from the victim application you are trying to eavesdrop upon and locating this has been the biggest challenge for the attacker. In this context a strong signal means a cache line access pattern that itself reveals some secret data. In other words the data has to influence the memory access pattern in a particularly convenient way. Some key piece of data you are trying to steal needs to be used to calculate addresses that conveniently span lots of cache lines to give you some good cache side channel information to spy on later. Furthermore you need that cache layout to remain intact long enough that your eavesdropping code can observe it. Often, the cache layout information is quickly degraded by other operations being performed by the code under attack. These requirements don’t occur very frequently in reality, so although there are some nice demos of side channel communication the opportunities of finding a good signal generator in the code you want to attack is actually rather slim.
Meltdown and Spectre change the game fundamentally. You get to make your own signal generating code that you can then eavesdrop upon yourself. Very convenient. This is much more generic and much more powerful. The key insight here is the use of speculative execution to do this for you. To maintain decent performance, CPUs execute speculatively quite a lot of the time. In particular they predict which way branches are likely to take and as it turns out they effectively predict that, sure, you are definitely allowed to access that memory location you just read from.
First let us deal with Meltdown, which effectively only impacts Intel CPUs. Formally this is Rogue Data Cache Load (CVE-2017–5754), a rather clever abuse of the out of order speculative nature of CPUs. The first thing you do is you load from a memory location that are not meant to have permission to read. When you do this your code is going to get an exception and it is going to stop, and certainly not tell you directly what was held in that memory location. But, as it turns out, it doesn’t do that right away because it deals with the exception as a separate sub-operation to the load itself. So if you cleverly construct your code you can actually take the data load and transform it into an address and then load from that location. Now when the exception finally catches up with you all the register state you got from the original load gets rolled back. Washed away as if it never happened. However, the fact that you read from that data dependent address is still left in the cache state. It isn’t washed away. This vestige remains and with careful measurement of the cache state from another thread in your code you can infer what the data was. This is very bad news. Now you can effectively read any memory you want to, albeit a little statistically and very indirectly. The really terrible news is that by default Linux and Windows make all of the machine’s memory accessible in the process address space. So all the secret data, passwords and other useful nuggets are potentially accessible to your attack code. If you can get your code running on the target machine then you can read anything you want in theory. A pretty devastating oversight in the security isolation of CPUs. It is truly remarkable that this hasn’t been obvious before now. Speculative execution has been around for many years. It is one of those insights which results in a collective sigh of “of course, why did nobody think of that before” once you realise its implication. Patches to fix Meltdown have been in the works for a while and are ready to roll out soon. Basically the fix is to make sure the only pages that are MMU mapped for a process are ones that it should be able to read anyway or are otherwise not very interesting. This is termed Kernel Page Table Isolation (KPTI ) for Linux, and there is a similar fix for Windows arriving soon. This has a performance impact though. There were good reasons to make the rest of the kernel pages quickly accessible. Even though systems will see a drop in performance, we will be safe from Meltdown. So although much of the initial hysteria has been around the impact of Meltdown, especially in multi-tenant cloud environments such as AWS, it seems that the patch cycle should catch up soon and Meltdown will then likely drop out of the headlines. Although the main Meltdown bug does not impact ARM CPUs, they define a special variant 3a whereby certain system register contents can be leaked to ordinary processes but overall this seems less serious than the Intel variety. A PoC on ARM hardware is already running.
I suspect that Spectre will be the bug that keeps on giving. It will certainly haunt us for years to come as been pointed out by the researchers who found this vulnerability. Its implications are deeper and are more difficult to mitigate. Unlike Meltdown, Spectre equally impacts ARM and AMD devices. This means that phones and tablets are in the frame too for Spectre related data leaks.
Now we come to the second variant of Spectre, Branch Target Injection (CVE-2017–5715). This one seems the most bizarre of them all. Again this is related to branch prediction, but this time to indirect branches. Indirect branches happen reasonably frequently, especially in object oriented languages that allow polymorphic method overrides. So it is quite important to speed them up, otherwise they would be rather slow. This is because the CPU can’t even start filling its pipeline until it knows what instruction to go to next, and modern CPUs have rather long pipelines to be filled. As it turns out, where a particular indirect branch went last time is a rather excellent predictor of where it will go next time. So again, in the absence of anything better to do, a typical CPU will do exactly that and execute speculatively to the last target. To support this there is a table in the CPU called the Branch Target Buffer (BTB) that records a mapping from an instruction address to where that instruction will likely go to if it is an indirect branch. So this gets updated when the determination is made to predict for next time. Of course this table can’t be infinitely big and there are some collisions between the entries but most frequently executed instructions will be retained. It acts just like a cache of recent indirect branches and in some architectures is called a Branch Target Address Cache (BTAC). The BTB content is retained if your code stops executing and then passes to the kernel and then perhaps on to some other process running on the same core. It makes sense to keep the entries because some of the valid ones might still be there by the time it is your code’s turn to execute again. There is no point in throwing away potentially useful information that speeds up execution and what is the harm? If the target address was wrong then the CPU didn’t have anything better to do anyway, and the incorrect execution can just be rolled back as if it never happened and then the correct destination can be branched to. Well I’m sure you can see where this is going. As we know already, speculative execution is not without side effects as it turns out and these can be exploited.
The really clever trick with Spectre Branch Target Injection is that an attacker can get to choose what code is speculatively executed. If the attacker knows the address of a particular indirect branch then they can train the BTB to point to where they want it to. That needs to be code that is already in the process being attacked, but let’s presume that the attacker knows what and where that code is. The code doesn’t need to execute cleanly, it just needs to execute long enough that it is going to leak something useful into the cache state to examine later. This code is speculative, and it will be the wrong prediction anyway, so will be rolled back. So basically if the attacker wants to know the value of a particular register at an indirect branch call site then they need to find some code that uses that register in an address calculation and memory load that will leave a distinctive enough trace in the cache state. There is an extremely strong parallel here with ROP gadgets. These are short sequences of existing code that an attacker can cobble together to do something useful if they can cause a buffer overflow in a process that allows them to control a target address, but can’t inject new code. Buffer overflows and ROP gadget chains are the scourge of application security. What we have in the Spectre case are speculation gadgets. We can choose just the right existing code sequence in the existing code that will give us the leakage signature that we want. Unlike ROP gadgets though we don’t need an existing vulnerability like a buffer overflow to exploit them. We just need to have run earlier and have set our trap in the BTB to force another process’s code to incorrectly speculate and leave us some useful state in the cache that we can examine later. Of course in practice this is hard to setup and seems to be much easier to leak small blobs of information (such as individual bytes or words) rather than data blocks since our poisonous BTB training is soon undone and we have limited capacity for transmitting data through the cache state. It seems to me that the most obvious use for such an attack is for software key logging. Potentially every register that is live at an indirect branch site is susceptible to leakage. We just need to find one that, for instance, holds the value of the last key press on a system. This may of course be part of a password or confidential message. Not good.
Note that in order to exploit this Spectre variant it is necessary to know the exact code layout of the process you are trying to attack. You need to know this so you can set your BTB trap correctly for the indirect branch that is being subverted. Since Address Space Layout Randomization (ASLR) is widely used nowadays, as a defence against ROP Gadget attacks, this is not necessarily straightforward. There are various techniques around to try and infer the code addresses but this does add another layer of complication in the already byzantine causal chain required to make this attack actually work.
Fixes are also in the works for this flaw. In particular the approach relies on BTB training existing beyond the boundary of a context switch. Intel have a microcode update coming that will disable the retention of the entries across the boundary, effectively rendering this bug unexploitable. ARM have similar patches available that can disable or clear the BTB across context switch boundaries. At this stage it is unclear what the performance impact of this is going to be. There is definitely going to be some. The BTB content is traditionally retained for good performance reasons. Compiler patches are also coming in the form of the new “retpoline ” feature. This allows an application to be recompiled so that it is no longer susceptible, or indeed the kernel can be recompiled with this feature on. This is particularly for cases where an application might be run in an unpatched environment. Retpolines seem rather ugly but fix the issue. They basically cause the compiler to replace indirect branches and calls with a more complicated and slower sequence that abuses the return instruction to do the same thing. This is much slower but not susceptible to Spectre as returns don’t use the BTB for prediction. It is unclear when and if similar Retpoline fixes will come for other compiled and JITted languages such as Java, C#, Python or Go.
In part two I discuss what this all means for mobile security. These bugs are definitely not restricted to Intel devices only.
Thanks for reading! For more information on mobile API security, check out www.approov.io.
I’d really appreciate it if you recommend this post (by clicking the 👏 button) so other people can find it.