LLVM patch to fix half of Spectre attack

LLVM patch to fix half of Spectre attack(reviews.llvm.org)

433 points by Kristine1975 8 years ago | 242 comments

tptacek 8 years ago |

Page was down when I tried to read it, but it's archived here: http://archive.is/s831k.

Its hard to get your head around how big a deal this is. This vulnerability is so bad they killed x86 indirect jump instructions. It's so bad compilers --- all of them --- have to know about this bug, and use an incantation that hacks ret like an exploit developer would. It's so bad that to restore the original performance of a predictable indirect jump you might have to change the way you write high-level language code.

It's glorious.

jasode 8 years ago | |

>Its hard to get your head around how big a deal this is.

It truly is difficult to predict all the ripple effects from this. I can't think of a single computer bug in the last 30 years that's similar in reach to this Intel Meltdown.

[EDITED following text to replace "Intel bug" with "Spectre bug" based on ars and jcranmer clarification. The Intel Meltdown can be fixed with operating system update patches for kpti instead of a complete recompile.]

Journalists like to overuse the bombastic metaphor "shaken the very foundations" but this Spectre bug actually seems very fitting of it. Off the top of my head:

- browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

- cloud providers have to recompile and patch their code to protect themselves from hostile customer vms

- operating systems like Linux/Windows/MacOS have to recompile and patch code to protect users from malware

Imagine the economics of all these mitigations. Also imagine that each of the cloud vendors AWS/Google/Azure/Rackspace had very detailed Excel spreadsheets extrapolating cpu usage for the next few years to plan for millions of $$$ of capital expenditures. Because of the severe performance implications of the bugfix (5% to 50% slowdown?), the cpu utilization assumptions in those spreadsheets are now wrong. They will have to spend more than they thought they did to meet goals of workload throughput.

There are dozens of other scenarios that we can't immediately think of.

ars 8 years ago | | |

> to this Intel Meltdown.

Wrong bug. Intel meltdown is bad, but not anywhere near as bad as Spectre which affects everything! No AMD immunity here.

ghaff 8 years ago | | |

This document has performance impact estimates from Red Hat Performance Engineering: https://access.redhat.com/node/3307751

strongholdmedia 8 years ago | | |

> - browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

Not meaning to be that rude, yet this itself summarises (and the issue perhaps will shed more light on) how stupid an idea is to let everybody run untrusted code from other peoples, let alone third party stuff like "privacy-intrusion-as-a-service" startups et aliae.

late2part 8 years ago | | |

That won’t really be a problem for the cloud providers. That simply charge more because the customers will use more compute.

ikeyany 8 years ago | | |

Is this a 5% to 50% performance hit on all workloads or specific workloads?

voidmain 8 years ago | |

And I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem. They were just the variants that the few people in on this found time to develop exploits for. There can now be security bugs in things your program doesn't do; it seems like there is room for nearly unlimited creativity in finding them.

From the spectre paper:

"A minor variant of this could be to instead use an out-of-bounds read to a function pointer to gain control of execution in the mis-speculated path. We did not investigate this variant further."

jcranmer 8 years ago | | |

I think the main takeaway should be "speculative execution creates exploitable side-channels, and you should assume your hardware is exploitable until proven otherwise." AMD and ARM are probably still exploitable with unknown exploits, possibly even at Meltdown-levels of exploitability, but people haven't taken the time to reverse-engineer the microarchitecture enough to find the exploits.

If I were developing processors, I'd be having emergency meetings on trying to craft exploits to figure out where our processors' weaknesses are. While being happy that Intel is getting all the bad PR for this and I'm not.

geertj 8 years ago | | |

> d I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem.

Agreed. This is an entirely new class of vulnerabilities, and we're just at the beginning.

pdpi 8 years ago | | |

ARM’s white paper details a variant 3a that affects some of their cores that are unaffected by var3 (and vice versa)

rayiner 8 years ago | |

Is glorious the right word for it? We’re going back to the stone ages where processors couldn’t predict the targets of indirect jumps. More generally, this seems to me like an attempt to patch out of what is really a class of attacks leveraging fundamental assumptions about high-performance CPU design. Before, OOO just had to preserve correctness and (some of) the order of exceptions and memory operations. Now, it has to preserve (some of) the timing of in-order execution too? Where does this path end?

simias 8 years ago | | |

Legitimate question: on any non-shared non-virtualized system is there any reason to enable these workarounds besides running sandboxed applications such as javascript in a web browser (or flash/java applets/Active X, but those are not really super popular nowadays)?

For any other non-sanboxed application you pretty much have to trust the code anyway. Privilege escalation is always a bad thing of course, but for single user desktop machines getting user shell access as an attacker means that you can do pretty much anything you want.

As far as I can see the only surface of attack for my current machine would be a website running untrusted JS. For all other applications running on my machine if one of them is actually hostile them I'm already screwed.

Frankly I'm more annoyed at the ridiculous over-engineering of the Web than at CPU vendors. Because in 2017 you need to enable a turing complete language interpreter in your browser in order to display text and pictures on many (most?) websites.

Gopher should've won.

catnaroek 8 years ago | | |

> Where does this path end?

It ends with the performance advantages of OOO execution being effectively negated by the workarounds to address the security issues it causes.

The following parable is edifying: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/E...

tcoppi 8 years ago | | |

Seems like the ultimate end-game here is to have mini-vms for every process using CPU-level ring protection. If you can't speculate across privilege levels, only inside them, it isn't a security problem anymore.

tptacek 8 years ago | | |

We have different utility functions, you and I.

bouk 8 years ago | | |

tptacek exploits computers for a living, so it's glorious for him :)

wglb 8 years ago | | |

I like to think of a lot of vulnerability discovery and research as solving a puzzle. In the sense that this puzzle has so many far reaching implications makes it totally compelling to me. tqbf says "glorious", and I couldn't disagree.

[Edit] Or, how far down does the rabbit hole go?

Additionally, it is quite fascinating to me to compare the complexity of modern CPRUs with, say, a compiler.

wyager 8 years ago | | |

> leveraging fundamental assumptions about high-performance CPU design.

I believe the generalized fix is to restore the entire CPU state after a mispredict. You’d either need to add an extra copy of the entire processor state (tens of megabits) for every simultaneous predict you support ($$$) or keep track of how to revert all changes and revert them one at a time ($, slow).

js2 8 years ago | |

CPUs have been vulnerable to this attack since 1995. How did it collectively take us 22 years to figure this out? I know it's a highly esoteric complex attack, but there's no shortage of clever hackers in the world.

tzahola 8 years ago | | |

- we didn't have browsers compiling JavaScript into machine code

- we didn't have hyperconverged cloud infrastructures running arbitrary entities' code next to each other

jcranmer 8 years ago | | |

It's sort of been well-known that speculative execution opens up the possibility of side-channel attacks for quite some time. Hell, it's long-known that SMT (e.g., HyperThreading) can leak keys in a not-really-fixable way.

What's new and surprising is the power of these side-channel attacks--you can use these, reliably, to exfiltrate arbitrary memory, including across privilege modes in some cases (apparently, some ARM cores are affected by the latter vulnerability, in addition to Intel).

pixl97 8 years ago | | |

Honestly we knew about this in the 70s. Mainframe/time share systems had lots of protections against attacks like this. The problem is mainstream computing when cheap/single user and attempted to build a multi user/untrusted code execution environment on top of it. Now it's come back to bite us in the ass.

Danihan 8 years ago | | |

>there's no shortage of clever hackers in the world.

Are you sure?

jmull 8 years ago | |

Well, these are workarounds because fixing the problem at the source is hard.

The right fix is to prevent speculatively executed code from leaking information.

Here that perhaps means associating cache lines with a speculative branch somehow so that they aren't accessible until/unless the speculative branch becomes the real branch. (I have no idea exactly how that would be done or what the performance cost might be... I'd really need to know the details of how speculative execution is implemented in a particular CPU to even be able to guess.)

jncraton 8 years ago | |

Agreed. I haven't had this much fun thinking through the implications of a new exploit technique in a long time. It is truly beautiful.

eric_b 8 years ago | |

Prediction: This will be just like any vulnerability disclosure. The infosec people and media will scream hysterically about how game changingly bad it is. The OS vendors will patch, and business will go on as usual.

leeoniya 8 years ago | |

i know this came out as a leak, but makes one wonder how "responsible" even a Jan 9 official announcement would have been. the scope is absolutely terrifying. this bug will be exploitable for a very long time.

jopsen 8 years ago | | |

They had like 6 months or so... how is more time going to make things less painful?

Piskvorrr 8 years ago | | |

Jan 9, 2019? 2050? How much longer is long _enough_?

dzdt 8 years ago |

When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%.

Ouch! This is independent of other performance hurts, like from the kernel syscall overhead that was the hot topic yesterday. This is pretty crazy.

AaronFriel 8 years ago |

This is brutal for all interpreted/JITed languages and all statically compiled languages with dynamic dispatch. I can hardly imagine worse news for performance oriented engineers. And what's worse is that dynamic libraries will probably need to be rebuilt with these mitigations in mind, so nearly everyone will pay the cost even if they don't need it.

I feel bad for all of the engineers currently working on performance sensitive applications in these languages. There's a whole lot of Java, .NET, and JavaScript that's about to get slower[1]. Enterprise-y, abstract class heavy (i.e.: vtable using) C++ will get slower. Rust trait objects get slower. Haskell type classes that don't optimize out get slower.

What a mess.

[1] These mitigations will need to be implemented for interpreters, and JITs will want to switch to emitting "retpoline" code for dynamic dispatch. There's no world in which I don't expect the JVM, V8, and others to switch to these by default soon.

rntz 8 years ago |

This mitigates spectre variant #2, branch target injection. We also have a mitigation for meltdown, namely KPTI. Is there a known mitigation for spectre variant #1, bounds check bypass?

Maybe I'm being naive, but would a simple modulo instruction work? Consider the example code from https://googleprojectzero.blogspot.com/2018/01/reading-privi...:

    unsigned long untrusted_offset_from_caller = ...;
    if (untrusted_offset_from_caller < arr1->length) {
     unsigned char value = arr1->data[untrusted_offset_from_caller];
     ...
    }

If instead we did:

    unsigned char value = arr1->data[untrusted_offset_from_caller % arr1->length];

Would this produce a data dependency that prevents speculative execution from reading an out-of-bounds memory address? (Ignore for the moment that a sufficiently smart compiler might "optimize" out the modulo here.)

jzl 8 years ago |

A new thing that's going to become a standard part of systems engineering: deciding whether any given system needs to run with or without these kinds of protections. Do you want the speed of speculative execution or do you want Meltdown/Spectre protection? In some cases lack of protection is fine. But figuring out the answer for any given system is often going to take expert-level security knowledge. Security is all about multiple layers of protection, and even a non-public facing machine might benefit from these layers depending on the context.

s4vi0r 8 years ago | |

Spectre relies on tricking the CPU into branch predicting its way into accessing protected memory, no? Is it not possible that we can keep most of the performance benefits of speculative execution by somehow having a built in "Hey, never ever speculate that I'll want to access this region of memory" sort of thing?

lorenzq 8 years ago | | |

I read an ars technica article that this would be a possible solution but isn’t right now because the hardware to check access rights isn’t fast enough yet

senatorobama 8 years ago | | |

Uh, isn't this what AMD does?

crb002 8 years ago | |

CPUs should have a single instruction that wipes branch prediction caches. I would have it off by default, and add to the C/C++ spec this as a standard library macro or pragma. Easy peasy.

You only need to wipe between syscalls that have side effects. Number crunching AVX heavy subroutines should never have to deal with safety once entered.

ece 8 years ago | | |

This is what KPTI does, wipe caches, and if you did this often in user code, performance degradation would be all over the place. Also, heavy AVX routines that use encryption keys... would be great to attack.

ece 8 years ago | |

More likely, this is a shift back to in-order processors, if the solutions aren't workable. If you're in an embedded scenario, sure you can make more trade-offs and have more control, but it's not going to look great when it happens to get hacked.

leni536 8 years ago |

It has an interesting performance impact on calls to dynamic libraries. One alternative approach would be to avoid the indirect calls through not using '-fPIC --shared' when building shared libraries but '-mcmodel=large --shared'. This causes the relocations to happen at the direct calls and not through a GOT.

The obvious drawback that it effectively disables sharing code in memory, it would still allow sharing code on disk though. So it would be a middle ground between the current state in dynamic and static linking.

https://www.technovelty.org/c/position-independent-code-and-...

ealexhudson 8 years ago |

This patch apparently implements this mitigation: https://support.google.com/faqs/answer/7625886

JdeBP 8 years ago | |

And once one knows the technical background, one is better positioned to consider the response of Linus Torvalds to the idea that the entire Linux kernel be recompiled for all x86 CPUs with a compiler that implements this.

* https://lkml.org/lkml/2018/1/3/797 (https://news.ycombinator.com/item?id=16066968)

tptacek 8 years ago | | |

This would be more interesting if the attack that the compiler mitigations was designed for wasn't cross-vendor, cross-architecture.

coolspot 8 years ago | | |

0 usages of word "fuck"

1 usage of word "shit"

Not bad for Linus.

kough 8 years ago | |

This is a really good writeup, thanks. I'm curious -- how often are google support faq articles deeply technical like this?

badrequest 8 years ago |

I, for one, am eternally grateful for the incredibly bright people who take the time to patch this sort of stuff.

ben_jones 8 years ago | |

And the people who invented computers, programming languages, the internet, and all the learning resources, that allow me to get a paycheck writing extremely high level application code that feels like a coloring book in comparison. Truly the shoulders of giants.

jacksmith21006 8 years ago | |

Also to Google for finding and documenting it so well. Google security team really should be given an award.

vfaronov 8 years ago |

I have a hunch that the era of side-channel attacks is only now dawning, and that we should expect many more painful exploits and cumbersome mitigations in the coming years.

What do people more knowledgeable in the field think about this?

xigency 8 years ago | |

What about users who only execute trusted code?

All of these attacks assume you are running something you don't trust on your CPU, whether it is another user's program, a non-root executable, or a JavaScript program from a website.

When do we stop hacking processors, kernels, and compilers and revisit our assumptions of what we can and can't do securely.

phkahler 8 years ago |

RISC-V impact? With all the reports of these attacks, I have not seen mention of risc-v. Since they are in the process of finalizing a lot of specs including memory model and privileged instructions, I wonder if there will be last minute changes to mitigate these vulnerabilities.

coldcode 8 years ago |

I remember doing tricks like this in 6502 assembly and in other early processors. Amazing that to stop these attacks you have to come up with clever tricks again. Back in the 80's I would have never imagined this type of attack being something to worry about.

FLUX-YOU 8 years ago | |

>early processors

Early processors had speculative execution? I thought this had been added to Intel/AMD/ARM about 20 years ago?

dzdt 8 years ago | | |

I guess he means the retpoline. On the 6502 there is no indirect jump instruction, so you need such tricks just to achieve an indirect jump at all.

DiThi 8 years ago | | |

I think it means they're tricks for better performance when you _don't_ have speculative execution.

gpderetta 8 years ago | | |

Speculative execution is as old as branch prediction, which is very, very old.

peapicker 8 years ago |

This brings to mind Ken Thompson's "Reflections on Trusting Trust"[1] -- after all, all I have to do to write code with the exploit is be able to remove the patch and rebuild the compiler and build some executables.

Trusting in a compiler you hope was used to build all the executables on your system isn't trustworthy enough to be the final solution.

[1] https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

pwg 8 years ago | |

Every modern compiler usually has extensions that allow for bits of assembly to be inserted alongside the usual C or C++ code.

Unless the compiler is also patched to either disallow inserted assembly, or to modify the inserted assembly (this being both hard and dangerous), someone who wants to exploit the bug will just add their own inserted assembly code that exploits the bug, and a patched compiler won't help one bit in that case.

cws125 8 years ago |

Just as a FYI, according to:

* https://lkml.org/lkml/2018/1/4/432 * http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a...

It appears that Skylake and later can actually predict retpolines? Some hardware features called IBRS, IBPB, STIBP (not a lot of details on this are out there) are supposedly coming in a microcode update.

jgowdy 8 years ago |

The problem I see with this concept is ROP mitigations like Intel’s control flow enforcement don’t seem compatible with intentionally using tweaked addresses with ret. The address they inject won’t match the shadow stack and the program will be terminated.

DannyBee 8 years ago | |

This is true, and so far, nobody has a better idea. (IE i would expect that unless someone comes up with one, that hardware CFE in its current form dies and won't happen for Intel until the processors are changed in a way that mitigation is not needed)

teilo 8 years ago |

Isn't it the case that the Itanium architecture would not be vulnerable to Spectre because it moves the onus of branch prediction from the CPU to the compiler?

als0 8 years ago | |

Assuming the compiler knows what it's doing :)

teilo 8 years ago | | |

That was always the problem with the Itanium compilers. They were crap because they couldn't benefit from the years of tuning traditional architectures enjoyed.

nathell 8 years ago |

I can't help thinking of how the early-ITS approach to security (not only was there none, but looking at other users' work was a deliberate feature) was embraced by its users. I'm way too young to remember, but it rings a bell somewhere down my heart.

There's a lot of prominence being given to all kinds of damage malicious users might inflict, and ways to prevent or mitigate, but little to the malice itself. Whence does it arise? What emotions drive those users? What unmet needs?

Meanwhile, when these slowing-down patches for Sceptre and Meltdown arrive, I intend to not run them, to the possible extent. I intend to keep aside a VM with patches for critical stuff, like banking or others' data entrusted to me. But I don't want my machine to be slowed down just because someone, sometime, might invest effort in targeting these attacks at it. Given how transparent I want to be with my life, that's a risk I'm willing to take.

fwip 8 years ago | |

Most attacks aren't targeted at specific people. Hackers don't want to read your emails, they want your credit-card information, digital account passwords, or to compromise your computer to use in their botnet.

Sure, you might not have anything you want to hide in your life, but the drive-by javascript doesn't care about your secrets - it'll hack you anyway. Best-case scenario, you lose access to a bunch of accounts you used to use and need to create new identities from scratch. Worst-case, they clean you out financially, steal your identity, etc.

fooker 8 years ago |

retpoline seems to be a novel concept. Can anyone ELI5?

Also, any insight about performance impact here?

contrarian_ 8 years ago |

Note for a true fix to the BTB poisoning attack you would additionally have to disable SMT/HT.

See here: https://news.ycombinator.com/item?id=16070304

Pelam 8 years ago |

Maybe some future architecture will allow software to tell CPU which regions it considers to be secret from the point of view of each other region.

Something like that could allow the CPU to speculate agressively while preventing information leak exploits.

pwg 8 years ago | |

The CPU hardware already has that feature. It is the VM paging system and the permissions assigned thereto.

The bug here is that the CPU is not aborting the speculation when fetches occur to addresses marked as "access denied". Instead the fetch happens and a line of normally inaccessible memory is put into cache by code that should not be able to get it read into the cache normally.

One hardware fix would be to plug that hole. Speculative reads get blocked when they encounter permission denied errors from the paging system and do not change the cache state. That blocks the Meltdown attack, but not the Spectre attack.

Pelam 8 years ago | | |

I thought about that too... AFAIK currently paging system is not generally accessible to userland programs like browsers. They would need some way to setup different contexts for untrusted javascript code and the internal services that the javascript can call.

Also maybe the context switching would need to be made faster, because you would need to do that whenever eg javascript calls browser interfaces.

jacobolus 8 years ago | |

https://millcomputing.com/docs/ e.g. the most recent talk https://millcomputing.com/docs/threading/

Pelam 8 years ago | | |

Something like the portal calls and "turfs" described in there could help.

userbinator 8 years ago |

This is horrible, really really horrible. And I'm not talking about the bug itself, but the mitigation --- which is basically "stop using indirect jump and call instructions and recompile all your software". The latter is beyond unrealistic.

It also sets a very bad precedent: I understand people want to mitigate/fix as much as possible, but this is basically giving an implicit message to the hardware designers: "it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software."

hn_throwaway_99 8 years ago | |

> it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software.

What are any other options? It's hardware, that cannot be patched. Of course they will change chip designs going forward, but what else do you suggest folks do with the billions of chips that exhibit this problem?

ychen306 8 years ago | |

Go ahead, smash your computer, wait a few months, and buy a new one.

sempron64 8 years ago |

It's noted in the patch that one would have to recompile linked libraries, which seems impractical, unless a distro decides to build everything with this flag.

imtringued 8 years ago | |

And since this patch is opt in it isn't enough to secure cloud providers.

jacquesm 8 years ago | |

Not just linked binaries, also the whole underlying OS, and, critically, the compiler itself. Otherwise you could replace the 'proofed' construct with one that is not proofed against the bug.

JDevlieghere 8 years ago | | |

Why would you need to recompile the compiler? Both variants only provide read access.

strongholdmedia 8 years ago |

As Alex Ionescu has put it:

> We built multi-tenant cloud computing on top of processors and chipsets that were designed and hyper-optimized for

> single-tenant use. We crossed our fingers that it would be OK and it would all turn out great and we would all profit.

> In 2018, reality has come back to bite us.

This is the root of all the problems.

crb002 8 years ago |

This was the fix I was going to suggest. Especially with AVX leakage.

Right now many function calls don't safely wipe registers and the new side channel caches found in Spectre. There really needs to be two kinds of function calls. Maybe a C PRAGMA?

The complier has parent function call wiping as a flag; the code has pragmas that over-ride the flag.

okneil 8 years ago |

The site is down for me. HN hug of death?

arboroia 8 years ago | |

Google text cache: https://reviews.llvm.org/D41723

Wayback Machine: https://web.archive.org/web/20180104131631/https://reviews.l...

XnoiVeX 8 years ago | |

Yes. Give it about 5 minutes. It will load without images.

hultner 8 years ago | |

It was a bit slow but eventually loaded for me.

mayoralito 8 years ago | | |

Yeah, same thing happened to me... slow as hell but I guess it's common due the severity of the issue. All people wants to see this at the same time.

lousken 8 years ago |

what about performance impact after new CPU architecture arrives? how is that going to work?

eptcyka 8 years ago |

Mill can't come soon enough.

mike_hearn 8 years ago | |

What makes you think the Mill would be immune to these issues?

eptcyka 8 years ago | | |

Mill has no speculative execution.

marcosdumay 8 years ago | | |

A simple model of access permissions that fit before L1 cache and can return a fault before loading anything.

silimike 8 years ago |

If this were 15 years ago, I'd say the site was SlashDotted.

andrewmcwatters 8 years ago |

In other news, Intel has found that by not using a computer at all, though performance overheads increase 100%, this counter-measure does secure any previously available attack vectors.