Reading privileged memory with a side-channel(googleprojectzero.blogspot.com) |
Reading privileged memory with a side-channel(googleprojectzero.blogspot.com) |
"Compute Engine customers must update their virtual machine operating systems and applications so that their virtual machines are protected from intra-guest attacks and inter-guest attacks that exploit application-level vulnerabilities"
https://newsroom.intel.com/news/intel-responds-to-security-r...
Intel: "Recent reports that these exploits are caused by a “bug” or a “flaw” [..] are incorrect."
So much for "authoritative guidance", fuck these guys.
> Arm recognises that the speculation functionality of many modern high-performance processors, despite working as intended, can be used in conjunction with the timing of cache operations to leak some information as described in this blog.
I personally don't agree, but I guess they're trying to avoid needing to issue a recall for over ten years worth of CPUs?
Unlike Intel, ARM and AMD are implicated only where the attacker can inject code or data (specifically data that is manipulated by pre-existing vulnerable code) into the target address space. The particular kernel exploits require injection of a JIT-compiled eBPF program, as they said they were unable to locate any suitable gadgets in existing compiled kernel code. I wouldn't rule out gadgets being found in the future, but much like cryptographic software timing attacks, the proper fix is to refactor sensitive software logic to be data independent. There's no way to implement an out-of-order, superscalar architecture and protect against this stuff simply because of the nature of memory hierarchies. All you can do is 1) ensure that privilege boundaries are obeyed (like AMD and ARM do, but Intel notable doesn't), and 2) provide guaranteed, constant-time instructions that programmers and compilers can reliably and conveniently leverage. Unfortunately, all the hardware vendors have sucked at providing #2 (much timing resilient cryptographic software relies on implicit, historical timing behavior, not architecturally guaranteed behavior), but it nonetheless still requires cooperation by software programers, making it a shared burden.
Also, FWIW, basically everybody outside the Linux echo chamber has known that eBPF JIT and especially unprivileged eBPF JIT was a disaster waiting to happen. This is only the latest exploit it's been at the center of, and the 2nd in as many months. The amount of attention and effort that has gone into securing eBPF is remarkable, but at the end of the day even if you could muster all the best programmers for as much time as you wanted it's still an exceptionally risky endeavor. Everything we know about the evolution of exploits screams that unprivileged eBPF JIT is an unrelenting nightmare. But it's convenient, flexible, and performant, and at the end of the day that's all people really care about, including most Linux kernel engineers. The nature of the Linux ecosystem is that even if Linus vetoed unprivileged eBPF JIT (optional or not), vendors would have likely shipped it anyhow. It's an indictment of the software industry. Blaming hardware vendors (except for the Intel issue) is just an excuse that perpetuates the abysmal state of software security.
They are adding a new instruction to control speculation...
I implemented it myself simply based on the clues in the press release from AMD explaining why they weren't vulnerable. I don't even have a computer security background.
Note that even speculating across multiple loads could lead to observable side-effects by measuring memory bandwidth to differentiate between loads of accessible and silent page fault addresses. [1]
An interesting question is whether the CPU would also speculate on loads from mapped PCI device regions, as that could be also detectable in many different ways.
[1] https://eprint.iacr.org/2016/613.pdf
> Both hardware thread systems (SMT and TMT) expose contention within the execution core. In SMT, the threads effectively compete in real time for access to functional units, the L1 cache, and speculation resources (such as the BTB). This is similar to the real-time sharing that occurs between separate cores, but includes all levels of the architecture. [...] SMT has been exploited in known attacks (Sections 4.2.1 and 4.3.1)
Could have been said more concisely. Sadly, this seems to be the norm with academic texts.
Did they say that?
I don't see anything saying they were unable to, just that they didn't bother to because it would take effort.
>But piecing gadgets together and figuring out which ones work in a speculation context seems annoying. So instead, we decided to use the eBPF interpreter, which is built into the host kernel - while there is no legitimate way to invoke it from inside a VM, the presence of the code in the host kernel's text section is sufficient to make it usable for the attack, just like with ordinary ROP gadgets.
To be able to actually use this behavior for an attack, an attacker needs to be able to cause the execution of such a vulnerable code pattern in the targeted context with an out-of-bounds index. For this, the vulnerable code pattern must either be present in existing code, or there must be an interpreter or JIT engine that can be used to generate the vulnerable code pattern. So far, we have not actually identified any existing, exploitable instances of the vulnerable code pattern; the PoC for leaking kernel memory using variant 1 uses the eBPF interpreter or the eBPF JIT engine, which are built into the kernel and accessible to normal users.
For Variant 1, the "vulnerable code pattern" they're looking for has to be of a very specific type, it's not a run-of-the-mill gadget. It has to load from an array with a user-controlled offset, then mask out a small number of bits from the result and use that as an offset to load from another array, where we can then time our accesses to that second array.
However, they also go on to say:
A minor variant of this could be to instead use an out-of-bounds read to a function pointer to gain control of execution in the mis-speculated path. We did not investigate this variant further.
Which seems much less reassuring.
This way you can detect things based on speculative execution. I don't know how they go from that to reading memory though.
That was the second bit of the example source code:
unsigned long index2 = ((value&1)*0x100)+0x200;
This creates one of two different addresses, depending upon the value of bit zero of the memory location being attacked. The two different addresses are farther apart than the size of a cache line.
> unsigned char value2 = arr2->data[index2];
This actually does the read from one of the two different addresses (which results in the value located at one of them becoming resident in cache). Note that the value returned here is a "don't care" item.
Then, after everything unwinds from the speculation, the follow on code on the real path would read from both of the two possible addresses that were put into "index2". The read that returns data faster must have been in cache. Knowing which one was in cache, you now know the value of bit zero of the target address location.
Repeat the same block of code for bits 1-7 and you'll have read a whole byte. Continue and you can read as much as you like. You just gather data very slowly (the article mentioned about 2000 bytes per second).
You can then use cache timing to see which address was read in the second load, which means you can see part of the value that was read in the first load. Rinse, repeat.
The variants mostly amount to differences in how you arrange the first part (speculated execution loading from an address you get to provide).
https://twitter.com/pwnallthethings/status/94869396135866777...
[1] https://meltdownattack.com/meltdown.pdf (page 13, figure 6)
Can they disable speculative exec completely for sensitive boxes or is this too baked in?
Meanwhile, don't ever run untrusted code in the same process as any kind of secret. Better yet, don't ever run untrusted code.
Obviously we want 100% of the data in the kernel not to be writeable, but if only a small amount shouldn’t be accessible at all then maybe the long term solution is to handle that data in a special way. Something that makes using it slower but doesn’t make every other syscall suffer as much as a consequence.
Or maybe the solution is to prioritize moving more and more code into userspace.
I don't think that's an option either.
1. How to best protect my local personal data from being subject to this?
2. Whether I should seriously consider pulling all my cryptocurrency off of any exchanges?
1:
- install security updates for your OS - if it's not ready yet: disable JavaScript in your browser by default and enable it only for resources you trust. otherwise just skip the page. execute third party code with extra caution. any suspicious code should go away (even not inside vm)
2: as long as it's stored in a wallet on your own hardware which you fully control, it should be safe enough
Since this affects legacy systems that may not be able to be upgraded it seems like this issue will be around for a very long time.
It also only affects "legacy systems" which routinely run nontrusted code. If it's something like e.g. a server in a bank, chances are everything running on it has already been accounted for. This isn't like e.g. Heartbleed where you could just connect to any open server and read its memory --- you have to somehow get your code to run on it first.
> Cloud providers which use Intel CPUs and Xen PV as virtualization without having patches applied. Furthermore, cloud providers without real hardware virtualization, relying on containers that share one kernel, such as Docker, LXC, or OpenVZ are affected.
I take it to imply that hypervisors that use hardware virtualization are not affected. However, the PoC that reads host memory from a KVM guest seems to contradict this.
Is it because on Xen HVM, KVM, and similar hypervisors, only kernel pages are mapped in the address space of the VM thread (so a malicious VM cannot read memory of other VMs), but on these other hypervisors, pages from other containers are mapped? Yet the Xen security advisory [2] says:
> Xen guests may be able to infer the contents of arbitrary host memory, including memory assigned to other guests.
Relatedly, what sensitive information other than passwords could appear in the kernel memory? I'd expect that at the very least buffers containing sensitive data pertaining to other VMs may be leaked.
[1] https://meltdownattack.com/ [2] https://xenbits.xen.org/xsa/advisory-254.html
> On affected systems, Meltdown enables an adversary to read memory of other processes or virtual machines in the cloud without any permissions or privileges, affecting millions of customers and virtually every user of a personal computer.
The first section states that even with the branch prediction you still need to be in the same memory context to be able to read other process's memory through this. But eBPF lets you run JIT'd code in the kernel context.
I guess this JITing is also the issue with the web browsers, where you end up getting access to the entire browser process memory.
But ultimately the dangerous code is still code that got a "privilege upgrade"? the packet filter code for eBPF, and the JIT'd JS in the browser exploit?
So if our software _never_ brought user's code into the kernel space, then we would be a bit safer here? For example if eBPF worked in... kernel space, but a different kernel space from the main stuff? And Site Isolation in Chrome?
It's also possible to use existing code in the target context as the speculative execution path if it has the right form (and this is what P0's Variant 2 POC does, in that case by poisoning the branch predictor in order to make it speculatively execute a gadget that has the right form).
But I just remembered that years ago the FreeBSD developers discovered a vulnerability in Intel's Hyperthreading that could allow a malicious process to read other processes' memory.[1]
To the degree that I understand what is going on here, that sounds very similar to the way the current vulnerabilities work.
For a while, back then, I was naive enough to think this would be the end of SMT on Intel CPUs, but I was very wrong about that.
So I am wondering - is this just a funny coincidence, or could people have seen this coming back then?
[1] http://www.daemonology.net/hyperthreading-considered-harmful...
https://aws.amazon.com/security/security-bulletins/AWS-2018-...
If these exploits seem rely on taking precise timing measurements (on the order of nanoseconds), could we eliminate or restrict this functionality in user space?
The Spectre exploit uses the RDTSC instruction, and this can apparently be restricted to privilege level 0 by setting the TSD flag in CR4.
I know it would kind of suck, but it might be better than nothing.
I would think that most typical user applications wouldn't require that accurate of a time measurement. If they do, then maybe they can be white listed?
In fact, the practical JavaScript attacks use this method (using SharedArrayBuffer) and the browsers are disabling this (new, little used) feature as a mitigation. But I'm afraid hell will freeze over before mainstream operating systems deny userspace access to clocks, threads, and memory mapped files, which is a lower bound on what it would take to make the attack much harder.
> Since this new class of attacks involves measuring precise time intervals, as a partial, short-term, mitigation we are disabling or reducing the precision of several time sources in Firefox.
[1]: https://blog.mozilla.org/security/2018/01/03/mitigations-lan...
Is it that the current privilege level could be different between what it is now, and what it will be when the speculative instruction retires? If so then that seems a thin justification. CPL should not change often so it doesn't seem worth it to allow speculative execution for instructions where a higher CPL is required.
https://access.redhat.com/security/vulnerabilities/speculati...
It would seem to me that all the really smart people who designed super-scalar processors and all the nifty tricks that CPUs do today - would have thought that these attacks would be in the realm of possibility. If that's the case - who's to say these attacks haven't been used in the wild by sophisticated players for years now?
Seems like the perfect attack. Undetectable. No log traces.
Edit: has been settled, it's https://en.wikipedia.org/wiki/Meltdown_(security_bug) .
I'm asking because eBPF seems to execute within the kernel, and KPTI seemed to be about unmapping kernel page table when userspace processes execute.
Are there any mitigations to the eBPF attack vector?
I use eBPF all the time, but I never use it as non-root, so I haven't needed unprivileged bpf anyway.
update: that eBPF vector was already fixed, and another safety measure is already being considered https://lkml.org/lkml/2018/1/3/895
VLIW is not a panacaea, engineering being all about tradeoffs after all. But it was intended to not have the complex instruction dispatching logic, with things like speculative execution and branch prediction, in the processor. Instead, using a process called if-conversion the compiler combines the two possible results of a conditional branch into a single instruction stream where predicates control which instruction syllables are executed.
* http://web.eecs.umich.edu/~mahlke/papers/1996/schlansker_hpl...
* https://www.isi.edu/~youngcho/cse560m/vliw.pdf
* https://www.cse.umich.edu/awards/pdfs/p45-mahlke.pdf
* http://web.eecs.umich.edu/~mahlke/papers/1996/schlansker_hpl...
Observe, in considering this alternative history, that the Itanium had 64 predicate registers. People have, in the past few days in various discussions of this subject, criticized Intel for holding on to a processor design for decades and prioritizing backwards compatibility over cleaner architecture. They have forgotten that Intel actually produced a cleaner architecture, back in the 1990s.
Consider the following simple C code: "if (arr[idx]) { ... }". Without speculation, the core must stall until the condition has been read from memory, which can be hundreds of cycles if it's not in the cache. With speculation, these wasted cycles are instead used to do some of the work from most probable side of the branch, so when the condition finally arrives from memory, there's less work left to do.
The pipeline depth only affects what happens when the speculation predicted the wrong way: since the correct way is not on the pipeline, it has to fill the pipeline from scratch.
Also modern OoO CISCs and RISCs have very similar pipeline depths for the same performance/power budget.
Side-effects could obviously been mitigated better, but hindsight 20/20.
I was thinking of something similar but with a branching operation, but that would get screwed by branch prediction.
Trading between cryptoassets as GP suggests he has done already makes them subject to taxation. The fiat step isn’t needed.
Also, real money transactions are much more likely than blockchain transactions to be reversible if fraudulent.
https://millcomputing.com/docs/prediction/
It has to. The problem is the speed of light here, not a simple slipup by a CPU designer.
Of course, that means now monolithic kernels run just as slow as microkernels.
* http://blog.darknedgy.net/technology/2016/01/01/0/ (https://news.ycombinator.com/item?id=10824382)
I've always been a proponent of microkernels, and this is another situation that might help with this.
(Personally, I've been affected by the failures of monolithic kernels way too often. When a simple OpenGL or WebGL program manages to hang your GPU driver, parts of the kernel, and all DMA operations in the kernel, and your system becomes unusable, then reasonable isolation would be preferable)
But a microkernel is going to have multiple processes talking to each other, so there will still be more overhead whenever a message requires coordination between more than two processes.
Consider:
• Network packet buffers? Yes.
• Graphics driver command buffers? Yes.
• Disk caches? Yes.
• Kernel pointers of any kind? Yes if you care about KASLR.
It's actually kind of hard to think of data in the kernel that shouldn't be read protected.
oh, and x86 has variable length instructions - the same byte stream can decode as different instructions depending on where you start - so i doubt it's possible at all on x86 without a massive performance hit (you'd have to keep track of every jump instruction in the entire address space...)
The best approach is to evict all user space pages from cache when an invalid page access happens if the page fault was caused by the software trying to read/write kernel space pages.
Massive performance hit but only to misbehaved software. Normal software will not have the performance hit of the current solution.
Kernel could even switch to unmapped kernel pages solution if too many read/write attempts.
See this excerpt from spectreattack.com:
>Which systems are affected by Meltdown?
>Desktop, Laptop, and Cloud computers may be affected by Meltdown. More technically, every Intel processor which implements out-of-order execution is potentially affected, which is effectively every processor since 1995 (except Intel Itanium and Intel Atom before 2013). We successfully tested Meltdown on Intel processor generations released as early as 2011. Currently, we have only verified Meltdown on Intel processors. At the moment, it is unclear whether ARM and AMD processors are also affected by Meltdown.
>Which systems are affected by Spectre?
>Almost every system is affected by Spectre: Desktops, Laptops, Cloud Servers, as well as Smartphones. More specifically, all modern processors capable of keeping many instructions in flight are potentially vulnerable. In particular, we have verified Spectre on Intel, AMD, and ARM processors.
> These vulnerabilities affect many CPUs, including those from AMD, ARM, and Intel, as well as the devices and operating systems running them.
https://security.googleblog.com/2018/01/todays-cpu-vulnerabi...
Treating them as a group, ignores the very real differences in effect.
https://googleprojectzero.blogspot.com/2018/01/reading-privi...
> Thus, the isolation of containers sharing a kernel can be fully broken using Meltdown.
Any reason for the panic now? Any know malware using it?
We are posting before an originally coordinated disclosure date of January 9, 2018 because of existing public reports and growing speculation in the press and security research community about the issue, which raises the risk of exploitation.
https://security.googleblog.com/2018/01/todays-cpu-vulnerabi...
Websites like the Guardian report that this is now the case but I don't understand how that's possible.
I hereby nominate 2018's song to be Billy Joel's We Didn't Start the Fire.
Yes, this explains why it was lifted.
At least "spectre" and "meltdown" will be memorable even for non-technical people (who should probably be aware of the issue even if they don't understand the technical details). "Bounds check bypass" and "branch target injection" probably sound like random words stringed together for most people.
So yeah making it nice & pretty seems appropriate just like a CV
(unless I undergo the tedious process of becoming a noscript user or something similar).
Holy shit.
I kinda get how it works now. They force a speculative execution to do something with a protected memory address, and then measure the latency to guess the content. They did not found a way to continue execution after a page fault as rumors were.
The fact that speculative execution branch can access protected memory, but not to commit its own computation results to memory in ia32 was known since pentium 3 times.
It was dismissed as "theoretical only" vulnurability without possible practical application. Intel kept saying that for 20 years, but here it is, voila.
The ice broke in 2016 when Dmitry Ponomarev wrote about first practical exploit scenario for this well known ia32 branch prediction artifact. Since then, I believe, quite a few people were trying all and every possible instruction combination for use in timing attack until somebody finally got one that works that was shown behind closed doors.
Edit: google finally added reference to Ponomarev's paper. Here is his page with some other research on the topic http://www.cs.binghamton.edu/~dima/
https://googleprojectzero.blogspot.com/2018/01/reading-privi...
It is perhaps beneficial to be using an easily portable OS that can be run on older computers, and a variety of architectures.
Sometimes older computers are resilient against some of todays attacks to the extent those attacks make assumptions about the hardware and software in use. (Same is true for software.)
When optimization reaches a point where it exposes one to attacks like the ones being discussed here, then maybe the question arises whether the optimization is actually a "design defect".
What is the solution?
IMO, having choice is at least part of any solution.
If every user is effectively "forced" to use the same hardware and the same software, perhaps from a single source or small number of sources, then that is beneficial for those sources but, IMO, counter to a real solution for users. Lack of viable alternatives is not beneficial to users.
"x86 virtualization is about basically placing another nearly full kernel, full of new bugs, on top of a nasty x86 architecture which barely has correct page protection. Then running your operating system on the other side of this brand new pile of shit."
Doesn't Google say that they are protected...?
However, if the OS inside the VM is unpatched, then code inside the VM can attack other code inside the VM. If for example you install some malware on your VM, it could use this attack.
(I am not a security expert, this is just my understanding and not a official Google statement)
"Compute Engine customers should work with their operating system provider(s) to download and install the necessary patches."
For home computer, standard office use, there is no impact at this point, right?
> So far, there are three known variants of the issue:
> Variant 1: bounds check bypass (CVE-2017-5753) > Variant 2: branch target injection (CVE-2017-5715) > Variant 3: rogue data cache load (CVE-2017-5754)
> During the course of our research, we developed the following proofs of concept (PoCs):
> A PoC that demonstrates the basic principles behind variant 1 in userspace on the tested Intel Haswell Xeon CPU, the AMD FX CPU, the AMD PRO CPU and an ARM Cortex A57 [2]. This PoC only tests for the ability to read data inside mis-speculated execution within the same process, without crossing any privilege boundaries.
> A PoC for variant 1 that, when running with normal user privileges under a modern Linux kernel with a distro-standard config, can perform arbitrary reads in a 4GiB range [3] in kernel virtual memory on the Intel Haswell Xeon CPU. If the kernel's BPF JIT is enabled (non-default configuration), it also works on the AMD PRO CPU. On the Intel Haswell Xeon CPU, kernel virtual memory can be read at a rate of around 2000 bytes per second after around 4 seconds of startup time. [4]
> A PoC for variant 2 that, when running with root privileges inside a KVM guest created using virt-manager on the Intel Haswell Xeon CPU, with a specific (now outdated) version of Debian's distro kernel [5] running on the host, can read host kernel memory at a rate of around 1500 bytes/second, with room for optimization. Before the attack can be performed, some initialization has to be performed that takes roughly between 10 and 30 minutes for a machine with 64GiB of RAM; the needed time should scale roughly linearly with the amount of host RAM. (If 2MB hugepages are available to the guest, the initialization should be much faster, but that hasn't been tested.)
> A PoC for variant 3 that, when running with normal user privileges, can read kernel memory on the Intel Haswell Xeon CPU under some precondition. We believe that this precondition is that the targeted kernel memory is present in the L1D cache.
If I'm reading this right, then the only POC that works against ARM is the first one, which lets you read data within the same process. Not too impressive. (Yes, I know that I'm reading into this that they tried to run all the POCs against all the processors. But the "Tested Processors" section lower down leads me to believe that they did in fact do so.)
The third and fourth POC seem to be Intel-specific.
Going to AMD would be incredibly expensive as you'd be replacing nearly everything, but if Intel gets new chips out in a reasonable amount of time, they might actually make a killing on this.
"In addition to violating process isolation boundaries using native code, Spectre attacks can also be used to violate browser sandboxing, by mounting them via portable JavaScript code. We wrote a JavaScript program that successfully reads data from the address space of the browser process running it." - from the spectre paper
So it seems like they tried it on AMD and ARM, but they tried much harder on Intel. That's less reassuring than my initial reading.
>6.4 Limitations on ARM and AMD We also tried to reproduce the Meltdown bug on several ARM and AMD CPUs. However, we did not manage to successfully leak kernel memory with the attack described in Section 5, neither on ARM nor on AMD. The reasons for this can be manifold. First of all, our implementation might simply be too slow and a more optimized version might succeed. For instance, a more shallow out-of-order execution pipeline could tip the race condition towards against the data leakage. Similarly, if the processor lacks certain features, e.g., no re-order buffer, our current implementation might not be able to leak data. However, for both ARM and AMD, the toy example as described in Section 3 works reliably, indicating that out-of-order execution generally occurs and instructions past illegal memory accesses are also performed.
Seems like the possibility exists that AMD/ARM could be affected, based on the behavior they saw, but they were not able to successfully verify.
but it might also be reasonable in a different architecture to fetch 0 straight away to the micro op when the memory access is invalid and set a flag to raise the exception as well. in this situation you don't have the problem because you are just shuffling around invalid data.
Meltdown on the other hand can do everything. But only on Intel.
> In addition to violating process isolation boundaries using native code, Spectre attacks can also be used to violate browser sandboxing, by mounting them via portable JavaScript code. We wrote a JavaScript program that successfully reads data from the address space of the browser process running it.
(granted I think site isolation, if enabled, mitigates crossing domain boundaries)
It goes on to show a sample JS impl that JITs into the expected insns using V8.
Edit - mixing it up with this other article (https://security.googleblog.com/2018/01/todays-cpu-vulnerabi...)
Imagine that you want to know whether someone has checked out a particular library book. The library refuses to give you access to their records and does not keep a slip inside the front cover. You can only see the record of which books you have checked out.
What you do is follow the person of interest into the library whenever they return a book. You then ask the librarian for a copy of the books you want to know whether the person has checked out. If the librarian looks down and says "You are in luck, I have a copy right here!" then you know the person had checked out that book. If the librarian has to go look in the stacks and comes back 5 minutes later with the book, you know that the person didn't check out that book (this time).
The way to make the library secure against this kind of attack is to require that all books be reshelved before they can be lent out again, unless the current borrower is requesting an extension.
There are many other ways to use the behavior of the librarian and the time it takes to retrieve a book to figure out which books a person is reading.
edit: A closer variant. Call the library pretending to be the person and ask for a book to be put on hold. Then watch how long it takes them in the library. If they got that book they will be in and out in a minute (and perhaps a bit confused), if they didn't take that book it will take 5 minutes.
A library has two rooms, one for general books and one for restricted books. The restricted books are not allowed out of the library, and no notes or recordings are allowed to be taken out of the restricted room.
An attacker wants to sneak information out of the restricted room. To do this the pick up a pile of non-restricted books and go into the restricted room. Depending on what they read in there they rearrange the pile of non-restricted books into a particular order. A guard comes along and sees them, they are thrown out of the restricted room and their pile of non-restricted books is put on the issue desk ready to be put back into circulation.
Their conspirator looks at the order of the books on the issue desk and decodes a piece of information about the book in the restricted room. They repeat this process about 500000 times a second until they have transcribed the secret book.
The librarian has a list of books you're not allowed to take out. You request one of those books (book X), but it takes a while for search to run to see whether you're allowed to or not. While you're waiting, you say "actually, I'm not really interested in taking out book X, but if the content of that book is 'a', I'd like to take out book Y. If the content of that book is 'b', I'd like to take out book Y+1, and so on".
The librarian is still waiting for the search to complete to see if you can take out book X, but doesn't have anything better to do, so looks inside it, sees that the letter is 'b', and goes and gets book Y+1 so she can hand it over to you.
Now, the original check to see if you can take the first book out completes, and the librarian says "I'm sorry, I can't let you have book X, and I can't give you the book I fetched that you are allowed to take out, otherwise you'd know the content of the forbidden book."
Now, you request book 'Y', which you are allowed. The librarian goes away for a few minutes, and returns with book 'Y', and hands it over to you. You request book 'Y+1', and she hands it over immediately. You request book 'Y+2', and she goes away for a few minutes again, and hands it over.
You now know that Y+1 was (probably) the book she fetched when you made the forbidden request, and therefore that the letter inside the forbidden book was 'b'.
They check out the book called "how to go to facebook.com". Then they check out "how to type a password". Then they check out "Typing '1234' for Dummies".
I bet you'll never figure out how to get into their facebook account.
[^note]: https://www.facebook.com/notes/petrus-theron/spectre-how-do-...
https://meltdownattack.com/meltdown.pdf
https://spectreattack.com/spectre.pdf
From the spectre paper:
>As a proof-of-concept, JavaScript code was written that, when run in the Google Chrome browser, allows JavaScript to read private memory from the process in which it runs (cf. Listing 2).
Scary stuff.
- http://www.zdnet.com/article/security-flaws-affect-every-int...
* Edit:
From https://meltdownattack.com/
Which systems are affected by Meltdown?
"Desktop, Laptop, and Cloud computers may be affected by Meltdown. More technically, every Intel processor which implements out-of-order execution is potentially affected, which is effectively every processor since 1995 (except Intel Itanium and Intel Atom before 2013). We successfully tested Meltdown on Intel processor generations released as early as 2011. Currently, we have only verified Meltdown on Intel processors. At the moment, it is unclear whether ARM and AMD processors are also affected by Meltdown.
Which systems are affected by Spectre?
Almost every system is affected by Spectre: Desktops, Laptops, Cloud Servers, as well as Smartphones. More specifically, all modern processors capable of keeping many instructions in flight are potentially vulnerable. In particular, we have verified Spectre on Intel, AMD, and ARM processors."
Interestingly, it also put the LKML developers into an ethical grey zone, as they had to deceive the public the patch was actually fixing something else (they did a good and right thing there IMHO).
Despite all the slight problems along the way, kudos to any of the White Hats dealing with this mess over the last months and handling it super graceful!
> Google Chrome
> Some user or customer action needed. More information here (https://support.google.com/faqs/answer/7622138#chrome).
And the "here" link says:
>Google Chrome Browser
>Current stable versions of Chrome include an optional feature called Site Isolation which can be enabled to provide mitigation by isolating websites into separate address spaces. Learn more about Site Isolation and how to take action to enable it.
>Chrome 64, due to be released on January 23, will contain mitigations to protect against exploitation.
>Additional mitigations are planned for future versions of Chrome. Learn more about Chrome's response.
>Desktop (all platforms), Chrome 63:
> Full Site Isolation can be turned on by enabling a flag found at chrome://flags/#enable-site-per-process. > Enterprise policies are available to turn on Site Isolation for all sites, or just those in a specified list. Learn more about Site Isolation by policy.
Does that mean if I don't enable this feature using chrome://flags and tell my grandma to do this complicated procedure I (or she) will be susceptible to getting our passwords stolen?
Subject: Avoid speculative indirect calls in kernel
Any speculative indirect calls in the kernel can be tricked to execute any kernel code, which may allow side channel attacks that can leak arbitrary kernel data.
So we want to avoid speculative indirect calls in the kernel.
There's a special code sequence called a retpoline that can do indirect calls without speculation. We use a new compiler option -mindirect-branch=thunk-extern (gcc patch will be released separately) to recompile the kernel with this new sequence.
We also patch all the assembler code in the kernel to use the new sequence.
Also, see Linus' response here: https://lkml.org/lkml/2018/1/3/797
Does anyone have any color/details on how this came to be? A major fundamental flaw exists that affects all chips for ~10 years, and multiple independent groups discovered them roughly around the same time this past summer?
My hunch is that someone published some sort of speculative paper / gave a talk ("this flaw could exist in theory") and then everyone was off to the races.
But would be curious if anyone knows the real version?
Failed attempt in July which is being attributed as earliest work via https://twitter.com/lavados/status/948700783259811847
This part is interesting considering the performance concerns:
"The majority of Azure customers should not see a noticeable performance impact with this update. We’ve worked to optimize the CPU and disk I/O path and are not seeing noticeable performance impact after the fix has been applied. A small set of customers may experience some networking performance impact. This can be addressed by turning on Azure Accelerated Networking (Windows, Linux), which is a free capability available to all Azure customers."
The question is, how is the speculative write going to the cache in the first place? Only retired instructions should be able to modify cache lines AFAIK. What am I missing?
Edit: Figured it out. The speculatively accessed memory value is used to compute the address of a load from a memory location which the attacker has access to. Once the mis-speculation is detected, the attacker will time accesses to the memory which was speculatively loaded and figure out what the secret key is. Brilliant!
Curious. All other reports I've read state that AMD CPUs are not vulnerable.
As for the difference between AMD and intel. (From other posts here, not this one.) The speculative execution can access arbitrary memory locations on intel processors while this is not possible on AMD. This means that on intel processors you can probe any memory location with only limited privileges.
As for the affected AMD and ARM processors I'm none the wiser. How are they affected? Which models are affected? Does it allow some kind of privilege escalation? The next days will surely stay interesting.
Information site with some more information, and links to papers on the two vulnerabilities, called "Meltdown" and "Spectre" (with logos, of course).
(https://meltdownattack.com/ goes to the same site)
I wonder how the timing relates to New Year and many companies having holidays in CW1.
[1] https://lists.freebsd.org/pipermail/freebsd-security/2018-Ja...
> Both the Meltdown and Spectre logo are free to use, rights waived via CC0. Logos are designed by Natascha Eibl.
> I am careful in how I use the Internet.
> I generally do not connect to web sites from my own machine, aside from a few sites I have some special relationship with. I usually fetch web pages from other sites by sending mail to a program (see https://git.savannah.gnu.org/git/womb/hacks.git) that fetches them, much like wget, and then mails them back to me. Then I look at them using a web browser, unless it is easy to see the text in the HTML page directly. I usually try lynx first, then a graphical browser if the page needs it (using konqueror, which won't fetch from other sites in such a situation).
I'm wondering, was this vulnerability theorized first and later found out to be an actual vulnerability? Or was this something that nobody had any clue about?
I'm only saying this, because from a security perspective, I imagine somewhere at some point very early on someone had to have pointed out the potential for something like speculative execution to eventually cause security problems.
I just don't understand how chip designers assumed speculative execution wouldn't eventually cause security problems. Is it because chip designers were prioritizing performance above security?
And I certainly expect to see more things like this (but at least hopefully with lower bandwidth).
"The threat and the response to the three variants differ by microprocessor company, and AMD is not susceptible to all three variants. Due to differences in AMD's architecture, we believe there is a near zero risk to AMD processors at this time."
So either AMD is lying or Google's blog post is wrong. Granted AMD's statement is a bit muddled, not sure if they mean they aren't susceptible to all THREE variants (as in only 1/3) or they aren't susceptible to ALL three variants (as in none of them.)
Because this looks real bad:
> Reading host memory from a KVM guest
Yeah, it's pretty bad.
I think this might even be fixed by microcode patches on Intel, at least os specific, looking at the first address bit.
Google zero and academia researchers found it independently, following some talk about the concept a while back.
The 3 letters agencies have people of the same calibre working full time on that. They could find it too.
"Spectre" is very bad news and affects all modern CPUs. Mitigation is to insert mfence instructions throughout jit generated sandboxed code making it very slow, ugh. Otherwise assume that the entire process with jit generated code is open to reading by that code.
Any system which keeps data from multiple customers (or whatever) in the same process is going to be highly vulnerable.
Here's the synchronized announcement from Chrome/Chromium: https://sites.google.com/a/chromium.org/dev/Home/chromium-se...
"Chrome's JavaScript engine, V8, will include mitigations starting with Chrome 64, which will be released on or around January 23rd 2018. Future Chrome releases will include additional mitigations and hardening measures which will further reduce the impact of this class of attack. The mitigations may incur a performance penalty."
Chrome 64 will be hitting stable this month, which means that it ought to be possible to benchmark the performance penalty via testing in Chrome beta. Anybody tried yet?
"However, for both ARM and AMD, the toy example as described in Section 3 works reliably, indicating that out-of-order execution generally occurs and instructions past illegal memory accesses are also performed."
It seems like keeping untrusted code in a separate address space would be a suitable workaround? A lot of comments here seem to be implying that meltdown-style reading of separate address spaces is possible via Spectre, and my read is that it wouldn't.
It's not yet clear whether it affects all modern CPUs, notably I have yet to see any mention of modern POWER/MIPS/SPARC-based designs. If someone has pointers, those particular cases would probably be quite interesting.
Works on processors going back as far as 2007 (the oldest I have access to now is an Athlon 64 X2 6000+), but the example code relies on an instruction that the Atom D510 does not suport.
Because Spectre seems to be an intrinsic problem with out-of-order execution, which is almost as old as the FDIV bug in intel processors, I would be very surprised if the Atom D510 did not turn out to be susceptible using other methods as outlined in the paper.
EDIT: I originally suspected this instruction was CLFLUSH and erroneously claimed the D510 doesn't support sse2. It does support sse2, so it must be that it does not support the RDTSCP instruction used for timing.
EDIT: This gets very interesting. I made some modifications to use a CPUID followed by RDTSC, which now runs without illegal instructions and works everywhere the previous version worked. Except on the D510, this runs but I cannot get the leak to happen despite exploring values of CACHE_HIT_THRESHOLD. Could the Atom D510 really be immune from Spectre?
With "gcc (GCC) 7.2.1 20171128", remove the parenthesis from CACHE_HIT_THRESHOLD macro[1] to compile correctly.
[1]: https://gist.github.com/ErikAugust/724d4a969fb2c6ae1bbd7b2a9...
(Set cache_hit_threshold to the default value of 80, my cpu is an Intel i7-6700k.)
Looking back, the Mac patches were to address KPTI (Meltdown) which is separate to Spectre.
Looks like any session token/state could be exfiltrated from your Gmail tab to a malicious JS app running in-process, for example.
Am I overreacting here?
Still skimming the paper, but the JS attack appears to be processor-intensive (please chime in if you interpret it differently!). Any widespread, indiscriminate use of such an attack in the wild seems like it would eventually be detected as surely as client-side cryptocurrency mining was discovered. If you aren't a valuable target, if you don't visit sites that are shady enough to discreetly mine bitcoin in your browser, and if you use an adblocker to defang rogue advertisers, then you probably shouldn't lose too much sleep over this (which is not intended to diminish how awesome (in the biblical sense) this attack is).
That said, if there were ever a time to consider installing NoScript, now's it: https://addons.mozilla.org/en-US/firefox/addon/noscript/
https://groups.google.com/a/chromium.org/forum/#!topic/blink...
https://groups.google.com/forum/#!topic/mozilla.dev.platform...
Chrome and Firefox's "intent to ship" posts both contain claims to the effect that there probably aren't any really serious timing channel attacks, which... seems to have been disproved. Why isn't SharedArrayBuffer already being disabled as a stopgap? I think users can turn it off in firefox, how about Chrome?
I don't know for which processors Intel and AMD plan to release microcode updates.
Edit: Also, PoCs for unpatched Windows by pwnallthethings: https://github.com/turbo/KPTI-PoC-Collection
I am not sure what "the process in which it runs" means here ... do they mean private memory from within chrome ? Or within the child process spawned from chrome, or within the spawned JS sandbox or ... what ?
Practically speaking, I worry about a browser pageview that can read memory from my terminal process. Or from my 'screen' or 'sshd' process.
I think that is not a risk here, yes ?
> AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.
And Axios [2] that Zdnet quotes gave a comment from AMD:
> "To be clear, the security research team identified three variants targeting speculative execution. The threat and the response to the three variants differ by microprocessor company, and AMD is not susceptible to all three variants. Due to differences in AMD's architecture, we believe there is a near zero risk to AMD processors at this time. We expect the security research to be published later today and will provide further updates at that time."
And a comment from ARM: > Please note that our Cortex-M processors, which are pervasive in low-power, connected IoT devices, are not impacted.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/...
[2] https://www.axios.com/how-the-giants-of-tech-are-dealing-wit...
1. Have out of order execution
2. Have aggressive speculative memory load / caching behavior
3. Be able to speculatively cache memory not owned by the current process (either kernel or otherwise)
4. Have deterministic ways of triggering a speculative load / read to the same memory location
2 is probably the saving grace in ARM / low power land, given they don't have the power budget to trade speculative loads for performance (in the event they're even out of order in the first place).
Caveat: I'm drinking pretty strong Belgian beer while reading through these papers.
"AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault."
If you run a multitenant workload on a linux system (say you're a PaaS or even just hosting a bunch of WordPress side by side) you should update your kernel as soon as is reasonable. While VM to VM attacks are patched, I'm sure lots of folks are running untrusted code side by side and need to self patch. This is why our docs point this out for say GKE: we can't be sure you're running single tenant, so we're not promising you there's no work to do. Update your OSes people!
For example, if a processor prevents speculative execution of instructions in user processes from accessing kernel memory, the attack will still work.
and
Kernel mode testing has not been performed, but the combination of address truncation/hashing in the history matching and trainability via jumps to illegal destinations suggest that attacks against kernel mode may be possible. The effect on other kinds of jumps, such as interrupts and interrupt returns, is also unknown
There doesn't seem to be any reason to believe VM to VM attacks are either patched nor patchable.
My question to you, which I realise you may be unable to answer - how much does truly dedicated hardware on GCE cost? No co-tenants at all except maybe Google controlled code. Do you even offer it at all? I wasn't able to find much discussion based on a 10 second search.
Like the AWS reboots, people will notice. So in the interest of the embargo, both Azure and AWS waited to update as late as they felt was safe. Since we do live migrations and host kernel updates all the time, nobody noticed us :).
(Edit: there are 9 posts total, go to her user page to see them all)
Seems there are two issues. One, called Meltdown, only effects Intel and is REALLY bad, but the kernel page table changes everyone is making fixes it.
The other, dubbed Spectre, is apparently common to the way all processors handle speculative execution and is unfixable without new hardware.
I’d like to know more about that but I haven’t seen anything yet.
Whoever discovered this stuff on Google’s team deserves some sort of computer security Nobel prize.
You can see all the tweets here (courtesy of @svenluijten): https://twitter.com/i/moments/948681915485351938.
Speculative execution is fundamental to getting decent performance out of a CPU. Without it you should probably divide your performance expectations by 5 at least.
Rolling back all state rather than just user visible state in the CPU is neigh on impossible. When you evict something from the cache, you delete it. Undeleting is hard. There are also a lot of other non-user-visible bits of state in a CPU.
While they're not as big of a deal AFAIK, we do have the Pwnie Awards: https://pwnies.com/
https://googleprojectzero.blogspot.com/2018/01/reading-privi...
What!
But the sentence you quote adds AMD back into play. Maybe some of its ARM processors? e.g. AMD Opteron A1100?
I'd be curious how those two statements should be reconciled.
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz (called "Intel Haswell Xeon CPU" in the rest of this document)
AMD FX(tm)-8320 Eight-Core Processor (called "AMD FX CPU" in the rest of this document)
AMD PRO A8-9600 R7, 10 COMPUTE CORES 4C+6G (called "AMD PRO CPU" in the rest of this document)
An ARM Cortex A57 core of a Google Nexus 5x phone [6] (called "ARM Cortex A57" in the rest of this document)
https://googleprojectzero.blogspot.com/2018/01/reading-privi...
I think the key to the statement is in any case that you need to differentiate between what is possible on the processor architecture level when you have full software control, and what is possible on an operating system level, where 3rd party applications are further restricted in various arbitrary ways such as only allowed to use Java, limited access to high resolution timing primitives, etc. that can make practical exploitation impossible, even if the flaw is present.
It's difficult to reason about because it's hard to tell if you can manipulate a JIT runtime into generating the code you need for the exploit to work - and as the JavaScript implementations show, the answer is often "yes".
[0] Their secondary goals are to protect Google products and services, and to provide excellent PR in line with what we're discussing right here.
Which is, at the same time, highly rational: to secure their entire market.
It's nice to have big corp's incentives aligned with the public good. Too bad it happens so rarely.
Suppose your company also has a team that inspects public bridges to make sure they don't collapse.
Is it really altruistic, or given your market share is it a cost of business?
I predict tonight's disclosures will lead to an uptick in interest in running websites on dedicated hardware, like we did back at the turn of the century.
Spectre occurs no matter you are in cloud or not, while cloud companies can advertise themselves to help customers proactively to mitigate such risks.
I keep wondering if they got some “””AI””” fuzzer that helps them a ton? Plus tons of compute power to spend (remember SHA-256 clash they found “just because”?)
Flag early camp argument was - protected pages should not be allowed to be fetched to begin with by any insecure execution flow, and we need to pagefault before speculative execution
The "flag after" camp was all for post-factum pagefaulting when the branch has finished execution, so you do not need to pagefault for every branch, and only do it for the branch that has "won"
Chip design magazines from nineties has all that well covered.
https://news.ycombinator.com/item?id=16066165
With this particular computer scientist, who talked about this problem before, referenced in Google's paper:
https://news.ycombinator.com/item?id=16062223
It's a similar situation to other timing attacks, which have been around practically as long as caches.
!(susceptible_v1 && susceptible_v2 && susceptible_v3)
They are not saying that: !susceptible_v1 && !susceptible_v2 && !susceptible_v3
(the latter would be rendered in English as: "AMD is not susceptible to any of the the three variants")There is a nice table on AMD's website though:
We have empirically verified the vulnerability of several Intel processors to Spectre attacks, including Ivy Bridge, aswell and Skylake based processors. We have also verified the attack’s applicability to AMD Ryzen CPUs. Finally, we have also successfully mounted Spectre attacks on several Samsung and Qualcomm processors (which use an ARM architecture) found in popular mobile phones.
So in other word, the researcher haven't tried it on AMD processors, but they think the attack would work. AMD, on the other hand, is saying the attack won't work.
Frankly, I believe in PoC||GTFO, so AMD is safe in my book for now.
That's the bug. The guard only checks to see whether you're supposed to have access after you walk in and start (speculatively) rearranging books. One way to fix this bug would be to have the guard check your access at the door.
> We are posting before an originally coordinated disclosure date of January 9, 2018 because of existing public reports and growing speculation in the press and security research community about the issue, which raises the risk of exploitation. The full Project Zero report is forthcoming.
Is it super important in this case? shrug.
But imagine for the sake of argument there was some undocumented cpu behaviour "if instruction x,y,z are executed in that order with these constants then catch fire", then having anyone pre-empt the agreed update time could be bad.
I guess Intel only focused on Windows NT for several months, and there it's not so easy as on Linux.
After that its all about PR and getting people prepared for the magnitude and impact early.
Also to let people know that patches that were already available can be used (restarting GCP/AWS instances, SPI on chrome).
> This initial report did not contain any information about variant 3. We had discussed whether direct reads from kernel memory could work, but thought that it was unlikely. We later tested and reported variant 3 prior to the publication of Anders Fogh's work at https://cyber.wtf/2017/07/28/negative-result-reading-kernel-....
Let's say you want to know if your boss is away on vacation next week so you call their admin and say "you need to double-check my contact info if the boss is going to be out next week". They load up the boss' calendar to check and based on his presence next week then load up your info. Only once done, do they take the time to remember the boss didn't want you to know wether they are in or out. So you hear back, "sorry, can't tell you that, but you follow up with "OK, well can you still double check that my phone number is..."
If they respond quickly with a yes, then your file is still on their screen and the boss is in fact out next week. If there is a short pause while they look it up, then the opposite.
Other side-channel attacks can come in the form of analysing network data, power-consumption (CPUs use more power when they are "busier")... even noise (listen for when the fans start spinning up).
says:
> Want to know what's really going on with the Intel security flaw everyone is talking about? Checkout https://meltdownattack.com to get all the details. This is my boyfriend's and his research team's latest work. An huge security breach which affects nearly all your computers! Stealing all your secrets never was that easy!
I have been most focused on people being concerned that a neighboring VM could suddenly be an attacker. You're right that the same kind of thing that affects your JavaScript engine as a user affects say Apache or anything that allows requests from external sources. However, that situation already has a much larger attack surface and people in that space should be updating themselves whenever there's any CVE like this.
My concern was that the Azure announcement made it sound like they've done the work, so nothing is required. That's not strictly true, even though providers have mitigated one set of attacks at the host kernel layer, so I wanted to correct that.
Sure, Google could just patch themselves, but the information to recreate the issue would surely be leaked by a xoogler, since it only takes a single sentence describing the vuln for a competent sec team to recreate it
That said, the main new ingredient of Spectre seems to be the idea that userspace can poison the branch target buffer to cause speculative execution of arbitrary code in kernel space. That part of the attack should be fairly easy to mitigate with new hardware, by XORing (or hashing) the index into the BTB with a configurable value that depends on the privilege level. So each process has its own "nonce", and they're all different from the kernel's.
Then BTB poisoning won't work unless the attacker knows its own and the other context's nonce. Even if further attacks are found that leak this nonce, they could be mitigated by changing the nonce at regular intervals.
I guess part of the question you're raising is: are there so many different caches, translation buffers, etc. in a modern CPU that keeping 'uncommitted buffers' for the state of all of them would be just as complex as throwing a whole other core in there?
[1] https://eprint.iacr.org/2016/613.pdf
> Both hardware thread systems (SMT and TMT) expose contention within the execution core. In SMT, the threads effectively compete in real time for access to functional units, the L1 cache, and speculation resources (such as the BTB). This is similar to the real-time sharing that occurs between separate cores, but includes all levels of the architecture. [...] SMT has been exploited in known attacks (Sections 4.2.1 and 4.3.1)
The scary thing is that you can't fix this in software.
From the Spectre note (which does affect AMD):
In addition to violating process isolation boundaries using native code, Spectre attacks can also be used to violate browser sandboxing, by mounting them via portable JavaScript code. We wrote a JavaScript program that successfully reads data from the address space of the browser process running it.
How quickly are we going to see attacks targeting BTC/ETH wallets, apps etc. on clients and cloud hosted exchanges?
This recent article contains a bit more detail on Site Isolation: https://arstechnica.com/gadgets/2017/12/chrome-63-offers-eve...
> Chrome's default model is, approximately, to use one process per tab. This more or less ensures that unrelated sites are kept in separate processes, but there are nuances to this set-up. Pages share a process if they are related through, for example, one opening another with JavaScript or iframes embedding (wherein one page is included as content within another page). Over the course of a single browsing session, one tab may be used to visit multiple different domains; they'll all potentially be opened within a single process. On top of this, if there are already too many Chrome processes running, Chrome will start opening new pages within existing processes, resulting in even unrelated pages sharing a process.
Which suggests there are a number of cases where multiple tabs could share a process.
[1] Two pages are considered cross-site if they cannot use document.domain to become same origin. In practice, this means that the effective TLD + 1 component match.
Curious to know whether Firefox has anything similar in the pipe, since it uses a fixed number of content processes rather than a variable number of processes.
May be one way would be to use a smaller, separate cache for speculative execution and then copy that value to the regular cache once speculation is confirmed? This would add a one cycle latency for cache-to-cache transfer but there might be better ways.
This problem is already solved with speculative writes to main memory - a speculative store buffer keeps a sequence of memory operations which need to be done when the operation retires. These buffers are very power hungry, because every future speculative read must check every entry in the speculative store buffer to see if it is re-reading a previously written address. That many to many mapping leads to an exponential amount of checking logic.
The same could be done for cache reads/writes, but I have a feeling it would quickly get very complex, large, and power hungry.
More broadly, potential counter- measures limited to the memory cache are likely to be insufficient, since there are other ways that speculative execution can leak information. For example, timing ef- fects from memory bus contention, DRAM row address selection status, availability of virtual registers, ALU ac- tivity, and the state of the branch predictor itself need to be considered.
... also ...
Of course, speculative execution will also affect conventional side channels, such as power and EM
Historically I think it's been assumed that you can't extract much useful information from a modern speculating CPU via EM radiation, but these attacks constantly seem to be surprising people. Re-programming a wifi chip to monitor interference generated by the CPU to spy on speculation? It would have sounded like a pie in the sky fantasy ... yesterday.
DRAM and the memory bus is also affected by DMA operations running independently of the CPU.
Power consumption? There is no hardware available to measure that, let alone at the time resolution required. If you have to first attach a GHz bandwidth oscilloscope to the computer you might as well just reboot it or dump its RAM contents or whatever.
Forget about reprogramming a Wi-Fi chip. They operate on narrow channels in the 2.4GHz range and have fixed hardware for modulation. You would at least have to force the CPU to switch to the right frequency and then be lucky enough that it radiates a signal that demodulated to something sensible within the Wi-Fi hardware. This is physically impossible on current hardware.
Also, on a different note, we cannot sacrifice performance willy-nilly for the sake of a bit of potential security gains. A 30% performance loss on servers means that the counter move is to consume 30% more power to maintain current levels of operation in a data center. This energy needs to be generated, which means that someone is burning oil or gas for it with all the consequences. In essence, the current patches will result in an extra thousands or millions of tons of CO2 in the atmosphere. More efficient replacement hardware will eventually produced with extra environmental impact. We need to find ways to avoid that. Soon.
Look at their Listing 2: Instructions 5 - 7 will be waiting for the privileged data from line 4 (they are not speculatively executed since they have a data dependency on line 4).
So why is Intel releasing the privileged data to the waiting instructions? An answer could be that violation checking is delayed until retire, but other implementations are possible.
Anyway, so it could be that AMD and ARM are vulnerable, but it's possible that they are not.
1. Intel only (so far) is related to prefetching privileged memory 2. More or less everyone: Speculatively executing code that has variable execution time.
The point of VIPT caches is exactly to use data before all the checks are completed.
It's easy to judge the sanity of things ex post, but maybe it's not that easy if it took 20 years to find the issue.
It's certainly not easy, but it's doable.
We're all talking about how Spectre is this magic "get access to any memory from any process". But it looks to me like it's a new class of attack, that still requires specific entry points for the software you're trying to attack.
I'd like to be proven wrong on this, but it _feels_ like this is more of a software thing like other timing bugs. In theory you can write software that isn't vulnerable
EDIT: my reading of the "JS Spectre implementation" is "JIT code runs in the process of the browser + you can write JS code to read the process's own memory". I can imagine messiness with extensions (1Password in particular).
Before you trigger the victim process, you perform some steps in your own, hostile, process that teaches the branch predictor where a particular branching operation will likely go. Then you trigger the victim process in the way you know will cause a very similar branching operation.
Even though it's operating within an entirely different process, the branch predictor uses what it learnt in the hostile process to predict the branch result in the victim process. It jumps to the address the hostile process taught it, and starts to speculatively execute code there. Eventually, it figures out it guessed wrong, but by then it's too late, and the information has leaked via a side-channel in a way that the hostile process can detect.
So, essentially, you're use the branch predictor's cache to send the memory address. And you're not sending it to the victim process, you're sending it directly to the CPU. The victim process will never even know it's been attacked, because when the branch predictor hides the consequences of its incorrect guess from being detected by conventional means.
What would that entail? Disabling speculation completely? Disabling memory accesses during speculation?
The "out of bounds" Spectre variant is still feasible.
Also: What about hyperthreads? It seems to be many people's assumption that the BTB is shared within a physical core.
I'm being a facetious ass. But you know I'm not wrong, either.
Long live progressive enhancement and graceful degradation.
At least until we get a JS interpreter with proper permission controls and sandbox limits. Something closer to how Lua is embedded sounds nice.
The Javascript case is the main one that makes it remotely exploitable.
> In addition to violating process isolation boundaries using native code, Spectre attacks can also be used to violate browser sandboxing, by mounting them via portable JavaScript code. We wrote a JavaScript program that successfully reads data from the address space of the browser process running it.
The whitepaper doesn't contain example JS code however
Err… fourth paragraph:
> Before the issues described here were publicly disclosed, Daniel Gruss, Moritz Lipp, Yuval Yarom, Paul Kocher, Daniel Genkin, Michael Schwarz, Mike Hamburg, Stefan Mangard, Thomas Prescher and Werner Haas also reported them; their [writeups/blogposts/paper drafts] are at: Spectre (variants 1 and 2) Meltdown (variant 3)
Additional exploits for other architectures are also known to exist. These include IBM System Z, POWER8 (Big Endian and Little Endian), and POWER9 (Little Endian).
https://blog.mozilla.org/security/2018/01/03/mitigations-lan...
performance.now() accuracy is also being reduced.
This month's stable Chrome release will be outright disabling SharedArrayBuffer until additional mitigations are enacted.
It isn't exactly polyfillable.
Spectre doesn't really care if it is cloud or bare metal. They are equally vulnerable unless disconnected from internet.
This affects browsers with Javascript enabled because your Javascript engine runs foreign code on the CPU. The bad guy puts nasty code in a page, you visit the page, the code executes on your machine--boom.
And it affects public cloud web servers because multiple cloud servers (virtual machines) run on one CPU. So some attacker might be able to jump out of their VM and read your VM's memory.
BUT, on a dedicated hardware web server, there shouldn't be any foreign code running--no foreign VMs, and no browser.
Any user who has access to a system (developers or support or sysadmin) has the ability to read arbitrary memory. The vulnerability can probably be leveraged to privilege escalation or bypass the isolation.
So cloud instance and bare metal ones are equally vulnerable under Spectre: as long as they can transfer their malicious code to your VMs and runs it. Can't really see how bare metal servers mitigate this problem.
Add in that Spectre specifically is a js bug so in order to be vulnerable your server would need to execute untrusted JavaScript and I think we can assume the threat surface of this specific bug is smaller outside the cloud...
https://gist.github.com/ErikAugust/724d4a969fb2c6ae1bbd7b2a9...
Its a technique, not a flaw in js, from my understanding
1) https://spectreattack.com/: "Spectre tricks other applications into accessing arbitrary locations in their memory. " Spectre does not let you execute code in another guest
2) Spectre is not javascript specific. I am not sure why you think it is, beyond the fact a PoC was written in js
For code running in user mode, you flush the branch predictor on each context switch---again, new microcode + patched OS.
Hyperthreads are tricky. Those are not yet fixed by microcode AIUI, and in the future you may want a usermode program to say "I don't want indirect branch prediction because I am afraid of what the other hyperthread might do to me". That would require some new system call (like a new prctl on Linux) or something like that.
The AMD engineer could be right if talking about Ryzen, and/or he isn't mentioning user-user and user-kernel boundaries.
AMD isn't affected in nearly the same way as Intel/ARM are: https://twitter.com/ryanshrout/status/948683677244018689
Thats why in CPU's caches are usually "2 way associative" or "4 way associative".
That means the data you're looking for might be in one of 2 (or 4) places. Fully associative means the data you're looking for might be in any memory slot, and you're gonna have to check them all. Checking them all in parallel is possible, so it isn't a speed issue, but it is a massive power issue. Average power use is the main limiting factor in CPU's today.
In general in a CPU, transistors which stay in the same state don't use much power. Transistors changing state use power. In a fully associative memory, the transistors doing the comparing change state with every comparison. Whereas with a regular memory only the transistors for the individual bit of the memory being read or written change state and use power.
(the above is a simplification, but contains the key elements).
The density if load instructions in code is not particularly high on average. Also, all loads are subject to the same latencies, so that the chance that a speculative read completes before the blocking one is also low (must be cached in a higher level cache, I think).
Taken together, I would be surprised if more than about 10 speculative reads can successfully complete at all in that time frame, even though it is hundreds of cycles. So that would be around 1000 and gates and 1000 memory cells. Doesn't sound too big to me.
Does this mean that all websites work? Of course not. But this allows the user to choose which sites to allow to run js. I'm not going to pretend that this is an easy task for non-technical users, but we should be promoting these kinds of habits, not scoffing at them. We should educate as many users as possible that they can still (for now) control much of the web-based code executing on their machines.
Then there was that time when I read on HN about Forbes loading 35 MB worth of crap (lots of JS too) when you first access it, sure enough it's completely broken with noscript too if you don't allow it.
I get that the victim process' branch prediction can be messed with. But if my victim process is:
password = "password"
secret = "magic BTC wallet secret key"
while True:
password_attempt = input()
if constant_time_compare(password, password_attempt):
print(secret)
And my input is something like: result = ""
while sys.stdin.peek() not in ['\n', EOF]:
result += sys.stdin.get()
Then at no point is the victim program really exposing any pointer logic, so not even the victim process will be accessing the `secret` during execution, let alone the hostile process.The examples given all include arrays provided by the hostile program, and some indexing into the arrays. I definitely see this being an issue in syscalls, but if that's the scope of this, I wouldn't call Spectre a "hardware bug" any more than other timing attacks would be hardware bugs.
Previously side channel attacks like this have been seen by the security community as unreliable things which only work in very specific cases and have to be averaged over millions of runs.
This attack shows a side channel which is general purpose, reliable, and fast.
There is no fundamental reason why speculative instructions should be allowed to mutate the cache.
OTOH the contention-based side channel attack on speculation has been public knowledge for over a decade. [1]
[1] Z. Wang and R. B. Lee, "Covert and Side Channels Due to Processor Architecture," 2006 22nd Annual Computer Security Applications Conference (ACSAC'06), Miami Beach, FL, 2006, pp. 473-482. doi: 10.1109/ACSAC.2006.20
There is: hundreds of instructions can be in flight speculatively at the same time, especially if you take hyperthreading into account. Good luck rolling them all back.
The question is not whether the cache should be mutated during speculative execution. It's what kinds of speculative execution are allowed, and in some cases it's not even clear if fences should be placed by the programmer (whack-a-mole style), the compiler (not sure how) or the processor (probably not). It's non-obvious enough that how to solve it is to some extent a research problem.
I'd say difficulty level is easy.
So how easy is it to turn that PoC into something I should worry about? Seems like browsers are the most affected by this scenario, but that also means harden the browser (separate process per page) and it might be difficult to exploit.
Sadly, I feel the only part that won't happen will be the programming language part, but who knows.
it's obviously not a free + zero time activity, but I'm going to assume someone making an ad to scan memory isn't super concerned about end user cpu usage or battery life..
[1] https://support.google.com/faqs/answer/7622138#chrome [2] http://www.chromium.org/Home/chromium-security/site-isolatio...
Edit: Apparently you can already do something like this. Seems to be an option for Chrome starting with 63. (Which was an October release I believe?
http://www.chromium.org/Home/chromium-security/site-isolatio...
The reason for /further/ truncating performance.now() is that the relative cost in this attack means that you don't need as much precision as was needed for the original (page table? I think) attack.
A SAB timer just needs to increment a counter in one thread and read it in the host thread and the granularity is however long it takes to get through a for-loop.
This sucks, and is a side-effect that I didn't even think about. I guess it's probably pretty effective, but it will make benchmarking a lot harder, since you'll probably now have to do a lot more runs.
That's much more accurate than necessary to benchmark any software code.
AMD's ASID blocks the issues for VM guests (and root users on VM guests).
For variant 2, Project Zero used the eBPF interpreter as a gadget, a fake branch destination, without having to actually create an eBPF program or use the normal userland-facing eBPF APIs at all. And they only chose it as the least "annoying" option (see quote below).
edit: I'm not sure how ASID support would mitigate either of those variants, though there may be something I'm not thinking of. (It would help with variant 3, but that's the variant AMD wasn't vulnerable to in the first place.)
quote:
> At this point, it would normally be necessary to locate gadgets in the host kernel code that can be used to actually leak data by reading from an attacker-controlled location, shifting and masking the result appropriately and then using the result of that as offset to an attacker-controlled address for a load. But piecing gadgets together and figuring out which ones work in a speculation context seems annoying. So instead, we decided to use the eBPF interpreter, which is built into the host kernel - while there is no legitimate way to invoke it from inside a VM, the presence of the code in the host kernel's text section is sufficient to make it usable for the attack, just like with ordinary ROP gadgets.
BPF is employed by the `bpf()` syscall for socket packet filtering, as well as by `seccomp` itself for its syscall filtering. Is this threat vector not available to untrusted processes?
edit: I'm sure everyone involved acted responsibly. I'm just curious as how far apart these independent discoveries were made.
The bug has been around forever, but it must have been discovered relatively recently since it's not fixed in hardware yet.
I've always been baffled by the concept of simultaneous discovery.
> We would like to thank Intel for their professional handling of this issue through communicating a clear timeline and connecting all involved researchers.
1) Have any of them ended up in Project Zero or working on stuff like this
2) Wonder if NSA knew about this vulnerability and now someone there in a windowless office is sighing saying to themselves "Welp, another backdoor we can't use".
There's a huge shared buffer between two threads. 256 * 4K. One thread reads a byte of kernel memory, literally any byte it wants, and it then reads one of those 4K pages from that buffer in order to cache that one memory page that corresponds to the byte it just read. Then at some point the CPU determines that the thread shouldn't be permitted to access the kernel memory location, and rolls back all of that speculative execution, but the cached memory page isn't affected by the rollback.
The other thread iterates through those 256 pages, timing how long it takes to read from each page, and the one page that Thread A accessed will have a different (shorter?) timing because it's cached already. It now understands one byte of kernel memory that it shouldn't. That's just one byte but the whole process is so fast that it's easy to just go nuts on the whole kernel address space.
So what would the fixes be? Disable speculative execution? Only do it if the target memory location is within userspace, or within the same space as the executing address? Plug all of the sideband information leak mechanisms? I dunno.
Major props to the teams working on this... now time for us all to hold onto our pants as we ask for budget increases that will make shareholders demand blood.
The only sorts of companies where server costs could increase hugely due to a sudden need for hardware isolation are those where they're running tiny or incredibly bursty workloads. Big companies like Netflix that use tons of cores can just binpack their work all together on the same hardware so their jobs only share hardware with other jobs controlled by the same company. Effectively, cloud providers will start offering sub-clouds into which only your own jobs will be scheduled.
This is actually how cloud tech has worked for many years internally. I worked at Google for a long time and their cluster control system (Borg) had a concept called "allocs" which were basically scheduling sub-domains. You could schedule an alloc to reserve some resources, and then schedule jobs into the alloc which would share those resources. Allocs were often used to avoid performance-related interference from shared jobs, e.g. when a batch job kept hogging the CPU caches and slowing down latency sensitive servers. I suppose these days VMs and containers do a similar job, though I think the Borg approach was nicer and more efficient.
I guess this sort of per-firm isolation will become common and most companies costs won't change a huge amount. The people it'll hit will be small mom-and-pop personal servers, but they're unlikely to care about side channel attacks anyway. So I wouldn't sell stock in cloud providers just yet.
> The business/economic implications are not clear, since eventually the only way to eradicate the threat posed by Spectre is to swap out hardware.
Is this fully accurate, there's no software mitigation available now?
From [0], the above may be true:
> There is also work to harden software against future exploitation of Spectre, respectively to patch software after exploitation through Spectre .
There is 'work'? No current patch? So Spectre is unpatched?
This point doesn't seem to be being highlighted but appears particularly important.
The real problem with Meltdown seems to occur when: 1) The offending instruction is NOT really executed because it is in a branch which is not actually taken. 2) The offending instruction is executed but within a transaction, which leads to an exception-free rollback (with leaked information left in cache though).
AFAIK neither is (or can be made) visible to the kernel (which could explain the very large PTI patch), but I do wonder if they are events that can be hanlded at the microcode level, in which case a microcode update from Intel could mitigate them.
When a load is found to be illegal, an exception flag is set so that if the instruction is retired (ie. the speculated execution is found to be the actual path taken), a page fault exception can be raised. To prevent MELTDOWN, at the same time that the flag is raised you can set the result of the load to zero.
SPECTRE is the really hard one to deal with. Part of the solution might be providing a way for software to flush the branch predictor state.
1. Badly written code where bugs are being masked by the handler. 2. Any kind of virtualization?
So, for cloud providers it looks like a 30% performance hit, but for the rest of us I would rather have a patch that stops applications handling the SEGV trap.
> or within the same space as the executing address
That's probably a good place to start from. I'm guessing there still would be issues here with JITed code coming from a untrusted source.
But no solution leaps to mind for the problem of preventing speculative code from leaking things via cache, short of entirely preventing speculating code from being able to load things into the cache. If nobody can come up with a solution for that, that's going to cost us something to close that side channel. Not sure what though, without a really thorough profiling run.
And I'd put my metaphorical $5 down on someone finding another side channel from the speculative code; interactions with changing processor flags in a speculative execution, or interaction with some forgotten feature [2] where the speculation ends up incorrectly undone or something.
Yuck.
[1]: https://blog.booking.com/hardening-perls-hash-function.html
[2]: https://www.youtube.com/watch?v=lR0nh-TdpVg - The Memory Sinkhole - Unleashing An X86 Design Flaw Allowing Universal Privilege Escalation (Dec 29, 2015)
It's going to be really hard to give up real world gains from branch prediction. Branch prediction can make a lot of real world (read "not the finest code in the world") run at reasonable speeds. Another common pattern to give up would be eliding (branch predicting away) nil reference checks.
> short of entirely preventing speculating code from being able to load things into the cache
Some new server processors allow us to partition cache (to prevent noisy neighbors) [1,2]. I don't have experience working with this technology but everything I read makes me believe this mechanism can works on a per process basis.
If that kind of complexity is already possible in CPU cache hierarchy I wonder if it's possible to implement per process cache encryption. New processors (EPYC) can already use different encryption keys for each VM, so it might be a matter of time till this is extended further.
It's possible to key the cache in the kernel on CPL so at least there should be no user / kernel space scooping of cache lines.
It's possible we can never fully prevent all attacks in same address space. So certain types of applications (JIT and sandboxes) might forever be a cat and mouse game since we're unlikely to give up on branch prediction.
I was just thinking of the part they were talking about where it was too predictable, not the rest of the issues. Instead of a single hard-coded algorithm we could switch to something that has a random key element, like XOR'ing a rotating key instead of a hard-coded one, similar to some of the super-basic hashing some predictors already do. Prefetching I just don't know what to do with. I mentally started down the path of considering what it would take for the CPU to pretend the page was never cached in the first place on a misprediction, but yeow, that got complicated fast, between cache coherency issues between processors and all of the other crap going on there, plus the fact that there's just no time when we're talking about CPU and L1 interactions.
Timing attacks really blow. Despite the "boil the ocean" nature of what I'm about to say, I find myself wondering if we aren't better served by developing Rust and other stronger things to the point that even if the system is still vulnerable to timing attacks it's so strong everywhere else that it's a manageable problem. Maybe tack on some heuristics to try to deal with obvious hack attempts and at least raise the bar a bit. More process isolation (as in other links mtanski gives you can at least confine this to a proces). (What if Erlang processes could truly be OS processes as far as the CPU was concerned?) I'm not saying that is anything even remotely resembling easy... I'm saying that it might still be easier than trying to prevent timing attacks like this. That's a statement about the difficulty of fixing timing attacks in general, not the easy of "hey, everybody, what if you just wrote code better?", a "solution" I often make fun of myself.
On the positive side, the flaw is very difficult to exploit in a practical setting.
Is it?
"As a proof-of-concept, JavaScript code was written that, when run in the Google Chrome browser, allows JavaScript to read private memory from the process in which it runs"
Moreover the option 2) already exists for large customers and security sensitive applications (e.g. CIA dedicated cloud built by Amazon).
The flag has a fixed fee in the thousands of dollars and each instance is 10% more expensive.
Thanks for the handy link.
I am by no means a computer security guru - I just did a CPU architecture course at uni and figured I'd cowboy up an implementation. It worked nearly first time, and can read both kernel and userspace pages from userspace by fooling the branch predictor into going down the wrong path, and relying on the permission checks to be slower than the data reads from a virtually addressed cache. It can only access stuff already cached though, so you can't do a full memory dump with it.
edit: well it didn't take a nation state after all: https://twitter.com/brainsmoke/status/948561799875502080 - given that, you can be sure that everybody who counts is frantically launching these on your clouds gathering whatever they can.
https://www.fool.com/investing/2017/12/19/intels-ceo-just-so...
https://elixir.free-electrons.com/linux/v4.15-rc6/source/ker...
Obviously it's not just a js bug, there are other PoCs in other languages.
I never said anything about executing code in another guest, not sure where you got that from.
You have no reason to care about a neighbor VM executing Spectre on the same physical server, since they're only hurting themselves, not you.
>Add in that Spectre specifically is a js bug so in order to be vulnerable your server would need to execute untrusted JavaScript and I think we can assume the threat surface of this specific bug is smaller outside the cloud...
"Spectre specifically is a js bug"
So... you may be able to go native to avoid having neighbours, but this does not prevent other processes exploiting your process. To do that you need to prevent any downloadable code from running.
This of course is possible and is the entire reason why the ARM document explains that most embedded systems are not affected due to the fact that they will not download and execute code (of any form).
Edit: yes, I missed the details in section 4.1 when I skimmed through. I’m not familiar with the Kocker paper, but I assume the training looks like this?
for(int i=0 i<n; i++) if(i==n-1) do_probe();
See: https://blog.mozilla.org/security/2018/01/03/mitigations-lan...
That trend might reverse if vulnerabilities like these continue to surface.
I’m not saying that shouldn’t be done, but business wise its probably usually best to instead add design changes for the latest smartphone screen.
The web isn’t a hypertext graph anymore, it’s a large JavaScript program with a thin html front now.
Most sites did nothing like that, but they did use Javascript and would break in various ways without it. At that time, there were a lot of people admonishing web developers to test their applications with Javascript disabled. Sort of like now.
ETA: I had to look it up - XHR was first available in IE 5 as an ActiveX control. The internet at large couldn't really expect it to be available but I believe that is where we first used it.
Initial release: March 18, 1999; 18 years ago
A lot of sites rely on JS to function even at a basic level these days and I think the parent was saying it's unlikely that that's going to change.
As the other comments point out, though, the biggest problem is that this is economically irrational for most site owners. The figures on JS-disabled usage I had when I was still at Google (3+ years ago now) were at the lower end of TikiTDO's range. It generally doesn't make economic sense to spend developer time on an experience used by 0.1% of users, particularly if this requires compromises for the 99.9% of users who do have JS enabled.
If you have a web app there’s no point, but if you’re displaying text and images and your site doesn’t work without JS, you’ve over-egged a solved problem.
(... While increasing perceived latency, especially for mobile users.)
(it's the more advanced version of uBlock, from the same dev)
Though of course a M7 isn't running VMs, and probably isn't running any kind of attacker-controlled code (scripting included - its there, but rare), so many of the vectors aren't present.
See the section "Reading host memory from a KVM guest / Locating the host kernel". It's terribly clever.
KASLR is not enabled everywhere, and where it is, there are other attacks to defeat it, which are mentioned in the paper.
I totally understand how the breaking out of the javascript sandbox attack works and the fact that IPT won't help with that. With Linux's clone(), you could clone without CLONE_VM and use CLONE_NEWUSER|SYSVMSEM and then unmap everything except the Javascript interpreter / JIT and leave a shared memory map and communicate only via the shared memory map and SYSV semaphores for synchronisation. Obviously this wouldn't be available on other platforms.
For variant 1, you would need to find some legitimate kernel code, accessible by syscall, that looks at least somewhat similar to the example in the Project Zero blog post:
if (untrusted_offset_from_caller < arr1->length) {
unsigned char value = arr1->data[untrusted_offset_from_caller];
unsigned long index2 = ((value&1)*0x100)+0x200;
if (index2 < arr2->length) {
unsigned char value2 = arr2->data[index2];
}
}
In practice, you may not be able to find something nice like "((value&1)*0x100)+0x200", but even if it simply used 'value' as an index, you would be able to at least narrow it down to a range. Other code patterns may work too (and potentially be more powerful?), e.g. conditional branches based on 'value'.For variant 2, see caf's answer to you in another thread.
If I'm understanding the post correctly it says that JIT's not required for Intel CPUs, but is required for AMD.
The attack allows to read all the memory. Isn't there a way to scan for passwords or ssh keys and turn that into a privilege escalation?
What I was wondering was for local OS user mode to local OS root / kernel mode access; i.e. user to kernel privilege escalation.
What would block it is flushing the branch predictor state when switching privilege levels and/or address spaces.