How do you protect against someone controlling the hypervisor? Read the paper. But the high level is to encrypt memory using keys that cannot leave the processor and are only available to a specific VM ASID (Address Space Identifier), assisted by a secure firmware similar to the Secure Enclave. Attestation uses an on-chip certificate signed by an AMD master key during fabrication.
There were some discussions on this on the linux-kernel mailing list [2]. As I understand it, the current generation of SEV is still somewhat leaky, but there's no fundamental reason why those leaks cannot be closed.
[1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/... [2] http://www.mail-archive.com/linux-doc@vger.kernel.org/msg025...
With this, the party in control of the system is also in control of that, so every time a new vulnerability is found they can exploit it before patching it to retroactively get access to your data. Or never patch it at all and use the vulnerability itself to forge attestations that the vulnerability is patched.
Depending on how determined you imagine your bad actor host, you can probably never get around things like "zero day is discovered, host disconnects guest from internet preventing you from patching zero day, exploits guest".
Or are you talking about vulnerabilities in SEV itself?
Here's one application for the red team: AV-resistant malware, rootkits and next generation APTs
Homomorphic encryption would allow for "true security" where the party doing the computation doesn't ever have the encryption keys necessary to see what data they're operating on. This is something more akin to a TPM. The key that can read all of the data is in the possession of the party doing the computation, but it's stored in the CPU and the CPU will not give that key to anyone. Theoretically the key could be read off of the CPU but in practice this would require either a flaw, sidechannel, or a lot of time with an electron microscope.
For practical purposes, I believe that all implementations of secure cloud computing are going to be like this where the key is just secured physically. It's possible with homomorphic encryption to have someone securely do computations on data that they can't see all in software, but I just don't see any major breakthroughs happening that would make this fast enough to be practical.
> Attestation uses an on-chip certificate signed by an AMD master key during fabrication.
This is absolutely fantastic for security in the cloud, but it is important to note that this will not protect against nation state level actors.
Rest assured that the USG will obtain the AMD master signing key with or without AMD's permission. Other nation states may do likewise. The rest will have to wait for a leak, and if that key is leaked this feature will become almost nonexistent.
Even a less that perfect protection from the hyper-visor may still have some value.
I'd be more worried about the performance overhead, personally - I can't imagine using this if the impact is significant, and it seems like it almost has to be.
However, they seem to have a strong technical pipeline and they have historically punched above their weight-class. Does it look like they are going to make it?
I guess a big benefit of the separate caches is that if only half cores are in use, you can power half of it down, saving power and TDP.
It also has a bandwidth problem. If 64 threads are vying for access, you either build it with few access ports and it gets choked, or you build it with many access ports which is costly in area, power, & speed.
Two separate peer caches automatically have twice the bandwidth of one similar double-size cache, for the price of NUMA & cache coherency challenges.
There is no one right answer here. Bandwidth is far more important and coherency much easier in a small L1; as you go down the hierarchy, bandwidth needs shrink and coherency is more expensive.
The architecture itself sounds pretty much like what everyone was expecting, a traditional fat and wide core. Their power management and foundry process will probably make the difference as to whether final performance is impressive or not, may also be the cause of the delay.
Anyway, the first benchmark is promising, and I hope Zen can also keep up with Broadwell performance in other benchmarks/workloads, as well as in power efficency.
It's a good mix between high-level and highly-detailed.
It does have a brief introduction on some of the basic execution fundamentals but then it jumps right in, so you will probably need some external introduction if you are not generally familiar with the topic.
If it haves it, I would avoid it like a pest, and get an FX-8370 or 8350 to replace my now aging FX-4100. The last thing that I like to have on my computer is a hidden uncontrollable CPU doing things that could affect to my privacy.
If you can tolerate using a different CPU architecture, Raptor Engineering's Talos Secure Workstation looks very intriguing. https://www.raptorengineering.com/TALOS/prerelease.php
and those have cpu wide ring0 escalation bug https://www.blackhat.com/docs/us-15/materials/us-15-Domas-Th...
i7 just says you're going to get the top of the performance (and price) list for a desktop/mobile processor.
Just like Intel had the P4 hole that it had to drag its way out of, so now AMD has had Bulldozer. Notice how Intel has been quite conservative with each individual tick/tock, trying to keep their pipeline full. Doing crazy changes risks causing a pipeline stall which could last years. Each new architecture is risky, and AMD screwed up with Bulldozer. From early signs it looks like Zen is a winner, hopefully AMD can stick with it for a while.
Mankind Divided's recommended specs are FX-8350 or i7 3770. The price difference between the two in their heyday was $100 in AMD's favor.
It took about the same amount of time for Intel to release the Core and Core 2 architectures after realising they had made a huge mistake with the Pentium 4.
I've heard some people theorising that Intel might have worked their current architecture into a corner and they might have problems innovating out of it. I guess we will see when information about Zen's performance shows up.
Edit: As the day goes on it seems like they're trading back over $7, so obviously the market disagrees with me :) They were trading around $10 before losing profitability, and in the $20-$40 range during their mid-2000s heyday, so maybe the market is expecting performance closer to that. I think that is optimistic still, but again who knows.
Obliquely, also remember that while Nvidia is winning on the "pure" GPU front with the awesome 1080 etc, the major problem for Nvidia is that its tech needs a host processor, and its ARM attempts are going nowhere (Nintendo NX notwithstanding). AMD does not face this problem. It's becoming clearer that pure-parallel is not always optimal. Hybrid GPU/CPU architectures have a lot of upside as we are seeing with the Xeon Phi use cases, which smackdown on Nvidia bigtime as soon as you mix even the slightest bit of dependency in your algorithms.
I am very bullish on AMD. I believe its stock has double potential, because the price is so catastrophically pessimistic already. And without even talking market valuations, I think we have had enough of monopoly-style price gouging on Xeon and Tesla.
Whatever is happening in the company now is what you will be seeing in 2021 or 2023. Whether they will make it depends on how well the managing team handles that long lead time - for their leaders to give the engineers and product people as much time as possible to keep the company alive until each product comes into being.
No their not, there Q2 financials had them in black.
Probably, but I still think they would do far better with an owner like Qualcomm (granted Qualcomm would still have enough money in the bank to actually do something interesting with AMD after the acquisition).
1. https://www.amazon.com/Computer-Architecture-Fifth-Quantitat...
2. https://www.amazon.com/Inside-Machine-Introduction-Microproc...
Albeit, slightly older and very technical.
Charles Petzold's "Code: The Hidden Language of Computer Hardware and Software" is also well regarded, but is aimed more at non-technical people.
My main system is still running a i7-2600 from over 5 years ago. That GTX 680 I have in there is still plenty fast. The upgrade question is: how pretty do I want Star Citizen to be?
It is still completely fine for everything I ask of it even against the much newer machine at work, with the upgrade to an SSD a while back it basically felt like a new machine.
Not necessarily. Bandwidth to main memory is already typically several times less than to L1 or L2 caches. If processor caches are not encrypted, then it seems conceivable that you could have some dedicated encryption/decryption silicon and it probably wouldn't even have to be as fast as it would need to be for general purpose use (like the Intel AES instructions).
Even if it does cost bandwidth or latency to main memory, if it's by a small enough amount it could still be a worthwhile tradeoff for some applications.
Definitely agree. I know we like our security systems impervious to anyone and everyone, be they script kiddie or the entire NSA. However if you hand your machine over to someone else to run, (in either VM guest or box form), I think you need to acknowledge that you are incontrovertibly accepting slight vulnerability to the colo/VM host.
In theory you can't actually do it at all. The key is inside the chip, the attacker has physical control over the chip, an attacker with enough resources is going to be able to extract the key. You have no hope against a state-level attacker or even many university research departments. The assumption seems to be that the attacker won't be that sophisticated.
The problem is there are also likely to be attacks which won't require significant resources once published. Researchers are always coming up with new ways to extract keys from "tamper proof hardware" using timing or power consumption or whatever else. Some future version of the hardware will protect against that specific attack but that's too late for all the secrets you trusted to the current version.
I've always wondered why intel does not buy AMD... is it just because of an expected anti-trust lawsuit? With Nvidia, ARM, IBM, it does not seem that a monopoly in x86 would dominate ALL architectures, but perhaps it would not be viewed that way.
Disclaimer: I don't know sh*t about hardware design as you can probably guess from my posting. ;o)
Same situation with Zen, if they're going to include even a Polaris it would be highly memory constrained if it had to hit system RAM all the time, so another fat chunk of memory on-die will be necessary to not starve it and keep latency down (as it stands the RX 480 can pump 256GB/s).
Could the CPU also profit from the HBM memory? The bandwith is much better than with DDR4 main memory (even if it is 2 or 4 channels), and I would guess the latency as well because it would be on the same die?
CPUs are designed in a pipelined manner -- once the high-level team finishes up their work and passes it on to the low-level guys, they immediately start working on the next version, with the first version still years away from release. The total cycle from drawing board to store shelves is >5 years, and the skill sets required of high-level designers are very different than those required to turn it into silicon, so this just allows them all to have something to work on.
The long lag time is why chip companies can have disasters like P4 or Bulldozer, and they last so long. When some of the basic assumptions are wrong, the designers won't actually find out until after the design hits silicon, at which point the next two versions are already pretty much locked.
UPDATE - Oh mutagen's point about SSD is absolutely spot on... the faster the storage the better _first_... then worry about the rest of that stuff I mentioned.
It depends what you mean by "userland". The purpose of SEV is to allow a guest VM (using hardware virtualization) to run without trusting the host, including remote attestation. Traditionally hardware virtualization is used to run a full operating system which was installed at the impetus of the user, but there is no rule that it can only be used for that. If this feature is enabled on desktop parts, it's equally possible for black box DRM software running, say, on a non-virtualized Windows system, to include a small unikernel and automatically set it up to run in SEV mode. The whitepaper proposes that people running VMs in the cloud use remote attestation to upload disk encryption keys such that the VM can only decrypt the disk if it hasn't been tampered with, but the 'cloudiness' could just as well go the other way: cloud DRM servers sending decryption keys, for both video and perhaps the code itself, to enclaves on desktop PCs.
Using SEV alone for DRM would have a significant limitation compared to using the PSP: since all interaction with the outside world is still through the host, it would be hard to prevent the host from grabbing the raw decrypted video data as it leaves. But this still prevents recovering the original bitstream; allows 'perfect' obfuscation of many facets of how exactly the code works; and could probably be used in combination with the PSP in some manner. And in some DRM applications, the ability to grab the output may not matter. Imagine a video game where the bulk of the game was inside an enclave, preventing piracy but also all reverse engineering and modding.
Of course, a video service or video game that only runs on AMD CPUs won't get very popular... but conveniently, Intel is coming out with their own feature, SGX, that provides similar capabilities, though with a different design (it's designed more directly for the DRM use case). One might imagine that eventually most systems will have CPUs that support one or the other.
Wasn't this part already paved earlier by Microsoft, when Hollywood wanted to guarantee no unencrypted HD video leaves the PC? It might have weaknesses but the principle is already estabilished.
A secure crypto path from black box VMs to smart TVs also leaves the door open for all kinds of nasty scenarios involving TV pwnage. You also will have no way of decrypting the data that the VM exfiltrates from your PC.
That is a big fucking disadvantage.
If I buy software (buying a license also counts, if you’re in the EU, or renting a license), I want to be able to use it like I’d use a table I buy: I can saw one leg off, repaint it, turn it into a chair. I want to be able to mod the game, skin it, theme it, do a total conversion.
This is preventing me from using my rights.
If you're talking about the PSP or ME then I agree, they are dangerous and the inability to gain any insight into what they do means they should be considered hostile entities (especially if they may have access to the internal CPU memory where the encryption keys are stored).
If an event is set to happen in an hour, you can be fairly certain it will happen, so it should be strongly 'priced in'. If it's happening months or years out, a lot could happen in the meantime, thus it's not 'fully' priced in. It's not a discrete event, but rather some sort of curve (perhaps sigmoid?) that depends on perceived risk.
IBM does something similar (though not for graphics) in recent POWER CPU's with the Centaur memory controller(s), they are off-chip memory controllers with a bunch of eDRAM to act as a L4 cache (though the difference here is each system has multiple centaur controllers to handle different DIMM slots). They're able to burst to ~96GB/sec to system memory using this, having a good amount of on-package HBM would probably yield similar gains :)
That's basically the point of Zen, Bulldozer was an architectural dead-end that wasn't going anywhere.
Besides, it's not like Intel have massively innovated since Sandybridge. Ivy, Haswell, Broadwell and Skylake are little more than successive perfections of the Sandybridge architecture.
It's hard to tell from the slides, but it looks like Zen is a much wider architecture than Intel, with 10 execution ports (4 ALU, 2 AGU, 2 FP ADD, 2 FP MUL). Sandybridge had 6, Haswell and later have 8 execution ports. Bulldozer had 4 integer execution ports plus 2 float ports, which are shared between each pair of cores.
The most interesting thing about those slides is the layout of the blocks marked "Scheduler". Intel chips all have a single scheduler, Bulldozer had one float scheduler (shared) and one integer scheduler. But I'm counting 7 schedulers on the Zen slides, one float scheduler managing the 4 floating point execution ports and 6 integer schedulers, one for each execution port.
The text mentions it has to fuse the four FP ports to do a single 256-bit AVX per cycle. This is significantly less wide than Intel architectures (half/quarter). We can interpret the width thus as 4+2+1 ports, which is in the Haswell ballpark.
What is maybe more telling here is the 16-byte load/stores, Haswell is doing 32-byte at the same rate. It points to Zen abandoning FP bandwidth in both client and server. Perhaps they want to rely on GPGPU with the on-chip GPU to do compute workloads?
> The most interesting thing about those slides is the layout of the blocks marked "Scheduler". Intel chips all have a single scheduler, Bulldozer had one float scheduler (shared) and one integer scheduler. But I'm counting 7 schedulers on the Zen slides, one float scheduler managing the 4 floating point execution ports and 6 integer schedulers, one for each execution port.
Depends what they mean with Scheduler. If it means reservation stations for micro-ops, then that's already the case in other micro-architectures. If Scheduler means assigning micro-ops per port, than there can logically only be a single one.
The FP contention between the cores in a Bulldozer module makes all recent AMD chips perform objectively worse in most benchmarks than their peers from Intel.
Intel's architecture isn't a priori a goal to achieve. Intel's performance in real-world workloads is a good goal.
There are some heavily-threaded, integer-heavy workloads that Bulldozer and related parts are still incredibly competitive at, even compared to current-gen Intel parts. For the right workload, a Bulldozer-family processor can be a real screamer and they are priced incredibly aggressively. We should recognize, though, that the architecture is high performance only for these specific workloads.
Perhaps AMD should have pursued more innovative architectures. I am not saying that Intel's is perfect. But it is important to note that for current general purpose computing workloads, Intel's architecture is superior to Bulldozer.
I see the evidence of that happening with AMD GPUs versus Nvidia, but versus Intel processors?
Can you explain?
Now that newer software is using more threads, the old FX gets a big performance boost while the old i3 is only the same speed it ever was.
4+2+2, no need to combine all 4 ports, just the two multiplies or the two adds.
The text is speculation of the journalist. There It's possible that each port is actually 256 bits wide and fusing them is only needed for the 512bit AVX instructions that Intel don't even support yet.
Even if AMD are splitting the 256 bit fpus in half, that is still a huge win over average code, because 128bit SSE instructions are much more common than AVX instructions, and AMD can execute upto four of them per cycle.
Even Intel disable the upper half of their FPU most of the time to save power, AVX instructions get split into two 128bit micro-ops unless until a threshold is encountered and the upper half powers up.
> If Scheduler means assigning micro-ops per port, than there can logically only be a single one.
I assume that means one Re-order buffer per port. Bulldozer already had two Re-order buffer, one for float instructions and one for interger instructions, which proves multiple ROBs for different ports are possible. You just need to track dependencies across ROBs.
I'm guessing that tracking deprbdiencies across 7 schedulers is not much harder than tracking deprbdiencies across 2.
http://media.bestofmicro.com/Q/0/378072/original/AS-SSD_Sequ...
Seriously, invest into NVMe if you are video producer (I can't imagine processing my 4K movies on SATA SSD or HDD, even 24fps playback on SATA SSD can't happen in RAW format as it needs >1MB/s) or you do some heavy I/O server stuff. If you don't do any of the above, invest your $ into capacity instead, i.e. given 512GB NVMe vs 1TB M.2 SATA I'd go with 1TB one.
As if that would work today – look at No Mans Sky, [Alt]-[Tab] already doesn’t work.
Wait 2 years, and we’ll see exactly this. Already today DRM is often implemented as kernel modules, and the OS – especially on windows – prevents debugging for normal users.
Actually, much more than that. 3840 * 2160 * 3 bytes * 24 Hz = 569.53125 mebibytes / second