CPU Bugs (2018)(danluu.com) |
CPU Bugs (2018)(danluu.com) |
(The code below is likely to have bugs of its own - I wrote it from memory as an illustration of the CPU bug - and thanks to 'tlb' for catching an error in my first draft. I also left out the question of what data segment the various MOV instructions use for their memory references, as it isn't relevant to this CPU bug.)
If you needed to work in a different stack from the one you were currently running on, you might do something like this:
mov saveSP, sp
mov sp, mySP
...
mov sp, saveSP
This saves the original SP (Stack Pointer) register, loads it with your private value, and then restores SP when you are done.Suppose you wanted to switch not only to your own stack pointer but also your own stack segment. With 16-bit registers you could only address 64KB at a time, and you would need to change a segment register to access memory outside that range.
So you would save, change, and restore both the SS (Stack Segment) and SP registers:
mov saveSS, ss
mov saveSP, sp
mov ss, mySS
mov sp, mySP
...
mov ss, saveSS
mov sp, saveSP
Now imagine that an interrupt triggered in between one of the changes to SS and the matching change to SP. The interrupt code would now be running on the new stack segment but the old stack pointer, corrupting memory and crashing.Not to worry! Intel had your back. The documentation promised that after a MOV SS or POP SS, interrupts would automatically be disabled until the next instruction (the matching MOV SP or POP SP) completed.
But they kinda forgot to implement that feature. So if you followed the docs, you would have these very rare and intermittent crash bugs.
Word got around fairly soon, and the fix was simple enough, disable interrupts yourself around the paired instructions:
mov saveSS, ss
mov saveSP, sp
cli
mov ss, mySS
mov sp, mySP
sti
...
cli
mov ss, saveSS
mov sp, saveSP
sti
This still left you unprotected against NMI (Non-Maskable Interrupt), but by the time most of us built NMI switches for our IBM PC's, we'd also upgraded to newer CPUs with this bug fixed. It was only the earliest 8088s (and perhaps 8086s) that had the bug. push sp
mov sp, myPrivateSP
...
pop sp
work? Isn't it popping from the private stack, while it was pushed on the regular stack?Updated now, hopefully this will be a more plausible example. Let me know if you spot something else! :-)
A most prescient remark in 2014.
Here's where they are more recently:
https://www.zdnet.com/article/intel-fixed-236-bugs-in-2019-a...
https://www.techradar.com/news/latest-intel-cpus-have-imposs...
How does that work for Apple's M1?
This ignores the fact that there can be security exploits.
That testing is a cost is a given. But it's a known cost compared to what a huge batch of faulty CPU's can cost. Or how about a ruined reputation, how do you even know what that could cost you?
I suppose Intel already use a lot of automated testing, but given all the bugs since the change it seems it is not enough.
Those have microcode that is more extensive than traditional CPUs though.
The various spec-ex workarounds actually matter more on things like cloud servers than they do on dedicated/controlled hardware.
It's a long list for some CPUs, i.e. Sandy Bridge was released in 2011 and got its most recent microcode update in 2020.