OpenBSD disables Intel's hyperthreading due to security concerns(mail-archive.com) |
OpenBSD disables Intel's hyperthreading due to security concerns(mail-archive.com) |
This suggests a long-term compromise solution where threads within a process can use hyperthreading to share a core, but threads in different processes can't. Given that hyperthreads share L1 cache, this might also be better for performance.
Intuitively this may sound logical, however in practice it's often not the case. For many workloads putting two threads of the same program on a core ends up being worse than co-locating with threads from different programs. The reason is that two threads of the same program will often end up executing similar instruction streams (a really good example is when both are using vector instructions (these registers are shared between the two hyperthreads)).
SMT/hyperthreading is complicated. If you have a workload dominated by non-local DRAM fetches, it's a huge win because when the CPU pipeline is stalled on one thread it can still issue instructions from the other.
If you have a workload dominated by L1 cache bandwidth, the opposite is true because the threads compete for the same resource.
On balance, on typical workloads, it's a win. But there are real-world problems for which turning it off is a legitimate performance choice.
Why is that bad?
Browsers are particularly problematic, and it would be nice to alert the scheduler that a particular process is untrusted and extra care should be taken to sanitize caches before and after its time slice.
If userspace thread writes something into a buffer, does some syscall initiating asynchronous work in the kernel wouldn't it be better for the kernel thread to be located on the same core instead of shuffling the data into another cache?
I'm not an OpenBSD user (and glad for it, if this is anything to go by), but I'm curious - is this really how they operate, or does this decision stand out?
Also seen Erlang workloads where you could get a bit of throughput increase with your VM scheduler scheduling more threads than your physical cores (so starting to use HT) but the latency would spike and become very unpredictable, which was a bad tradeoff for the use case.
When you have four-six-eight or more cores, there's less value in doubling that number. The gain is lower.
I don't have hard numbers to back this up, it's purely my personal experience/recollection. On my 2 socket P4 Xeon box, I disabled HT. On my current I7 6-core box, I have HT on.
"We really should not run different security domains on different processor threads of the same core. Unfortunately changing our scheduler to take this into account is far from trivial."
https://cvsweb.openbsd.org/cgi-bin/cvsweb/
https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/arch/amd64...
I've encountered some cases where SMT made performance worse such as with very optimized HPC libs but in general SMT can really help. Compiling projects got a nice boost when enabling HT on Intel's recent arch for example (all of this on Linux though, last time I checked OpenBSD its SMP perf was abysmal)
Many OpenBSD devs are security researchers in academia. If they hear whisphers over beers that there are new Spectre attacks coming that exploit this or that, they might not be able to reproduce the exploit without putting a lot of work into it (it's research after all), but they might be able to prevent it by making a simple change, like disabling hyperthreading.
OpenBSD cares more about security than basically any other trade-off in OS design (performance, usability, ...), so it makes sense to me that they went this way. If you want a balance of security and performance, OpenBSD is not for you any ways.
For a system aiming at security, it's a completely valid choice to disable things that start to look questionable, even if it's not conclusively proven yet. Just like potential software vulnerabilities are patched even if nobody has demonstrated that they actually are exploitable yet.
OP (and environs) has names on it that I have seen before and respect as knowing what the hell they are on about.
I'm not a OpenBSD user either, I use FreeBSD whenever possible. However from listening to OpenBSD devs, via blogs, conferences, HN, etc, it seems that OpenBSD is an operating system built mainly for OpenBSD developers, their goals support this[1]. OpenBSD being useful for non OpenBSD developers is more of a secondary goal compared to how FreeBSD or Linux or any other OS handles it. Also OpenBSD is much more of a research operating system then other large successful OS(Linux, Windows, MacOS, FreeBSD, etc). Meaning OpenBSD cares way more about developing features and novel security mitigations then trying to maintain backwards compatibility like other operating systems do.
> So... they "strongly suspect" (but don't know and haven't shown) there may be a Spectre-class bug enabled by current HT implementations and improving their scheduler is hard, so they'll pre-emptively disable HT outright on Intel CPUs now and others in the near future?
The OpenBSD devs strongly suspected another Intel hardware bug a week or two ago, implemented a mitigation and deployed it. Turns out they were right[2].
[1]: https://www.openbsd.org/goals.html
[2]: https://www.bleepingcomputer.com/news/security/new-lazy-fp-s...
This is not the feeling I get from OpenBSD at all. They don't act like research. They aren't keen on implementing new features just for the sake of it, or just to try it out. A better description would be that they put correctness, security and maintainability first, and simplicity often comes as a nice side effect. Deprecating old, unused features is just a consequence of striving to decrease complexity by trimming your code base. OpenBSD is one of the few OS where the number of lines of code is not skyrocketing to unmanageable numbers.
Honestly, I would say that this is true of many open source projects. It's one of the reasons that open source development tools are so good on Linux, but end user applications fall so far behind. It's also why documentation and usability tend to be much worse. When your system is based on volunteering, the work that gets done tends to be the stuff that interests the workers.
I don't see it that way at all. Whenever I have to work on a project where security is top concern, I always look at OpenBSD as an option. In the Linux world, the equivalent would be the Openwall GNU/*/Linux project. Not something for an average user, but to say it's used by its devs mainly is off by an order of magnitude.
In fairness, my impression from the video of Theo's presentation was that they were tipped off by someone under embargo.
OpenBSD is a research operating system and security is a core component of their research. Pro-actively mitigating security risks before exploits appear is one way to improve security that has worked in the past: vulnerabilities having been fixed before they appeared.
Because they give reasonable deadlines for companies to fix security bugs (~90 days), they are kept out of the loop by hardware vendors like Intel who requested 1 year to fix meltdown.
Being in the dark, if you see some suspicious behavior, either you protect yourself from it, or you might wake up the next day and Intel will have released a new "We are sorry" post and your users would be screwed.
So this is pretty much how they operate, and if an OS is as security conscious as OpenBSD, there isn't really a different way to operate.
Note that disabling hyper threading to mitigate CPU flaws isn't anything new either: this had to be done for AMD's Ryzen because of hardware bugs last year anyways - https://www.extremetech.com/computing/254750-amd-replaces-ry...
That's really what I was getting at with my question. There's no such thing as absolute security, it is a set of tradeoffs between usability, performance and specific security guarantees. Is there a point where the OpenBSD developers would say "okay, this is a (potential or confirmed) security bug, but the mitigation is just too costly in this case"?
In the post-Spectre world, it's not completely inconceivable to contemplate the possibility that, in order to retain the security guarantees most people thought they had, one might have to give up a substantial subset of the benefits of speculative execution in out of order processors. For some workloads that might mean up to two orders of magnitude in performance. I know roughly where the common operating systems would draw the line and I certainly know where I would, for my own usecases. I'm just curious about how OpenBSD works in this regard.
Is 90 days really a reasonable timeframe to fix something like meltdown? I agree with your whole comment in general, but hardware/microcode issues at Intel's scale are a different beast than some buffer overflow.
.. yet most likely if interacting with unix systems rely on OpenSSH.
why would relying on a feature from a vendor with known processor security issues, including undisclosed hidden application processors, for a feature which has marginal performance improvement and in some cases degredation be a preferable stance?
at best ambivalence towards this decision would be the position to take, esp. given the very recent 'oh hey fpu registers are also a problem' "discovery" which they were entirely correct about..
Some security professionals seem to insist on having a proven exploit before they act. Doesn't that seem like poor decision-making? Their job is to provide security, not to secure proven exploits - the latter is a means to an end. If there are threats from unknown exploits, and there certainly are, then it seems that they need techniques to address unknown exploits. One of those techniques is expert analysis of potential threats.
CPU bugs seem to be a rich vein to mine at the moment.
Spectre is about a) leaving side-effects of misspeculation in shared resources, and b) bandwidth contention (between a misspeculated instruction stream and an attacker) to shared resources.
It is trivially obvious that HT exacerbates Spectre-class bugs, as the entire raison d'être to HT is to share pipeline resources. How quickly information can be leaked can be up for debate, but it's definitely non-zero.
That would describe it if they...disabled it outright.
But they made HT user configurable, just like any other performance tuning knob.
https://www.blackhat.com/us-18/briefings/schedule/#tlbleed-w...
Compilation can get boosts because while some threads are waiting for I/O others are crunching source files. Also the variety of computation is high enough so multiple threads don't overlap too much on functional unit usage. If you try to build from a filesystem in memory, you'll find way a less impressive speedup (if any).
That seems to indicate it is enabled, since the old option was 'browser.tabs.remote'? Or has it changed to something else now?
1: https://ark.intel.com/products/137977/Intel-Core-i3-8130U-Pr... 2: https://ark.intel.com/products/124969/Intel-Core-i5-8350U-Pr... 3: https://ark.intel.com/products/122589/Intel-Core-i7-8550U-Pr...
You'll find out in a couple of months why they did this. I hope you'll remember the comments you've written when that happens, because you might learn something about what OpenBSD means when they say they "strongly suspect" something.
On the latter point, I did take "strongly suspect" at face value rather than code for "we know for sure, just can't disclose it all yet" because I'm not familiar with the development culture of OpenBSD.
If you want performance above all else, OpenBSD is not for you.
https://lobste.rs/s/ifr52b/openbsd_disables_intel_s_hyperthr...
As has been previously noted, Meltdown, Spectre and related exploits came about because nobody ever thought it would be possible to access the cache from other processes. 1 year is probably quite reasonable for Intel to redesign the architecture (in fact, is probably going to stretch to 2 years or more), but people need security fixes now. In this case, it looks like OpenBSD are taking the right approach.
Makes sense. 90 days is more than enough.
So the idea with SMT is that most of the time, lots of the execution units are unused because the thread a) isn't using them at all (e.g. a process to do encryption won't use the floating point units) and/or b) can't use them all because of how the program's written (for example, if I say 'load a random memory address, then add it to a register, then load another random memory address, then add it, etc' I'm going to be spending most of my time waiting for memory to be loaded.
SMT basically means that you run another program at the same time, so even if the encryption process can't use the floating point units, maybe there's another process that we can schedule that will.
However, imagine my encryption process can use 6 of the 8 arithmetic units. If I have 2 encryption processes scheduled on the same core, I have demand for 12 when there are only 8. So now I have contention for resources, and I won't see a speedup from using SMT.
Other comments mention registers and not execution units: I'm suspicious of this, since modern processors have many registers (for Skylake, 250+) which they remap between aggressively as part of pipelining. Maybe this is different for the SIMD units.
That said, I haven't looked at this stuff since university so could well be wrong on the execution unit vs register comparison.
Here's a simplified block diagram of a Skylake core: https://en.wikichip.org/wiki/intel/microarchitectures/skylak...
RISC will perform about the same and you can hyperthread one fine.
So I guessing GP meant there's going to be contention for those registers, and thus no speedup?
Hyperthreading can be a real mixed bag for performance, though generally good and a lot of engineering effort has gone in to making it shine. As ever it's strongly advisable that people benchmark real world conditions on a server, and it's worth giving a shot with hyperthreading turned on and off.
"Secure by default", turn knobs if you need more speed
"Fast by default", turn knobs if you need more security
Not that the knobs will be always available for each design decision, but sometimes they are there and you can turn them at your own risk. It probably would be wise to understand the consequences. Some people will prefer the peace of mind of knowing that safe defaults are in place if they don't change anything. Those will probably align with OpenBSD here. Some people believe that security is something you bolt on afterwards. Those definitely won't like OpenBSD design decisions.
They certainly set the defaults for some usecase, it just happens to be more security-biased than most. They don't ship an OS for an airgapped toaster, so it can't ever literally be "secure by default", it's just a compromise on the tradeoff scale that's more security-oriented than most. It still needs to be usable (for some set of people) and it still has to achieve some baseline level of performance to be usable - I was trying to get some clarity on the latter.
How often is that a polite way of saying "software that is inefficient"?
Also, to be fair: sometimes a DRAM fetch is just inherent in the problem. Big RAM-resident databases are all about DRAM latency because while sure, it's a lot slower than L1 cache, it's still faster than flash. I mean, memcached is a giant monument in praise of the pipeline stall, and it's hugely successful.
Indeed. It is arguably rational for Intel to take on the burden in a centralised place rather than expecting every two-bit software shop to to do it.
But then the existence of this kind of security issue shows that the added complexity is not always worthwhile. We might be forced to to accept that computers which actually behave well are a little bit slower than we thought. But in return they will be simpler and more amenable to software optimisation.
Trees or hashmaps which use non-local DRAM fetches can be more efficient than a brute force linear search through a continguous array given a sufficiently high enough number of elements.
At the same time continguous arrays can be significantly more efficient than linked lists which use non-local DRAM fetches.
[1]: https://blogs.msdn.microsoft.com/slavao/2005/11/12/be-aware-...
One could also say "software written with strong OOP patterns" because those are almost always written to benefit the developer later, rather than the CPU and RAM at runtime.
To take an extreme example, traversing graphs is notorious. Cray and Sun iirc have some fascinating processors with many many hyperthreads because all the programs do is wait on dram but luckily there are lots of searches that can be done in parallel.
Conversely: finding a task that is L1-cache-bound but does not frequently have to stall for memory is much harder. The only ones off the top of my head are streaming tasks like software video decode.
One task that is L1 cache bound and does not frequently stall for memory (if you code it up well) is matrix multiply.
Just recently I noticed that when Firefox loads multiple tabs of the same wordpress site, it starts hanging not unlike Firefox always used to hang. That's likely because it groups all of those same site pages under one process.
I've never experienced that with Chrome. This is why I hope Firefox eventually (ASAP) switches to one process per tab, too. I can handle the browser using an extra GB of RAM. I can't handle it hanging on me and frustrating me.
Instead of pushing for 30-40% lower memory than Chrome, I say they should push for 10% lower memory with the same sanboxed process per tab model.
http://dev.chromium.org/developers/design-documents/process-...
FWIW I do not have the problem you describe and I don't want Firefox wasting any more of my scarce memory, or for that matter, CPU.
Not really. Chrome uses a lot of processes for isolation. Firefox uses about four so it can take advantage of multiple cores.
Right, but I doubt that it's for exactly the reason you think it is: Chrome doesn't blindly do "one process per tab" anymore, and hasn't for a bit.
Strict site isolation Security mode that enables site isolation for all sites. When enabled, each renderer process will contain pages from at most one site, using out-of-process iframes when needed. When enabled, this flag forces the strictest site isolation mode (SitePerProcess). When disabled, the site isolation mode will be determined by enterprise policy or field trial. – Mac, Windows, Linux, Chrome OS, Android
After the way you seized on the word "typical", I kind of expected you to take words at face value. I didn't see any text to the effect that he thinks his say so is good enough.
Also, you're the one frequently mentioning how varied workloads are, and you don't constitute the discussion.
I'm going to go drink some cocoa to wash down this hook, line, and sinker I swallowed with your flame bait.
It just so happens that people complaining about Firefox doing it wrong is a pretty common thing in Firefox threads on HN. And they usually have an example where it's really unclear if it's a problem for more than them or not. But, usually, they have a lot of advice about what the Firefox team should do. Whereas the Firefox team has telemetry data from most of their users.
Certainly not always, but often enough and more so than usual with Spectre and Meltdown.
> As an example of simplicity, I am not personally aware of any install that is so simple as theirs. Except maybe ubuntu's, but then with ubuntu you end up with a mess of interdependent packages and it will be a hell to uninstall shit you don't need.
That's an interesting point. How does it compare in terms of simplicity to the other BSDs (FreeBSD and Dragonfly) or something like Arch Linux?
I'm not familiar with FreeBSD and DragonFly, but I have used NetBSD in the past and a bit of Arch Linux. The system management is way more consistent in OpenBSD, things generally work and are more reliable. The package management system is a pleasure to work with, and when you want to remove unused packages or dependencies of previously installed packages, it's simple and consistent. It actually works. When you are configuring something, most of the time there is one single way to do it, and it's well documented. And the simplicity can't really be compared to Arch Linux. Fire up a vm and install OpenBSD to it, just for the experience. It's mostly just accepting the defaults, extremely simple.
I'm pretty sure those are meant to be, and I think are, "typical" for the general purpose CPU in use, and thus the general case.
Both mobile and desktop CPUs will be doing DOM layout, DB queries (whether to SQLite or the registry or just the filesystem), and possibly computing game logic on a regular basis.
"Typical" is highly varied, and it changes.
Edit: here's an example: Google brings on-device machine learning to mobile with TensorFlow Lite
https://thenextweb.com/artificial-intelligence/2017/11/15/go...
BTW, this is just one example algorithm that I picked because it does (on the cpu) what the person I replied to said was rare.