Never Bet Against x86(osnews.com) |
Never Bet Against x86(osnews.com) |
It says that x86 is highly standardised - even with different combinations of chips, peripherals and motherboards you know it will work just fine. It's not the case for ARM systems - can you even have something similar to IBM PC with ARM?
I personally know that adding support for ARM devices on Linux is a huge and manual task - e.g. look at devicetree, it's a mess. There is no standard like ACPI for ARM devices, so even powering off the computer is a problem, everything is proprietary and custom.
I don't agree with the article though, x86 is dying and my worry is that ARM devices will bring an end to such an open platform like modern PCs are.
Yes, it's called SBBR which requires UEFI and ACPI. It is more common on server hardware than on consumer-grade embedded devices. The fact that it is not ubiquitous is really holding back ARM.
Device tree does the same thing and it's open source. Even if you only extract it in binary form a proprietary kernel or uboot, you can decompile it very easily.
AFAIK, ARM does not have port mapped i/o, so that makes it difficult to really match up with the PC. That said, an OS can require system firmware to provide certain things and you get closer to an IBM like world. Microsoft requires UEFI for desktop Windows (maybe WP8 and WM10 as well, but I believe those were effectively limited to specific qualcomm socs, whereas I feel like Desktop windows is supposed to be theoretically open to anything that hits the requirements).
ACPI for ARM is a thing that exists, but not all ARM systems will have it. Technically, not all x86 systems have it either, but for the past several generations of Intel and AMD, all the hardware you need for ACPI is embedded in the CPU, so only old hardware or really weird firmware would be missing it. Also, PC i/o is so consistent, either by specification or by consensus, that it's easy to detect hardware anyway: PCI is at a specific i/o port by specificiation; cpu ID/MSR lets you locate on-chip memory mapped perhipherals that's aren't attached via PCI, and PCI has specificied ways to detect attached hardware. There's some legacy interfaces that aren't on PCI that you might want, and you need ACPI to find them properly, but you can also just poke them at their well known address and see if they respond. AFAIK, you don't get that on other systems... many perhipherals will be memory mapped directly, rather than attached via PCI; the PCI controller/root is not at a well known address, etc, every system is a little different because there's no obvious system to emulate.
Mostly ACPI is about having hardware description tables in a convenient place for the OS to find it. Certainly standardized understanding of power states and the os-independent description of how to enter them is important too.
There are/were other proposals, but if you want something like UEFI and ACPI, and you have clout, you can just require it for systems you support. The market problem is Apple doesn't let their OS run on anything non-Apple, and Android has minimal standards in this area; whereas the marketplace for software for the IBM PC relied heavily on the IBM BIOS, the marketplace of software for Android relies on features of the OS; SoC makers can build a custom kernel with the hardware description hardcoded and there's no need to provide an in firmware system of hardware description. Other OSes lose out because they too need custom builds for each SoC.
Modern PCs are NOT open platform anymore. Not since signed bootloaders, UEFI, secure boot. ARM on the other hand, as long as they don't require signed bootloaders (like phones) or a closed source driver for GPU or something, are in fact open.
Any competent computer engineer can design a much better ISA than RISC-V.
The problem is that designing a CPU ISA is easy and it can be done in a few weeks at most. On the other hand, writing all the software tools that you need to be able to use an ISA, e.g. assemblers, linkers, debuggers, profilers, compilers for various programming languages etc. requires a huge amount of work, of many man-years.
The reason why everybody who uses neither x86 nor Arm tends to use RISC-V is in order to reuse the existing software toolchains, and not because the RISC-V ISA would be any good. The advantage of being able to use already existing software toolchains is so great that it ensures the use of RISC-V regardless how bad it is in comparison with something like Aarch64.
The Intel ISA, especially its earlier versions, has also been one of the ugliest ISAs, even if it seems polished when compared to RISC-V. It would be sad if after so many decades during which the Intel/AMD ISA has displaced other better ISAs, it would eventually be replaced by something even worse.
As one of the main examples of why RISC-V sucks, I think that any ISA designer who believes that omitting from the ISA the means for detecting integer overflow is a good idea deserves the death penalty, unless the ISA is clearly declared as being a toy ISA, unsuitable for practical applications.
It’s a crapshoot. That’s why whenever anyone recommends a certain cool Arm motherboard or mini PC, the first thing you have to figure out is what its software support situation is like. Does the OEM provide blessed Linux images? If so, do they offer more than an outdated Ubuntu build? Have they made any update promises?
Almost every ARM board I've got is running ancient kernel images that were out of date even when they were released and haven't got any newer since then, but that's positively great compared to the RISC-V situation where you feel like you're taking your life into your hands every time you try and update it. The last update I did, to a popular widely-used board, took close to a full day to progressively reflash different levels of boot loaders and kernel images and whatnot, repartition the MTD for each reflash, hack around hardware and boot params via the serial interface through trial-and-error, and slowly work my way up to a current already out-of-date firmware and kernel config.I really hate to like x86 but I know that when I set up an embedded x86 device it's flash, apt-get update/upgrade, and I've got the latest stuff running and supported.
ARM is a complete jungle by comparison. Each ARM manufacturer licenses a different UART, different USB, different PCIe (or none at all), different SATA, different GPU, different audio even if it's just I2s, different I2c, different SPI, different GPIO controller, different MMC/SDHCI, etc. etc. And each one needs, of course, a different driver!
The big mistake ARM (the company) made was to design only CPUs, not complete SoCs with peripherals, or at least require standard I/O addresses. And now they're trying to patch it up with UEFI and ACPI: closed-source ring -2 blobs that will never be updated or bug-fixed by any manufacturer.
But now? I can’t think of a single segment where x86 is doing well. Its out of mobile entirely, it’s slowly getting squeezed out of servers as e.g. Graviton takes over, it has no presence in the AI gold rush, and in consumer desktops/laptops it’s position is precarious at best.
I’m quite bearish on x86.
And in terms of squeezing out of servers, this is happening way more slowly than you're implying.
I say this as a person running an NVIDIA Spark as my daily driver. We're not there yet.
The reason ARM is making inroads in the server market is we've reached the point where cooling is a significant cost factor in server farms, so lowering TDP is starting to become a relevant factor in total cost.
x86 will lose to ARM/RISC in gaming only if those chips provide a better gaming experience.
On the conclusion - x86 didn't eventually win in smartphones.
And of course having a choice of processor designs from precisely two firms is absolutely something that we should continue to be happy with (and the post ignores RISC-V).
Being long AAPL and NVDA has crushed AMD and INTC, and that's with AMD's gains which I would argue are mostly due to non-x86 chips. Even Broadcom + Qualcom + ARM has been a better basket to hold for most of the last 5 years.
While PCs still need x86 because of the standardization the article talks about, more appliance-like computers like mobile phones and even server hardware have stolen a lot of market share and I think are the dominant way people will do their computing in the future. This comment was written on a m2 macbook that I use to ssh into a gb200 server.
- Apple took over the single-threaded crown a while ago.
- ARM also caught up in integer workloads.
- ARM Cortex is still behind in floating-point.
- Both are behind in multithreaded performance. (mostly because there are more high-end x86 systems...)
- Both are way behind in SIMD/HPC workloads. (ARM is generally stuck on 128-wide, x86 is 256-wide on Intel and 512-wide on AMD. Intel will return to 512-wide on the consumer segment too)
- ARM generally have way bigger L1 caches, mostly due to the larger pagesize, which is a significant architectural advantage.
- ARM is reaching these feats with ~4.5Ghz clocks compared to the ~5.5Ghz clocks on x86. (very rough approximation)
Overall, troubling for x86 for the future... it's an open question whether it will go the way of IBM POWER, legacy support with strict compatibility but no new workloads at all, or if it will keep adapting and evolving for the future.
I have a virtual instance of Win11 ARM running in UTM on my MBP. It's honestly been surprisingly rare that I have to figure out how to run something that requires x86. More and more Linux distros have an ARM version that I can run if I need to.
So it's really more like mid-80s or early 2000s, not late 70s.
You can't run a COM program today. Not without emulation. Recent PCs can't even run DOS EXE because they're missing the BIOS interrupts most DOS programs use.
That variable-length encoding and strongly ordered memory model will do x86 in sooner and not later.
This is a funny thing to say when Valve hasn't actually released any ARM device yet, and the Steam Deck is still fully reliant on x86. The ARM hardware they do plan to release relies on x86 emulation, which is something that historically usually doesn't pan out.
Since they had control over the hardware, they could punt on one of the hard parts of Rosetta and bake it into Silicon.
Understanding the memory ordering requirements from binary without source and without killing performance by being overly conservative (and hell, the source itself probably has memory ordering bugs if it was only tested on x86) sounds next to impossible.
They also use emulation backing this project: https://github.com/FEX-Emu/FEX
But that’s changed somewhat - Apple has managed a larger mind and market share (while switching into ARM). The vast majority of uses are now available on the web, which is CPU agnostic, and there is a huge amount of open source software available.
The only things for which x86 still shines a little brighter are games, and native office. But office is mostly available on web, on Mac, and on Winarm. So games. Which aren’t big enough market mass to sustain the x86’s popularity — and is a segment (soon) under attack by Valve.
So by that measure the future of x86 seems to be less troubling today than it was 5 years ago.
On the other hand, most ARM-based CPUs, even those of Apple, have quite poor performance for things like arithmetic operations with floating-point numbers or with big integer numbers. Geekbench results do not reflect at all the performance of such applications.
This is a serious problem for those who need computers for solving problems of scientific/technical/engineering computing.
During the half of century when IBM PC compatible computers have been dominant, even if the majority of the users never exploited the real computational power of their CPUs, buying a standard computer would automatically provide at a low price a good CPU for the "power" users that need such CPUs.
Now, with the consumer-oriented ARM-based CPUs that have been primarily designed for smartphones and laptops, and not for workstations and servers, such computers remain good for the majority of the users, but they are no longer good enough for those with more demanding applications.
I hope that Intel/AMD based computers will remain available for a long time, to be able to still buy computers with good performance per dollar, when taking into account their throughput for floating-point and big integer computations.
Otherwise, if only the kinds of computers made by Apple and Qualcomm would be available, users like me would have to buy workstations and servers with a many times lower performance per dollar than achievable with the desktop CPUs of today.
This kind of evolution already happened in GPUs, where a decade ago one could buy a cheap GPU like those bought by gamers, but which nevertheless also had excellent performance for scientific FP64 computing. Then such GPUs have disappeared and the gaming GPUs of today can no longer be used for such purposes, for which one would have to buy a "datacenter" GPU, but those cost an arm and a leg.
I see x86 on top (the first valid result is 6841, which is x86), if that is the sole benchmark we're going to look at. You can further break that down into the individual tasks it performs, but I'm not going to :-)
> - ARM generally have way bigger L1 caches, mostly due to the larger pagesize, which is a significant architectural advantage.
Larger pages mean more potential for waste.
Not to bash on x86 or anything, but that's an outlier. Very overclocked with a compressor chiller or similar. Also the single-threaded and multi-threaded scores are the same; it's probably not stable at full load across all cores.
I don't think that's really representative of the architecture at scale, unless you're making the case for how overclockable (at great power/heat cost) x86 is.
X86 is de-facto standardized by vendor fiat.
ARM is in an unfortunate middle ground.
It is hard, but Microsoft came up with a hack to make it easier. MSVC (since 2019) annotates x86 binaries with metadata describing the codes actual memory ordering requirements, to inform emulators of when they need to be conservative or can safely YOLO ordering. Obviously that was intended to assist Microsoft's Prism emulator, but the open source FEX emulator figured out the encoding (which I believe is undocumented) and implemented the same trick on their end.
Emulators still have to do it the hard way when running older MSVC binaries of course, or ones compiled with Clang or GCC. Most commercial games are built with MSVC at least.
dtc -I dtb -O dts -o devicetree.dts blob.dtb
Big, biiig, biiiiig difference!
PS: You can also examine it directly as a directory tree in /sys/firmware/devicetree/*
Any competent computer engineer can design a much better ISA than RISC-V.
Hello, my fellow bitter old man! I have to respectfully disagree, though. Firstly, RISC-V was actually designed by competent academic designers with four preceding RISC projects under their belt. The tenet of RISC philosophy is that the ISA is designed by careful measurement and simulation: the decisions are not supposed to be based on gut feeling or familiarity, but on optimizing the choices, which they arguably did.Specifically, about detecting the overflow: the familiar, classic approach of a hardware overflow (V) flag is well known to be suboptimal, because of its effect on speculative and OoO implementations. RISC-V has enough primitives to handle an explicit overflow checking, and they are consistent with performance techniques such as branch prediction and macro fusing, to the point of having asymptotically vanishing cost--there can be no performance penalty. Even more so, the RISC-V code that does NOT care about overflow can completely ignore these checks.
I think that any ISA designer who believes that omitting from the ISA the means for detecting integer overflow is a good idea deserves the death penalty
Given that the C standard (C99 §3.4.3/1) declares integer overflow to be UB which means the compiler can and often will do anything it damn well pleases with your code, I can understand why the RISC-V designers, under the influence of the stupidity of the C standard, could leave out overflow detection. I'm not saying it's a good idea, in fact it's complete and utter braindamage, but I can see where they got it from.This is explicitly documented in the spec, even.
e.g.
add t2, t0, t1
bltu t2, t0, overflow
The implication that the ISA is not designed by competent engineers does not pass basic scrutiny, either.You've missed a huge segment:
Random in-house apps or niche vertical market apps that are closely tethered with a business workflow to the point that replacing them is a massive undertaking, where the developers at best aren't interested in improving anything and at worst no longer exist.
Embedded/hardware is the last segment still not replaced by web.
I support a lot of dental practices using Patterson Eaglesoft and they still don't officially support VMs in any form, even for the server (despite it working fine) while they have removed all support for using terminal services. Obviously the basic application works fine, but a dental practice needs to be able to take digital x-rays. Shock the sensor drivers only exist for Windows and back when RDP and Citrix were supported it required a special bridge running on both the client (which of course still had to be Windows) and the server.
We used some thin clients back in the day for front desk stations and hygiene rooms that didn't need any special hardware, but the main practice rooms and the pano stations always needed full Windows PCs.
The client app is built with PowerBuilder so it'd require a deep rewrite to support any other platforms.
The server side is a Sybase SQL Anywhere database and a SMB file share so it could easily be run natively on Linux but the vendor can't be bothered.
This is a company that still insists that every user needs local admin privileges, despite literally nothing going wrong when they don't have it, and who usually doesn't support new Windows releases until a few months after it becomes the default for new PCs.
---
There are other dental platforms that do have web interfaces intended mostly to enable the use of iPads and other tablets but switching platforms is far from straightforward for practices with years of data, custom integrations, etc. Even if you are willing to go through the trouble (or starting fresh) those platforms, to my knowledge, still require Windows PCs for digital x-ray support.
Additionally beware what to wish for, as CoPilot+ PC are locked down with Pluton security processor, from XBox and Azure Sphere.
Well, no, UEFI can have proprietary implementations, but nothing prevents just shipping edk2.
Conversely, there are so many phones with devicetree... and proprietary blobs controlling boot and running beneath ring 0. You're kinda pointing out a real problem, but confusing it with a different part of the stack.
It's not the same!
A bootloader, once it loads the kernel and executes it, is overwritten in memory. No trace of it remains while the sistem is running - until the next boot. UEFI / ACPI / SMM continue to execute on the CPU after it finished booting, "under" the kernel, preempting it as they please.
Suggested reading:
* https://www.maven-silicon.com/blog/privilege-levels-in-arm-a...
* https://projectzero.google/2017/02/lifting-hyper-visor-bypas...
* https://en.wikipedia.org/wiki/ARM_architecture_family#Securi...
* https://docs.samsungknox.com/admin/fundamentals/whitepaper/s... (in which Samsung uses firmware running under the kernel to protect against kernel exploits)
https://doc.coreboot.org/distributions.html seems to say Purism, Star Labs, and System76
(Edit: actually weirdly enough Librem seems to be using a different coreboot payload instead of edk2, but the other 2 stand)
> Or even allow users to replace the preinstalled UEFI/ACPI with a free one?
So many Chromebooks: https://docs.mrchromebox.tech/docs/supported-devices.html
But in the places I frequent (backoffice, municipal, finance) it’s all gone web and rdp-through-web (which is web, in the sense that it doesn’t require windows on the client) with centralized administration with minimal (not quite self-serve but reasonably close) thin client users.
There goes freedom in the ARM world too.
Nothing more privileged than ring 0 is required for that.
"v86" is what allowed real mode to be virtualized under a 32-bit OS. This is no longer available in 64-bit mode, but the CPU still includes it (as well as newer virtualization features which could be used to do the same thing).
https://man7.org/linux/man-pages/man2/modify_ldt.2.html
Set seg_32bit=0 and you can create 16-bit code and data segments. Still works on 64 bit. What's missing is V86 mode, which emulates the real mode segmentation model.
I'm pretty sure that if I make a dual-kernel 9front (9pc and 9pc64 available at boot) in a 64 bit machine and I compile emu2 for it, DOS COM binaries might be trapped enough to run simple text mode tools under the 386 port.
The designers of RISC-V included the bare minimum needed to compile C, everything else was deemed irrelevant.
It will be used when it needs to be handled. e.g. where elsewhere, an exception would actually handle it. Which is seldom the case.
More instructions doesn't mean slower, either. Superscalar machines have a hard time keeping themselves busy, and this is an easily parallelizable task.
>The designers of RISC-V included the bare minimum needed to compile C, everything else was deemed irrelevant.
Refer to "Computer Architecture: A Quantitative Approach" by by John L. Hennessy and David A. Patterson, for the actual methodology followed.
https://en.wikipedia.org/wiki/ACPI#Criticism
https://en.wikipedia.org/wiki/System_Management_Mode#Problem...
Trustzone usually runs code from eMMC. These chips are programed in factory with a secret key in the RPMB partiton. It's a one-time operation - the user can't replace it. Without that key you can't update the code Trustzone executes. Only the manufacturer can update it.
Also, any ring -2 code can be used for secure boot locking the device to manufacturer approved OS, enforce DRM, lock hardware upgrades and repairs, spy, call home, install trojans by remote commands, you name it. And you can't audit what it does.
Microsoft tried in the past without a Rosetta equivalent; Apple succeeded twice with Rosetta. They did not try to switch cold turkey the way Microsoft did.
> Trustzone usually runs code from eMMC.
This might be true in so far as the largest number of systems using Trustzone may be using eMMC, but there's nothing magical about eMMC here (my phone, which absolutely uses Trustzone, has no eMMC). But when you then go on to say:
> Without that key you can't update the code Trustzone executes. Only the manufacturer can update it.
you're describing the same sort of limitation that you decried with SMM. As commonly deployed, Trustzone is strictly worse for user freedom than SMM is. This isn't an advantage for Arm.
> Also, any ring -2 code can be used for secure boot locking the device to manufacturer approved OS
No, the secure boot code that implements cryptographic validation of the OS is typically running in an entirely normal CPU mode.
> enforce DRM
This is more typical, but only on Arm - on x86 it's typically running on the GPU in a more convoluted way.
> lock hardware upgrades and repairs
Typically no, because there's no need at all to do any sort of hardware binding at that level - you can implement it more easily in normal code, why make it harder?
> spy
When you're saying "can be used", what do you mean here? Code running in any execution environment is able to spy.
> call home
Code in SMM or Trustzone? That isn't literally impossible but it would be far from trivial, and I don't think we've seen examples of it that don't also involve OS-level components.
> install trojans by remote commands
Again, without OS support, I'm calling absolute bullshit on this. You're going to have an SMM trap on every network packet to check whether it's a remote command? You're going to understand a journaling filesystem and modify it in a way that remains consistent with whatever's in cache? This would be an absolute nightmare to implement in a reliable way.
> And you can't audit what it does.
Trustzone blobs do have a nasty habit of being encrypted, but SMM is just… sitting there. You can pull it out of your firmware. It's plain x86, an extremely well understood architecture with amazing reverse engineering tools. You can absolutely audit it, and in many ways it's easier to notice backdoors in binary than it is in source.
Trustzone is mostly deployed on Devicetree-based platform. What saves you here isn't the choice of firmware interface, it's whether the platform depends on hostile code. If you don't care about secure boot (or if you do but don't care about updating the validation keys at runtime), you can implement a functional UEFI/ACPI platform on x86 with zero SMM.
There are only two viable firmware alternatives in the world right now: ring 0 U-boot* or the ones that use ring -2: UEFI* and various bootloaders +TrustZone in Android world (read the footnotes!). Manufacturers usually focus on only one of the two: either ring -2 (locked bootloaders, UEFI +ACPI +SMM +whatever crapware they may want to add) protected by secure boot or ring 0 U-boot +a device tree +their GPL source code. The ones interested in locked-down platforms choose the ring -2 option and they are not going to make it open source, nor provide the signing keys to allow it to be replaced by FOSS alternatives.
I appreciate freedom. Linux kernel is free (ring 0). U-boot and coreboot are free (ring -2 if they include ACPI / SMM, else still ring 0). When I run a Linux kernel, I don't want it preempted and sabotaged by a ring -2 component. If that ring -2 includes proprietary blobs, then it's a hard "no" from me. You may argue that SMM (and ACPI) brings useful features such as overheating shutdown when the kernel froze/crashed or the system is stuck at bootloader, but let's face it: practically there's no free alternative to manufacturer's blobs when it comes to ring -2. The FOSS community barely keeps u-boot and the device tree working. Barely! An open source UEFI + all that complexity for every single board out there is a no-go from the start. If you ported Coreboot, i'm sure you know how difficult it is.
I recently learned that ACPI can be decompiled to source code, so that's an improvement, but not by much. Unlike a device tree, which is only a hardware description, ACPI is executable code. I see that as a risk and I'm not the only one. Even Linus had something to say about it - the quote is on wikipedia article. Some of that code executes in ring -2. It can also install components in the OS - spyware components - you can also read about that in the wikipedia article. U-boot has the capability of creating files on some filesystems and you can argue that a proprietary fork could maliciously install OS components by dropping something in init.d, but I've never heard of it being misused that way, and a manufacturer must publish the GPL source code, so it would be difficult to hide. A device tree can't to that at all. If you use UEFI, then every single blob published by the manufacturer must be decompiled and be inspected. U-boot + ACPI is probably simpler than porting Coreboot, but it still won't happen. There are simply too many systems to support.
So, as a conclusion. I see ring -2 as a dangerous capability (even if the malware itself doesn't execute in ring -2) because there are no viable open source alternatives. For this reason I encourage you to not support or promote UEFI and ring -2.
> Trustzone is strictly worse for user freedom than SMM is. This isn't an advantage for Arm.
> Trustzone is mostly deployed on Devicetree-based platform.
True, but ARM world still has unlocked CPUs that can boot unsigned firmware. There are none left in x86 world. (Or at least none that I know about.)
> No, the secure boot code that implements cryptographic validation of the OS is typically running in an entirely normal CPU mode.
OK, valid observation, I may have used "ring -2" to describe features that are not typically running in ring -2. I tried to avoid these technologies as much as possible and I don't have much hands-on experience about what runs where.
> you can implement a functional UEFI/ACPI platform on x86 with zero SMM.
One dev could probably implement and maintain one or maybe 5-10 systems if they are related (same CPU, mostly same hardware). How many systems are there and how many devs? Not possible, but for very very few exceptions, as long as some random dev got one of these systems for himself and does it as a pet project.
----
* When I say U-boot, I mean mainline U-boot plus a device tree, or forks with pubished GPL source code. I know U-boot can include ACPI and secure boot, but that's not what I mean in the context of this comment. Sure, you can set up secure boot with open source U-boot if you want to. There's nothing wrong with that.
* When I say UEFI, I mean all related technologies: ACPI, SMM, secure boot, signed firmware, etc. The whole forest.