Intel's “Cripple AMD” Function (2019)(agner.org) |
Intel's “Cripple AMD” Function (2019)(agner.org) |
RMS was right that compilers should be GPL licensed to prevent exactly this kind of thing (and worse things which are haven't happened yet).
On another compiler related note, I find it insane that GCC had not turned on vectorization at optimization -O2 for the x68-64 targets. The baseline for that arch has SSE2, so vectorization has always made sense there. The upcoming GCC 12 will have it enabled at -O2. I'd bet the Intel compiler always did vectorization at -O2 for their 64bit builds.
The problem with this is that it wouldn't solve the problem in question: Intel would just have stuck with their old compiler backend instead of LLVM.
Besides, LLVM wouldn't have gotten investment to begin with if it were GPL licensed, since the entire reason for Apple's investment in LLVM is that it wasn't GPL. Ultimately, LLVM itself is a counterexample to RMS's theory that keeping compilers GPL can force organizations to do things: given deep enough pockets, a company can overcome that by developing non-GPL competitors.
I don't think that's the case; Apple/LLVM actually offered to sign over the copyright to the FSF, under the GPL; from https://gcc.gnu.org/legacy-ml/gcc/2005-11/msg00888.html
> The patch I'm working on is GPL licensed and copyright will be assigned to the FSF under the standard Apple copyright assignment. Initially, I intend to link the LLVM libraries in from the existing LLVM distribution, mainly to simplify my work. This code is licensed under a BSD-like license [8], and LLVM itself will not initially be assigned to the FSF. If people are seriously in favor of LLVM being a long-term part of GCC, I personally believe that the LLVM community would agree to assign the copyright of LLVM itself to the FSF and we can work through these details.
The reason people worked on LLVM/clang is that GCC was (and to some degree, is) not very good in various areas, and had a difficult community making fixing those issues hard. There's a reason a lot of these newer languages like Swift, Rust, and Zig are based on LLVM and not GCC. See e.g. https://undeadly.org/cgi?action=article&sid=20070915195203#p... for a run-down (from 2007, I'm not sure how many of these issues persist today; gcc has not stood still either of course, error messages are much better than they were in 2007 for example).
GPL3 changed things a bit; I'm not sure Lattner would have made the same offer with GPL3 around, but that was from 2005 when GPL3 didn't exist yet. But the idea that LLVM was primarily motivated by license issues doesn't seem to be the case, although it was probably seen as an additional benefit.
All in all I avoid non-GPL compilers for my code, but I'm happy that clang acted as a big (hard) foam cluebat for GCC.
In my opinion, we need a well polished GNU/GPL toolchain both to show it's possible, and provide a good benchmark to compete with. This competition is what drives us forward.
So yes, he basically said that if you want to develop a proprietary compiler, it should cost you, and not take GCC as a base to freeload. Intel basing their new compilers on LLVM clearly saved them effort.
thats a typo .. are you showing a case of AVX instructions not generated by GCC? where are the details here? Is SSE2 from twenty years ago?
This isn't a hypothetical; quoting https://en.wikipedia.org/wiki/X86_Bit_manipulation_instructi... :
> AMD processors before Zen 3[11] that implement PDEP and PEXT do so in microcode, with a latency of 18 cycles rather than a single cycle. As a result it is often faster to use other instructions on these processors.
There's no feature flag for "technically supported, but slow, don't use it"; you have to check the CPU model for that.
All that said, the right fix here would have been to release this as Open Source, and then people could contribute optimizations for many different processors. But that would have required a decision to rely on winning in hardware quality, rather than sometimes squeezing out a "win" via software even in generations where the hardware quality isn't as good as the competition.
Intel's “cripple AMD” function (2019) - https://news.ycombinator.com/item?id=24307596 - Aug 2020 (104 comments)
Intel's “Cripple AMD” Function - https://news.ycombinator.com/item?id=21709884 - Dec 2019 (10 comments)
Intel's "cripple AMD" function (2009) - https://news.ycombinator.com/item?id=7091064 - Jan 2014 (124 comments)
Intel's "cripple AMD" function - https://news.ycombinator.com/item?id=1028795 - Jan 2010 (80 comments)
https://techreport.com/news/8547/does-intels-compiler-crippl...
But AMD does have its own optimized libraries:
In the light of "Reflections on Trusting Trust" [1]
"Intel cannot be trusted to supply your compiler at any price." That's a point of view that is a lot more than just "a reasonable one to hold." The reflection on Intel and their lack of reckoning having been caught out sabotaging your customers is something any customer of Intel needs to consider - included in that assessment must be the expected value of the $$$ loss of purchasing from Intel. It's really not something anyone can responsibly ignore and fail to assess. Then go ahead and making your responsible and informed engineering and business trade off.
[1] https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...
edit: The point being we all have a bar of "well they wouldn't actually do that" in a purchasing decision. That bar for Intel is dramatically lower as a result of this incident and failure to properly address it in full with a mea culpa and consequences rather than the ongoing minimum action required by the courts and damage limitation we've seen. It is very hard to see how the probability of them secretly sabotaging your goals could have gone down here.
That is what Intel think of their reputation and what they think they can get away with in their response to you.
If they are sending Zen down the generic AVX2 codepaths by default and those are competitive with, say, openBLAS, that seems reasonable, right?
Hopefully BLIS will save us all from this kind of confusion eventually.
I don't condone Intel behavior, but let's be honest here: AMD underinvests in software and expects others to pick up the slack. That isn't acceptable.
At any rate though, based on Intel's track record I think this content is still relevant and of value to engineers who don't have domain knowledge in compilers or work downstream.
[1] https://www.extremetech.com/computing/308501-crippled-no-lon...
Looking forward to RISC-V pushing x86 into retrocomputing territory.
Sorry to go on this tangent: but is capitalism so rotten that everything eventually corrupts? Here even outlets for discussion on topics of science and technology self-censure to maximize profit. So much for freedom of speech.
Where is truth these days?
Of course there would still remain the issue of self-censoring to avoid annoying your readership, not sure how you can deal with that...
1. Put nothing shitty past them. 2. Will never ever purchase their products again.
The real problem is the endless pursuit of profit though, instead of the pursuit of ever-advancing, ever-improving technological superiority, and sadly AMD isn't any better in this area I've come to see. The moment they conclusively, provably became better than Intel, they jacked up their price, even though their processors were using the same 7nm process that, at that point, was extremely reliable and had a 93% usable chip ratio.
So it turns out as soon as one company gains superiority they immediately become shitbags focused on money instead of focused on the advancement of technology and mankind. It puts anyone with a moralistic stance on what technology should be and how it should be implemented and distributed into a real pickle.
I was hoping that AMD would be the better company here, especially given they nearly died, but turns out they also are ready and willing to squander the goodwill of those of us who bought their chips not just when they were on the last legs, but also during their recovery period.
That exactly why our economy is stagnating - over the past 30 years many major hard industrues have become uncompetitive oligopolies or cartels, and we have people defending this state of affairs.
Some digital industries are even outright monopolies
Why wouldn't they have jacked up their price, if provably better ?
It's not like this happened on its own, without more effort from AMD (or less from Intel) ?
It was only by the end of 2020 AMD has managed zero debt and bought back their own HQ which was previously put on as collateral.
https://www.phoronix.com/scan.php?page=news_item&px=Intel-5....
The issue here is that they used a slower code path even on CPUs that could run the faster one, just because they were made by a competitor.
You say "AMD should have made their own compiler", but why? What else should they have made? An OS? An office suite? Why?
I think we can simply imagine a common scenario: some employee working for Company X, developing a compiler suite, and adding necessary optimizations for Company X's processors. Meanwhile, Company Y's processors don't get as much focus (perhaps due to the employee not knowing about Company Y's CPUIDs, supported optimizations for different models, etc.). Thus, Company Y's processors don't run as quickly with this particular library.
Why does this have to be malicious intent? Surely it's not surprising to you that Company X's software executes quicker on Company X's processors: I should hope that it does! The same would hold true if Company Y were to develop a compiler; unique features of their processors (and perhaps not Company X's) should be used to their fullest extent.
There are lots of cases where Intel has either screwed up or done things that were unarguably anti-competitive. It happens at every company, I don't like Uber, but I'm not going to blame Uber today for the fuckery that Kalanick got up to.
In each case you could ask the Intel HR, or Intel senior management what they thought about it and it was never Intel's fault. The answers to any questions about this sort of stuff would be full of pettifogging, passsive voice, and legalese. The result was the internal culture was an extremely low trust environment since you knew people were willing to be transparantly intellectually dishonest to further their careers. I haven't been there since Gelsinger arrived but I hope that changes, I wonder how much it can change in the legal environment we're in.
Gelsinger was the head designer on the 486, so he was around during the time when Intel was obsessed with keeping competition out of their ISA and probably has a case of auteur mentality, too.
[0] In case you couldn't tell, I really hate this word. The underlying concepts are, at best, necessary evils.
At best, detailed documentation is a lower priority item below "make it work" and "increase performance".
At worst, it requires exposing trade secrets.
Edit: It'd probably be more productive for everyone if we set incentives and work such that the goal we want (compilers that produce code that runs optimally on Intel, AMD, and other architectures) isn't contingent on Intel writing them for non-Intel architectures. (Said somewhat curmudgeonly, because everyone complains about things like this, but also doesn't really how insanely hard and frustratingly edge-case-ridden compiler work is)
Compare to vTune just about all open source profilers are either a bad joke or like programming in Basic in a C++ age.
When they finally get to the point that their driver/libs are as easy to install as Nvidia's , it might be too late. I've argued this with AMD folks before.
The barriers to adoption need to be low. Friction needs to be low. They need to target ubiquity[3].
[1] https://developer.amd.com/amd-aocc/
[2] https://developer.nvidia.com/nvidia-hpc-sdk-downloads
[3] https://blog.scalability.org/2008/02/target-ubiquity-a-busin...
Actually Nvidia bought the Portland Compilers And Intel's Fortran compiler is (has been, now its backend is LLVM) MS's compiler via DEC/Compaq/and HP - MS Visual Fortran 4 -> DEC Visual Fortran 5 -> Compaq Visual Fortran 6 -> Intel Visual Fortran ;).
In the past, AMD just straight up had horrible software.
More recently, AMD have been investing more in open software, probably with the goal that indeed, a community form and they get "leverage" / ROI for their investment.
On the flip side, Intel invest heavily in high-quality but jealously guarded and closed source software.
With this nuance, I'm not so sure it's clear cut which one is "acceptable," and it's an interesting ethical question about Open Source and open-ness in general.
https://jolt.law.harvard.edu/digest/intel-and-the-x86-archit...
However, this will become moot as even Intel is shifting towards LLVM.
https://www.intel.com/content/www/us/en/developer/articles/t...
A. Company makes and sells hardware, and offers no software.
B. Company makes and sells uniquely featured hardware, and offers software that uses those unique features.
C. Company makes and sells hardware that adheres to an industry standard, and offers software that targets hardware adhering to that standard.
D. Company makes and sells hardware that adheres to an industry standard, then uses their position in related markets to give themselves an unfair advantage in the hardware market.
Of these, options A, B, and C are all acceptable options. AMD has traditionally chosen option A, which is a perfectly reasonable option. There's no reason that a company is obligated to participate in a complementary market. Option D is the only clearly unethical option.
AMD has optimized libraries https://developer.amd.com/amd-aocl/ and their own compilers: https://developer.amd.com/amd-aocc/
AMD’s performance improves by 1.32x – 1.37x overall… changing what looked like a narrow victory [for Intel] over the 3960X and a good showing against the 3970X into an all-out loss. https://www.extremetech.com/computing/302650-how-to-bypass-m...
In Russia's view Ukraine is a breakaway province that was getting a free ride from all the transit gas fees and must be brought to heel.
In medieval mentality women are property
What i mean is, its really a tool to justify why your dishonest behaviour is okay
The enhancement applies to string lengths between 1 and 128 bytes long. Support for fast-short REP MOVSB is enumerated by the CPUID feature flag: CPUID [EAX=7H, ECX=0H).EDX.FAST_SHORT_REP_MOVSB[bit 4] = 1
So there is indeed a CPUID feature flag for fast rep movsb.
On November 12, 2009 AMD and Intel Corporation announced a comprehensive settlement agreement to end all outstanding legal disputes between the companies, including antitrust and patent cross license disputes. In addition to a payment of $1.25B that Intel made to AMD, Intel agreed to abide by an important set of ground rules that continue in effect until November 11, 2019.
Customers and Partners
With respect to customers and partners, Intel must not:*
[...]
Intentionally include design/engineering elements in its products that artificially impair the performance of any AMD microprocessor.
https://www.amd.com/en/corporate/antitrust-rulingI like that 'in effect until November 11, 2019.' part :D
Its a 13 year old laptop, and still running strong (linux though). Desktops are Sandy Bridge based. The RTX2060 and RTX3060 are doing fine with the current drivers. I usually only update when CUDA changes.
But yeah, its pretty simple. I can't speak to non-linux OSes generally, though my experiences with windows driver updates have always been fraught with danger.
My zen2 laptop has an inbuilt Renior iGPU, and I use it with the NVidia dGPU also built (GTX 1660ti). I leverage the Linux Mint OSes packaging system there for the GPU switcher. I run the AMD on the laptop panel and the NVidia on the external display. Outside of weirdness with kernel 5.13, I've not had any problems with this setup.
But I'm glad your preferred workloads are unaffected. That counts for something I guess.
I've got some Epycs and Zen2s at home here, and I have both compilers. Haven't done testing in recent months, but they've been updating them, so maybe I should look into that again. Thanks for the nudge!
https://developer.amd.com/amd-aocl/ https://developer.apple.com/documentation/accelerate
They both contain a sparse matrix library, but exactly what operations are offered is somewhat different between the two. They both have image processing operations, but fairly different ones. Accelerate has BNNS, MKL has its own set of deep learning interfaces...
https://www.intel.com/content/www/us/en/developer/articles/t...
The other comes from SPQR, which is part of SuiteSparse:
https://people.engr.tamu.edu/davis/suitesparse.html
Part of the issue is that SPQR is dual licensed GPL/Commercial and the last time I checked a license was not cheap. Conversely, MKL has no redistribution fee, so it's been essentially the only option for this factorization if the code can't be bundled in a way compatible with the GPL.
[1] https://github.com/as-com/varint-simd/blob/f4c4af672403cc683...
[2] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instructi...
[3] https://en.wikipedia.org/wiki/List_of_AMD_CPU_microarchitect...
So what you would do for PDEP/PEXT would be
use_pdep = cpuid.supports_bmi2() && !cpuid.is_amd_before_zen3()
and not use_pdep = cpuid.is_intel_haswell_or_later()If Jim Keller hadn't gone back to help AMD, and if Dr. Lisa Su hadn't decided to take on that challenge, we'd likely be stuck in an era of processor Dark Ages, OR, Apple and their Apple Silicon line of processors would be even more attractive than they already are.
Intel gained an unfair advantage and built their reputation by taking shortcuts with security. The FX series weren't marvels of design engineering, but they weren't nearly as behind the performance curve as customers were deceived into thinking.
AMD's processors, while not fast, did none of those things and were cheap. I guess that buys you a lot of good will when Intel's still charging 300 dollars for CPUs that wouldn't beat a 2500K at 4.6GHz until several years down the line.
And "have to replace the motherboard along with the CPU" is the exact thing we're talking about here: there was no technical reason for Intel to make the earlier boards incompatible, they did it just because they could. Not that there was ever really a reason to upgrade beyond "buy the cheapest K series, set multiplier to 46-48x, done", but even if you wanted to, you couldn't.
It was an anti-consumer practice and said consumers never forget it (not that most tech channels don't provide active reminders of it). And those people are who everyone else asks when "I'm getting a new computer", they say "buy the competitor's product", and the rest is history.
AVX-512 had a rough rollout, but it seems like it is finally turning into something nice.
Intel is still acting in bad faith by allowing MKL to run in crippled mode on AMD. They should either let it use all available instructions or make it refuse to run.
It's Intel that tends to phone this in and make everyone else pick up the slack.
https://www.phoronix.com/scan.php?page=article&item=aocc32-c...
If they had fixed the issue shortly after it was publicly disclosed it might have been unintentional, but this issue has been notorious for over a decade and they still refuse to remove the unnecessary checks. They know what they're doing.
The CPUID instruction allows software to query the CPU on if an instruction set is supported. Code emitted by Intel's compiler would only query if the instruction set exists if the CPU is from Intel, instead of just always detecting.
AMD can choose to to implement (or not) any instruction set that Intel specifies, and Intel can choose to implement (or not) any instruction set AMD specifies, however, it would in 100% of cases be wrong to check who made the CPU instead of checking the implemented instruction set. AMD implements MMX, SSE1-4, AVX1 and 2. Any software compatible with these must work on AMD CPUs that also implement these instructions.
If AMD ever chooses to sue Intel over this (likely as a Sherman Act violation, same as the 2005 case), a court would likely side with AMD due to the aforementioned previous case: Intel has an established history of violating the law to further its own business interests.
Off the top of my head, SSSE3 has a very flexible instruction to permute the 16 bytes of one xmm register at byte granularity using each byte of another xmm register to control the permutation. On many chips this is extremely cheap (eg 1 cycle) and its flexibility suggests certain algorithms that completely tank performance on other machines, eg old mobile x86 chips where it runs in microcode and takes dozens or maybe even hundreds of cycles to retire. There the best solution is to use a sequence of instructions instead of that single permute instruction, often only two or three depending on what you’re up to. And you could certainly just use that replacement sequence everywhere, but if you want the best performance _everywhere_, you need to not only look for that SSSE3 bit but also somehow decide if that permute is fast so you can use it when it is.
Much more seriously, Intel and AMD’s instructions sometimes behave differently, within specification. The approximate reciprocal and reciprocal square root instructions are specified loosely enough that they can deliver significantly different results, to the point where an algorithm tuned on Intel to function perfectly might have some intermediate value from one of these approximate instructions end up with a slightly different value on AMD, and before you know it you end up with a number slightly less than zero where you expect zero, a NaN, square root of a negative number, etc. And this sort of slight variation can easily lead to a user-visible bug, a crash, or even an exploitable bug, like a buffer under/overflow. Even exhaustively tested code can fail if it runs on a chip that’s not what you exhaustively tested on. Again, you might just decide to not use these loosely-specified instructions (which I entirely support) but if you’re shooting for the absolute maximum performance, you’ll find yourself tuning the constants of your algorithms up or down a few ulps depending on the particular CPU manufacturer or model.
I’ve even discovered problems when using the high-level C intrinsics that correspond to these instructions across CPUs from the same manufacturer (Intel). AVX512 provided new versions of these approximations with increased precision, the instruction variants with a “14” in their mnemonic. If using intrinsics, instruction selection is up to your compiler, and you might find compiling a piece of code targeting AVX2 picks the old low precision version, while the compiler helpfully picks the new increased-precision instructions when targeting AVX-512. This leads to the same sorts of problems described in the previous paragraph.
I really wish you could just read cpuid, and for the most part you’re right that it’s the best practice, but for absolutely maximum performance from this sort of code, sometimes you need more information, both for speed and safety. I know this was long-winded, and again, I entirely understand your argument and almost totally agree, but it’s not 100%, more like 100-epsilon%, where that epsilon itself is sadly manufacturer-dependent.
(I have never worked for Intel or AMD. I have been both delighted and disappointed by chips from both of them.)
> Please don't comment on whether someone read an article.
I understand some end users may not be able to re-compile the application for there machines, but I wouldn't say its Intel's fault, but rather the distributors of that particular application. For example, if AMD users want Solidworks to run faster on their system, they should ask Dassault Systemes for AMD-optimized binaries, not the upstream compiler developers!
Anyways, for those compiling their own code, why would anyone expect an Intel compiler to produce equally optimized code for an AMD cpu? Just use gcc/clang or whatever AMD recommends.