All this as the OP glorifies AMD's engineering and grit-based culture to drive through all though tough missteps and missed opportunities.
To expand on that, I really do feel AMD has great engineering culture but they keep falling to the same traps. They do not invest strongly enough in software support nor vendor relationships. Neither of these necessitate the more evil monopolistic practices of vendor lock-in and proprietary, non-free (as in libre) software. If they can navigate that without turning evil, they'd be a company for the ages.
And I can't close with mad respect to Dr. Lisa Su for her admirable leadership, itself bookworthy. Also, quick fact, she and Jensen are cousins!
Lest we forget the Intel IPC advantages over comparable AMD CPUs was due to some shortcuts that exposed major vulnerabilities in Intel CPUs made from ~2011 to 2019. I’d be curious to see how a Spectre and Meltdown-patched Intel CPU fares against its AMD competitor NOW. Some of the performance hits were brutal- 20%+ in some workloads.
Nvidia was pushing AMD out of the GPU market back when GPUs were effectively only used for gaming and while GameWorks was predatory, you can’t really blame them for having the cooler-running, quieter, more energy-efficient GPUs going back to the Maxwell line (GTX 9x0). CUDA didn’t screw AMD until recently… but in 2014, people were picking Nvidia because the GPUs were considerably “better”. AMD had the best bang for buck back then, but you’d have more power consumption and heat output, and the drivers tended to be buggy. The bugs would be fixed, but it really sucked for people trying to play games on release day.
For like 8 years their drivers on Linux were a nightmare and AMD could have come in and done better.
AMD eventually did while Nvidia's drivers remained a nightmare almost until these days. But sure, AMD could have done it sooner.
Nowadays almost nobody cares about OpenCL.
https://news.ycombinator.com/item?id=40790924
That was OK for the CPU turnaround, but on the GPU front it completely shut them out of the first rounds of the AI party and maybe a trillion in market cap.
I'm really surprised AMD isn't throwing a whole bunch of money on emulating CUDA. If they could "just" make CUDA work on AMD cards, it feels like Nvidia's position would be severely weakened.
Kind of like how Valve invested heavily into Proton and now gaming on Linux is pretty much fine.
What was their shady practices and lock-in policies?
Personally, I have no issue with "ends justify the means"-style thinking as a blanket rule, often it's perfectly appropriate.
I would argue it is, in this case, where Nvidia was playing a game by the rules. If there is an issue with how they played, then government should change the rules.
The people in power in the US don't want that though.
This takeaway was a little odd to me in the context of 2008. I had been an AMD stalwart in my PCs since about 2000 (Athlon Thunderbird), but IIRC in 2008 Intel had the better processor. Better single core performance, better performance/watt, and I think AMD processors tended to have stability issues around this time. I remember I built a PC in 2009 with a Core processor for these reasons.
Obviously this is a niche market (gaming PC) perspective. But I don't think it was so clear cut.
Personally, I’ve always liked Intel for stability reasons. Running Intel chipsets and CPUs, I’ve just had fewer issues. I’m an enthusiast, so I do spend more than I should on both Intel and AMD rather frequently… but now, I’m hungry for an Ampere system. My wallet is crying.
That happened well after 2008, with the advent of Zen and chiplet bases tech and better perf/W
Given Nvidia's track record I'd sooner imagine them just slacking off and overcharging more for lack of competition. I wish AMD would actually compete with them on GPUs (for graphics, not AI). Interestingly Intel seems to be trying to work up to that now.
Being able to take a trailing-node strategy during the Turing/Ampere years, being able to run a full node behind RDNA1/2 and use dirt-cheap Samsung crap and last-gen TSMC 16FF/14FFN while still fighting AMD to a standstill on efficiency is entirely the result of AMD slacking off.
AMD themselves have said they slacked off. Lost focus, is the quote.
Practicality beats purity 100% of the time. This echoes "Worse is better".
Not understanding the importance of GPUs in 2006, or of being first-to-market, while confusing OpenGL with OpenCL (twice), survival bias (BELIEVE IN YOUR VISION)…
> AMD (NASDAQ:AMD) today announced revenue for the first quarter of 2024 of $5.5 billion, gross margin of 47%, operating income of $36 million, net income of $123 million and diluted earnings per share of $0.07.
https://ir.amd.com/news-events/press-releases/detail/1192/am...
And Nvidia had a revenue $22.1 billion in Q4 2024, gross margin of 76%, operating income of $13.6B, net income of $12.2B, and diluted earnings per share $4.93. https://investor.nvidia.com/news/press-release-details/2024/...
It's interesting that they see such a monopoly as something that would bring costs down. It seems more to me like competing with AMD does much more to keep Nvidias costs down (if they can be described as "down") than combining resources would.
Is that a far statement to make, given ~20-years has passed?
What does this mean? I thought neither have any “fab” (manufacturing) facilities.
Imagine the wealth destruction if they had merged way back then! I don't love the way mergers are regulated today but I do feel like preventing companies from growing too big through mergers is desirable.
I remember reading that on places like the Register, but they kept the second A, so DAAMIT.
I'm sure it mean engineering but i've never seen that abbreviation, he motioned he's from India, is that where this comes from or is it just an individual quirk?
I wonder how many companies had this problem.
I should‘ve BET 50% my portfolio.
So, long story short is that most engineers, especially ones as fanboyish as this, are wildly out of place in decision making and can't see the forest for the trees?
It doesn't seem that surprising.
My experience is rather that people who are passionate about engineering simply have a very different "taste" in hardware and buying decisions than other groups. So they see the forest insanely well, but they see very different paths through this forest than other people (say analysts or the general population) do.
Vaguely interesting side note, yandex found that from poor search terms very easily and google abjectly failed to. I hope google are tracking how frequently people use their engine to find yandex, while remembering bing being mostly used to find google and maybe the death of yahoo.
is there anybody here who has access to a B200 NVL72 with working external nvlink switches and wants to share non-marketing impressions?
it is wild the way AMD engineers can't stop themselves from throwing stones, even with 20 years of distance and even when their entire product strategy in 2024 now rides on gluing together these cores.
people forget that Intel saying that AMD was gluing together a bunch of cores comes after years of AMD fans whining that Intel was gluing together a bunch of cores - that was always an insult to Intel users that pentium D wasn't a real chip, that core2quad wasn't a real chip (not like quadfather, that's a real quad-core platform!). And you see that play out here, this guy is still salty that Intel was the first to glue together some chips in 2002 or whatever!
and the first time AMD did it, they rightfully took some heat for doing it... especially since Naples was a dreadful product. Rome was a completely different league, Naples really was glued-together garbage in comparison to Rome or to a monolithic chip. You can argue that (like DLSS 1.0) maybe there was a vision or approach there that people were missing, but people were correct that Naples was a dogshit product that suffered from its glued-together nature. Even consumer ryzen was a real mixed bag, vendors basically took one look at naples and decided to give AMD 2 more years to cook. People wedge still so wound into it they sent death threats to GamersNexus for the “i7 in production, i5 in gaming” which frankly was already quite generous given the performance.
frankly I find it very instructive to go back and read through some of the article titles and excerpts on semiaccurate because it just is unthinkable how blindly tribal things were even 10 years ago, but this shit is how people thought 10 years ago. Pentium D is bad, because it's glued-together! Core2Quad is bad because it's glued-together! And that from the actual engineers who have the perspective and the understanding to know what they're looking at and the merits, with 20 years of retrospect and distance! If you instead look at what the discourse of this time was like...
https://www.semiaccurate.com/tag/nvidia/page/6/
"NVIDIA plays games with GM204"
"how much will a GM204 card cost you!?"
"Why mantle API will outlive DX12 [as a private playground for API development outside the need for standardization with MS or Khronos]"
"GP100 shows that NVIDIA is over four years behind AMD in advanced packaging"
"NVIDIA profits are up in a fragile way".
like why are amd people like this? inside the company and out. It’s childish. None of the other brands engineers are out clowning on twitter (frank azor? chris hook? etc), none of the other fans are sending death threats when their brand’s product isn’t good. Like you wanna make a $10 bet over it???
I get arguments that maybe one fab is better than the other, but what about all of them combined? All of our modern chipmaking capability all at once.
Nvidia has no factories. You can ship their output on a USB flash drive. Valuation: ~3.1T.
Intel, TSMC and Samsung have all the factories. Every modern chip made on earth in this circle. Combined valuation: ~1.1T
This is simple napkin math for this arbitrary retail investor. I don't know when the music will stop but it absolutely will.
It's impossible to overstate to advantages that CUDA, it's documentation, toolchain, and nSight software provide to outside developers.
The closest thing I've seen to nSight Systems software is Intel's VTune. But that's just one piece I'm a much larger puzzle, and last I checked, VTune was only for Intel CPU.
AFAICT, Nvidia's software seriously reduces the ramp-up time for new developers to write kernels or apps that make good use of the available hardware.
E.g., nsys-ui (like VTune) recognizes anomalous profile results, and makes solid suggestions for next steps. I don't know of other software that does this (well), although maybe I'm just uninformed.
DGX is a complete data center from Nvidia where Nvidia is the supplier of everything themselves:
- CPU+GPU from Nvidia - Rack from Nvidia - Interconnects + networking from Nvidia - SW from OS to application framework from Nvidia
The only thing Nvidia really needs partners with DGX is memory (RAM + SSD).
One reason Nvidia's margins are so high is because they provide the whole data center so while competition has to split margins (AMD/Intel + SMCI/DELL + Broadcom/Arista + Cray/HPE).
Don't believe me? Do you believe nvda?
There are many companies working on alternatives at the moment, but it will be a while until Nvidia can be replaced.
thats gonna take a while
https://store.steampowered.com/hwsurvey/Steam-Hardware-Softw...
For example, if you try to load Twitter in Firefox incognito, it will say "Error" and then lower say "Firefox incognito is known to cause issues with X.com" as if to move the blame away from themselves, when in actuality they are specifically detecting and blocking browsers in order to deter privacy.
AMD's strategy for people wanting to learn, is basically no strategy.
It's always been the software holding them back, still is, need to invest in the ecosystem and not just the things easy to justify as a revenue driver.
This was much more obvious in servers or workstations than in consumer devices, because the kinds of applications run by non-professionals at that time were much more sensitive to the high burst speeds offered by Pentium 4 with very high clock frequencies, than to the higher sustained performance of the AMD CPUs.
In 2005, I had both a 3.2 GHz Pentium 4 (Northwood, 130 nm) and a 3.0 GHz Pentium D (Prescott, 90 nm). With any of them, the compilation from sources of a complete Linux distribution took almost 3 days of continuous work of 24 hours per day.
After I bought an Athlon X2 of only 2.2 GHz, the time for performing the same task has been reduced to much less than a day. Even for some single-threaded tasks, but which contained many operations that were inefficient on Pentium 4, like integer multiplications or certain kinds of floating-point operations, the 2.2 GHz AMD CPU was several times faster than the 3.2 GHz Pentium 4.
At work, the domination of the AMD CPUs was even greater. Each server with Opteron CPUs that we bought was faster than several big racks with Sun or Fujitsu servers that were used before. Intel did not have anything remotely competitive. At the beginning of 2006, on my laptop with an AMD Turion I could run professional EDA/CAD programs much faster than on the big Sun servers normally used for such tasks. Intel had nothing like that (i.e. the 32-bit Intel CPUs could not use enough memory to even run such programs, so the question whether they could have run such programs fast enough was irrelevant).
Of course, half of year later the competition between Intel and AMD looked completely different.
I think I read this from Warren Buffett, but basically as of the early 2000s the airlines, in their entire history, had only managed to break even. If you had bought an airline company in 1940 and held it until 2000, you would have never profited from it. The business itself would be worth significantly more, but your only exit strategy would be to sell.
I haven't looked at any of these company's balance sheets, but it might be that semiconductor fabbing is less profitable and has less room for growth. In the short term that's all that matters. The question is if Nvidia can hold on to its current growth and margins (I don't think it can).
I think we've seen over many years at this point that there is huge value in the final product.
"...But we must now talk about the elephant in the room, and that is AMD’s software stack. While it is absolutely night and day from where it was when we tested MI210, ROCm is nowhere near where it needs to be to truly compete with CUDA..."
AMD did not really turn their attention to AI until about Oct of last year. Now that they have, it will take a bit of time to correct the course of the ship, but I know for certain that it is all hands on deck at this point. One sign of this is that we're seeing more frequent and substantial "night and day" improvements to ROCm.
The lifecycle of hardware, is years. MI300x is a substantial leap. MI325x is another one. The rest of the hardware roadmap (years out), is extremely impressive. Software is a much shorter lifecycle and can be iterated on more easily. Expect to continue to see improvements here over the coming years.
"...no, but you could expand on OpenCL or Vulkan compute if you wanted. There are other spec stakeholders, we can't give you carte-blanche control, Apple."
"Why do you insist upon mismanaging the industry's APIs? Screw you guys!" <Beginning of mid 2010s "Khronos Drought" at Apple Computers>
That was the previous quarter. $14.881B net profit on the last one
https://nvidianews.nvidia.com/news/nvidia-announces-financia...
$26.044 revenue
78.4% gross margin
$16.909 operating income
$14.881B net income
$5.98 diluted earnings per share
It’s a bet because it’s risky “capital is at risk”,“value of investments can go down as well as up” etc. As opposed to a savings account which is far far less risky enough that it’s not really a bet.
So for example, Coca-Cola (KO) is pretty predictable. Absent any major blunders by management, KO is going to grow roughly the size of the economy, and it's going to put out 3% a year in dividends. So the fair market price of KO is reasonably determinable, and you wait until you can buy it at or less than it's fair price.
This is usually contrasted against technical traders, momentum traders, etc., who are not investing in the fundamentals of the business and assuming the price will follow good fundamentals, but rather they are betting on how the price will change.
So "investing" is seen as buying fundamentals and "betting" (or "gambling") is seen as buying on expected price changes.
Capital is always at risk in financial investments.
If there is a semantic difference, I'd say you "invest" when you have a historical expectation of future positive returns, and you "bet" when you're taking a contrarian approach or just going with a gut feeling when data isn't available or known.
Anecdotally, and personally, I've had better luck with "bets" than "investments". But they're fundamentally the same thing.
Presumably, elzbardico’s use of “bet” meant something akin to betting in a casino or lottery, where the goal is to get high from the rush of sudden, big, improbable wins.
and yet that trillion-dollar valuation built over the last decade is built with customers almost entirely running on those "nightmare" linux drivers, while AMD's linux drivers crash running the sample app on supported hardware+OS, and nobody at AMD cared until finally a tech-bro with a loud enough platform shamed them into fixing it...
... and this is something like AMD's third crack at the apple, and the first three sets of drivers (one of which is literally a Vulkan-branded spec) are just as non-functional today as rocm was a year ago.
(OpenCL, Fusion HSA/AMD APP, Vulkan Compute/SPIR-V... all still broken so badly that Octane called them out for being unable to successfully compile their renderer and for lack of vendor support, so badly that Blender pulled support after years of turbulent and poorly-performing attempts to work with AMD, etc)
Nevertheless, the integration of multiple cores into an Intel multiprocessor was very inefficient before Nehalem (i.e. the cores were competing for a shared bus, which prevented them from ever reaching their maximum aggregate throughput, unlike in the AMD multiprocessors, which had inherited the DEC Alpha structure, with separate memory links and peripheral interfaces and with an interconnection network between cores, like all CPUs use now).
However this was noticeable at that time mostly in the server CPUs and much less in the consumer CPUs, as there were few multithreaded applications.
Core 2 still lagged behind AMD's cores for various less mainstream applications, like computations with big integers.
Only 2 generations later, after Core 2 and Penryn, with Nehalem (the first SKU at the end of 2008, but the important SKUs in 2009) Intel has become able to either match or exceed AMD's cores in all applications.
On the other hand, AMD's 90 nm CMOS process has been excellent.
With its 65 nm process, Intel has recovered its technological leadership, but that was not the most important factor of success, because AMD's 65 nm process was also OK and it became available within a few months of Intel's process.
AMD has lost because they did not execute well the design process for their "Barcelona" new generation of CPUs (made also in 65 nm, like Core 2). While Intel has succeeded to deliver Core 2 even earlier than their normal cadence for new CPU generations, AMD has launched Barcelona only after several months of delays and even then it was buggy. The bugs required microcode workarounds that made Barcelona slow in comparison with Core 2, and that started the decline of AMD, after a few years of huge superiority over Intel.
AMD was struggling to release CPUs that were competitive against year old Intel Core 2 Duos which remained the status quo through their Bulldozer architecture. Things started turning around with Ryzen when a combination of architecture improvements and typical workloads taking more advantage of multicore flipped the script.
The bits about "true" multicore are also sketchy considering Bulldozer was using shared L2, fetch/decode, and floating point hardware on each module and calling a module two "cores" for marketing purposes.
https://www.anandtech.com/show/4955/the-bulldozer-review-amd...
So perhaps a bit more than a couple of years, but my impression is also that they fell behind on (single-thread) performance for a long time after that.
I've also understood that in more ancient history AMD CPUs sometimes beat contemporary Intel parts in performance, although releasing their parts later than Intel. I'm not sure that's relevant to any remotely recent developments anymore though.
The experience was incredibly simple: write C like usual but annotate a few C functions with some extra keywords and compile using a custom frontend/preprocessor/whatever-nvcc-was instead of gcc (i was on Linux - and BTW i heavily contest the notion that Nvidia drivers on Linux were "nightmare", they always worked just fine with both performance and features comparable to their Windows counterparts while ATi/AMD had buggy and broken drivers for years). Again, the experience was very simple, i even just copy/pasted a bunch of existing C code i had and it worked.
Later i tried to use OpenCL which was supposedly the open alternative. That one felt way more primitive and low level, like writing shaders without the shading bits.
In a way, as you wrote, it was kinda like DirectX: that is, CUDA was like using OpenGL 1.1 with its convenient and straightforward C API and OpenCL was like using DirectX 3 with its COM infested execute buffer nonsense.
After that i never really used CUDA (or OpenCL for that matter) but it gave me the impression that Nvidia did put way more effort on developer experience.
Like, I started using CUDA (through frameworks) over ten years ago, and basically nobody has come up with anything competitive since then.
This is a significant understatement. For quite some time Jensen has been saying repeatedly that 30% of their R&D spend is on software. With the money-printing machine that is Nvidia if that holds they're going to continue to rocket ahead of competitors in terms of delivering actual solutions.
The "What are you talking about? AMD/Intel runs torch just fine!" crowd clearly haven't seen things like RIVA, Deepstream, Nemo, Triton Inference Server/NIM, etc. Meanwhile AMD (ROCm) still struggles with flash attention...
What these hardware-first (only?) companies like AMD don't seem to understand is that people buy solutions, not GPUs. It just so happens that GPUs are the best way to run these kinds of workloads but if you don't have a wholistic and exhaustive overall ecosystem you end up in single digit market share vs Nvidia at ~90%.
Since CUDA 3.0, NVidia has embraced a polyglot stack, with C, C++ and Fortran at the center, and PTX for anyone else.
Followed by changing CUDA memory model to map that of C++11.
Khronos never cared for Fortran, and only designed SPIR, when it became obvious they were too late to the party.
So not only has CUDA first level tooling for C, C++, Fortran, with IDE integration in Visual Studio and Eclipse, graphical GPU debugger with all the goodies of a modern debugger, it also welcomes any compiler toolchain that wants to target PTX.
Java, Haskell, .NET, Julia, Python JITs, .... there are plenty to chose from, without going through "compile to OpenCL C99" alternative.
Finally, the myriad of libraries to chose from.
CUDA is not only for AI, by the way.
And because of that, their OpenCL implementation also works better than others. So there's more tooling not just from nvidia using it, because it. just. works.
Compare this with AMD, whose latest framework is a total mess of "will it work on this GPU?", sometimes needing custom wrangling to enable, etc. etc. and it's effectively supported only on the most expensive compute-only cards.
"What are you talking about? AMD/Intel runs torch just fine!" refers indirectly to the value of having competition in markets, not jump on the (well-funded,slick) monopoly bandwagon.
Hell, nvidia drivers might be often complained about, but for years I would take nvidia because the crappiness was manageable and close to nothing if you were in the target market (desktop workstations running X11 on only nvidia GPUs? The only issue was if you were running super latest kernel).
Now that tools like Blender and the like are increasingly picking up Vulkan support, there is no reason for the above to use Nvidia anymore.
It was when you went outside said use case that things started getting worse, and you had to wait long time for fixes. Sometimes it was because the changes in XFree/X.Org were effectively fixated on how some other vendors did things (cough intel cough), or involved things that effectively nobody wanted to spend engineering to fix properly (like rebuilding rendering path to be able to handle hybrid graphics properly when hybrid graphics came into world years after critical set of X.Org devs decided to stop any real development into X.Org...).
Vulkan Compute also is nowhere close to feature parity with CUDA, so not sure it would be picked up instead.
Also, it's not like Wayland offered a concrete target to support, different approaches of how to actually provide device context to applications were from beginning a "now draw the rest of the owl" thing.