>A lot of things have changed in the last quarter-century – in 1997 NVIDIA had yet to even coin the term “GPU”
[1] https://www.anandtech.com/show/21542/end-of-the-road-an-anan...
I prayed for programmable blending via "blending shaders" (and FWIW programmable texture decoding via "texture shaders" - useful for custom texture format/compression, texture synthesis, etc) since i first learned about pixel shaders waay back in early 2000s.
Somehow GPUs got raytracing before programmable blending when the former felt like some summer night dream and the latter just about replacing yet another fixed function block with a programmable one :-P
(still waiting for texture shaders though)
https://medium.com/pocket-gems/programmable-blending-on-ios-...
This is a great example of using it: https://vulkan.org/user/pages/09.events/vulkanised-2024/vulk...
OTOY does all their rendering with compute nowadays.
It's vaguely like comparing a full CPU emulator with something that implements the ADD and MUL instructions.
Just to clarify, Dolphin's specialized shaders simulate fixed-function blending/texturing too. What's different about ubershaders is that a single shader can handle a wide variety of fixed-function states whereas a specialized shader can only handle a single state.
Thus whereas specialized shaders have to be generated and compiled on-the-fly resulting in stutter; ubershaders can all be pre-compiled before running the game. Add to this the capability to asynchronously compile specialized shaders to replace ubershaders and the performance loss of ubershaders becomes negligible. A rare case of having your cake and eating it too.
Edit: source https://www.computer.org/publications/tech-news/chasing-pixe...
Found these here https://books.google.de/books?id=Jzo-qeUtauoC&pg=PT7&dq=%22g... Computerworld magazine 1976 VGI called a graphics processing unit (GPU)
> 3400 is a direct-writing system capable of displaying 3-D graphics and alphanumerics with speeds up to 20.000....
It's not what I would call a GPU, but I think it's hard to draw lines when it comes to naming things and defining things.
If anyone else wants to try to find the real GPU
https://www.google.com/search?q=%22gpu%22+graphics+processin...
> a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second
It’s kind of arbitrary, even when you take out the processing rate. But prior to that there was still a significant amount of work expected to be done on the CPU before feeding the GPU.
That said, the term GPU did definitely exist before NVIDIA, though not meaning the same thing we use it for today.
"The TMS34010, developed by Texas Instruments and released in 1986, was the first programmable graphics processor integrated circuit. While specialized graphics hardware existed earlier, such as blitters, the TMS34010 chip is a microprocessor which includes graphics-oriented instructions, making it a combination of a CPU and what would later be called a GPU."
https://en.m.wikipedia.org/wiki/TMS34010
And they weren't alone in the history of graphics hardware.
the "old way" was to engineer a bit of silicon for each one of those things, custom-like. problem was how much silicon to give to teach feature, it almost has to be fine-tuned to each individual game, a problem
So nvidia comes up with the idea to sort of have a pool of generic compute units, each of which can do T&L, or shading, etc. Now the problem of fine-tuning to a game is solved. but also now you have a mini compute array that can do math fast, a general-purpose unit of processing (GPU-OP), which was a nod from NVIDIA to the gaming community (OP - overpowered)
https://archive.org/details/byte-magazine-1985-02/1985_02_BY...
> Two years later, Nvidia introduced the GPU. He [Curtis Priem] recalls that
> Dan Vivoli, Nvidia's marketing person, came up with the term GPU, for
> graphics processing. "I thought that was very arrogant of him because how
> dare this little company take on Intel, which had the CPU," he said.https://archive.org/details/byte-magazine-1985-02/1985_02_BY...
NVIDIA didn’t invent the GPU. They coined the modern term “graphics processing unit”. Prior to that, various hardware existed but went by other expanded names or don’t fully match NVIDIAs arbitrary definition, which is what we use today.
The Emotion Engine (CPU) to GS (GPU) link was what made the PS2 so impressive for the time, but it also made it somewhat hard to code for and immensely hard to emulate. If I recall correctly, the N64 has something like 4x the memory bandwidth (shared) of the PS1, and the PS2 had roughly 6x (3GB/s) the system bandwidth of the N64. However, the PS2's GS RAM clocked in at 48GB/s, more than the external memory bandwidth of the Cell (~25GB/s), which meant that PS3 emulation of PS2 games was actually done with embedded PS2 hardware.
It was a bonkers machine. I don't think workstation GPU bandwidth created 50GB/s for another 5-6 years. That said, it was an ultra simple pipeline with 4MB of RAM and insane DMA requirements, which actually got crazier with the Cell in the PS3. I was at Sony (in another division) in that era. It was a wild time for hardware tinkering and low level software.
That's kinda overselling it, honestly. When you're talking about the GIF, only the VU1's vertex pipeline was able to achieve this speed directly. PATH2/PATH3 used the commodity RDRAM's bus (unless you utilized MFIFO to mirror a small portion of that to the buffer, which was much more difficult and underutilized than otherwise since it was likely to stall the other pipelines); the exact same bus Pentium 4's would use a few months after the PS2's initial launch (3.2-6.4GB/s). It's more akin to a (very large) 4M chip cache, than proper RAM/VRAM.
As to the PS3 being half that, that's more a design decision of the PS3. They built the machine around a universal bus (XDR) versus using bespoke interconnects. If you look at the Xbox 360, they designed a chip hierarchy similar to the PS2 architecture; with their 10MB EDRAM (at 64GB/s) for GPU specific operations.
As to those speeds being unique. That bandwidth was made possible via eDRAM (on-chip memory). Other bespoke designs utilized eDRAM, and the POWER4 (released around the same time) had per-chip 1.5M L2 cache running at over double that bandwidth (100GB/s). It also was able to communicate chip-to-chip (up to 4x4 SMP) at 40GB/s and communicate with it's L3 at 44GB/s (both, off-chip buses). So other hardware was definitely achieving similar to and greater bandwidths, it just wasn't happening on home PCs.
Edit: if memory serves, SPE DMA list bandwidth was just north of 200GB/s. Good times.
As I recall, the partnership between Intel and Rambus was pilloried as an attempt to re-proprieterize the PC RAM interface in a similar vein to IBM’s microchannel bus.
Back then the appeal of console games to me were that beyond a convenient factor, they were also very specialised hardware for one task - running games.
I remember playing FF12 (IZJS) on a laptop in 2012 and it ran very stable granted that was 6 years post release but by then had the emulator issues been fully solved?
Re. Wild time for low level programming I remember hearing that Crash Bandicoot had to duck down into MIPS to eke out every extra bit of performance in the PS1.
I very much enjoyed this video that Ars Technica did with Andy Gavin on the development of Crash Bandicoot: https://www.youtube.com/watch?v=izxXGuVL21o
The most “famous” thing about Crash programming is probably that it’s all in lisp, with inline assembly.
At my last job, we had ASICs that allowed for single-sample audio latency with basic mixing/accumulation functions for pulling channels of audio off of a bus. It would have been tragically expensive to reproduce that in software, and the required hardware to support a pure software version of that would have been ridiculous.
We ended up launching a new platform with a different underlying architecture that made very different choices.
But for more reading https://www.psdevwiki.com/ps2/Graphics_Synthesizer
The author has a bunch of other things in their post they don’t expand upon either which are significantly more esoteric as well though, so I think this is very much geared for a particular audience.
A few link outs would have helped for sure.
Likewise most game developers weren't fit to drive PS2 without additional help from Sony.
I believe Gran Tourismo 3 used the ps1 hardware for everything except rendering, which was the kind of nuts thing you could do if your games were single platform and with huge budget.
By the mid-90s, C compilers were good enough - and CPUs were amenable enough to C - that assembly was only really beneficial in the most extreme of cases. Sony designed the Playstation from the jump to be a 3D machine programmable in C - they were even initially reluctant to allow any bit-bashing whatsoever, preferring devs used SDKs to facilitate future backwards compatibility. No doubt some - even many - PS1 games dropped down into assembly to squeeze out more perf, but I doubt it’s anything like “all”.
As for Lisp, Crash used a Lisp-like language, “GOOL”, for scripting, but the bulk of the game would’ve been native code. It was with Jak & Daxter that they really started writing the game primarily in a Lisp (GOAL, GOOL’s successor).
It does use the Psy-Q SDK which contains assembly, but it's not something that the game developers have written.
Compilers were good enough by that point in time for the vast majority of games.
All games tap into Assembly, in some form or the other, even nowadays.
(I never had a PS3, AFAIR my PSP just downclocked it's CPU, I remember battery life playing classics was awesome...)
People on Slashdot got really worked up when Intel signed a deal with Rambus to make RDRAM the only supported technology on Intel x86—from which they relatively quickly backtracked.
Anandtech (sniff) of course has great contemporary background: