Nvidia Grace CPU(nvidia.com) |
Nvidia Grace CPU(nvidia.com) |
https://www.anandtech.com/show/7621/nvidia-reveals-first-det...
So many computing devices such as Nvidia Jetson and Raspberry Pi are simply not available anywhere. I wonder what's he point of bringing out new products when existing products can't be purchased? Won't the new products also simply not be available?
Apple bought out the entire capacity of TSMC's 3nm node [1]. I would not be surprised if the deal actually was for Apple to fund the construction of the fab in exchange for this level of priority.
[1] https://www.heise.de/news/Bericht-Apple-schnappt-sich-komple...
So GPUs are not high priority? Because they are out of stock pretty much everywhere too.
There are shortage in low end, high NM, mature node. This is on 4nm leading node.
All of a sudden there is real choice of ARM CPU on Server. ( What will happen to Ampere ? ) The LPDDR5X used here will also be the first to come with ECC. And they can cross sell those with Nvidia's ConnectX-7 SmartNICs.
Hopefully it will be price competitive.
Edit: Rather than downvoting may be explain why or what you disagree with ?
Apple isn't going to give up the substantial performance benefits of on-package unified memory in order to support DIMMs. Therefore I predict that we'll see a two-tier memory architecture with the OS making automated decisions based on memory pressure, as well as new APIs to allocate memory with a preference for capacity or performance.
The chassis design is new enough that it was designed with an eventual Apple Silicon Mac Pro in mind, so I expect to see minimal change to the exterior. It might shrink and have fewer slots (particularly since most users won't need a slotted GPU) though I think that's unlikely given that its height and width was defined by 5U rack dimensions.
When Jensen talks about Transformers, I know what he’s talking about because I follow a lot of talented people.
https://www.kaggle.com/code/odins0n/jax-flax-tf-data-vision-...
Robots in disguise?
NVidia tooling is the best among all GPU vendors.
CUDA has been polyglot since version 3.0, you get proper IDE and GPGPU debugging tools, and a plethora of libraries for most uses cases one could think of using a GPGPU for.
OpenCL did not fail only because of NVidia not caring, Intel and AMD have hardly done anything with it that could compete on the same tooling level.
This has been done as a commercial product with the Ampere ARM server chips. The base model is about $8k.
However, the price tag will be too high for a lot of desktop buyers.
(There are smaller Tegras around though)
are they going through TSMC like NVIDIA or are they using Samsung?
That is only the CPU though, they might deploy it as Grace + Hopper config.
An interesting angle here is these support partitioning even better than in the A100's. AFAICT, the cloud vendors are not yet providing partitioned access, so everyone just exhausts worldwide g4dn capacity for smaller jobs / devs / etc. But partitioning can solve that...
396MB of on-chip cache… (198MB per die)
That’s a significant part of it too.
Finally, a computer optimised for COBOL.
The contention on that memory means that only segregated non-cooporative as in not "joint parallel on the same memory atomic" will scale on this hardware better than on a 4-core vanilla Xeon from 2018 per watt.
So you might aswell buy 20 Jetson Nanos and connect them over the network.
Let that sink in... NOTHING is improving at all... there is ZERO point to any hardware that CAN be released for eternity at this point.
Time to learn JavaSE and roll up those sleves... electricity prices are never coming down (in real terms) no matter how high the interest rate.
As for GPUs, I'm calling it now: nothing will dethrone the 1030 in Gflops/W in general and below 30W in particular; DDR4 or DDR5, doesn't matter.
Memory is the latency bottleneck since DDR3.
Please respect the comment on downvote principle. Otherwise you don't really exist; in a quantum physical way anyway.
Game Over!
After 13 microarchitectures given the last names of historical figures, it's really weird to use someone's first name. Interesting that Anandtech and Wikipedia are both calling it Hopper. What on Earth are the marketing bros thinking?
So expect a future Einstein GPU to come with a matching Albert CPU.
https://www.intel.com/content/www/us/en/architecture-and-tec...
edit: the market pretty much went from gaming as the primary pillar to gaming + HPC, which makes it far more attractive since you'd expect it to be much less cyclical and less price sensitive. Raja Koduri was hired in late 2017 to work on GPU related stuff, and it seems like the first major products from that effort will be coming out this year. That said, they've obviously had a lot of failures in the acelerator and graphics area (consider Altera) and Koduri has stated on Twitter that Gelsinger is the first CEO to actually treat graphics/HPC as a priority.
[1] https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)
Once enough patents expire all ISAs are eventually equal, I'd think.
This will probably cost them some market share, but they have plenty of cash to weather there current manufacturing issues, they still have world-class CPU design talent which they've proven over and over and over again, and they have some very interesting products & technologies on the roadmap.
ARM offering a fight for the first time ever is not going to be a 1-hit KO against the goliath that is Intel.
Arm has a much more efficient and also much less profitable business model, and Intel will never catch up unless they adopt it. They'll never do that so they'll fade away like IBM.
I know which SKUs I would be prioritising.
People also seem often to forget that interconnects are a significant amount of modern power budgets - look at the Epic IO die often using more than the cores in many workloads. It may be the the M1 family looks less attractive when you actually have to add these requirements.
Perhaps there's some possibility of having both a tightly-coupled RAM package and also have an extensible memory system - though that has significant management complexity if you try to treat it like a cache, or likely needs app support if it's some NUMA system where they're mapped separately. But possible, at "just" the cost of the extra memory controller.
I'm interested to see when the PC form factor goes away completely, likely 1-2 generations of product from now.
The questions everyone has, Ram and GPU.
Heck, we're talking about a company that put an A13 into a monitor. I wouldn't put it past Apple to put an M2 Ultra onto MPX modules and have that GPU/ANE compute performance automatically available through existing APIs. (Would be a great way to bin M2 Ultra chips with a failed CPU core.)
PS2 and DS outsell by about 50 million units.
https://www.vgchartz.com/analysis/platform_totals/
Holay molay.
Depending on the product, volume orders for high-end ICs are typically running between 52 and 72 weeks of lead time at the present, and it's been this way for many months now. So the orders that are getting filled today for parts were placed in early 2021 in most cases.
This is generally very difficult for retailers, because they have had to come up with capital to have a year's worth of orders in the pipeline. So they've been having to stock fewer things -- only what they are absolutely sure will sell -- and can't use real-time sales data to estimate the next month's order.
Welcome to the new normal, it'll be this way for at least another year or two, minimum (until new factories get built plus pre-pandemic levels of productivity, for the most part).
What are the things that 1030 can do that G13 will struggle with?
https://www.realworldtech.com/forum/?threadid=197759&curpost...
M1 comes is MANY flavours with different watt and gflops.
And for CPU Glops I had to get friends to measure it themself: 2.5Glops/W compared to Raspberry 4 2Gflops/W and this time it's 5nm vs 28nm.
Please give me official Gflops and Watt sources and we can discuss.
The page you link is NOT clearly stating watts in a clear way.
> What are the things that 1030 can do that G13 will struggle with?
I real life when you develop games for real hardware you notice their real limitations like fill rates and how they scale different behaviours because they have enough registers to do things in one blow or they have to remember things. It's complicated, but eventually you realize you can just benchmark things for your own needs and for me 1030 is for all purposes as good as 1050 so far: 2000 non-instanced animated characters on 1030 at 30W vs like 2500 for 80W 1050!
Without knowing, I'm pretty sure the M1 cannot do more than 1000 at whatever watt it uses... not that I would ever compile anything for a machine where I need to sign the executable.
I don’t know what your friends have tested but the results make zero sense. Firestorm reaches 1700 points in GB5 at 5W. Pi 4 is under 300 at similar wattage.
I recognize what you are calling out/that performance will be the same on some workloads if you distribute or not. I would just point out less manufacturing causes less e-waste/I would rather live in a world where Nvidia sells 50 million 10*0 cards, than 500 million 1030 cards to create the same amount of compute in the world. It's not just the power costs to consider (but it could be there is a reality where running 500 million 1030s for their lifetime wastes so much less power, that the manufacturing costs to the planet are worth it).
Not only that. On-chip gives you high precision synchronous time (all cores observe the same time) so you can use synchronous distributed algorithms that are unsuitable for ethernet networks.
This type of hardware allows for much better solutions to some problems.
I find it strange that Intel has basically just left the entire market to nvidia, despite having 10-15 years warning and running their own GPU division the whole time.
[1] https://en.wikipedia.org/wiki/Deep_learning#Deep_learning_re... [2] https://en.wikipedia.org/wiki/Titan_(supercomputer)
CUDA and Deep Learning may have started in 2007 and 2010. But their usage, or their revenue potential was unclear back then. Even in 2015, Datacenter revenue was less than one eighth of gaming revenue. And rumours of Google AI Processor ( now known as TPU ) started back in 2014 when they started hiring. In 2021, Datacenter is roughly equal to Gaming revenue, and are expected to exceed them in 2022.
Intel sort of knew GPGPU could be a threat by 2016 / 17 already. That is why they started assembling a team, and hired Raja Koduri in late 2017. But as with everything Intel in post Pat Gelsinger era, Intel was late to react. From Smartphone to Foundry Model and now GPGPU.
[1]: https://en.wikipedia.org/wiki/Xeon_Phi
[2]: https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)
Cuda wasn't as flexible, and the payoff wasn't as big in 2010 or so as it is now.
I've never used a phi, but i can see where they were coming from i think. No need for a full rewrite like Cuda (maybe). The hardware is also more flexible than a GPU, but that turned out to be less important than they thought it might be.
I’ve been using ARM since about 1992 though so I may be biased.
But my point still stands I think, isn't this CPU designed by Nvidia, also just with an ARM-licensed ISA? Similarly AMD you mentioned in your list shares its ISA with Intel, and yet the CPUs are completely different.
Given what a mess shipping has been for the last two years, they appear to be taking the "underpromise, overdeliver" route on shipping quotes.
In some cases you're finding far flung regions relative to the source having better availability, because they were allocated a percentage of the original supply but are also too far away for scalpers to be interested, so they just haven't gone through it as fast. Australia hasn't had too many troubles with some items that are pretty hard to get elsewhere for example. Getting a 3080 in NYC is probably a real challenge, but I can walk to my local parts store and pick one up no dramas.
They're all using the exact same CPU, in fact you can make the air perform (almost) just as well as the pro/mini by opening it up and adding a thermal pad: https://www.cultofmac.com/759693/thermal-mod-m1-macbook-air/
Honestly 16GB Air is pretty epic for $1200, though you probably want to spend the extra $200 for a storage bump as well. I'm very happy with the performance for dev tasks, and with my (displaylink) dock it runs multiple screens just fine too.
Custom build Mac Studio, on the other hand, takes 10-12 weeks.
I can't find a new dGPU at MSRP so I'm going to see if the Intel Arc cards are more readily available, and if not, I'm probably going to part out my desktop and move permanently to using Intel NUCs. Mostly for the GPU contained within. It seems like the days of getting your hands on a dGPU are over, and I'm not fighting over them.
Best long term lifetime decision is to get off any dependency for either of them. I’m looking at electric cars and Intel NUCs. A lot of people that I know moved to laptops for the same reason. A lot of us gave up and many like me no longer trust the market.
But some of those x86 instructions take 0.5 cycles and some of them take 0 if they’re removed by fusion or register renaming. It has worse problems, like loop instructions you can’t actually use but take up the shortest codes.
First answer on google: "maximal power consumption is around 50 watts"
Firestorm is GPU (again google has little info) I'm talking CPU for the Raspberry.
The Raspberry 4 GPU uses 1W. You are conflating because of sunk cost.
You need to compare the same things, apples to apples (no pun) one CPU core on Raspberry consumes 1W on the M1 they are 4W
GPU is 1W vs 5W (if you are correct which I HIGHLY doubt, I suspect 20W for the GPU alone, wikipedia states 39 watts at maximum load so yes 20W for the GPU)!
You need to start looking at the world objectively and understand how it really works, because peak energy is not going to be forgiving if you don't.
Anyway, power usage of M1 variants has been studied in detail. It’s 5 watts peak for a single performance core, 20W peak for a CPU cluster of four cores, 10W for a 8-core GPU (128 FP32 ALUs per core). Bigger M1 variants have respectively higher power consumption because of the larger interconnects/caches etc. DRAM is also a factor. Running at full bandwidth is can consume over 10W of power.
Sony pushed updates to the firmware. Most commodity Blu ray players don't have an (easy) way to update.
But for both the PS2 and PS3, Getting folks to adopt the new formats was definitely a factor.
In the case of the PS2, I think less so; It wasn't the cheapest way to get a DVD player, but IIRC it wasn't that much more than a DVD Player with Component out at the time (note; All PS2s can do Component out, but only later models can play DVDs at 480p) and that made it a lot easier for families to buy-in.
More seriously I wish some of the old consoles were officially opened because the absolute install base of PS1 and NES compatible hardware must be insane. Indie NES games specifically have become popular lately, but I don't think any of the 3D capable consoles are popular or open targets.
https://en.wikipedia.org/wiki/List_of_Dreamcast_homebrew_gam...