The eighth-generation TPU: An architecture deep dive(cloud.google.com) |
The eighth-generation TPU: An architecture deep dive(cloud.google.com) |
With the expected scale of inference, it makes cost sense to make dedicated hardware for each task if the workloads are even slightly different. Probably similar to the video decoding chips in TVs not being very cheap/efficient compared to chips capable of encoding video.
There's no admission - this has always been known.
The funny thing about scaling laws is that as soon as they were known, the whole objective became learning how to break them - bending the curve, at least. They provided an incredibly useful target, but 'law' was a bit too strong a word.
I would love to see an instruction set reference for one of these, all you have is hardware architectural diagrams or high level APIs.