Groq surpasses 1,200 tokens/sec with Llama 3 8B(twitter.com) |
Groq surpasses 1,200 tokens/sec with Llama 3 8B(twitter.com) |
Groq - mainly hardware, the LPU (https://wow.groq.com/lpu-inference-engine/)
Grok - Elon's de jour AI endeavor
I might well be wrong about the etymology here, but I understand "grokking" to be a term for a phenomenon in training neural networks.
What I'm not sure about is which was there first – AI companies called some version of "grok" or that term.
My bullshit detector went off when I first saw Groq posted on HN - a startup is making their own chips (doubt) that performs faster than anything Nvidia has for inference (doubt) and accelerates LLMs to hundreds/thousands of tokens per second?? Mega doubt.
But... then I tried their demo, and... yeah, it's that good. Such an amazing company of talented individuals.
The other issue they don't mention is power, space, efficiency etc. We want to run larger models with less power, fewer server blades, at lower cost. Not use more server blades, more chips, more power, etc.
If anything, Google's TPU advancements chart a viable course. I suspect both Groq and Cerebrus will overcome the challenges and offer competitive compute options, depending on the context
When will Groq support a real API (not experimental beta preview)?
When will Groq support logprobs?!
When will Groq actually tell us what their rate limit is?!
Until these aren't answered, many of us can't actually build on Groq.Edit: It seems I'm getting downvoted by Groq employees...
For groqcloud the rate limits are fairly clear [1]. For example, for llama3-8b-8192 you get 30 requests per minute, 14400 per day, and 30000 tokens per minute. That said, it's the beta free tier so it sometimes goes down randomly and the limits may be different once they start charging for it.
I'm not affiliated with groq but I use groqcloud to make some simple chatbots since it's currently free.
Economics and costs are hard to predict. For example, Groq is not using HBM chips. So probably the cards are a lot easier to source.
Its not clear what the capacity of these systems are in terms of total users, or even tokens per second. Then you factor in cost. Then you realize all vendors will match a competitors pricing. Then you realize Groq doesn't sell chips.
¯\_(ツ)_/¯
The only thing you have is the public API to benchmark against: https://artificialanalysis.ai/
- SambaNova has real revenue from big customers - SambaNova can run any model on a single node at the speed Groq requires - SambaNova can do low latency inference just like Groq, but can also run large batches and host hundreds of models on a single deployment - SambaNova does not quantize models unless explicitly stated - SambaNova can run training at perf competitive with Nvidia, as well as fastest inference in the world at full precision
It really isn't a competition. Groq has done great as garnering hype in recent months, but it is a house of cards.
So every clock cycle you're doing useful work rather than loading up people into batches. And thats why the arch will probably win for inference, for training you're basically competing with software eco system and silicon density. AKA NVIDIA can give TSMC more money to get more ALUs on the die.
I think other places have attempted dataflow (FPGA etc) but they all basically had buffers (due to non-determinism in networks stack and even ram). SambaNova seems indistinguishable from an FPGA with a few clock cycles difference. I think they blew their shot with a Series D ($600 million???) where they made more of the same old. Maybe Intel will buy them to augment Altera? Looks like chasing parity with existing strategies.
I buy the Groq hype because its something different, certainly the public demo helped. HN is about the future.
[1] https://www.semianalysis.com/p/groq-inference-tokenomics-spe...
And they can, Twitter (why everything gets claimed as his personal work I never know) isn't using their trademark.
As they (Groq) themselves have said...
> the difference of one consonant (q, k) only matters to scrabblers and spell checkers
Grok the term has been around since at least 1961.[0] The fact that a company decided to take a common term (especially in the CS field), change one letter and trademark it doesn't mean nobody can use the original spelling at all.
Funnily enough, Groq is trying to claim grok and groq are not associated terms in court filings while trying to bully another company with the same name:
> The word “Groq” essentially did not exist before Ross created it and has no known meaning in any language beyond its intended association with Groq, Inc.
vs that companies reply
> The word “grok” originated in Robert Heinlein’s 1961 novel Stranger in a Strange Land. Merriam Webster defines “grok” as “to understand profoundly and intuitively.” The Oxford English Dictionary defines “grok” as “[t]o understand intuitively or by empathy; to establish rapport with.”
Once Groq realized their trademark didn't include healthcare data, they tried to trademark...the other companies name.
Groq, in arguing that their mark is different from "grok" (at the USPTO) is because one cannot trademark common words. They are applying for plain marks (without font/color/logo) and this is very normal. I went through this with a proper name trademark
In the Groq vs Grok, they are arguing that the average person will confuse the marks (as can be seen in many HN posts about Groq, like this one). Their argument is that Grok should not be given a trademark beforehand due to this potential confusion. They can also take the case to court should the trademark be granted. Given the common confusion, Groq appears to have good standing to make this argument.
To call someone defending their own trademarks "bullying" is inaccurate
Groq says no such thing. Their two public things so far include
1) a company that rebranded to Groq Healthcare < 2 year after Groq launched (their trademark at the time had nothing to do with health, they then added it to their trdemark and tried to trademark the competitors name)
2) a C&D to twitter over the name
Didn't know about Heinlein coining it though, that's cool!
> Please don't comment about the voting on comments. It never does any good, and it makes boring reading.
As an aside, I wonder when "please don't make a quote from the HN commenting guidelines the only contribution of your comment" will join that list...
HN is largely community driven moderation, helping dang do his job, so I suspect this meta don't wouldn't make it
I didn't comment on the substance because others already had by then, not sure why they didn't prefer your OG comment...