Bard is much worse at puzzle solving than ChatGPT(twofergoofer.com) |
Bard is much worse at puzzle solving than ChatGPT(twofergoofer.com) |
It's not really realistic to expect people to give Google credit for these amazing models they have published results about but haven't let people play with - they have given people Bard and people are evaluating it based on the criteria most obvious to them - a comparison to a very similar product that has just been released.
In any case, it's a massive marketing blunder, the public opinion formed within the last hours was overwhelmingly "Bard sucks compared to ChatGPT."
This is the best they can do under pressure.
ChatGPT surprised the world with how good it was, then Google scrambled to get something out quick.
A project like this is a massive undertaking, the first mover has the advantage that they can calmly refine their model until they find it presentable.
The question is, is what Google is delivering good for the timeframe since OpenGPT exploded in popularity enough for Google leadership to take note. Since that moment, realistically, is when they put pressure on their devs to push something out the door.
I think we'll see a better iteration soon. Not only from Google, but from other competitors.
Unless they release a model one can "use" and verify their claims it's literally silly to make this statement.
It's almost silly to presume anything without proofs. People are judging Google based on what Google has shown.
They behave like Yahoo when Google took over.
Is this not simply: "Bard is worse than ChatGPT at having seen the 'how-to-play' page for my side project during its training"?
Based on that it looks like the author asked all 25 test puzzles in one big prompt, which one supposes would favor larger models. To compare "puzzle solving" you'd think it would make more sense to ask one puzzle at a time?
Personally, as someone who worked at a company that was up over 500% during the pandemic, shipped absolutely nothing during that spike, and then deflated below their pre-pandemic pricing, I saw the foley in hiring smart people first hand.
It's not enough to hire the smartest people, and in fact it can be a competitive disadvantage. The smartest people often want their piece of the product to reflect their ingenuity no matter how ancillary it is to the core mission. Unfortunately that often precludes the kind of agility that businesses need to stay competitive.
OpenAI managed to poach Googlers by simply not having fiefdoms built by smart people™. I imagine if Google had built GPT-4, it wouldn't be having downtime today. Because it wouldn't be public. And it might never be public because it doesn't scale for Google scale yet, and the ethicists want their say, and we need to integrate it into Borg and the front end hasn't passed through enough layers of design and...
Good joke. MS has always been in the 'incompetent evil' quadrant, Newcomers just keep inexplicably giving them the benefit of the doubt or assuming/insisting they've "changed".
History rhymes with itself.
I hope you understand that it's only because of inability of OpenAI's servers to keep up with demand or some issue in their backend code - language models themselves can't "crash" like normal programs on some kind of input, because they "just" generate new tokens.
That is like saying the Excel document didn't crash, but Excel did when it tried to parse it. As far as I know there is no proof that you can't cause a LLM to crash with user input.
> because they "just" generate new tokens.
I can write a program that counts to 100 that crashes reliably.
And yet one puzzle they hammer Bard for failing is "Cactus Practice". What accent do you have to have for that to be a perfect rhyme?
The terminal /tʌs/ is not quite the same thing as /tɪs/; since they are both unstressed, the difference can be hard to notice in fast speech, but becomes clear when enunciating.
It is very fast and wins the search benchmarks here:
Any rhyming done is an impressive result.
Dont forget that the model has seen all the poems and raps on the internet. It built some latent space where certain words always cluster together in the context of poems, and in which location.
In this case it really has the best database available to say, what next word would slot in nicely here - as that is precisely what it was trained to do.
It is amazing, but somewhat explicable as an emergent effect.
Dont forget that the model has seen all the poems and raps on the internet. It built some latent space where certain words always cluster together in the context of poems, and in which location.
In this case it really has the best database available to say, what next word would slot in nicely here - as that is precisely how it was trained.
I find it more amazing tbh that you can ask for a poem about something, and that it then sticks to the plot, makes references to the start etc than the actual rhyming.
Bard may still be much worse than ChatGPT at solving all kinds of puzzles, but the article is click bait for promoting the author's word game, not an actual investigation that warrants that conclusion.
I completely disagree with the "hasty rhyming test" - Skeleton and Gelatin don't rhyme (-ton vs -tin), and rhyme worse than protein and poutine (-een vs --een).
It's located at https://twofergoofer.com/blog/gpt-4
cactus / practice ?
they rhyme to my mind
skeleton / gelatin ?
also rhyme to my ears
protein / poutine
also rhyme enough to be considered to rhyme
You appear to be operating under the impression that the every syllable of a rhyming couplet has to rhyme exactly for it to be considered a rhyme. This is an incorrect assumption. In fact, the above rhymes are arguably more pleasing because they are inexact rhymes rather than being exact forced rhymes.
In your world the only actual rhymes would be
bold / cold / gold
and
double / trouble / bubble
types of rhymes but the world considers the following to be perfectly acceptable
sent to meet her / centimetre
and so on
Hence pooh-teen and proh-tein do not rhyme. Skell-ih-tin and Gell-ih-tin do rhyme.
A game like this requires a pretty tight rhyming definition to not annoy players in a given day!
Thanks for reading: https://www.masterclass.com/articles/perfect-vs-imperfect-rh...
the use of novel puzzles is frankly awesome because there's a much lower chance of contamination from previous puzzles so we get a chance to see how much generalization they've achieved.
A lot of “does this rhyme with that?” depends on the context in terms of how strict the rhyming must be or not.
https://rarehistoricalphotos.com/windows-95-launch-day-1995/
I remember people commenting "never seen such a thing before, for a computer OS!"
"Many electronics stores held midnight launches for the product, with thousands of people waiting in line to be the first to get their hands on the operating system.
The release was a tremendous success. Microsoft sold 7 million copies in the first five weeks, and Windows 95 was soon the most popular operating system on the market."
Except its not actually doing that calculation, so one wrong one shouldn't truly affect the rest like "Real" math.
Well, then you simply don't understand how they work.
Matrices don't multiply themselves. You need hardware and software. As I pointed out the LLM is effectively just data that is being processed by a program. It is silly to assume you have no bugs in that software or the underlying operating system.
I guess the examples there might be accent dependent. Protein/poutine is the only one of those first examples that really rhymes to me; skeleton/gelatin and cactus/practice both have different vowels. Maybe different for you though.
??? Oh yeah, says who?
protein / poutine
p[]oh teen / poo teen => both start with a p, then there's an oh or oo (which are similar) and both end the same way – disyllabic rhyme
cactus / practice ?
[]ah ck təss / []ah ck tiss => ignoring the first consonant (cluster) which anchors the rhyme and pronouncing the u as a schwa (which it is), the ck's are the same and təss and tiss are totes similar – disyllabic rhyme
skeleton / gelatin ?
trisyllabic goodness – again, the way we pronounce the on in first word is not like the on in frond but like the ən in motion (UHn) and the way we pronounce the at in the second word is not like the at in bat or cat but like the ət in … hmm, none spring to mind but it's an UHt sound here if you listen to it – sgɛ́lɪtən or ˈskelɪtən – ˈʤelətɪn – so you've k vying with g (both hard), e with e, l with l, ɪ with ə (close sounding!), t with t, ə with ɪ (close sounding!), n with n
===
listen with your ears, not with your eyes
And you are right of course that gelatin and practice have the short "ee" sound at the end whereas skeleton and and cactus have the "uh" sound.
I looked it up, and sure, it sounds about the same in Canadian French as someone saying Vladimir Putin[1]. But I've never heard anyone say it that way myself, and in neutral French (according to the linked video, at least), it's pronounced 'pooh-teen', which sounds exactly like protein (I don't know if you pronounce protein different, but for me it's 'pro-teen').
I'm not operating under that impression, but the author is [1].
To me, the final "sounds" should match - not every syllable, an end rhyme according to wiki [0]. Specifically I would consider a rhyme to require matching sounds "at least from last vowel to end", but I don't think of rhymes first from the strict definition. Perhaps it's an accent thing but "-us" in cactus is not the same sound as "-ice" in practice. If a child made a poem with these sounds I would tell them "good job, it's a rhyme", and perhaps for the purpose of a silly word game too. But I would not use it as a passing case for a test of any sort like the author.
What's more pleasing is irrelevant, what is relevant is if its a true rhyme.
[0] https://en.wiktionary.org/wiki/end_rhyme#English
[1] https://news.ycombinator.com/reply?id=35258385&goto=item%3Fi...
Indeed, it's an accent thing. In America at least, pronouncing "cactus" with an "is" or an "us" sound are both valid.
I think that given that the author provided a working definition, and your provided failure example is actually passing that definition (just only for the author's dialect), and given that you are now essentially trying to change the subject to "my definition of rhyme is the correct one"; well, you're just being...pedantic? argumentative? I'm not sure.
https://www.vice.com/en/article/vb7qwb/inside-vladimir-pouti...