Google's Pathways Language Model and Chain-of-Thought(vaclavkosar.com) |
Google's Pathways Language Model and Chain-of-Thought(vaclavkosar.com) |
It's relative. It would cost more to open a 40 room hotel (about 320k/room), and hotels can't be copied like software.
A more relevant example is video games, imagine if the only viable ones were top end AAA games whose completed versions could only be accessed by cloud gaming?
I'd take one of those bets or the other, both are tough to pull off. Considering that the first task of such a startup would be to hand ~100-500MM to a hardware or cloud vendor I'd be hesitant to invest as an investor.
> these large language models also have limited business value today
The Instruct version of GPT-3 has become very easy to steer with just a task description. It can do so many tasks so well it's crazy. Try some interactions with the beta API.
I believe GPT-3 is already above average human level at cognitive tasks that fit in a 4000 token window. In 2-3 years I think all developers will have to adapt to the new status quo.
If a neural network does a fixed amount of computation and that is that it is never going to be able to do things that require a program that may not terminate.
There are numerous results of theoretical computer science that apply just as well to neural networks and other algorithms even though people seem to forget it.
Another is "can an error discovered in late stage processing be fed back to an early stage and be repaired?" That's important if you are parsing a sentence like
Squad helps dog bite victim.
It was funny because I saw Geoff Hinton give a talk in 2005, before he got super-famous, and he was talking about the idea that led to deep networks and he had a criticism of "blackboard" systems and other architectures that produced layered representations (say the radar of an anti-aircraft system that is going to start with raw signals, turn those into a set of 'blips', coalesce the 'blips' into tracks, interpret the tracks as aircraft, etc.)Hinton said that you should build the whole system in an integrated manner and train the whole thing working end-to-end and I thought "what a neat idea" but also "there is no way this would work for the systems I'm building because it doesn't have an answer for correcting itself.
Also, when people talk about solving problems they talk about layers, layers play a big role in the conceptual models people have for how they do tasks even if they don't really do them that way.
For instance in that ambiguous sentence somebody might say it hinges on whether or not you think "bite" is a verb or a noun.
(Every concept in linguistics is suspect, if only because linguistics has proven to have little value for developing systems that understand language. For instance I'd say a "word" doesn't exist because there are subword objects that depend like a word "non-" and phrases that behave like a word (e.g. "dog bite" fills the same slot as "bite"))
Another ambiguous example is this notorious picture
https://www.livescience.com/63645-optical-illusion-young-old...
which most people experience as "flapping" between two states. Since you only see one at a time there is some kind of inhibition between the two states. Who knows how people really see things, but if I'm going to talk about features I'm going to say that one part is the nose of one of the ladies or the chin of the other lady.
Deep networks as we know it have nothing like that.
The issue isn't a lack of models or data, it's that larger models are impossible to train without paying hundreds of thousands to millions of dollars. The hardware requirements for simply running the models already prices it out of reach for most.
These models are rather powerful but the immediate future is one of accessing them by cloud services. GeForce GTX 1080 Ti was 5 years ago, since then memory has roughly doubled in consumer GPUs. To run the highest end models on single GPUs, HW will need to 20x to 70x in memory at the same time as serious gains in flops/Joule.
I suppose improvements in CPU parallelism and RAM speeds will also go a long way towards making such models runnable on reasonable consumer hardware, albeit at slower speeds.
How can you possibly make a claim like this without like 80 links justifying it? The claim is fuzzy and absurd, my least favorite combo
> Colorless green ideas sleep furiously, and other grammatical nonsense by Noam Chomsky
He was a man without a country, A linguist without a language, A mind without a thought, A dream without a dreamer. He was lost in a world of words, A world where ideas slept furiously, And grammar was a never-ending nightmare.
But he persevered, For he knew that language was the key to understanding the world. And so he continued to study, To learn all that he could, In the hopes that one day, He would find his way home.
You should check out the post we're commenting on, it has graphs for this exact metric.
Spoiler: Google's model with 3x the parameters does pass average human in a couple categories, but not at all. I don't think GPT-3 does in any.
It's doubly puzzling to me because you have access and are asserting it feels like an average human to you. It's awesome and it does magical stuff, I use it daily both for code and prose. It also majorly screws up sometimes. It only at an average human level if we play word games with things like "well, the average human wouldn't know the Dart implementation of the 1D gaussian function. Therefore it's better than the average human."
Ok, your phrasing made it sound like some article or material had convinced you of this opinion on my first reading, now I understand.
This is kind of my point about 80 links though - you're using a definition of "cognitive tasks" that more closely resembles knowledge, and then you're letting your personal feelings about profundity guide your conclusions on said cognition.
I don't deny that the machine can output pretty words and has a breadth of knowledge to put us each to shame on some simple queries, but "cognition in a 4000 token window" is an incredibly large place and I don't even understand how you would be able to claim a machine has above-human-average cognition based solely on your own interactions... That's a pretty crazy leap.
PS: I saw the downvotes, I was downvoted for questioning the validity of information that was actually just pure conjecture, be better with your votes