Will scaling work?

242 points by saliagato 2 years ago | 283 comments

slibhb 2 years ago |

The best analogy for LLMs (up to and including AGI) is the internet + google search. Imagine explaining the internet/google to someone in 1950. That person might say "Oh my god, everything will change! Instantaneous, cheap communication! The world's information available at light speed! Science will accelerate, productivity will explode!" And yet, 70 years later, things have certainly changed, but we're living in the same world with the same general patterns and limitations. With LLMs I expect something similar. Not a singularity, just a new, better tool that, yes, changes things, increases productivity, but leaves human societies more or less the same.

I'd like to be wrong but I can't help but feel that people predicting a revolution are making the same, understandable mistake as my hypothetical 1950s person.

bee_rider 2 years ago | |

The internet did change things pretty dramatically.

Productivity at information communication tasks just isn’t the entire economy.

I think we are massively more productive. Some of the biggest new companies are ad companies (Google, Facebook), or spend a ton of their time designing devices that can’t be modified by their users (Apple, Microsoft). Even old fashioned companies like tractor and train companies have time to waste on preventing users from performing maintenance. And then the economy has leftover effort to jailbreak all this stuff.

We’re very productive, we’ve just found room for unlimited zero or negative sum behavior.

HarHarVeryFunny 2 years ago | | |

> The internet did change things pretty dramatically.

For sure - I grew up in the mid-late 70s having to walk to the library to research stuff for homework, parents having to use the yellow-pages to find things, etc.

Maybe smartphones are more of a game changer than desk-bound internet though - a global communication device in your pocket that'll give you driving directions, etc, etc.

BUT ... does the world really FEEL that different now, than pre-internet? Only sort-of - more convenient, more connected, but not massively different in the ways that I imagine other inventions such as industrialization, electricity, cars may have done. The invention of the telephone and radio maybe would have felt a bit like the internet - a convenience that made you feel more connected, and maybe more startling being the first such capability?

Negitivefrags 2 years ago | | |

I remember long ago reading an argument that information technology has not actually increased productivity. I really wish I could find a source for this now, but I just can't seem to find it anywhere on the internet. Here it is anyway:

The administration of the Tax Service uses 4% of the total tax revenue it generates. This percentage has stayed relatively fixed over time.

If IT really improved productivity, wouldn't you expect that that number would decrease, since Tax Administration is presumably an area that we should expect to see great gains from computerisation?

We should be able to do the same amount of work more efficiently with IT, thus decreasing the percentage. If instead the efficiency frees up time allowing more work to be done (because there are people dodging taxes and we need to discover that), then you should expect the amount of tax to increase relatively which should also cause the percentage to decrease.

Therefore IT has not increased productivity.

Either it doesn't do so directly, or it does do so directly, but all the efficiency gains are immediately consumed by more useless beurocracy.

imachine1980_ 2 years ago | | |

I feel you are mixing value capture with value generation. If GM produces cars with the same level of margins as Facebook or Google, things will be different. LVMH (Louis Vuitton Group) holds a value equivalent to that of Toyota, Volkswagen, and two-thirds of Ford combined. Louis Vuitton alone was valued more than Red Hat a few months ago. This doesn't mean that Louis Vuitton is more valuable than Red Hat, but rather that it captures Value more effectively than Red Hat.

gradschoolfail 2 years ago | | |

Going slightly beyond armchair economics, here are a couple of articles which discuss the lack of evidence for internet-based productivity so far:

https://archive.ph/baneA https://archive.ph/TrHYN

“Our central theme is that computers and the Internet do not measure up to the Great Inventions of the late nineteenth and early twentieth century, and in this do not merit the label of Industrial Revolution,”

— Robert Gordon, actual economist

HarHarVeryFunny 2 years ago | |

I think AGI can change the world once it gets way beyond human level both in terms of types of beyond-human "senses" and pattern matching/prediction (i.e. intelligence), but we are nowhere near that yet.

On their current trajectory LLMs are just expert systems that will let certain types of simple job be automated. A potential productivity amplifier similar to having a personal assistant that you can assign tasks too. Handy (more so for people doing desk-bound jobs than others), but not a game changer.

An AGI far beyond human capability could certainly accelerate scientific advance and let us understand the world (e.g. how to combat climate change, how to address international conflicts, how to handle pandemics) so be very beneficial, but what that would feel like to us is hard to guess. We get used to slowly introduced (or even not so slowly) changes very quickly and just accept them, even though today's tech would look like science fiction 100 years ago.

What would certainly be a game changer, and presumably will eventually come (maybe only in hundreds of years?) would be if humans eventually relinquish control of government, industry, etc to AGIs. Maybe our egos will cause us to keep pretending we're in control - we're the ones asking the oracle, we could pull the plug anytime (we'll tell ourselves) etc, but it'll be a different world if all the decisions are nonetheless coming from something WAY more intelligent than ourselves.

HarHarVeryFunny 2 years ago | | |

Odd to see this down-voted... I guess my prediction of the future has rubbed someone the wrong way, but if you disagree then why not just reply ?!

throwup238 2 years ago | |

By that standard, nothing has meaningfully changed since agriculture and domesticated animals. We're still killing each other, forming hierarchical societies, passing down stories, eating, drinking, sleeping, and making families - except now we're killing each other from afar with gunpowder, forming those hierarchies using the guise of democracy or whatever, passing down stories in print rather than speech, can use condoms to control when we make families, and so on.

Human civilization has accumulated many layers of systems since then and the internet changed all of them to the point that many are barely recognizable. Just ask someone who's been in prison since before the internet was a thing - there are plenty of them! They have extreme difficulty adapting to the outside world after they've been gone for forty or fifty years.

BriggyDwiggs42 2 years ago | | |

Yeah in a real sense nothing has changed. I wonder if they finally will when we start modifying our bodies and minds to extreme degrees, that’d be my guess for when the model breaks down.

lnxg33k1 2 years ago | |

Imagine telling those same people in the 50s that all those changes in productivity would come for the benefit of no one since the work week would be the same and purchasing power would decline

qudat 2 years ago | | |

Such a wild take. Would you want to live in the 50s? I definitely would not.

Xelynega 2 years ago | | |

Don't have to imagine, same thing is happening right now with LLMs.

I see "AI safety" brought up as a laughable attempt at stopping the progress of LLMs, when in reality the people talking about "AI safety" are the people trying to say that the majority will not benefit from this technology.

jerpint 2 years ago | |

The internet has allowed us to interact in ways that were inconceivable at the time; think communication and speed of information for one.

When agents start being more reliable I think we will start seeing applications we couldn’t possibly anticipate today

hardwaregeek 2 years ago | |

It's important to remember that the internet is still very very new. Like the generation of digital natives are barely in adulthood. Sure, it's existed in some form for about 40 years, but most of the world didn't have access for the longest time. I wouldn't be surprised if we see massive changes in the next 20 years from the people who grew up on the web (specifically people outside the United States and Europe, where access was harder for a long time)

Jensson 2 years ago | | |

"Digital native" are the people who grew up with computers. Many kids born in 1980's and later grew up with computers in their earliest memories.

I'd call the current generation "Social media natives", because that is the biggest difference from the previous generation. 90s kids grew up with games and communication, but they were free from facebook, youtube and instagram.

_a_a_a_ 2 years ago | |

> but we're living in the same world with the same general patterns and limitations

seems odd. What 'patterns' and 'limitations' do you still see? Because I see so much has changed.

beebmam 2 years ago | |

> leaves human societies more or less the same

My mom, who is 70 years old, regularly tells me how profoundly transformative the internet has been for society.

Frost1x 2 years ago | |

If we ignore technology for the sake of technology and look at daily life, things we need like food, shelter, healthcare, transportation, socialization, etc. then I'd say technology has definitely improved some of these aspects.

Food distribution has improved as have most logistics in general. These efficiencies have somewhat been shared with the general public but in a lot of cases, those gains were captured by private enterprise.

Healthcare has improved a little bit, iterative progress can be made more quickly, shared, and moved into translational medicine as practice. Drug discovery has improved quite a bit, as have logistics around getting said drugs in the hands of people who need them and doing so affordably. This improved lives and longevity.

Socially we can communicate far easier. It remains to be seen to me if thise is always an improvement. Humans seem to be designed for much smaller social circles and don't seem to be capable of taking much advantage in their daily lives of increases frequency, scale, and reach of socialization.

The list goes on. It's not exactly linearly correlated with technology growth because ultimately it boils down to actionable information. Just because we have more information or more processing capability around information doesn't mean we get direct returns from that or that we don't reach limits where we simply don't have use for the additional gains. Information has to be actionable in some way, otherwise it's just intermediate data products that may or may not benefit us. I know can ready daily news from some small town in Southern Japan if I wanted to. That doesn't improve my life mostly, but it's there.

We have piles and piles of scientific literature we could share and iterate on towards new discoveries for humanity. That doesn't mean in my daily need for survival and balance with recreation I have time to contribute to things I find interesting or necessary, after all I am to some degree a slave of my needs within the economic system I'm entrenched in. I have bills, I have to earn money, and I have to work.

Even if that wasn't the case maybe or maybe not would I be able to contribute more back to society than I do now at my paid profession. Currently I'd say I do pretty well in this department in terms of reach. Without that I might struggle.

cultureswitch 2 years ago | |

The internet did change things dramatically, but the change wasn't as dramatic as industrialization. And that one matured over two centuries.

herval 2 years ago | |

> And yet, 70 years later, things have certainly changed, but we're living in the same world with the same general patterns and limitations. With LLMs I expect something similar. Not a singularity, just a new, better tool that, yes, changes things, increases productivity, but leaves human societies more or less the same.

by what criteria do you see the world as the same today vs 70 years ago?

dnissley 2 years ago | | |

Look around. The most significant change is that there are a lot more screens, and a lot more "cheap stuff" (consumer electronics, food, clothes, entertainment, plastic anything, etc).

Things "behind the scenes" have perhaps changed a lot -- e.g. financialization, more competitive markets, explosion of communication options, which are the driving force behind those visible changes.

ketzo 2 years ago | | |

I mean, very broad strokes, but I can see GP’s point.

- people eat plants and animals

- people pay money for goods and services

- there are countries, sometimes they fight, sometimes they work together

- men and women come together to create children, and often raise those children together

etc, etc, etc

The “bones” of what make up a capital-S Society are pretty much the same. None of these things had to stay the same, but they have so far.

lysecret 2 years ago | |

Good point to me the internet was just "other people", what differentiated is not the 4 people you know but literally (almost) and potentially all other people.

With AI, the way I see it, it is just virtual other people. Of course, a bit stranger but more simillar than you think.

david_allison 2 years ago | | |

There's currently little to no learning or feedback loop due to the relatively small context window sizes.

I've done many language exchanges with people using Google Translate and the lack of improvement/memory of past conversations is a real motivation killer; I'm concerned this will move on to general discourse on the internet with the proliferation of LLMs.

I'm sure many people have already gone around in circles with rules-based customer support. AI can make this worse.

arketyp 2 years ago | |

My take on this is that much of work and problem solving is about understanding the problem. So I think human abilities will remain the bottleneck. I pose this thought experiment: Is it possible to design an AI system for a monkey which gives it super-monkey abilities?

red75prime 2 years ago | | |

For a monkey it's impossible to design... pretty much anything beside a few simple tools. So, no. A monkey cannot design a bow, a loom, a tractor, a computer, or an AI of any kind.

We had designed many tools that beat us in various aspects. This is an invalid analogy.

Aerbil313 2 years ago | |

Technology is the one force that drives modern human societies, Western ones even more. The world has changed dramatically, especially with smartphones. I suggest reading Ted Kaczynski.

qudat 2 years ago | |

What do you think would need to be different for it to be considered meaningful to you?

yashap 2 years ago | |

Depends on the quality of the AGI. If it’s legitimately as good or better than humans at almost everything, while being cost effective, it will utterly and completely change society. Humans will be obsolete at almost every job - why pay a human if an AGI can do it as good or better, for free(-ish)? Best case scenario, the AGI is benevolent, traditional work is gone, but we find some post-capitalism system, and new ways to keep life interesting/meaningful. Worst case scenario, pure sci-fi dystopia.

If it’s closer to a midpoint between GPT-4 and true human intelligence, then sure, I agree with you, it’s a significant change to society but not an overhaul. But if it’s actually a human level (or better) general intelligence, it’ll be the biggest change to human society maybe ever.

amelius 2 years ago | |

Imagine explaining to someone from 1950 that we now all have a TV-set on our office desks, with 1000+ channels ...

I bet their reaction would be a facepalm.

berniedurfee 2 years ago |

I think there’s a huge assumption here that more LLM will lead to AGI.

Nothing I’ve seen or learned about LLMs leads me to believe that LLMs are in fact a pathway to AGI.

LLMs trained on more data with more efficient algorithms will make for more interesting tools built with LLMs, but I don’t see this technology as a foundation for AGI.

LLMs don’t “reason” in any sense of the word that I understand and I think the ability to reason is table stakes for AGI.

hokeone 2 years ago |

>Furthermore, the fact that LLMs seem to need such a stupendous amount of data to get such mediocre reasoning indicates that they simply are not generalizing. If these models can’t get anywhere close to human level performance with the data a human would see in 20,000 years, we should entertain the possibility that 2,000,000,000 years worth of data will be also be insufficient. There’s no amount of jet fuel you can add to an airplane to make it reach the moon.

Never thought about it in this sense. Is he wrong?

jsnell 2 years ago |

The original title ("will scaling work?") seems like a much more accurate description of the article than the editorialized "why scaling will not work" that this got submitted with. The conclusion of the article is not that scaling won't work! It's the opposite, the author thinks that AGI before 2040 is more likely than not.

bee_rider 2 years ago | |

It might be nice to modify the title a bit though, to indicate that it is about AGI.

Obviously scaling works in general, just ask anyone in HPC, haha.

zoogeny 2 years ago |

I was thinking last night about LLMs with respect to Wittgenstein after watching this interesting discussion of his philosophy by John Searle [1].

I think Wittgenstein's ideas are pertinent to the discussion of the relation of language to intelligence (or reasoning in general). I don't meant this in a technical sense (I recall Chomsky mentioning that almost no ideas from Wittgenstein actually have a place in modern linguistics) but from a metaphysical sense (Chomsky also noted that Wittgenstein was one of his formative influences).

The video I linked is a worthy introduction and not too long so I recommend it to anyone interested in how language might be the key to intelligence.

My personal take, when I see skeptics of LLMs approaching AGI, is that they implicitly reject a Wittgenstein view of metaphysics without actually engaging with it. There is an implicit Cartesian aspect to their world view, where there is either some mental aspect not yet captured by machines (a primitive soul) or some physical process missing (some kind of non-language system).

Whenever I read skeptical arguments against LLMs they are not credibly evidence based, nor are they credibly theoretical. They almost always come down to the assumption that language alone isn't sufficient. Wittgenstein was arguing long before LLMs were even a possibility that language wasn't just sufficient, it was inextricably linked to reason.

What excites me about scaling LLMs, is we may actually build evidence that supports (or refutes) his metaphysical ideas.

1. https://www.youtube.com/watch?v=v_hQpvQYhOI&ab_channel=Philo...

ralusek 2 years ago |

Almost everything interesting about AI so far has been unexpected emergent behavior, and huge gains through minor insights. While I don't doubt that the current architecture is likely to have a current ceiling below that of peak human intelligence in certain dimensions, it's already surpassed it in some, and there are still gains to be made in others through things like synthetic data.

I also don't understand the claims that it doesn't generalize. I currently use it to solve problems that I can absolutely guarantee were not in its training set, and it generalizes well enough. I also think that one of the easiest ways to get it to generalize better would simply be through giving it synthetic data which demonstrates the process of generalizing.

It also seems foolish to extrapolate on what we have under the assumption that there won't be key insights/changes in architecture as we get to the limitations of synthetic data wins/multi-modal wins.

mgaunard 2 years ago |

I think the more interesting question is how long will people cling to the illusion that LLMs will lead us to AGI?

Maintaining the illusion is important to keep the money flowing in.

machiaweliczny 2 years ago |

I think there's a need to separate knowledge from learning algorithm. There's need to be a latent representation of knowledge that models attend to but the way it's done right now (with my limited understanding) doesn't seem to be it. Transformers seems to only attend to previous text in the context but not to the whole knowledge they posses which is obvious limitation IMO. Human brain probably also doesn't attend to whole knowledge but loads something into context so maybe it's fixable without changing architecture.

LLMs can work as data extraction already, so one can build some prolog DB and update it as it consumes data. Then translate any logic problems into prolog queries. I want to see this in practice.

Similar with usage of logic engines and computation/programs.

I also think that RL can come up with better training function for LLMs. In the programming domain for example one could ask LLM to think about all possible test for given code and evaluate them automatically.

I was also thinking about using diffusER pattern where programming rules are kinda hardcoded (similar to add/replace/delete but instead algebra on functions/variables). Thats probably not AGI path but could be good for producing programs.

PlasmonOwl 2 years ago |

Author is leveraging mental inflexibility to generate an emotional response of denial. Sure, his points are correct but are constrained. Let’s remove 2 constraints and reevaluate:

1 - Babies learn much more with much less 2 - Video training data can be made in theory at incredible rates

The questions becomes: why is the author focusing on approaches in AI investigated in like 2012? Does the author think SOTA is text only? Are OpenAI or other market leaders only focusing on text? Probably not.

Xelynega 2 years ago | |

Isn't 1 a point for their "skeptic" persona?

If babies learn much more from much less, isn't that evidence that the LLM approach isn't as efficient as whatever approach humans implement biologically, so it's likely LLM processes won't "scale to ago"?

For video data, that's not how LLMs work(or any NNs for that matter). You have to train them on what you want them to look at, so if you want them to predict the next token of text given an input array, you need to train it on the input arrays and output tokens.

You can extract the data in the form you need from the video content, but presumably that's already been done for the most part, since video transcripts are likely included in the training data for gpt.

sgt101 2 years ago |

>Here’s one of the many astounding finds in Microsoft Research’s Sparks of AGI paper. They found that GPT-4 could write the LaTex code to draw a unicorn.

a lot of people have tried to replicate this, I have tried. It's very hard to get GPT-4 to draw a unicorn, also asking it to draw an upside down unicorn is even harder.

YetAnotherNick 2 years ago |

> ‘5 OOMs off’

I think Google, Microsoft and facebook could easily have 5 OOM data than the entire public web combined if we just count text. Majority of people don't have any content on public web except for personal photos. A minority has few public social media posts and it is rare for people to write blog or research paper etc. And almost everyone has some content written in mail or docs or messaging.

HarHarVeryFunny 2 years ago | |

Maybe, and certainly with the current trend of synthetic data they can also create it, but I don't think quantity of data beyond what something like GPT-4 has been trained on will in of itself change much other than reducing brittleness by providing coverage of remaining knowledge gaps.

Quality of data (which I believe is at least part of why synthetic data is being used) can perhaps make more of a difference and perhaps at least partly compensate in a crude way for these models lack of outlier rejection and any generalization prediction-feedback loop. Just feed them consistent correct data in the first place.

nmca 2 years ago | |

From the article, and relevant here:

I’m worried that when people hear ‘5 OOMs off’, how they register it is, “Oh we have 5x less data than we need - we just need a couple of 2x improvements in data efficiency, and we’re golden”. After all, what’s a couple OOMs between friends?

No, 5 OOMs off means we have 100,000x less data than we need.

FergusArgyll 2 years ago |

I love Dwarkesh, his podcast is phenomenal.

But every article/post of this kind immediately begs the question; What is AGI? I have yet to hear even a decent standard.

It always seems like I'm reading Greek philosophers struggling with things they have almost no understanding of and just throwing out the wildest theories.

Honestly, it raised my opinion of them seeing how hard it is to reason about things which we have no grasp of.

cavisne 2 years ago |

If the size of the internet is really a bottleneck it seems Google is in quite a strong position.

Assuming they have effectively a log of the internet, rather than counting the current state of the internet as usable data we should be thinking about the list of diffs that make up the internet.

Maybe this ends up like Millenium Management where a key differentiator is having access to deleted datasets.

jeremyjh 2 years ago | |

I'd guess at most they have 5x more data, but it is probably nowhere near that, and the article says 100,000x more data is needed.

nextworddev 2 years ago | |

True that said market structure changes so rapidly that old datasets aren’t that useful for most strategies

lossolo 2 years ago |

A few more interesting papers not mentioned in the article:

"Faith and Fate: Limits of Transformers on Compositionality"

https://arxiv.org/abs/2305.18654

"Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks":

https://arxiv.org/abs/2311.09247

"Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve"

https://arxiv.org/abs/2309.13638

"Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models"

https://arxiv.org/abs/2311.00871

tw1984 2 years ago |

LLM is going to bring tons of cool applications, but AGI is not an application!

You can feed your dog 100,000 times a day, but that won't make it a 1,000kg dog. The whole idea that AGI can be achieved by predicting the next word is just pure marketing nonsense at best.

nextworddev 2 years ago |

I am in the believer camp for simple reasons: 1) we haven’t even scratched the surface of government led investments into AI, 2) AI itself could probably discover better architectures than transformers (willing to bet heavily on this)

diggan 2 years ago | |

> AI itself could probably discover better architectures than transformers (willing to bet heavily on this)

Is there any existing cases of LLMs coming up with novel, useful and namely better architectures? Either related to AI/ML itself or any other field.

jeremyjh 2 years ago | |

> AI itself could probably discover better architectures than transformers

The entire subject of the article is concerned with what it will take and how likely it is than an AI will ever will able to generate improvements like this.

bob1029 2 years ago |

I think the "self-play" path is where the scary-powerful AI solutions will emerge. This implies persistence of state and logic that lives external to the LLM. The language model is just one tool. AGI/ASI/whatever will be a system of tools, of which the LLM might be the least complicated one to worry about.

In my view, domain modeling, managing state, knowing when to transition between states, techniques for final decision making, consideration for the time domain, and prompt engineering are the real challenges.

lern_too_spel 2 years ago | |

It's not necessary for the author's purpose of providing more data. We're only training on one kind of input so far, text, from which these models have built some understanding of the world. Humans train on more inputs, and the data to provide those inputs for training a model is readily available, in far larger quantities than individual human brains consume. Data is not the issue.

Xelynega 2 years ago | | |

We're training on text because that's what we're making the model do.

It's a fact of neural networks that to train them supervised you need the training data in the expected input for(vector of n thousand preceding tokens for LLMs) with the expected output(the next token for LLMs). "Training them on video" would mean converting the video to a format we can train the llm with, then training the LLM with that info.

This would probably be a 1 OOM increase at maximum, if the video transcripts aren't already a part of the training data for gpt.

revskill 2 years ago |

No, human is not that intelligent to generate super intelligent bot in a short time.

My estimation is about 200 years in future to have a "human-brain AI" that works.

All idea should be treated equally, not based on revenue metrics. If everyone could make a Youtube clone, the revenue should be divided equally to all of creator, that's the way the world should move forward, instead of monopoly.

Everything will be suck, forever.

LunicLynx 2 years ago |

Here is an idea. Maybe the most optimized neural network is the brain. Computation to energy consumption ratio. So essentially the way doing this in silicon is just pointless.

There must be a reason we can do so much while consuming so little, and then again struggling with other tasks.

What is the success if we build a machine that consumes just heaps of energy and then is as bad in maths as us?

sweezyjeezy 2 years ago | |

There's a couple of false dichotomies here

- to say that because we're "more optimised" we must be the most optimised. Our brains are optimised well for certain things, sure, but computers are far more efficient at e.g. crunching numbers than we are

- to say that there's no success in a machine that can't currently beat us at math - this year has already proven that false

daepyonim 2 years ago | | |

That google paper really gave a bunch of idiots a whole bunch of ammunition. The four color theorem was proven by a machine long ago and it was worthless, about as worthless as what funsearch did!

andyg_blog 2 years ago |

Really stellar, well-sourced article that comes across as unbiased as possible. I especially enjoyed the almost-throw-away link to "the bitter lesson" near the end, the gist of which is: "Methods that leverage massive compute to capture intrinsic complexity always outperform humans' attempts to encode that complexity by hand"

nemo44x 2 years ago |

Where in the hype cycle are we for LLMs? Are we in the late stages of the rise or over the peak and beginning the slide?

collaborative 2 years ago | |

LLMs are still too expensive to run and therefore can't be supported by ads. If costs get lower we'll see them being pushed _a lot_ more

kevindamm 2 years ago | |

If you have the answer to that question you could make some very lucrative investments.

crowbahr 2 years ago | |

Still on the climb imo

mewpmewp2 2 years ago |

Why wouldn't you include the "LLM" part in the title?

Hint for everyone else here:

It's about scaling LLMs.

mewpmewp2 2 years ago | |

After reading the article, I really enjoyed it and the believer + skeptic perspective. However, I only touched it because I thought "meh, what is there going to be about web scaling".

HarHarVeryFunny 2 years ago |

I'm not sure how one can percentage-wise compare scaling and algorithmic advances - per Dwarkesh's prediction that "70% scaling + 30% algorithmic advance" will get us to AGI ?!

I think a clearer answer is that scaling alone will certainly NOT get us to AGI. There are some things that are just architecturally missing from current LLMs, and no amount of scaling or data cleaning or emergence will make them magically appear.

Some obvious architectural features from top of my list would include:

1) Some sort of planning ahead (cf tree of thought rollouts) which could be implemented in a variety of ways. A simple single-pass feed forward architecture, even a sophisticated one like a transformer, isn't enough. In humans this might be accomplished by some combination of short term memory and the thalamo-cortical feedback loop - iterating on one's perception/reaction to something before "drawing conclusions" (i.e. making predictions) based on it.

2) Online/continual learning so that the model/AGI can learn from it's prediction mistakes via feedback from their consequences, even if that is initially limited to conversational feedback in a ChatGPT setting. To get closer to human-level AGI the model would really need some type of embodiment (either robotic or in a physical simulation virtual word) so that it's actions and feedback go beyond a world of words and let it learn via experimentation how the real world works and responds. You really don't understand the world unless you can touch/poke/feel it, see it, hear it, smell it etc. Reading about it in a book/training set isn't the same.

I think any AGI would also benefit from a real short term memory that can be updated and referred to continuously, although "recalculating" it on each token in a long context window does kind of work. In an LLM-based AGI this could just be an internal context, separate from the input context, but otherwise updated and addressed in the same way via attention.

It depends too on what one means by AGI - is this implicitly human-like (not just human-level) AGI ? If so then it seems there are a host of other missing features too. Can we really call something AGI if it's missing animal capabilities such as emotion and empathy (roughly = predicting other's emotions, based on having learnt how we would feel in similar circumstances)? You can have some type of intelligence without emotion, but that intelligence won't extend to fully understanding humans and animals, and therefore being able to interact with them in a way we'd consider intelligent and natural.

Really we're still a long way from this type of human-like intelligence. What we've got via pre-trained LLMs is more like IBM Watson on steroids - an expert system that would do well on Jeopardy and increasingly well on IQ or SAT tests, and can fool people into thinking it's smarter and more human-like than it really is, just as much simpler systems like Eliza could. The Turing test of "can it fool a human" (in a limited Q&A setting) really doesn't indicate any deeper capability than exactly that ability. It's no indication of intelligence.

xbar 2 years ago |

Yes, for some things.