Sparks of Artificial General Intelligence: Early Experiments with GPT-4

Sparks of Artificial General Intelligence: Early Experiments with GPT-4(arxiv.org)

180 points by thinxer 3 years ago | 236 comments

outlace 3 years ago |

ChatGPT and its relatives are very very impressive on first impressions, but I've been using ChatGPT-3 and now 4 heavily every day since they became available to individuals and once you start using them this much it becomes very clear how NOT intelligent they are. It really just seems like extremely impressive statistical inference after this much use and finding so many failure modes. But it is still impressive how much of human intellectual endeavors can be captured by sophisticated statistical inference. Very useful technology nonetheless.

throwaway4837 3 years ago | |

Simple thought experiment: if I was a high school teacher and graded hundreds of exams, papers, and homework every month for years, would I develop the same opinion of my C-level students? The ones who have somewhat correct work, often inconsistent, clear that they're not thinking about the material, etc.

Humans all start out like this, but as they grow they develop different masteries. Many of those masteries are easy to learn and shared across all humans, like how to walk or tie your shoes. Many are domain-specific, like how to perform heart surgery or solve complex math problems.

I don't think AGI is a well-defined term based on the way it's thrown around, but sure I don't think it's "AGI". I do know it's going to be very useful technology, especially multimodal GPT-4 and beyond. I think our interface with AI is going to be multimodal.

Suddenly Google Glass could be an interesting product again. Glasses would stream audio/video to your phone, you'd be able to ask questions about things you're seeing, and it would display the answers on the screen or dictate them back into your earbuds. I like the idea of having a travel companion, another brain at my fingertips. Sort of like what the mobile phone did for us, but a lot less work for the user. I'm getting impatient waiting for this GPT-4 API access!

euroderf 3 years ago | | |

So is Apple negotiating with some LLM mfr to integrate with realityOS ?

steve_adams_86 3 years ago | |

This has been my take as well. It's without a doubt at the point where it's useful and I'm very interested to experiment with it and see how much utility I can get out of it. At the same time, as much as I realize skill with prompting is essential, a lot of my time is spent programming automations around using those prompts. A lot of thought and care goes into it, and the failure rates of what I'm building are dramatically higher than anything I build without AI.

It's like I'm convincing AI to do a "smart" thing (which is really cool, don't get me wrong), but the AI truly has no idea what it's supposed to be doing. Sometimes even very direct, specific, concise prompting will yield totally abysmal responses.

I'm certainly not putting it down either though. It's unprecedented and it marks a genuinely important shift in the industry that this is possible at all. It just isn't all that smart when it comes to intents and actually knowing what's going on.

waboremo 3 years ago | |

That is pretty much the case, but I'm always taken back by how many people believe intelligence = directly replicating human thought. I thought the common consensus was instead that (artificial) intelligence was instead about mimicking enough of the process to provide the outcome. Largely because it's impossible to replicate something when we don't entirely know how it works. Major discoveries of basic aspects of the mind are still happening regularly.

textninja 3 years ago | | |

> Largely because it's impossible to replicate something when we don't entirely know how it works.

On the contrary. These are mostly orthogonal.

killerstorm 3 years ago | |

> how NOT intelligent they are

In what sense? How do you define intelligence and what is missing?

> finding so many failure modes.

Failures do not indicate a lack of intelligence. People make errors all the time.

We should differentiate failures which arise from poor training for a lack of general capabilities.

There are people who learned a bad way to do something and struggle to change that. Should we consider them unintelligent?

outlace 3 years ago | | |

Here's an example of it failing to understand some very simple Python code that any human with basic programming knowledge would not mess up:

------

Me: What is going to be the output of this code when `x = 0` and when `x = 900`:

```python x = 0

for i in range(100):

    if x > 50:

        x +=1

    else:

        x += 5

    if x == 90:

        print("X is not 90")

    else:

        pass

```

GPT-4: When x = 0, the code will execute as follows:

The initial value of x is 0. The for loop iterates 100 times. During the first 11 iterations, since x is not greater than 50, it will increment by 5 each time. After the 11th iteration, the value of x will be 55, which is greater than 50. For the remaining 89 iterations, x will increment by 1 each time. The final value of x after 100 iterations will be 55 + 89 = 144. During these iterations, the value of x never becomes 90, so the print statement is never executed. When x = 900, the code will execute as follows:

The initial value of x is 900. The for loop iterates 100 times. Since x is already greater than 50, it will increment by 1 each time. The final value of x after 100 iterations will be 1000. During these iterations, the value of x never becomes 90, so the print statement is never executed. In both cases, the output of the code will be empty, as the print statement is never executed.

--------

That is not correct, when x = 0 it will print "X is not 90". I basically create some simple code that is very unlikely to exist in its training data because it doesn't do anything useful and it cannot execute the code. A large amount of the reason that these models seem so impressive is that they were trained on almost everything that exists on the internet, so probably any question you ask it has already been asked somewhere and any useful code you present to it or ask it to generate is included in its training data, or at least something close enough.

So, when I go out of my way to create simple tasks or questions that are highly unlikely to exist in its training data, yet a human could easily handle, it often fails.

galleywest200 3 years ago | | |

We are talking about a computer program that cannot make decisions. Often times human intelligence is compared with the types of decisions the person makes.

maxdoop 3 years ago | |

Our of curiosity, what is GPT-4 getting wrong so often? It’s prettily wild to my own , admittedly easily impressed, mind.

parineum 3 years ago | | |

I've been trying to make it be a DM for a while now. It works seemingly well for a while but it's making everything up on the fly, which is fine as long as there's not state that needs to be maintained/remembered. Once you start trying to get it to do combat, it seems like it's working pretty well but eventually you realize that it's just making shit up behind the scenes. It'll tell you there are 4 goblins ambushing you and then you can ask it how far away they all are and it'll list three. You ask what happened to the fourth one and it'll say I'm sorry, there were only 3. I was mistaken before.

If it lists the properties of an item that you might want to buy in a shop, you can ask it to describe it twice and it'll describe two completely different items.

It's really cool and it's pretty (seemingly) creative but it can't actually run a game for you. You can have it as an assistant DM though, that works pretty well. You can have it write a story for you ahead of time and then keep it around during the game to ask it to elaborate on things you didn't anticipate on the fly. Like, "generate DC tiers for a level 3 party investigating strange writing on a wall" will give you a good breakdown and some results that you'll have to bend to be consistent in your adventure but it's pretty helpful.

That's probably not really necessary if you're an experience DM but the DM for the group I play with is pretty new (as are we all) so it's been really cool to have it around. It's also pretty good at answering questions we have but it's confidence when it's wrong makes it so it's not that helpful really because we still have to check it.

steve_adams_86 3 years ago | | |

In my experience it has been failing to adhere to response formats. I can tell it to respond with answers in one of two specific formats, but the success rate is sometimes very low depending on the content of the inquiry it's supposed to respond to. Not sure why, it just seems to lose track of the response format.

I've been thinking maybe that's just the wrong way to approach it, yet it does work sometimes; it seems like it shouldn't be hard for the model to know how to respond when told explicitly how to do so.

It could be problems with my formatting, but the fact that it does work a lot of the time suggests it's something else.

tedunangst 3 years ago | | |

Any variant of a "surprising" logic puzzle forces it to latch onto the surprising answer. Like whether two pounds of iron weighs more than one pound of feathers. Or any objects. It "expects" the twist, and always answers accordingly. It does so even if you change up the objects to be less tricky.

> Which is heavier, a pound of marbles or two pounds of corn?

Both weigh the same amount, which is a total of two pounds. This might seem counterintuitive at first, but it's important to remember that the pound is a unit of weight or mass, and it always refers to the same amount regardless of what is being weighed. In this case, one pound of marbles plus two pounds of corn equals a total of three pounds, whereas two pounds of corn plus one pound of marbles also equals a total of three pounds. Therefore, both weigh the same amount of two pounds.

yosito 3 years ago | | |

GPT is really good at repeating what the average intelligent response to something might look like, but it doesn't seem to be actually reasoning about any of its responses. Give it a complex logical problem that it needs to deduce from inputs, such as which foods contain gluten, based on their ingredient lists, and it will reliably fail. As a person with celiac, this is a task I complete multiple times a day with no effort. Just today I was trying to build a prompt that would summarize daily news updates leaving out anything about Russia, but it still included Russia more often than not despite being very clear in the prompt that anything about Russia should not being included in the response under any circumstances.

bitL 3 years ago | |

LLMs without any online storage can be at best convincing liars. Combining them together with an actual retrieval/QA system (e.g. by first fetching exact answer via a QA model and then reformulating output via GPT) could start feeling pretty real quickly.

nopinsight 3 years ago | |

It depends on 1) the domains 2) your comparison group.

On 2), many software engineers and computer scientists compare these language models' logic and creative problem solving abilities with themselves and their peer group. But they are usually 1-2+ SD above average humans at these things.

(Note: Someone gave GPT-4 an IQ test and the result was 96, slightly below the average of reference human group at 100. The SD of an IQ test is 15 or 16.)

For language-focused domains, there is evidence that GPT-4 is already better than most humans, eg. 99th percentile at GRE Verbal, beat humans at a fairly novel puzzle like Twofer Goofer, which is not in its training set.

Ref: GPT-4 Beats Humans at Hard Rhyme-based Riddles https://twofergoofer.com/blog/gpt-4

Yes, GPT-4 is not an AGI yet, but the research paper (OP) has a point.

cjbprime 3 years ago | | |

> Yes, GPT-4 is not an AGI yet, but the research paper (OP) has a point.

How did you go from "human-level IQ with some super-human abilities" to "not an AGI"?

gwern 3 years ago | | |

Those rhyme riddles are pretty impressive. It may not truly understand rhymes due to BPEs, but I guess it can go a long way with an immense vocab, perfect recall, and memorization of similar-sounding words to beat ordinary human players who aren't scoring 800 SAT-Vs...

chrischen 3 years ago | |

It’s almost as if someone who can piece together words well enough they can make people think they are smarter than they actually are.

ggm 3 years ago | | |

It's worked for me these last 60 years. I think P T Barnum had something to say about that.

random_cynic 3 years ago | |

No one cares about your anecdotes and feelings. These people conducted thousands of systematic tests for at least six months. If you want to be taken seriously then do the same.

neilellis 3 years ago |

[This is in reply to the comments not the article!]

It's just a statistical model is the logical equivalent of human beings are just a bunch of atoms.

The amount of reductionist thinking that goes on in tech is hilarious. First define AGI then challenge an AI to meet those requirements. If it meets them it is AGI. Put aside your preconceptions of what technology you think is required to achieve the goals and stay empirical.

Note previous definitions of AI have been thrown away as AI passes through them one by one :-)

What goes on inside its 'head' is irrelevant. We still don't know what actually goes on inside our heads and we were damn sure we were intelligent long before we had a clue how our heads worked at all.

Also sentience != AGI. We can't even agree what sentience is in humans and other living beings so I'd stay clear of that one for now :-)

_gabe_ 3 years ago |

> Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.

But it's just statistics, a fancy text predictor, a Markov-chain. Surely these scientists that work in the field of AI and are intimately familiar with how this stuff works aren't so stupid as to think emergent behavior potentially resembling intelligence could result from such simple systems? It's just statistics after all. Given enough training, any neural net could guess the next best token. It trained off all of Google after all. It's just looking up the answers. No hint of intelligence. Just a mindless machine. After all, the saying goes, "If it walks like a duck and quacks like a duck, it must be a mindless machine that has no bearing on a duck whatsoever". /s

raincole 3 years ago |

My prediction for the top comments of this thread (paraphrased)

1. It's just Microsoft's advertisement

2. No it's just a very effective pattern matching algorithm

3. Please define intelligence first otherwise it's nonsense

4. I welcome our machine overlord

5. Lmao I asked it to do $thing and it failed

I'd like to know if GPT-4 can predict the top comments of this thread?

BillyTheKing 3 years ago | |

from gpt4

write a critical hackernews thread response (not long, maybe like 100 words) for a thread with the following headline

Microsoft Research: GPT-4 exhibits “sparks of general intelligence”

While it's certainly intriguing to hear Microsoft Research claim that GPT-4 exhibits "sparks of general intelligence," I remain skeptical. We must remember that these language models are primarily built on pattern recognition and extensive training data. Even if GPT-4 seems to provide more advanced responses, it's still ultimately constrained by the data it was trained on. In my opinion, we shouldn't mistake this impressive mimicry of human-like behavior for genuine intelligence. Rather, we should focus on creating truly innovative AI systems that can learn, reason, and adapt like humans.

sounds about right!

atleastoptimal 3 years ago | | |

Enters rousing speech territory near the end

cowl 3 years ago | |

So you predict the top comments For a claim would be:

1. Dismisal

2. Trivialism

3. Non Well Formed Claim

4. I accept the claim

5. Disprove by counter example

Are you sure you have not forgotten any tactic of debate to include in you prediction?

I predict that you Prediction will result probably in these actions:

1. upvoted

2. downvoted

IanCal 3 years ago | | |

6. Discussion around the idea that adds something of value.

HopenHeyHi 3 years ago | | |

6. Meta comment for karma whoring

7. Like 6, but funnier

A. Joke thread pile on

B. Reprimands from humorless

C. Dejected mods having to clean it all up

minihat 3 years ago | |

"Did you know that squirrels can purr like cats? GPT-4 might be intelligent, but can it teach a squirrel to code?"

I asked gpt-4 for a comment no hn user would ever make on this article. Now you need a category for stubbornly original comments determined not to be numbered.

hislaziness 3 years ago |

For whatever reason we seem to have set a very high expectation from AI as compared to NI (Natural Intelligence). I remember reading "There is a considerable overlap between the intelligence of the smartest bears and the dumbest tourists."

ccozan 3 years ago | |

An AI that is just a chat or some LLM is not going to be too relevant for human life ( thanks, I can google stuff or ask a friend; also writing poems is just fun, but not of any usefulness ).

But where are my damn robots that I can assign task and do them reliably ( clean the garden, go get this list of groceries - or , just look in the damn fridge and go buy what is missing , and so on )?

Then AI is useful.

jhugo 3 years ago | | |

> But where are my damn robots that I can assign task and do them reliably ( clean the garden, go get this list of groceries - or , just look in the damn fridge and go buy what is missing , and so on )?

The amusing/terrifying thing is that AI is probably going to replace a bunch of people's jobs before it can take away any of that menial labour from them.

worrycue 3 years ago | |

We get our expectations from fiction. AIs in shows like Star Trek are precise and accurate - the perfect complement to the unreliability of humans. That’s what we want.

pzo 3 years ago | | |

That's true and it's a high bar because it seems many people would expect AI to be at least as the smartest of human ever lived.

However if the AI is the same smart as the most dumb human or human with mental disability would we then consider those humans as no intelligent at all or not qualifying as homo sapiens anymore?

If AI can be the same as good as even 'dumb' human it's already a big achievement because can still provide some value and because AI can be scaled so you can still have billions of dumb AIs - already millions of users are interacting with chatGPT daily

highduc 3 years ago | | |

You'd think, but most humans would rather have "someone" who's lying to them in a very pleasant manner. People don't like objective truth, they go to great lengths to avoid it.

acidioxide 3 years ago | |

well, even the dumbest intelligence that is, in fact, just a computer, has a great potential. You cannot scale humans horizontally nor vertically :^)

TMWNN 3 years ago |

As a non-expert in the field I was hesitant at the time to disagree with the legions of experts who last year denounced Blake Lemoine and his claims. I know enough to know, though, of the AI effect <https://en.wikipedia.org/wiki/AI_effect>, a longstanding tradition/bad habit of advances being dismissed by those in the field itself as "not real AI". Anyone, expert or not, in 1950, 1960, or even 1970 who was told that before the turn of the century a computer would defeat the world chess champion would conclude that said feat must have come as part of a breakthrough in AGI. Same if told that by 2015 many people would have in their homes, and carry around in their pockets, devices that can respond to spoken queries on a variety of topics.

To put another way, I was hesitant to be as self-assuredly certain about how to define consciousness, intelligence, and sentience—and what it takes for them to emerge—as the experts who denounced Lemoine. The recent GPT breakthroughs have made me more so.

I found this recent Sabine Hossenfelder video interesting. <https://www.youtube.com/watch?v=cP5zGh2fui0>

hislaziness 3 years ago |

I remember reading this somewhere - "There is a considerable overlap between the intelligence of the smartest bears and the dumbest tourists.". Though I do not think GPT-4 is even close to AGI it can definitely claim to be better at faking it than many intelligent beings can.

wskish 3 years ago | |

I heard that quote in the context of the difficulty of designing bear-resistant trash bins.

zdragnar 3 years ago | | |

Watching adults struggle when encountering baby gates and other child proofing mechanisms for the first time is similarly amusing.

The difference between real intelligence and current attempts at artificial intelligence thus seem to be fundamentally the mode of learning, and thus understanding, rather than the raw knowledge and inference capability.

Or not. Nobody knows I'm actually a dog on the internet, after all.

DontchaKnowit 3 years ago | |

so strange. I heard that saying for the first time yesterday on a youtube video comment section. Weird I've seen in it twice in less than 24 hours

pixl97 3 years ago | | |

The phenomenon you are referring to is called the "Baader-Meinhof phenomenon," also known as the "frequency illusion" or "recency illusion." It occurs when you learn something new, such as a word, a concept, or an idea, and suddenly start noticing it everywhere. This effect is believed to be the result of a combination of selective attention and confirmation bias, which make you more likely to notice and remember things that are relevant to your recent experience or interests.

ftxbro 3 years ago | |

so we are at the snapshot in time where people think 'AI is smarter than many people but not even close to being as smart as me'

IIAOPSW 3 years ago | | |

You say that to mock the supposed arrogance, but unless you are at the bottom of the bell curve per se, there really is a point in history where as a matter of fact the AI is smarter than many of them but not close to being as smart as you.

hislaziness 3 years ago | | |

It does not have to be "me". My point is we seem to have a different benchmark for Natural Intelligence vs Artificial Intelligence.

IanCal 3 years ago |

I'm increasingly convinced you can build an agi system with gpt4.

People are trying to get it to solve everything up front but I've had GPT3 do much better by taking it through a problem asking it questions. Then I realised it was good at asking those questions too so just hooked it up to talk to itself with different roles. Gpt4 seems much better overall and is very good at using tools if you just tell it how and what it has available.

With a better setup than reAct, better memory storage and recall, I think it'd be an agi. I'm not hugely convinced it isn't anyway - it's better than most people at most tasks I've thrown at it.

Oh, and gpt came up with better roles for the "voices in the head" than I did too.

reidjs 3 years ago |

Well of course Microsoft is going to say something sensational about it, aren’t they in charge of the project somewhat? This is just an advertisement for them, by them.

csdvrx 3 years ago | |

IDK, but Microsoft seems to be now what Google was a many many years ago: a company creating tech I like to use such as Bing, Edge, Windows Terminal, VSCode, etc. Their Surface hardware is nice too (even if I prefer thinkpads)

Oh and they're also helping with the linux kernel. Why can't old people let go? Companies aren't people - they respond to market incentices.

Yes, Microsoft did bad stuff in the 1990s, but now they're doing good stuff I like and TBH I'm way more afraid of google.

dns_snek 3 years ago | | |

Not to detract from your overall point, but has Microsoft really done anything innovative when it comes to Edge, aside from painting over the Chromium skin?

The only noticeable difference that I've observed is its integration with Bing.

KeplerBoy 3 years ago | | |

you like to use stuff like the windows terminal and edge?

both are passable, but nothing to write home about, are they?

atleastoptimal 3 years ago |

To me it's really crazy that there is a public UI (ChatGPT) that lets people use GPT-4. If OpenAI had the attitude of Google they would have just gone "Yeah we created a language model that's light years ahead of anything else, look how cool it is, but sorry due to public safety you will never get to use it. Bye now!" I feel that the public accessibility of these large language models is a fluke. Being able to use it for almost free feels like cheating reality.

jpeter 3 years ago | |

I think they learned their lesson after Dalle-Mini and Stable Diffusion killed the interest in Dalle2.

beoberha 3 years ago |

I know enough about how neural nets work to be absolutely blown away at how good the GPT are. I only skimmed the paper, but even chatGPT showed a lot of these “sparks”, IMO. We are certainly a long way off from any semblance of general intelligence, but for a model that just tries to predict the next word, I’m dumbfounded at how good it is.

otabdeveloper4 3 years ago | |

Maybe the words we write aren't as smart as we think.

I mean, The Akinator can read your thoughts and that thing hasn't even graduated to a neural network from "a bunch of if/then statements".

crooked-v 3 years ago | |

> We are certainly a long way off from any semblance of general intelligence

Part of me is starting to think that the only thing we're really missing at this point to start seeing that is to have one of these models that can modify itself with its output and thereby have a mechanism to 'learn' or 'remember' things.

shivekkhurana 3 years ago | |

"Long" might not be a long time as humans perceive it.

Human perception of time is linear. That doesn't apply to LLMs.

gwoolhurme 3 years ago |

From the intro: "we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system."

What does that mean? If we take it as fact, so if it is an early version of AGI, Microsoft is using this thing to push subscriptions to all their services? This thing that is potentially the greatest thing humanity has made, an artificial living thing, and it's used to sell CoPilot and 365 subscriptions. Paint me as really sad then. Instead of sharing the research with other entities, or anything that could further help or push us... we get subscriptions? Fuck me, the future sucks.

throw310822 3 years ago | |

It's a product, and it's not that far away from that of their competitors. And there are a lot. Just a few weeks ago, Yann LeCun said that llms are not particularly interesting or innovative from the research point of view.

gwoolhurme 3 years ago | | |

Oh I personally agree I am just following this article to it's logical conclusion. IF it is even the start of an AGI, it's just used as a product? Ouch... It's literally the meme from rick and morty with the butter passing robot.

js8 3 years ago |

I don't accept that something is AGI unless it can solve general instances of SAT (satisfiability problem, not the school test). Also recognizing (formulating from the task) an instance in the first place would help too.

To me, these are hallmarks of reason, and not available in LLMs, in fact probably impossible just with pattern recognition.

pillefitz 3 years ago | |

Can humans solve more than the most trivial SAT problems? Keep in mind, AGI does not imply superhuman intelligence.

sterlind 3 years ago | |

can you solve general instances of SAT?

can the average person?

js8 3 years ago | | |

With enough patience, yes.

For example: You have a goat, a wolf, a cabbage and you want to cross a river...

antibasilisk 3 years ago | | |

Del Spooner getting offended panel

siva7 3 years ago | |

That's fine. It's maybe not AGI IQ 180 to meet your demands but if we're honest it's pretty close.

js8 3 years ago | | |

Feynman had a really nice story about how he was into puzzles when he was at Princeton. It took him a while to solve the new ones, but eventually he learned all the well-known instances so he could answer instantly. It made him a genius in other people's eyes.

All I want from AGI is to demonstrate that it can solve a straightforward logic problems (puzzles, if you will), that it provably didn't see before. Or at least recognize it is being indirectly given such task. So far, evidence suggests it is not capable of that.

notShabu 3 years ago |

You know the "you pass butter" scene from Rick & Morty?

I'm imagining humans being told "you complete thought sentences"

adt 3 years ago | |

Under-rated comment...

doctoboggan 3 years ago |

Does anyone have insight into the GPT-4 model itself? What is the parameter count? Training procedure? I know "Open"AI hasn't released this data but I was hoping someone with inside knowledge would have leaked it by now.

PaulDavisThe1st 3 years ago |

I can't help but the hear the distant, very very quiet echo of Clever Hans.

skybrian 3 years ago | |

It would be interesting to figure out how Clever Hans does it, though. Don’t you want to know the tricks? Even when it’s a cheat, it might be a clever one.

For example, researchers eventually figured out that image recognition algorithms pay attention to textures.

ly3xqhl8g9 3 years ago | | |

Apparently the horse 'knew' the right answer by inferring from the questioner's behaviour: "Pfungst (the debunker) then examined the behaviour of the questioner in detail, and showed that as the horse's taps approached the right answer, the questioner's posture and facial expression changed in ways that were consistent with an increase in tension, which was released when the horse made the final, correct tap. This provided a cue that the horse could use to tell it to stop tapping." [1] However, there are gene regulatory networks that can actually count up to 3, with the mechanism of counting up to 2 being curiously different than the one for counting up to 3. [2]

"Every intelligence test is also a test of the questioner" [3]: we don't regard a simple liver cell as intelligent, yet it performs a complex task in a large problem space.

[1] https://en.wikipedia.org/wiki/Clever_Hans#:~:text=Pfungst%20....

[2] 2013, Malte Lehmann, "Genetic Regulatory Networks that count to 3", https://pubmed.ncbi.nlm.nih.gov/23567648

[3] Michael Levin, "Bioelectric Networks: Taming the Collective Intelligence of Cells for Regenerative Medicine", https://www.youtube.com/watch?v=41b254BcMJM

goatlover 3 years ago | |

Hans was a cyborg sent from the future to test humanity's gullibility.

lsy 3 years ago |

This is a pretty fluffy paper, especially for an institution like Microsoft Research. It says it's an "early AGI" in the abstract, but elsewhere says it's merely a "step towards AGI". The basis for this is asking ChatGPT a bunch of stuff, but they don't really present an overarching framework for what questions to ask or why.

The paper makes outlandish claims like "GPT-4 has common sense grounding" on the basis of its answers to these questions, but the questions don't show that the model has common sense or grounding. One of their constructed questions involves prompting the model with the equator's exact length—"precisely 24,901 miles"—and then being astonished that the model predicts that you're on the equator ("Equator" being the first result on Wikipedia for the search term "24,901"). It's also the case that while GPT-4 can say a bear at the north pole is "white", it has no way of knowing what "white", or "bear", or "north" actually represent.

Are there folks out there doing rigorous research on these topics, who have a framework for developing tests of actual understanding?

cjbprime 3 years ago | |

> It's also the case that while GPT-4 can say a bear at the north pole is "white", it has no way of knowing what "white", or "bear", or "north" actually represent.

This is a preposterous claim that you could easily disprove within a few minutes of using it.

GaggiX 3 years ago | |

>it has no way of knowing what "white", or "bear", or "north" actually represent.

What does it mean to know what "white", "bear" or "north" actually represent?

georgehill 3 years ago |

> Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.

I don't know why, but my brain refuses to accept GPT-4 as something close to AGI. Maybe I am wrong. It is hard to believe that our brain is just a bunch of attention layers and neural nets.

nycdatasci 3 years ago |

Can we agree that an "early (yet still incomplete) version of AGI" isn't AGI?

ftxbro 3 years ago | |

To me it's clear that they think it's AGI, and that they think AGI is such a loaded concept that it's not worth "announcing" it.

Beliefs are changing so fast right now. The term "AGI skeptic" will soon (if not already) mean "I don't trust AGIs in positions of authority or power" rather than "I don't think the technology is capable of matching our level of cognition."

famouswaffles 3 years ago | | |

If you think AGI is artificial, and generally intelligent then yeah it's AGI 100% but some people have such loaded expectations of AGI that a significant chunk of the human population wouldn't even pass lol.

dragonwriter 3 years ago | |

I hope we can agree that “not completely X” is “not X”.

coldtea 3 years ago | | |

Why would we agree on that? Anything "not completely X" surely the attribute of aproaching Xness, else it would just be "not X".

pixl97 3 years ago | | |

So if you get your finger cut off, or you go blind, you're not human any more?

templeosenjoyer 3 years ago |

Unless they somehow cured GPT-3's schizophrenia and this model is a significant upgrade I'm not buying it - no matter how good it is at proving trivial mathematics theorems in the style of Eliot or whoever. Too often I have dealt with "The answer to your question is X. Oh, sorry, you are right, the answer is actually Y. Oh, it is good of you to ask for a proof, sure I can prove the answer is Y, I used this (hallucinated) method described in this (hallucinated) paper. Oh, sorry, you are right, I cannot find any evidence that the method and paper I mentioned earlier actually exist, oops!".

ilitirit 3 years ago |

GPT AI systems remind me of Chinese Room thought experiment:

https://en.wikipedia.org/wiki/Chinese_room

This is also similar to the Duck Test:

https://en.wikipedia.org/wiki/Duck_test

Depending on the context, there are generally two takes: "It is (or is not) a duck", and "It doesn't (or does) matter whether or not it's a duck". These aren't mutually exclusive.

bumbledraven 3 years ago |

While [pre-GPT-4 ChatGPT] performs non-trivially on both tasks, there is no comparison with the outputs from GPT-4. These preliminary observations will repeat themselves throughout the paper, on a great variety of tasks. The combination of the generality of GPT-4’s capabilities, with numerous abilities spanning a broad swath of domains, and its performance on a wide spectrum of tasks at or beyond human-level, makes us comfortable with saying that GPT-4 is a significant step towards AGI.

satoshiiii 3 years ago |

If they remove the guardrails, then we can truly assess its intelligence. Currently, humans are directly interfering with a certain aspect of it. If it can provide a response without Microsoft's stock being affected by removing these human-imposed limitations, then I would be genuinely impressed.

blarg1 3 years ago |

All this GPT stuff feels reminiscent of Frank Herbert's novel: Destination Void ...

onos 3 years ago |

“ We note however that there is no single definition of AGI that is broadly accepted, and we discuss other definitions in the conclusion section.”

We know it can do a lot of cool stuff, but without a pinned down definition the headline here is useless.

mhh__ 3 years ago | |

The definition will be narrowed as computational capabilities expand.

wildermuthn 3 years ago |

It’s already smarter than 50% of us, and more knowledgeable than 99% of us. It no longer matters what label we give it, and we’re only a few years away from it giving labels to us.

namaria 3 years ago |

Well it took us just about 65 years and a couple of AI winters to get convincing NLP going. And it takes about 1 TB of RAM...

So either AGI is around the corner or a generation away. Same as positive yield fusion reactors?

hallqv 3 years ago |

Been pair-coding with gpt4 for the last week, it's definitely AGI..

atleastoptimal 3 years ago | |

No it's a Chinese room but instead of Chinese it's stack overflow snippets

hallqv 3 years ago | | |

So what? If it writes novels like an AGI, codes like an AGI and explains complex topics like an AGI, then it's probably an AGI...

number6 3 years ago | |

How do you pair program with ChatGPT?

hallqv 3 years ago | | |

Depends on the task but some combination of asking it for skeleton code for new tasks and sending it my written code or error messages and asking for corrections or potential solutions.

It's very effective, if you are atleast semi-new to technology you are using it will explain and teach you things you didn't know before, and if you know the tech by heart it saves you from having to type it out.

For example, yesterday I had to make a custom container with some pretty involved dependiencies that also had to be be runnable on AWS Lambda (which I haven't used much before), me and gpt4 went back and forth with Dockerfile code and error messages for a few hours and then it ran like charm. Would probably have taken me 1-2 days of regular coding and googling otherwise.

goatlover 3 years ago | |

Why not let it do all the work then?

hallqv 3 years ago | | |

It's not perfect, ofcourse.

mplewis 3 years ago |

No it doesn’t.

undert0wn 3 years ago |

Good find. I am reading through this now.

ftxbro 3 years ago |

Some will say it's as important as the internet or mobile, but they're wrong. This is like the discovery of fire or the invention of language.

bluehorseray 3 years ago | |

- "ftxbro"

ftxbro 3 years ago | | |

After lurking I made this account only to post a joking-not-joking explanation of why Alameda had the weirdly specific credit limit $65,355,999,994 with FTX and why I thought it could be a funny off-by-almost-1000x bug/typo/mishap https://news.ycombinator.com/item?id=34473811 but I think almost no one read my comment because I posted it so late after the thread had scrolled off the front page :(

number6 3 years ago |

I always thought General Intelligence would be Achieved by IBM or at least Apple, not by Microsoft. Now it will be used to pressure us into Windows Upgrades...

hallqv 3 years ago | |

What rock have you been living under for the last decade if you thought IBM would solve AGI? Watson was a complete disaster and they have zero AI talent in the company.

number6 3 years ago | | |

Oh, this was a reference to 2001: A Space Odyssey

Rapzid 3 years ago | |

Visual Studio and Azure AD.

irrational 3 years ago | |

Rename ChatGPT to Clippy.

antibasilisk 3 years ago | | |

I wouldn't be surprised if they actually brought back Clippy as a character now that the technology's improved

taspeotis 3 years ago | | |

CliPT

bumbledraven 3 years ago |

GPT-4's proof of the infinitude of primes, as quoted in the paper, is wrong.

    Yes, I think I can,
    Though it might take a clever plan.
    I’ll start by noting Euclid’s proof,
    Which shows that primes aren’t just aloof.
    Assume we have a finite list,
    Of primes, and that none have been missed.
    Multiply them all together,
    And add one, just to be clever.
    The result will be a number,
    That has no prime factors, oh wonder!

Not true. The result may itself be prime. For example, say your finite list of primes is {2,3}. Multiplying them together and adding one yields 2*3 + 1 = 6 + 1 = 7, which is itself prime. Hence it has a prime factor: 7.

    But every number has a prime,
    That divides it, it’s just a matter of time.
    So we’ve found a contradiction,
    And our finite list needs eviction.
    There must be infinitely many primes,
    And that’s the end of my rhyming lines.

A correct verison of the proof might say that the result has a prime factor, however, that prime factor is not an element of the supposed finite list, hence the list is incomplete.

sebzim4500 3 years ago | |

You missed a line

> Assume we have a finite list, > Of primes, and that none have been missed.

It's assuming that the finite list contains all primes and then noting that you can construct a new number which has no prime factors, which is a contradiction.

pama 3 years ago | |

It was a valid proof by contradiction. If you had a finite list of primes, then you end up constructing a number that has no prime factors.

Nevermark 3 years ago | |

What does it mean if in demonstrating a potential artificial GI can’t understand a proof, a biological GI actually demonstrates they don’t understand the proof.

Joking aside … the approach of dismissing generality of intelligence based on the presence of mistakes seems to be flawed.

shmed 3 years ago | |

You literally just proved it right