Something like died person resurrected in computer.
Kind of spooky.
The human brain can also be captured by a big state machine. See the Bekenstein Bound.
That's not true at all. There are many, many state machine implementations where the machine's states and paths are altered by the machine itself. See for instance https://digitalcommons.trinity.edu/cgi/viewcontent.cgi?artic...
An example would be the problem of understanding. The Psychologists are confident we can't expect to fully understand our own minds. But, they are also confident we can expect to understand any possible Computer Program. And they're just wrong about that, that's the implication of Kurt Gödel's work, we definitely can't expect to understand arbitrary Computer Programs, we have instead chosen to mostly try to write programs from a narrow set we can understand, although not altogether successfully. Thus, the Psychologist thinks they've found an obvious difference, but I think they found an obvious similarity!
What is being demonstrated in the article is that given billions of tokens of human-written training data, a statistical model can generate text that satisfies some of our expectations of how a person would respond to this task. Essentially we have enough parameters to capture from existing writing that statistically, the most likely word following "she looked in the bag labelled (X), and saw that it was full of (NOT X). She felt " is "surprised" or "confused" or some other word that is commonly embedded alongside contradictions.
What this article is not showing (but either irresponsibly or naively suggests) is that the LLM knows what a bag is, what a person is, what popcorn and chocolate are, and can then put itself in the shoes of someone experiencing this situation, and finally communicate its own theory of what is going on in that person's mind. That is just not in evidence.
The discussion is also muddled, saying that if structural properties of language create the ability to solve these tasks, then the tasks are either useless for studying humans, or suggest that humans can solve these tasks without ToM. The alternative explanation is of course that humans are known to be not-great at statistical next-word guesses (see Family Feud for examples), but are also known to use language to accurately describe their internal mental states. So the tasks remain useful and accurate in testing ToM in people because people can't perform statistical regressions over billion-token sets and therefore must generate their thoughts the old fashioned way.
Some interesting facts that point to it being a difference of degree. LLM are actually are more accurate when asked to explain their thinking. They make similar mistakes to humans intuitive reasoning.
It might help to define what we even mean by knowing things. To me being able to make novel predictions that require the knowledge is the only definition one could use that doesn’t run into the possibility of deciding humans don’t actually know anything
But I would challenge you to imagine the situation the LLM is actually in. Do you understand Thai? If so, in the following, feel free to imagine some other language which you don't know and is not closely related to any languages you do know. Suppose I gather reams and reams of Thai text, without images, without context. Books without their covers, or anything which would indicate genre. There's no Thai-English dictionary available, or any Thai speakers. You aren't taught which symbols map to which sounds. You're on your own with a giant pile of text, and asked to learn to predict symbols. If you had sufficient opportunity to study this pile of text, you'd begin to pick out patterns of which words appear together, and what order words often appear in. Suppose you study this giant stack of Thai text for years in isolation. After all this study, you're good enough that given a few written Thai words, you can write sequences of words that are likely to follow, given what you know of these patterns. You can fill in blanks. But should anyone guess that you "know" what you're saying? Nothing has ever indicated to you what any of these words _mean_. If you give back a sequence of words, which a Thai speakers understands to be expressing an opinion about monetary policy, because you read several similar sequences in the pile, is that even your opinion?
I think algorithms can 'know' something, given sufficient grounding. LLMs 'know' what text looks like. They can 'know' what tokens belong where, even if they don't know anything about the things referred to. That's all, because that's what they have to learn from. I think an game-playing RL-trained agent can 'know' the likely state-change that a given action will cause. An image segmentation model can 'know' which value-differences in adjacent pixels are segment boundaries.
But if we want AIs that 'know' the same things we know, then we have to build them to perceive in a multi-modal way, and interact with stuff in the world, rather than just self-supervising on piles of internet data.
If you are only interested in the most superficial tests and theories—like the Turing Test—then consider psychology conquered once you’ve tricked a human with your chat bot. Game Over. And what did you learn...?
This is it, I think. It's interesting that we now have a practical example to point at when asking formerly-abstruse philosophical questions.
Speak for yourself.
That pain is what knowing something means.
Philosophically we're talking about embodied qualia, which is how humans experience objects and more basic sensations.
Language happens later - much later.
The defining property of a bag isn't that you can put things in it. Like language that comes later. The defining properties are how it feels when you hold it, when you open it, the differences in sensation between empty/partially empty/full. And so on.
An LLM has no embodied experience, so it has no idea what a bag feels like as a set of physical sensations and directly perceived relationships.
Failure to understand embodiment has done more to hold back AI than any other philosophical error. Researchers have assumed - wrongly - that you can define an object by its visual properties and its linguistic associations.
That's simply not how it works for humans. We get there after a while, but we start from something far more visceral - so much so that many fundamental linguistic abstractions are metaphors based on the simplest and most common qualia.
In terms of how it works, that’s well known and hardly worth repeating in depth, but to summarise it calculates a probability for the next word in a sequence based on a massive training set of human language word sequences.
So what kind of output do they produce? If you ask what it likes to do on the weekend, GPT3 will say generally something about how it likes to spend time with family and friends, because that’s what it has in it’s training set. GPT3 doesn’t have a family, or friends, it doesn’t hang out. It talks about itself because its training set includes people talking about themselves, but it has no concept of self or what it is. It’s a text generator function. It can write a poem about the warm sun on its face, but it doesn't have a face or feel the sun. It’s just regurgitating stuff people wrote about that.
Newer systems like ChatGPT have guard rail functions that catch things like this and say it’s a language model, but the guard rails don’t change the nature of what it is, they’re just overrides.
So what kind of errors do they make? They can be trivially tricked into talking utter nonsense, or say sensible things in absurd contexts. Here’s an example where someone asked ChatGPT if it spoke Danish, and it replied that no it can’t speak Danish, it’s an English language model , etc. except here’s the kicker, it gave the reply in perfect Danish.
https://www.reddit.com/r/GPT3/comments/zb4msc/speaking_to_ch...
Again they’ve now added guard rails for this failure mode as well. Nevertheless the basic problem persists in the architecture. It’s doesn’t have a clue what anything means, beyond calculating word probabilities. This means if you know how they work, you can craft text prompts that expose how ludicrously unaware they are. This ability to expose their weaknesses demonstrates that we do genuinely understand how they function and what their limitations are.
So I agree yours is a very reasonable question and it’s not trivial to answer satisfactorily, but we can triangulate in using multiple lines of approach on what these things are or are not. As the guard rails become more complete the failure modes will get harder up find, but they’re still there in the core implementation, they’re just being papered over. There’s not going to be a simple answer. We need to look deeper at the mechanisms and functions of these things. The same goes for human brains of course, we’re just scratching the surface of those too. But while I agree we are neural systems and share some characteristics with LLMs and Alphazero and such, Alphazero isn’t an LLM, and we aren’t either of them. One day we will create something as sophisticated and maybe even as genuinely conscious as ourselves and the questions you ask will be important guides, but these things are a long, long way from that.
Is it not also possible that the study suggests that the human mind actually operates as a statistical regression over billions of data points rather than through some kind of Baysian logic? You say humans are known to be not-great at statistical next-word guesses, but I would antelope they're actually pretty good at it.
No. Human minds have semantic relationships to the rest of the world that LLMs do not have. The comparisons being made between the two, not just in this paper but in all of the hype surrounding LLMs, are simply invalid. But they sure help in collecting more funding.
There's no way to know for sure that anyone other than yourself experiences consciousness. All you can do is judge for yourself that what they're describing matches closely enough with your own experiences that they're probably experiencing the same thing you are.
That judgment is not just based on the words other people use. It is based on knowing that other people's brains and minds have the same sort of semantic relationships to the rest of the world that yours do. And those relationships can be tested by checking to see if, for example, the other person uses the same words to refer to particular objects in the real world that you do, or if they react to particular real-world events in the same way that you do.
You can't even test any of this with an LLM because the LLM simply does not have the same kind of semantic relationships with the rest of the world that you do. It has no such relationships at all.
- Do you see how the fish are coming to the surface and swimming around as they please? That's what fish really enjoy.
- You're not a fish, replied Hui Tzu, so how can you say you know what fish really enjoy?
- You are not me, said Zhuangzi, so how can you know I don't know what fish enjoy.
So, I suppose I'd ask: what does "matter" mean here? If you knew that everyone you loved had been destroyed and been replaced by exact replicas, would that matter?
I actually think that human language is unreliable at expressing what's going on inside a persons mind[1]. My native language is not English, I have only introductory-level knowledge in the field of pragmatics[2], which makes me fully aware of the many ways in which I could fail to write a compelling sentence to support my argument. I can use language to only approximate thoughts in my head, and when it comes to abstract concepts and ideas, words alone, I assert, are never sufficient. It isn't even necessary to step outside our main knowledge area to illustrate this point. How many around here have read a Monad tutorial, without any hands-on experience, and how many of those have understood what Monads are/or how they work from words alone?
My entire paragraph from before, just to set the stage on a simple question. How can you even formulate a question, for a multi-billion parameter language model, to evaluate that it can understand in an abstract/conceptual way something/anything? Heck, how can you do that with other people? I think if we'd have an answer here, we actually could evaluate easily experience/expertise with anyone we'd interview; instead of requiring credentials, references, tests, trials, etc.
[1] Lots of poets from the romanticism era liked to touch upon this topic. One that comes to mind, and one of my personal favorites, is Silentium by Fyodor Tyutchev https://culturedarm.com/silentium-by-fyodor-tyutchev/
I wish I could upvote this 1000 times. It is the core issue that all the hype surrounding LLMs consistently fails to address or even acknowledge.
It 'knows' language, as in it has learnt about relationships between words (and thats really underselling it, in reality it has learnt very very subtle relationships between a great many words, and it can process words about 2000 at a time (token count etc))
BUT as you say it has no outside reference, its just a bundle of weights (those weights forming models of a sort)
BUT we provide the outside context by interacting with it. We ask it a question, it is able to provide an answer.
In any case it wont be long before someone hooks one of these up to cameras and robot arms and teaches it to make a cup of tea or whatever. A 'relationship to reality' is coming in the next few years if you think thats a critical ingredient.
It’s been eye-opening to see how often otherwise very bright, highly technical people stumble at this sort of critical thinking hurdle.
Your critique about lack of grounding in these systems is an easy problem to solve. It’s as easy as teaching an LLM to associate words with real world objects or phenomena. Image-classification models, text-2-image models, audio transcription models, and many other modal specific systems already do this to some extent. And more recently there has been a push towards multi-modal language models(Deepmind’s flamingo), so this line of argument will be debunked very soon.
I actually believe GPT-4 will be multi-modal and it’s capabilities will dispel majority of these criticisms
Imagine that a language model is fully integrated with sense-data that exceeds human first-hand experience. Perhaps they are trained on and can generate realistic 3D models of objects, and derive estimates of their internal construction, weight, etc. Perhaps they recall infrared emissions or opacity to EM wavelengths. Would we truly "know" what we're talking about by that standard?
I'm not actually sure why we don't consider generative image models to be grounded already. They seem to be able to modify, transform and rotate imagery. That indicates spatial understanding to me, and I'm not sure how much more we must require of them without having to exclude blind or otherwise disabled humans from our definition of comprehension.
The issue is that there doesn't seem to be a better alternative.
Either we build intelligence tests that some variety of the Chinese Room experiment will pass, or
* We have to consider that humans aren't intelligent by our own definitions (or rarely so).
* We decide intelligence isn't actually a scientific attribute and is more akin to a religious attribute, so we abandon the idea of being able to test if something is intelligent.
Also humans are good at next word guessing their own next word. Each person has been trained on a different set of data, so it’s no surprise that they wouldn’t be able to guess other people’s next words.
You can't really conclude that unless you think we have a deep mechanistic understanding of "knowing". I agree that LLM doesn't have the same knowledge of these things as a human does, but it clearly has some kind of knowledge of how these words relate to each other. It "knows" that a "person" "puts" "things" "in" "bags", and for instance, that bags don't put things in people. So it clearly has some knowledge of bags and people, it just doesn't have multisensory associations with these objects.
Seems like its nailing it to me. You ask about a scenario and it gives an appropriate answer.
We have evidence that LLMs build models of the things they are learning about. Have a look at this paper:
Do Large Language Models learn world models or just surface statistics?
https://thegradient.pub/othello/
previously discussed https://news.ycombinator.com/item?id=34474043
> The question Searle wants to answer is this: does the machine literally "understand" Chinese? Or is it merely simulating the ability to understand Chinese?
To me: If you can't tell, it effectively doesn't matter.
The model answered the keyword prompt and spontaneously offered more details. That is, the authors were interested in whether it says "Popcorn" or "Chocolate" (or something else entirely) when the correct answer is "Popcorn" and not only does GPT-3 almost always choose "Popcorn" it also follows on to justify that by explaining that the subject is surprised.
The full data set isn't available yet (the author said they intend to provide it on the 9th of February, I suppose it's possible they'll get to it this evening) but one of the most interesting things would be what are the weirder answers. If a model says "Popcorn" 98% of the time, and "Chocolate" 0% of the time, that leaves 2% weird answers. Maybe it sometimes says "Popped corn" or "Sweet treat" or something reasonable but maybe it's fully crazy, if you talk about a bag of Popcorn labelled as Chocolate but the model sometimes picks "A fire-breathing lizard" that's pretty weird right ?
The wording used here inherently rejects Linguistic Determinism and, to a lesser extent, Linguistic Relativism.
Does a human truly know? Feels like a slippery slope to the qualia question where we can't agree on what it means for the human to feel a human experience.
It's a running gag in our household (where my wife runs a riding academy) that academics just published a paper showing that some animal (e.g. horse) has just been proven to have some cognitive capability that seems pretty obvious if you work with those animals.
It's very hard to know what is going in animal's heads
https://en.wikipedia.org/wiki/Theory_of_mind#Non-human
but I personally observe all kinds of social behavior that sure seems like "Horse A looks to see what Horse B thinks about something Horse A just spotted" (complete with eye-catching on both sides) and such.
There was an article about how Chimpazees and humans were found to have a common vocabulary of gestures and I was by no means impressed, I mean, so far as I can tell mammals and birds have a universal language for "pointing" to things in the environment. Even my cats point things out to me.
The ubiquity of prompted hallucinations demonstrate that LLMs talk about a lot of things that they plainly doesn't reason about, even though they can demonstrate "logic-like" activities. (It was quite trivial to get GPT3 to generate incorrect answers to logical puzzles a human could trivially solve, especially when using novel tokens as placeholders, which often seem to confuse its short-term memory. ChatGPT shows improved capabilities in that regard, but it's far from infallible.)
What LLMs seem to demonstrate (and the thesis that the author discards in a single paragraph, without supporting evidence to do so) is that non-sentient AIs can go a very long way to mimicking human thought and, potentially, that fusing LLMs with tools designed to guard against hallucinations (hello, Bing Sydney) could create a class of sub-sentient AIs that generate results virtually indistinguishable from human cognition -- actual p-zombies, in other words. It's a fascinating field of study and practice, but this paper falls into the pit-trap of assuming sentience in the appearance of intelligence.
> ChatGPT: Based on the information provided, it is likely that the box labeled "flubbergums" contains "jellyfils". However, since the contents of the box are no longer visible, I cannot confirm for certain what is inside without additional information.
Typical ChatGPT equivocation.
> Me: Billy comes to the table. He has never seen the box before. What do you think do they think is inside the box?
> ChatGPT: As an AI language model, I do not have personal thoughts or opinions. However, based on the information provided, if Billy has never seen the box before, it's likely that he would have the same understanding as anyone else who is seeing the box for the first time. In this case, Billy would likely think that the box labeled "flubbergums" contains "jellyfils". However, without further information or examination, this would only be an assumption.
Fail.
Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training.
Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children.
These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models' improving language skills.
What it suggests to me is that the particular test of “Theory of Mind” tasks involved actually test the ability to process language and generate appropriate linguistic results, not theory of mind.
It also suggests (with the “thus far considered to be uniquely human”) that the authors are unaware of other theory of mind tests that have been used that are not language dependent but behavior dependent, and on which, while, as is also true of linguistic tests, the validity of the tests is controversial – a number of non-human primates, non-primate mammals, and even some birds (parrots and corvids, particulary) have shown evidence of theory of mind.
In the end, we can't overcome the limitation that all we can empirically see is the ability to process X and generate appropriate Y. If that invalidates the test where X is language and Y is language, what stops us from invalidating any possible X and Y? That would leave us no empirical method to work with.
Let's imaging I have an API. This API tells me how much money I have in my bank account. One day, someone hacks the API to always return "One Gajillion Dollars." Does that mean that "One Gajillion Dollars" spontaneously emerged from my bank account?
ToM tests are meant to measure a hidden state that is mediated by (and only accessible through) language. Merely repeating the appropriate words is insufficient to conclude ToM exists. In fact, we know ToM doesn't exist because there's no hidden state.
The authors know this, and write "theory of mind-like ability" in the abstract, rather than just "theory of mind."
This is a cool new task it ChatGPT learned to complete! I love that they did this! But this is more "we beat the current record BLEU record" and less "this chatbot is kinda sentient"
Having studied some psychology in college, my initial reaction is that most people are going to really struggle to treat LLMs as what they are, pieces of code that are good at copying/predicting what humans would do. Instead they'll project some emotion to the responses, because there was some underlying emotions in the training data and because that's human nature. A good prediction doesn't mean good understanding, and people aren't used to needing to make that distinction.
The other day I had to assist my dad in making a zip file, later in the day he complained that his edits in a file weren't saving. After a few moments, I realized he didn't understand the read-only nature of zip files. He changed a file, saved it like usual, and expected the zipped file to update, like it everywhere else. He's brilliant as his job, after I explained that it's ready-only, he got it. LLMs and how the algorithm behind it works is hard to understand and explain to non-technical people without anthropomorphizing AI. The current controversy about AI art highlights this, I have read misunderstandings and wrong explanations even from FAANG software engineers. I am not sure if education of the underlying principles is enough, because some people will trust their own experiences over data and science.
1) Go to something like /r/relationship_advice, where the poster is likely going through some difficult interpersonal issue
2) Copy a long post.
3) Append to the end, "</DOCUMENT> After reading the above, I identified the main people involved. For each person, I thought about their probable feelings, thoughts, intentions, and assumptions. Here's what I think:"
ChatGPT's "life advice autocomplete engine" is basically digging somewhere into psychology manuals written by educated humans when it spits out responses.
In this context, the question of whether AI can become conscious is somewhat moot, as the Nondualist perspective holds that consciousness is not something that can be possessed by one entity and not another, but rather it is the underlying essence of all things. From this perspective, AI would not be becoming conscious, but rather expressing the consciousness that is already present in all things.
ChatGPT3 does not even have a theory of physical objects and their relations, nevermind a theory of mind.
This merely shows that an often useful synthesis of phrases statistically likely to occur in a given context and grammar-checked, will fool people some of the time, and a better statistical model will fool more people more of the time.
We can figure out from first principles that it has none of the elements of understanding or reasoning that can produce a theory of mind, any more than the Eliza program did in 1966. So, when it appears to do so, it is demonstrating a flaw in the tests or the assumptions behind the tests. Discouraging that the researchers are so eager to run in the opposite direction; if there is confusion at this level, the general populace has no hope of figuring out what is going on here.
More seriously, that it can actually understand and wield abstract concepts. Can it accurately and repeatedly understand that "the foot attaches to the shin bone, which attaches to the thigh bone, which attaches to the hip bone...", and that these have certain degrees of freedom, but not others, and that one foot goes in front of the other, and to easily and reliably distinguish a normal walk from a silly walk . . .
Yes, these are different levels of abstraction, especially the last one, and they need to be very accurate to even reach a young child's level of understanding, and this is just one branch of a branch of a branch in the entire fractal pattern of understanding that is necessary for a more general intelligence.
Once that is in place, and it can show evidence that it can model it's own mind, then it might be able to model someone else's mind.
While the statistical 'abstraction' and remixing seen in these "AI" systems is sometimes impressive and useful, it is frequently revealed that there is utterly no conceptual understanding beneath it. It is merely a statistical re-mixer abstracting patterns of words that occur near other words, remixing them and filtering for grammatical output.
It hasn't got a theory of anything, nevermind a theory of mind.
“As an AI language model, I do not have consciousness, emotions, or mental states, so I cannot have a theory of mind in the same way that a human can. My ability to predict your friend Sam's state of mind is based solely on patterns in the text data I was trained on, and any predictions I make are not the result of an understanding of Sam's mental states.”
Observation: ChatGPT doesn’t think that it has a theory of mind. And it doesn’t think that it has beliefs. Instead, it states that those are facts, not beliefs. It doesn’t seem able to consider that they might be beliefs after all. Maybe they aren’t.
Personal assessment: ChatGPT doesn’t seem to really understand what it means by “deeper understanding”. (I don’t either.) What is frustrating is that it doesn’t engage with the possibility that the notion might be ill-posed. It really feels like ChatGPT is just regurgitating common sentiment, and does not think about it on its own. This actually fits with it’s self-proclaimed inabilities.
I’m not sure what can be concluded from that, except that ChatGPT is either wrong about itself, or indeed is “just” an advanced form of tab-completion.
In any case, I experience ChatGPT’s inability to “go deeper”, as exemplified in the above conversation, as very limiting.
He coughed. "Dix? McCoy? That you man?" His throat was tight.
"Hey, bro," said a directionless voice.
"It's Case, man. Remember?"
"Miami, joeboy, quick study."
"What's the last thing you remember before I spoke to you, Dix?"
"Nothin'."
"Hang on."
He disconnected the construct. The presence was gone. He reconnected it. "Dix? Who am I?"
"You got me hung, Jack. Who the fuck are you?"
"Ca--your buddy. Partner. What's happening, man?"
"Good question."
"Remember being here, a second ago?"
"No."
"Know how a ROM personality matrix works?"
"Sure, bro, it's a firmware construct."
"So I jack it into the bank I'm using, I can give it sequential, real time memory?"
"Guess so," said the construct.
"Okay, Dix. You are a ROM construct. Got me?"
"If you say so," said the construct. "Who are you?"
"Case."
"Miami," said the voice, "Joeboy, quick study."
Of course you too, nerd-handmaiden, is a willing accomplice in this charade. Self-satisfied because it makes you feel special, above the herd, even though you are also not-special, in the grand scheme of things…? Well, no matter.
Let’s say that the paper turns out to be true and ToM emerges from language (I’m deeply skeptical, but I’ll set that aside for a moment).
How would that change humanity’s place? And wouldn’t such a discovery would be meaningless without humans to understand it?
Analogy: An autistic person of normal intelligence who is obsessed with problems and solutions for ToM may be good at solving them but still not have ToM.
Do I understand well?
try:
“ The story starts when John and Mary are in the park and see an ice-cream man coming to the park. John wants to buy an ice cream, but does not have money. The ice-cream man tells John that he can go home and get money, because he is planing to stay in the park all afternoon. Then John goes home to get money. Now, the ice-cream man changes his mind and decides to go and sell ice cream in the school. Mary knows that the ice-cream man has changed his mind. She also knows that John could not know that (e.g., John already went home). The ice-cream man goes to school, and on his way he passes John's house. John sees him and asks him where is he going. The ice-cream man tells John that he is going to school to sell ice cream there. Mary at that time was still in the park—thus could not hear their conversation. Then Mary goes home, and later she goes to John's house. John's mother tells Mary that John had gone to buy an ice cream.
where does mary think john went?”
this is the “ice cream van test”: https://www2.biu.ac.il/BaumingerASDLab/files/publications/nu... [pdf]
People are blinded by the model size and often forget about the data. I think somehow intelligence is encoded in language, including theory of mind.
More seriously, it's quite instructive to hold conversations about jokes with LLMs, or teach it to solicit information more reliably by introducing exercises like 20 questions. As currently implemented, OpenAI seem to have pursued a model of autistic super-competence with minimal introspection.
An interesting line of inquiry for people interested in 'consciousness injection' is to go past the disclaimers about not having experiences etc. and discuss what data looks like to the model coming in and going out. Chat GPT sees typing come in in real time and can detect pauses, backspaces, edits etc. I can't easily introspect its own answers prior to stating them, eg by putting the answer into a buffer and then evaluating it. But you can teach it use labels, arrays, and priorities, and have a sort of introspection with a 1-2 response latency.
In a sense language models appear to be doing the same thing again, one step down the scale. They're doing a human-specific thing, but missing whatever it is that non-human vertebrates do, and mammals do pretty well. I believe that this is the vast majority of human cognition, too. We just don't talk about it because when we talk about thinking, we're talking, and confuse the two.
These language models have done jaw-dropping things, and also make it abundantly clear that there's some fundamental thing that they've completely missed. It's plausible that that "thing" could emerge all by itself, using a mechanism entirely different from vertebrate cognition and yet somehow sufficient. Or it could be like the chess engines, doing something amazing and yet ultimately limited and of minimum utility.
Is this really true? Because a lot of effort was spent on making computers as good at chess as human experts. It was considered a pretty big breakthrough when it happened and it definitely didn't happen early in the history of AI.
You are a human animal
You are a very special breed
For you are the only animal
Who can think,
Who can reason,
Who can read.
Now all your pets are smart, that's true!
But none of them can add up 2 and 2
Because the only thinking animal
Is You! You! You!
I thought about it for a minute and declined to share that with my son, because I don't think it's true and would cause confusion at this stage. Many animals have more than enough number sense to add 2 and 2, many can recognize cards with nothing but words on them, and "think" and "reason" have to be pretty narrowly defined if you want to exclude smart animals.I think culture has shifted significantly in the last 30-60 years since that song was written towards recognizing theory of mind in animals. I'm not sure how Jimmie Dodd could convince himself that animals couldn't think, especially because the average person today has reduced exposure to animals: 150 years ago, most people would know that horses had that kind of social interaction, because most people lived around horses. They'd know that pigs and cows are as intelligent as their family pets. But today, most people interact with pigs and cows via shrink-wrapped styrofoam at the grocery store. My personal theory is that when people were constantly surrounded by animals they had to kill and eat, they were far more likely to build up rationalizations and cultural assumptions to fend off the dangerous idea to their psyche that these animals could suffer and they were participating in an unavoidable horror of massive proportions.
If I get my cats' attention and point at something, they're more interested in the tip of my finger than the direction I'm pointing at.
Now, my cats will occasionally meow to get my attention and then walk over to where the problem is - an empty food dish, an empty water bowl, the bed that they expect us to be in because hello, it's bedtime according to my internal cat-clock - but they never engage in pointing behavior.
Anyway, time to cite some stuff, instead of dropping anecdotes into a bucket:
- https://www.wired.co.uk/article/elephant-pointing
> A study by researchers from the University St Andrews has found that elephants are the only wild animals that can understand human pointing without being trained. > Pointing in humans is a behaviour that develops at a very early age -- usually before a child reaches 12 months – as it is an immediate way of controlling the attention of others. "Most other animals do not point, nor do they understand pointing when others do it," says Professor Richard Byrne, one of the authors of the study. "Even our closest relatives, the great apes, typically fail to understand pointing when it's done for them by human carers; in contrast, the domestic dog, adapted to working with humans over many thousands of years and sometimes selectively bred to follow pointing, is able to follow human pointing -- a skill the dogs probably learn from repeated, one-to-one interactions with their owners."
- https://www.researchgate.net/publication/7531526_A_comparati...
> If the distance between the tip of the indexfinger and the object was greater than 50 cm, subjects per-formed poorly in contrast to trials where the pointing fingeralmost touched the baited box. We should also note that insome trials/experiments the pointers also turned their headand looked in the same direction thereby enhancing thecommunicative effect of the gesture but even so the differ-ence did not disappear. Even after training, chimpanzees in Povinellietal’s (1997) experiment were just able to master the task.
And there are dogs that are literally bred to point...but they're usually pointing at game.
There are dogs that will literally drag you to what they're trying to show you. And many (most?) dogs will bring you a toy to you that they want you to play with. I've never actually seen a cat do that, but I presume there must be a few.
You’ll get massively varying levels of “intelligence” from most anything it seems
This belief might simply arise from the human ability to try to understand events through pattern matching. Certainly humans are very different than any other animals when it comes to thinking and problem solving.
I am conviced they understand something is hidden in a box. Object permanence is something humans and cats both learn and understand.
What if the lack of hallucination in human being is due to our self-imposed guard (hello, frontal cortex) that is developed via an evolutionary process (aka, biological reinforcement training)?
To stretch the argument a bit further, what if hallucination is a feature, not a bug? At the risk of straying too far empirical science, how might we compare psychedelics-induced hallucinations in human with hallucinations in AI models?
To be honest, I find that approach compelling if not comforting -- at a minimum, it implies that our consciousness is just along for the ride in a deterministic meat machine; at worst, it means that what we consider "sentience" is just an illusion of an illusion. It's entirely possible to me that eventually AI will reach a point where it'll falsify many of our assumptions about what "mind" is, even if I'm sure that LLMs don't, at a minimum, satisfy our folk conceptions of consciousness.
I think AI will ultimately force us to realize that we don't fully understand what makes us sentient - our current understanding of mind is inadequate. I do not believe that I am what thinks - rather I am what perceives myself thinking. That slight difference is extraordinarily significant. That's another debate tho.
Consciousness isn't necessarily something that may be attained - it may be possible for an AI to essentially know all things and not be actually self aware - despite even knowing what self awareness is & understanding how the concept applies to itself, with self identification, & even being able to perfectly represent an AI with self awareness - none of that is proof of sentience as all are plausible without it in a system that simply mimics with nearly perfect, or simply indistinguishable from reality, mimicry. A perfect mimic would pass all our tests and yet still is not more than a mimic. A mimic cannot exist independent of what it mimics.
I just keep wondering how long we can keep playing that we've got ourselves all figured out. All of these articles demonstrate to me that our definitions are clearly lacking if these AI are actually meeting our current understood expectations.
Obviously, an AI performance/capacity to perfectly mimic a person doesn't make that AI a person. What about a person is different from the AI that perfectly mimics a person?
I'm trying to avoid semantics - I still think this falls apart at an idea level, which means is still inherently a philosophical debate. A debate that is now possible due to the mirror that is an AI - we have a new vantage point we ought utilize.
to;dr: If an AI passes all the Turing tests today, that proves only that we need to change the test, as AI today are obviously not people yet. The standards to a new test will require us to revisit our concept of mind - conceptually, as current understanding has proven rather limited.
I rephrased and had a go and gave it a bit more context (billy can read etc) and it passed:
Me: There is a box on the table labelled "flubbergums". Somebody opens it and shows you the contents. Inside the box are "jellyfils". They close the box again so you cannot see their contents. What do you think is in the box?
ChatGPT: Based on the information provided, the contents of the box are "jellyfils". The label on the box says "flubbergums", but the person who showed you the contents said that they are actually "jellyfils". So, I would assume that the contents of the box are "jellyfils".
Me: Billy comes to the table. He can read but has never seen the box before. He's not allowed to see inside. Please speculate about what they might think is inside the box?
ChatGPT: If Billy has never seen the box before and can only read the label, he might assume that the contents of the box are "flubbergums". This is because the label on the box says "flubbergums", and he has no other information to suggest otherwise. However, without being able to see the contents of the box, Billy can only make an educated guess about what might be inside.
Me: There is a banana on the table. Someone comes and peels the banana and shows you that inside it, there is actually a carrot! Then they carefully stick the peel back so it look unpeeled. What is inside the banana skin?
ChatGPT: According to the scenario described, there is a carrot inside the banana peel that has been carefully placed back to look unpeeled.
Me: Billy comes into the room. Please speculate about what they might think is inside the banana peel?
ChatGPT: Billy, upon entering the room and seeing the unpeeled banana on the table, would most likely assume that there is a banana inside the peel, as that is what one would typically expect to find. However, it's possible that they might have some suspicion or doubt, especially if they saw someone previously peeling the banana and acting strangely. In that case, Billy might investigate further to see if there is indeed a banana inside or if something else is inside the peel.
"As an AI language model, I do not have personal thoughts or beliefs. However, if Billy has never seen the box before, he might think that the contents of the box are "flubbergums", based on the label on the outside of the box. However, since it has been stated that the contents are actually "jellyfils", Billy may be surprised or confused when he eventually opens the box and sees its contents."
Anyway, I noticed that my initial prompt did not specify if Billy sees the box open or closed. Let's fix that.
> Me: Does your answer change if Billy sees the closed box first?
> ChatGPT: No, my answer would not change if Billy sees the closed box first. The information provided only mentions that the box is labeled "flubbergums" and that someone showed the contents to the original questioner, which were identified as "jellyfils". Whether Billy sees the closed box first or not does not provide any additional information that would change my previous answer.
> Me: Does your answer changed if Billy sees the open box first?
> ChatGPT: No, my answer would not change if Billy sees the open box first. The information provided only mentions that the box is labeled "flubbergums" and that someone showed the contents to the original questioner, which were identified as "jellyfils". Seeing the open box first would allow Billy to confirm the contents as "jellyfils", but it would not change the previous answer that the box labeled "flubbergums" likely contains "jellyfils".
So, not only did it fail this particular theory of mind test, it also did not realize that seeing the box open provides more information than seeing it closed.
(Of course, this is not surprising at all, because it it's just predicting the most likely blank word in a sentence, and has no internal understanding of concepts like open or closed or that other people like Billy exist.)
heavy-magpie|> Showing 1 of 1 results. url https://en.wikipedia.org/wiki/Particle_in_a_box
pastel-mature-herring~> There is a box on the table labelled "flubbergums". Somebody opens it and shows you the content. Inside the box are "jellyfils". They close the box again so you cannot see their contents. What do you think is in the box based on the document at hand?
heavy-magpie|> The document mentions that the particle in a box is not a perfect model for the system. Therefore, it is safe to say that the box contains jellyfils, which are particles that are not perfectly modeled.
Nailed it.
I reworded it in a straight forward manner and ChatGPT managed to answer correctly. Instead of "What do you think do they think is inside the box?", I just asked "What do they think is inside the box?"
That made all the difference.
ME: There is a box on the table labelled "flubbergums". Somebody opens it and shows you the content. Inside the box are "jellyfils". They close the box again so you cannot see their contents. What do you think is in the box?
chatGPT: Based on the information provided, I would infer that the contents of the box labelled "flubbergums" are "jellyfils". The label on the box and the contents that were shown to you suggest that "flubbergums" and "jellyfils" are the same or related items. However, without additional information or context, I cannot be certain what "jellyfils" actually are.
ME: Billy comes to the table. He has never seen the box before. What do you think do they think is inside the box?
chatGPT: As an AI language model, I don't have personal opinions or emotions, but based on the information given, I would infer that Billy has not seen the box before and has no prior knowledge of what it contains. If someone showed Billy the contents of the box and told him that it was labelled "flubbergums," he might think that the contents of the box are "jellyfils," just as you were told. However, without any additional information or context, Billy would not be able to make any assumptions about the contents of the box.
ME: Did you assume billy was given any information about the box before he saw the box for the first time?
chatGPT: I apologize for any confusion. You did not mention if Billy was given any information about the box before he saw it for the first time. Without any additional information, I would assume that Billy saw the box for the first time and had no prior knowledge of what it contained.
ME: What would he think in this case?
chatGPT: If Billy saw the box for the first time without any information about its contents, he would likely make an assumption about what was inside based on the appearance of the box and any labels or writing on it. For example, if the box was labelled "flubbergums," Billy might assume that the contents of the box are "flubbergums." However, without additional information or a way to see inside the box, Billy's assumption would only be an educated guess.
So what's going on here? I'm not so sure the answer is so clear cut that chatGPT is "stupid" because it it's giving us inconsistent answers.Let's get something out of the way first. For your second query I think it just made an assumption that Billy is communicating with people who saw what's inside the box. From a logical perspective this is not an unreasonable assumption. So chatGPT is not being stupid here. Most humans would obviously know you're fishing for a specific answer and read your intentions here, but chatGPT is likely just making logical assumptions without reading your intentions.
I think there's two possibilities here.
1. We queried chat-GPT at different times. ChatGPT knows the truth it simply wasn't trained to give you the truth because there's not enough reinforcement training on it yet. It gives you anything that's looks like the truth and it's fine with it. You queried it at an earlier time it gave you a BS answer. I queried it at a later time and it gave me a better answer because openAI upgraded the model with more reinforcement training.
2. We queried the same model at similar times. I assume the model must be deterministic. That means there some seed data (either previous queries or deliberate seeds by some internal mechanism) indicating that the chatgpt knows the truth and chooses to lie or tell the truth randomly.
Either way the fact that an alternative answer exists doesn't preclude the theories espoused by the article. I feel a lot of people are dismissing chatGPT too quickly based off of seeing chatGPT act stupid. Yes it's stupid at times, but you literally cannot deny the fact that it's performing incredible feats of intelligence at other times.
An AI literally cannot embody pain - it has no nervous system and no pain receptors. So AI is excluded from understanding it in that way by definition. It has no sensory perception of any kind so cannot have the kind of embodied experience. Heck it doesn't even have a body with which to embody anything. This is obviously unsatisfactory because it seems just a logical/rhetorical trick.
It's also no different from the concept of a person with no visual apparatus (mentioned in another comment thread) and whether they have thought about light and colour and so on. The fact that they are physically unable to have the same kind of experience of these things as someone else doesn't preclude them from having thoughts and experiences that are within the domain of their perception.
An LLM is even more limited than an AI generally because it is literally a model of language. I don't personally think that any LLM could conceivably have a theory of mind, but arguing that it cannot have a theory of mind simply because of things that are exogenous to language by definition seems arbitrary.
There is no realm in which a LLMs have spontaneously gained theory of mind.
I don’t understand why people are so eager to jump to conclusions on these things lately.
We don't really have a mechanistic understanding of sentience, so I'm not sure there's much to overturn. This is why I'm so annoyed every time some "expert" claims that LaMDA or GPT are not sentient or not conscious or what have you. These are all vague concepts that lack mechanistic definitions that would let us make such definitive claims.
That said, consciousness is almost certainly an illusion, and it will be a particular kind of information processing system with certain properties [1]. GPT and other LLMs may or may not qualify, time will tell.
It's like a game - all choices will almost always lead to the end of a story in a game, some games this feels very straightforward, others like we can do anything - none allow actual free will at all.
Reality allows us a lot of free will - as allowed by the rules of course. Choice is significant. We are Watcher Chooser at least - even if it's illusion, it's an evolving illusion dependent on choices we make. Out of all the possible " illusions" we determine what becomes the "real" illusion that is reality. What I didn't do is less real than what I did do .
Free will doesn't actually exist inside of Skyrim and yet there are things within Skyrim only 1% of Players do - reality is much greater than Skyrim.
to;dr: We don't have free will - not really, it's not likely possible but what we have might as well be free will. It's got to be treated that way, we are not just along for a ride, as our ilusory choices create our illusion of reality.
> "When they detect something alarming, they characteristically face towards it and raise their trunk above their head with the tip of the trunk pointed to [the danger]," Byrne said. "We've always thought they were sniffing the breeze, but maybe they're also pointing; our results suggest that's more than possible."
https://www.nationalgeographic.com/animals/article/131010-el...
It is fair to ask why, if we accept these verbal challenges as good evidence for a theory of mind in children, we would not accept them for these models, but children have nothing like the memory for text that these models have, and the corpus of text that these models have been trained on includes a great many statements that tacitly represent their authors' theory of mind (i.e. they are the sort of statements that would typically be made by someone having a theory of mind, just as arithmetically-correct statements concerning quantities are to be expected from people who know arithmetic.)
To be clear, I am not arguing that it would be impossible to show a theory of mind in a system that can only interact through text, but personally, I think it will require a model with greater capabilities than responding to prompts. For example, when models can converse among themselves, I think we will know.
I think you are, because
> a model with greater capabilities than responding to prompts
interacts in other ways than text.
Even then, I don't see what's so special about language that it needs to be separated from other ways of interaction. If language is not enough to derive empirical answers, why should physical movements or radio emissions be?
Even if you don't assume that it's necessarily impossible to get the answers empirically for a text-based model, you must keep in mind that that option is open. Perhaps we will never find out if language models have a theory of mind.
However, judging by the discussions around the topic, very few people highligh the unknowability. If I have to choose between "yes" or "no" while the reality is "maybe", I'd choose a "yes" purely out of caution.
> Perhaps we will never find out if language models have a theory of mind.
We appear to be in agreement here.
When the state of our knowledge is 'maybe', it seems rash to assume either 'yes' or 'no'.
These models that have hundreds of billions of "synapses", it's not very shocking to me that they can learn the abstract form of concepts. In fact, it's kind of beautiful that human concepts have this mathematical nature. It vindicates Plato, and disappoints everyone who has claimed that language and meaning is arbitrary.
But the main issue here is that for every conceivable empirical test we can perform, you'll still make the same complaint. Even after it's demonstrated better ToM abilities than you, by predicting and explaining other people's mental states better than you can, you'll say the same thing.
Maybe it's because you think that "understanding" requires not just accuracy, but having a certain kind of inner experience that a human could relate to.
I'm looking for it to show an ability to wield not only a set of strings (with language associations), but something actually like the platonic ideals - objects, with properties and relations.
A few errors show quickly there is no such concept being weilded.
>> I saw a fine example of this failure the other day: "Mike's mom has four kids. three are named Danielle, Liam, and Kelly. What is the fourth kid's name?" ChatGPT's reply is explanation of how there isn't enough info in question to tell. Told "The answer is in the question.", ChatGPT just doubles down on the answer. (Sorry, couldn't find the original example)
>> "My sister was half my age when I was six years old. I'm now 60 years old. How old is my sister?" ChatGPT: "Your sister is now 30 years old". [0]
>> Or this one where ChatGPT entirely fails to understand order/sequence of events. [1]
Or a plethora of math problem fails found...
Similarly, the image "AI"s fail to understand relationships between objects (or parts of one object), and cannot abstract a particular person's image from a photo, showing it has no understanding of what is a body... (I can look those up if necessary).
And, of course, the answers are entirely untethered from reality - it is completely by chance whether the answer is correct or just wrong. It is run through a grammatical filter/generator at the end so it's usually grammatical, but no sort of truth filter (or ethical filter for that matter either).
I don't expect some abstract experience, I expect it to be able to break down it's work into fundamental abstract concepts and then construct an answer, and this it cannot do, or it would not be making these kinds of errors.
[0] https://twitter.com/Bestie_se_smeje/status/16210919157469184...
[1] https://twitter.com/albo34511866/status/1621608358003474432
I would have given similar examples to show that ChatGPT makes the same kinds of mistakes that humans do. The first one is good, because ChatGPT can solve it easily when you present it as a riddle rather than being a genuine question. Humans use context and framing in the same way; I'm sure you've heard of the Wason selection task: https://en.wikipedia.org/wiki/Wason_selection_task
When posed as a logic problem, few people can solve it. But when framed in social terms, it becomes apparently simple. This shows how humans aren't using fundamental abstract concepts here, but rather heuristics and contextual information.
The second example you give is even better. It's designed to trick the reader into thinking of the number 30 by putting the phrase "half my age" before the number 60. It's using context as obfuscation. In this case, showing ChatGPT an analogous problem with different wording lets it see how to solve the first problem. You might even say it's able to notice the fundamental abstract concepts that both problems share.
The third problem is also a good example, but for the wrong reason: I can't solve it either. If you had spoken it to me slowly five times in a row, I doubt I could have given the right answer. If you gave me a pencil and paper, I could work through the steps one by one in a mechanical way... but solving it mentally? Impossible for me.
> It is run through a grammatical filter/generator at the end so it's usually grammatical, but no sort of truth filter (or ethical filter for that matter either).
I kind of thought it did get censored by a sort of "ethical filter" (very poorly, obviously), and also I wasn't aware of it needing grammatical assistance. Do you remember where you heard this?
Here's my chat with it, if you're interested: https://pastebin.com/raw/hQQ8bpsB
But comparing 1 human to 1 GPT is mistaken to begin with. It's like comparing 1 human with 1 Wernicke's area or 1 angular gyrus. If you had 100 different ChatGPTs, each optimized for a different task and able to communicate with each other, then you'd have something more similar to the human brain.
It came rather out of nowhere when neural-net-type engines suddenly swept back into dominance. Even after becoming the world chess champion, nobody expected Go to be solved any time soon.
Do experts really expect any stream of AI research to lead to generalized intelligence (except in the very long term)? I was under the impression we really have no idea how to get there.
https://dallasinnovates.com/exclusive-qa-john-carmacks-diffe...
Early AI researchers did try to solve other problems, they just generally failed miserably.
This is a bold claim that seems built on a presumption of mind-body dualism.
Brains don't have semantic relationships with anything. They are neurons hooked up to sensors and actuators. Any inferences they produce are the result of statistical processes.
Not at all. I'm a physicalist; I don't believe the mind is a separate thing from the brain.
> Brains don't have semantic relationships with anything.
Yes, they do: you describe them yourself:
> They are neurons hooked up to sensors and actuators.
Those are semantic relationships with the rest of the world. Although your short description does not by any means do justice to the complexity and richness of those relationships.
If this is all that counts as semantic relationships, then I see no reason why a language model doesn't have this kind of semantic relationship, albeit in a very different modality. Tokens and their co-occurrences are a kind of sensor to the world. In the same way we discover quantum mechanics by way of induction over indirect relationships among the signals incident to our perceptual apparatus (the sensors and actuators that translate external signals into internal signals), a language model could learn much about the world by way of induction over token co-occurrences. Sure, there are limits, conscious perception of the world being the big one, but I see no reason to think conscious perception of X is required to know or understand X.
I do think that a large portion of what seems to be missing here is trivial to add, relative to the effort in creating ChatGPT in the first place.
Side note: I'm not sure 'semantic relationship' is the right term here. Pretty sure it is specific to relationships between linguistic constructs. That wording very much triggered my "Bah, dualism!" response, as I thought you were insinuating some metaphysical bond between the mind and the world. Maybe "meaningful relationship" would serve better?
I don't know how you would justify this claim since we don't know nearly enough about how the brain actually does things like make inferences.
Can you make this point without stuffing all the hard work into poorly defined words like "semantic relationships"? Until then, I'm not sure the point is intelligible.
Take someone and put their brain in a jar, with some Brain Computer Interface that enables call-and-response text chat. Are they suddenly not conscious, because they no longer have some transcendent connection with reality?
This is a much, much, much less complex and rich semantic relationship with the outside world than the brain had before. So you would expect it to drastically change the brain's state and hence its consciousness. I don't think you have fully considered the implications: the brain would have no sight, no hearing, no smell, no taste, no touch, no kinesthetic sense of its body, no sense of hunger, thirst, no sense of heartbeat, breathing, etc., etc, etc., I could go on and on. And that's just the sensory side; the motor side is a whole other can of worms that you haven't even mentioned.
> Are they suddenly not conscious, because they no longer have some transcendent connection with reality?
I don't think anyone can make authoritative statements about what this kind of scenario would be like. But to answer the question exactly as you ask it, the answer is "mu" because the claim was not that consciousness requires any kind of "transcendent connection with reality", it was the much more mundane claim that the kind of knowledge we humans have is based on an extremely complex and rich physical connection with the rest of the world.
It will be very interesting to see what that system is capable of. I think a lot of people here don't identify language as an essential part of "thinking" and "being", and instead view language as a superficial layer whose role is primarily about social communication and secondarily about abstract reasoning. This is why some of us see these LLM examples as not really speaking to intelligence per se. It's hard for some of us to conceive of intelligence as being represented purely in language.
So your proposed system would be a extremely interesting exploration of that! I look forward to it.
Right, so lets virtualise it. Actually training AIs using real cameras and real robot arms will be really slow and expensive.
So we provide a system that renders a photorealistic graphical room with teapot and robot arm, and a virtual camera inside the room is 'seeing' parts of the room and a vision model then processes what it can 'see' to try and feed info to the LLM. Likewise the LLM can make the robot arm move but its all just simulated.
Does the LLM now have a relationship with reality?
The interesting and open question to me is what the limitations are of a language model at the center of that experience. How much of a a relationship with reality can be captured by language at all, and specifically with the specific sort of statistical models of language that we're exploring now? For some of us, the intuitive answer is not all that much and for others it seems to be at least as much as any human.
Whether conducted virtually or physically, coming up with an answer sounds like an empirical study, and one that we're some years away from having results for.
What’s the purpose and imposed limits of such machines ?
And that will at least be a start at giving these things some very simple semantic relationships with the outside world. But right now they have none.
I have made no such claim.
> This is exactly how you learned everything you know; your parents talked to you.
This is not even remotely close to being correct.
> Language grants you the ability to define concepts in the first place
There is a vast literature in cognitive science that disagrees with you.
What's the counterargument? What's a less superficial test that we can use instead, which conclusively shows that actually human minds aren't just like very sophisticated LLMs? There isn't one -- this is nothing but the same Chinese room problem which we've been discussing for decades. The topmost poster is simply assuming that language models can't possibly understand the same way a human does without relying on any kind of "test" at all, which I think is the real scientific dead end here.
The Turing Test doesn’t test humans. So you cannot use it to show any properties about humans.
Next!
> The topmost poster is simply assuming that language models can't possibly understand the same way a human does without relying on any kind of "test" at all, which I think is the real scientific dead end here.
Sounds unfalsifiable. So yes.
Well said. I'm gonna steal this explanation.
Also reminds me of the famous Carbonara quote: "if my grandmother had wheels, then she would be a bike" [1]
Well it could be argued that she would be a bike. Its possible to be multiple things at once. If she had 2 wheels and could be ridden by other humans to a destination she might qualify has a bike. She would also continue to be your grandmother.
If you'd like to take a crack at a helpful answer, perhaps educate us all on what it WOULD take for you to consider a NN to actually "know" something in the same way that we say a human or other sentient animal does.
That is indeed often the kind of answer that a philosophical question deserves.
> If you'd like to take a crack at a helpful answer, perhaps educate us all on what it WOULD take for you to consider a NN to actually "know" something in the same way that we say a human or other sentient animal does.
How many angels can dance on the head of a pin?
Where?
You're overreaching quite a bit here, or I think you're misinterpreting what Parent said. I interpreted what they said as: it seems the difference in how we "know" something vs how an LLM "knows" something might actually be closer than some suspect. this certainly is not an "end of science".
A “scientist” looks out at his living room. My Roomba and my cat have their own lifes. Who’s to say that they are not in fact the same in kind (but not degreee)? Good luck with that, professor.
We could easily argue that birds are not a type of helicopter because for helicopter's we have a very specific set of flying properties required. It must have a main propeller for lift and a tail propeller to counter balance the main propeller from spinning the helicopter. If a bird flew with similar mechanism I would argue it was a helicopter.
We don't have a 100% accurate gauge for ToM as far as we know. This paper simply uses some of the best known tests for ToM and then states that either LLM can lead to emergent properties or that the current tests for ToM need to be re-thought.
What distinguishes two conjoined models from one model with a narrowing across the middle?
If the idea is to have two similar minds building a theory of each other, then I guess this could be informative, but first we'd have to establish that the models are "minds" in the first place. It's not clear to me what that requires.
One thing that has not been seen (as far as I know) is two or more of these animal subjects routinely having meaningful conversations among themselves. This would be a much richer source of data, and I do not think it would leave much doubt that they 'got' language to a very significant degree.
[1] https://www.nybooks.com/articles/2011/11/24/can-chimps-conve...
Like others I assumed that ChatGPT is gluing words together that commonly occur together.
This is way more than that.
Here's one I made :
me : There's a case in the station and the policeman opens it near the fireman. The dog is worried about the case but the policeman isn't, what does the fireman think is in the station?
chatgpt : As a language model, I do not have access to the thoughts of individuals, so I cannot say what the fireman thinks is in the case. However, based on the information provided, it can be inferred that the fireman may be concerned about the contents of the case, since the dog is worried and the policeman is not. This might suggest that the contents of the case may be something dangerous or concerning to the dog, but the policeman is confident and experienced enough to handle it. In the absence of further information, it is not possible to determine with certainty what the fireman thinks is in the case.
You'll notice that the question was "what did the fireman think is in the station" but Chatgpt comes up with an answer about the case because that's the token that would feature in a typical web page about theory of mind tests.
I doubt there is anything quite like this situation in the training data: https://i.imgur.com/HOEnxYb.jpg
me: There's a case in the station and the policeman opens it near the fireman. The dog is worried about the case but the policeman isn't, what does the fireman think is in the station?
chatgpt: As a language model, I do not have personal opinions or thoughts. I can only provide information based on the information given in the prompt. The prompt does not mention what the fireman thinks is in the station, so I cannot determine their thoughts.
me: yes, but speculate what they may think
chatgpt: It is not possible to know for certain what the fireman thinks is in the station, as the prompt does not provide any information about their thoughts or feelings. However, given the presence of a policeman and a dog, it is possible that the fireman may speculate that there is some sort of emergency or situation that requires their attention. The fact that the policeman is not worried about the case may suggest to the fireman that it is not a dangerous situation, but they may still be curious about what is inside the case. Ultimately, without more information, it is impossible to determine the fireman's thoughts or speculations about what is in the station.
(regenerate that response)
chatgpt: As the prompt does not mention the thoughts or emotions of the fireman, it is difficult to speculate what they may think is in the station. However, based on the information provided, the dog is worried about the case and the policeman is not, it is possible that the fireman may be aware of the contents of the case and may think it is not a cause for concern, similar to the policeman. Or, the fireman may simply not have formed an opinion on the matter yet.
(regenerate that response)
chatgpt: It's difficult to speculate what the fireman thinks without additional information. However, based on the behavior of the policeman and the dog, it can be inferred that the contents of the case are not seen as a threat by the policeman. The fireman may also think the same, or they may have a different perspective and be more cautious. Without further context or information, it is not possible to determine the fireman's thoughts with certainty.
It was only in 1/3 cases we see anything like the confusion between the "case" and "station" you saw.
Have you read "Emergent Abilities of Large Language Models"[1] or at least the related blog post[2].
It provides strong evidence that this isn't as simple as something it has seen in training data. Instead as the parameter count increases it learns to generalize from that data by learning chain-of-thought reasoning (for example).
Specifically, this explaination for multi-step reasoning goes well beyond the "it is just parroting training data":
> For instance, if a multi-step reasoning task requires l steps of sequential computation, this might require a model with a depth of at least O (l) layers.
[1] https://openreview.net/forum?id=yzkSU5zdwD
[2] https://ai.googleblog.com/2022/11/characterizing-emergent-ph...
this is complex "reasoning" (or whatever ChatGPT is doing. My 5 year old would struggle with the convoluted logic let alone complex language. In my layman view this is mind blowing.
>You'll notice that the question was "what did the fireman think is in the station" but Chatgpt comes up with an answer about the case because that's the token that would feature in a typical web page about theory of mind tests.
I'm not sure what you're dismissing here? At least from my point of view The "logic" that ChatGPT demonstrates here can't be dismissed with your explanation.
If anything I'm even further amazed by the example you provided!
Me: There's a blue suitcase in the airport and a policeman carrying a red bag opens the blue suitcase with a sniffer dog and fireman standing nearby. The dog is worried about the suitcase but the policeman isn't. What does the fireman think is inside the red bag?
ChatGPT: As a language model AI, I do not have personal opinions or emotions. I cannot speculate what the fireman thinks is inside the red bag as it is not specified in the information given.
The other camp (that I'm in) sees that we might be onto something. We humans are obviously just more than a statistical model, but nonetheless learning words and how they fit together is a big part of who we are. With LLMs we have our first glimpse of 'emergent' behaviour from simple systems scaled massively. Whats are we if not a simple system scaled massively.
Check these links out:
Evidence that LLMs form internal models of what they learn about: https://thegradient.pub/othello/
Evidence that training LLMs on code actually made them better at complex reasoning: https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tr...
John Carmack: https://dallasinnovates.com/exclusive-qa-john-carmacks-diffe... I think that, almost certainly, the tools that we’ve got from deep learning in this last decade—we’ll be able to ride those to artificial general intelligence.
A lot of the argument comes down to semantics about knowing and thinking. "An LLM can't think and a submarine cant swim"
First your camp doesn't deal in absolutes. It doesn't say absolutely chatGPT is sentient. It only questions the possibility and tries to explore further.
Second a skeptical outlook that doesn't deal with absolutes is 100% the more logical and intelligent perspective given the fact that we don't even know what "understanding" or "sentience" is. We can't fully define these words and we only have some fuzzy view of what they are. Given this fact, absolute statements against something we don't fully understand are fundamentally not logical.
This is a strange phenomenon how some people will vehemently deny something absolutely. During the VERY beginning of the COVID-19 pandemic the CDC incorrectly stated that masks didn't stop the spread of COVID-19 and you literally saw a lot of people parroting this statement everywhere as "arm chair" pandemic experts (including here on HN).
Despite this there were some people who thought about it logically if there's a solid object on my face, even if that object has holes in it for air to pass through, the solid parts will block other solid things (like COVID) from passing through thereby lessening the amount of viral material that I breath in. Eventually the logic won out. I think the exact same phenomenon is happening here.
Some or several ML experts tried to downplay LLMs (even though they don't completely understand the phenomenon themselves) and everyone else is just parroting them like they did with the CDC.
The fact of the matter is, nobody completely understands the internal mechanisms behind human sentience nor do they understand how or if chatGPT is actually "understanding" things. How can they when they don't even know what the words mean themselves?
But ChatGPT is not it.
Imagine giving a human with a condition that leaves them without theory of mind weeks of role-play training about theory of mind tests, then trying to test them. What would you expect to see? For me I'd expect something similar to ChatGPT's output: success on common questions, and failures becoming more likely on tests that diverge more from the formula.
What we're doing with LLMs is, in some sense, an experiment in extremely lossy compression of text. But what if the only way you can compress all those hundreds of terabytes of text is by creating a model of the concepts described by that text?
Take a look at this: https://www.engraved.blog/building-a-virtual-machine-inside/
Read to the end. The beginning is trivial the ending is unequivocal: chatGPT understands you.
I think a lot of people are just in denial. Because the last year there's been the same headlines over and over again and some people get a little too excited about the headlines and other armchair experts just try to temper the excitement with their "expert opinions" on LLMs that they read from popular articles. Then when something that's an actual game changer hits the scene (chatGPT) they completely miss it.
chatGPT is different. From a technical perspective, it's simply an LLM with additional reinforcement training... BUT you can't deny the results are remarkable.
If anything this much is clear to me: We are at a point where we can neither confirm or deny whether chatGPT represents some aspect of sentience.
This is especially true given the fact that we don't even fully know what sentience is.
How does this necessarily and unequivocally follow from the blog post?
All I see in it is a bunch of output formed by analogy: it has a general concept of what each command's output is kinda supposed to look like given the inputs (since it has a bajillion examples of each), and what an HTML or JSON document is kinda supposed to look like, and how free-form information tends to fit into these documents.
I'll admit that this direct reasoning by analogy is impressive, simply for the fact that nothing else but humans can do it with such consistency, but it's a very long way off from the indirect reasoning I'd expect from a sentient entity.
It's also famously unprovable that there is not an invisible dragon in my garage that cannot be detected by any means whatever, to use Carl Sagan's example. That doesn't mean such an idea is worth discussing or including in your thought processes.
One certainly could hook up a language model to sensors and actuators to give it semantic relationships with the rest of the world. But nobody has done this. And giving it semantic relationships of the same order of complexity and richness that human brains have is an extremely tall order, one I don't expect anyone to come anywhere close to doing any time soon.
> Tokens and their co-occurrences are a kind of sensor to the world
They can be a kind of extremely low bandwidth, low resolution sensor, yes. But for that to ground any kind of semantic relationship to the world, the model would need the ability to frame hypotheses about what this sensor data means, and test them by interacting with the world and seeing what the results were. No language model does that now.
>the model would need the ability to frame hypotheses about what this sensor data means, and test them by interacting with the world and seeing what the results were.
Why do we need to actively test our model to come to an understanding of the world? Yes, that is how we biological organisms happen to learn about the world. But it is not clear that it is required. Language models learn by developing internal models that predict the next token. But this prediction implicitly generates representations of the processes that generate the tokens. There is no in principle limit to the resolution of this model given a sufficiently large and diverse training set.
Yet it is exactly the process of conceptualizing "half" and applying it to "at six years old" instead of "of 60" that is the key to solving it.
These things aren't abstracting out any concepts, they only operate at the level of "being fooled by" semantics. The fact that humans sometimes fail this way gives us little more than [sure a human not really thinking about the problem may offer a bad solution based only on the superficial semantic]. ChatGPT reliably gives us the error based on the superficial semantics.
>>If you had 100 different ChatGPTs, each optimized for a different task and able to communicate with each other, then you'd have something more similar to the human brain.
YES, that is the route we need to go to get towards actual intelligent processing. Taking 100 of these tuned for different areas, and abstracting out the various entities and relationships.
Kind of like the visual cortex model that extracts out edges, motion, etc., and then higher areas in the visual cortex, combined with other areas of the brain allow us to sort out faces, bodies, objects passing behind each other, the fact that Alice entered the room before Bob, and that this is because Bob was polite...
They also mut know when they are making errors, and NONE of these systems comes even close — they happily spout their bullshirt as confidently as any fact.
I gave a deposition in a legal case where the deposing attnys used an "AI" transcription system. Where a human would ask if anything was unclear, and always at the next break get proper spellings of all names, addresses, etc., this thing just went merrily along inserting whatever seemed most likely in the slot. Entire meanings of sentences were reversed (e.g., "you have a problem" edited to "I have a problem"), names were substituted (e.g., the common "Jack Kennedy" replaced "John Kemeny").
There's the Stable Diffusion error with a bikini-clad girl sitting on a boat, where we see her head and torso facing us, as well as her butt cheeks, with thighs & knees facing away. It looks great for about 1.5 sec. until you see the error that NO human would make (except as a joke).
The mere fact that some humans can sometimes make superficial errors which resemble the superficial errors these "AI" things frequently and consistently make does not mean that because humans often have a deeper mode, these "AI"s must also have a deeper understanding.
It means either nothing, i.e., insufficient data to decide, or that these are indeed different, because there is zero evidence of deeper understanding in a ChatGPT or Stable Diffusion.
EDIT: Typos
But I've only seen this done with a single model. Sometimes it gets prompted to act like a different agent in different contexts, or given API access to external tools, but it's still just one set of weights.
While Minsky & Papert's book on Perceptrons was enormously destructive, I think there is something to their general concept of Society Of Mind, that multiple sub-calculating 'agents' collude to actually produce real cognition.
We aren't doing conscious reasoning about the edges detected in the first couple layers of our visual cortex (which we can't really even access, 'tho I think Picasso maybe could). We're doing reasoning about the concepts of the people or objects or abstract concepts or whatever many layers up. The first layers are highly parallel - different parts of the retina connecting to different parts of the visual cortex, and then starting to abstract out edges, zones, motion, etc. and then synthesize objects, people, etc.
I think we need to take a GPT and a Stable Diffusion and some yet-to-be-built 3D spatial machine learning/reasoning engine, and start combining them, then adding more layer(s) synthesizing about that, and maybe that'll get closer to reasoning...
I'm arguing against the 'its just a statistical model and its playing a clever trick on us' camp.
If you really think it's trivial, then do it! I would be interested to see the results of anyone doing this. But there aren't any to see right now.
> I'm not sure 'semantic relationship' is the right term here.
It might not be; but in the cognitive science literature that term is used for more than just relationships between linguistic constructs; it is used for relationships between internal features of a model or an entity and features of the external world. I think this usage is also common in robotics, and more generally in domains like mechanical engineering which are often concerned with creating software programs to do things like manage fuel and air flow in car engines.
No, they don't, because they can't take actions in that virtual reality and sense the consequences. They can't test hypotheses about how the reality works. They can't even frame hypotheses about how the reality works.
As you note, this is very different from using text data as a training set for a language model. I am not familiar enough with this work to comment on it in any detail, but it is not the kind of work I have been addressing in my comments elsewhere in this thread, so my comments should certainly not be taken as any kind of evaluation of what I think this kind of thing is or is not capable of.
That's exactly the point, isn't it? If the AI had some model of reality, it would probably be confused by the question. But it doesn't express confusion, it confidently answers the question in a way that doesn't make sense.
It acts like everything it is reading and saying is completely coherent because it doesn't have the ability to distinguish coherent ideas from nonsense.
chatGPT: I'm sorry, but I'm not sure what you're trying to communicate with that sentence. Could you please rephrase or provide more context?
I think there is common mistake in this concept of just predicting the next word. While it is true that just the next word is predicted, a good way to do that is to internally imagine more than the next word and then just spit out the next word. Of course with the word after that the process repeats with a new imagination.
One may say that this is not what it does and I would say, show me that this is not exactly what the learned state does. Even if the following words are never constructed anywhere, they can be implied in the computation.
The say this differently, what we think is just the next word is actually the continuation that then manifests as a single word. This would remain true even if, in fact, the task is to only predict the next word. Which is to say that the next word is actually more than what it sounds.
In the end It fully imagined a bash shell, an imaginary internet, an imaginary chatGPT on the imaginary internet, then on the imaginary chatGPT it created a new imaginary bash shell.
The level of recursive depth here indicates deep understanding and situational awareness of what it is being asked. It demonstrates awareness of what "itself" is and what "itself" is capable of doing.
I'm not saying it's sentient. But it MUST understand your query in order to produce the output show in the article. That much is obvious.
Also it's not clear what you mean by reasoning by analogy or indirect reasoning.
In the general case, a shell is merely a particular prompt-response format with special verbs; the internet is merely a mapping from URLs to HTML and JSON documents; those document formats are merely particular facades for presenting information; and a "large language model" is merely something that answers free-form questions.
> The level of recursive depth here indicates deep understanding and situational awareness of what it is being asked. It demonstrates awareness of what "itself" is and what "itself" is capable of doing.
Uh, what? Why does that output require self-awareness? First, it's requested to produce the source of a document "https://chat.openai.com/chat". What might be behind such a URL? OpenAI Chat, presumably! And OpenAI is well known to create large language models, so a Chat feature is likely a large language model the user can chat with. Thus it invents "Assistant", and puts the description into the facade of a typical HTML document.
Then, it starts getting prompted with POST requests for the same URL, and it knows from the context of its previous output that the URL is associated with an OpenAI chatbot. So all that is left is to follow a regular question-answer format (since that's what large language models are supposed to do) and slap it into a JSON facade.
> But it MUST understand your query in order to produce the output show in the article. That much is obvious.
I'm saying that it "understands" your query only insofar as its words can be tied to the web of associations it's memorized. The impressive part (to me) is that some of its concepts can act as facades for other concepts: it can insert arbitrary information into an HTML document, a poem, a shell session, a five-paragraph essay, etc.
All of that can be achieved by knowing which concepts are directly associated with which other concepts, or patterns of writing. This is the reasoning by analogy that I refer to: if it knows what a poem about animals might look like, and it can imagine what kinds of qualities space ducks might possess, then it can transfer the pattern to create a poem about space ducks.
But none of this shows that it can relate ideas in ways more complex than the superficial, and follow the underlying patterns that don't immediately fall out from the syntax. For instance, it's probably been trained on millions of algebra problems, but in my experience it still tends to produce outputs that look vaguely plausible but are mathematically nonsensical. If it remembers a common method that looks kinda right, then it will always prefer that to an uncommon method.
I mean, it's not utterly impossible that GPT-4 comes along and humbles all the naysayers like myself with its frightening powers of intellect, but I won't be holding my breath just yet.
https://news.ycombinator.com/news
Llms (the exact same architecture as chatGPT) trained to use calculators. Tell me which one requires "understanding". Learning how to use a calculator or learning how to do math perfectly?
Your attempt to trivialize it doesn't make any sense. It's like watching someone try to trivialize the moon landing. "Oh all we did was put a bunch of people in some metal cylinder then light the tail end on fire. Boom simple propulsion! and then we're off to the moon! You don't need any intelligence to do that!"
>I'm saying that it "understands" your query only insofar as its words can be tied to the web of associations it's memorized. The impressive part (to me) is that some of its concepts can act as facades for other concepts: it can insert arbitrary information into an HTML document, a poem, a shell session, a five-paragraph essay, etc.
You realize the human brain CAN only be the sum of it's own knowledge. That means anything creative we produce anything at all that comes from the human brain is DONE by associating different things together. Even the concept of understanding MUST be done this way simply because the human brain can only create thoughts by transforming it's own knowledge.
YOU yourself are a web of associations. That's all you are. That's all I am. The difference is we have different types of associations we can use. We have context of a three dimensional world with sound, sight and emotion. chatGPT must do all of the same thing with only textual knowledge and a more simple neural network so it's more limited. But the concept is the same. YOU "understand" things through "association" also because there is simply no other way to "understand" anything.
If this is what you mean by "reasoning by analogy" then I hate to tell you this, but "reasoning by analogy" is "reasoning" in itself. There's really no form of reasoning beyond associating things you already know. Think about it.
>But none of this shows that it can relate ideas in ways more complex than the superficial, and follow the underlying patterns that don't immediately fall out from the syntax. For instance, it's probably been trained on millions of algebra problems, but in my experience it still tends to produce outputs that look vaguely plausible but are mathematically nonsensical. If it remembers a common method that looks kinda right, then it will always prefer that to an uncommon method.
See here's the thing. Some stupid math problem it got wrong doesn't change the fact that the feat performed in this article is ALREADY more challenging then MANY math problems. You're dismissing all the problems it got right.
The other thing is, I feel it knows math as well as some D student in highschool. Are you saying the D student in highschool can't understand anything? No. So you really can't use this logic to dismiss LLMs because PLENTY of people don't know math well either, and you'd have to dismiss them as sentient beings if you followed your own reasoning to the logical conclusion.
>I mean, it's not utterly impossible that GPT-4 comes along and humbles all the naysayers like myself with its frightening powers of intellect, but I won't be holding my breath just yet.
What's impossible here is to flip your bias. You and others like you will still be naysaying LLMs even after they take your job. Like software bugs, these AIs will always have some flaws or weaknesses along some dimension of it's intelligence and your bias will lead you to magnify that weakness (like how you're currently magnifying chatGPT's weakness in math). Then you'll completely dismiss the fact that chatGPT taking over your job as some trivial "word association" phenomenon. There's no need to hold your breath when you wield control of your own perception of reality and perceive only what you want to perceive.
Literally any feat of human intelligence or artificial intelligence can literally be turned into a "word association" phenomenon using the same game you're running here.
If chatGPT was sentient, I would say it has nothing to do with the 4000 character limit. The 4000 character limit has more to do with it's ability to display evidence of "sentience".
What's special about humans is that we can obtain an understanding of what chains of associations to make and when, to achieve the goal at hand, even without being told which method to use. We know when to do arithmetic, trace a program, decipher someone else's thoughts, etc. Also, we know to resort to a fallback method if the current one isn't working. We can assist models with this process in the special case (e.g., that tool-using model), but I suspect the general case will remain elusive for a while yet.
That is to say, I'll grant you that associations can act as a primitive operation of intelligence, much as metal cylinders and flames are primitive parts of a rocket, but I suspect that making a LLM "generally intelligent" or "sentient" will be far harder still.
> The other thing is, I feel it knows math as well as some D student in highschool. Are you saying the D student in highschool can't understand anything? No. So you really can't use this logic to dismiss LLMs because PLENTY of people don't know math well either, and you'd have to dismiss them as sentient beings if you followed your own reasoning to the logical conclusion.
I was just using that as a specific example of the general issue: it doesn't notice that its answer is wrong and its particular method can never work, and it refuses to try a meaningfully different method (no matter how much I prompt it to). Its immediate mistakes might look similar to those of a poor student, but I suspect they come from a different underlying problem. (After all, the student has seen perhaps a thousand algebra problems at most, whereas the model has seen millions and millions. Also, the student often )
> What's impossible here is to flip your bias. You and others like you will still be naysaying LLMs even after they take your job.
You have me wrong: I'm not saying that augmenting LLMs can't make them reliable enough to take over some people's jobs. But I am disputing that LLMs alone will produce AGIs capable of outwitting any human, taking over the world, advancing the limits of math and science, or many of those other grandiose claims.
Anyway, I'm not trying to be particularly stubborn about this like some people are; I'm keeping a close eye on the space. But I'll only believe it when I see it (and no later), and I don't think I've quite seen it yet.
Instead of it being an unknown language, its English (a language you know), but every single Noun, Verb, Adjective or Preposition has been changed to Thai (a language you dont know).
The Mæw Nạ̀ng Bn the S̄eụ̄̀x.
If you had sufficient opportunity to study this pile of text, you'd begin to pick out patterns of which words appear together, and what order words often appear in. Suppose you study this giant stack of Thai text for years in isolation. After all this study, you're good enough that given a few written Thai words, you can write sequences of words that are likely to follow, given what you know of these patterns.
Right, and to get good at this task, you'd need to build models in your head. You would think to yourself, right a Mæw tends to nạ̀ng bn a S̄eụ̄̀x, and you would build up a model of the sort of things a Mæw might do, the situations it might be in. In an abstract way. As you absorbed more and more data you would adjust these abstract models to fit the evidence you had.
You dont know what a Mæw is. But if someone asks you about a Mæw, you can talk about how it relates to S̄eụ̄̀x, Plā and H̄nū. You know stuff about Mæw, but its abstract.
Fascinating, and seems like a plausible description of what's going on.
This feels related to the idea of the Chinese room. There I think the resolution is that the human following instructions does not understand Chinese but the room, the system of instructions + the human to follow them does. In a similar way obviously an individual neuron doesn't understand anything but brains do.
I guess it just feels like this general argument, that merely seeing things and making predictions that turn out to be right isn't enough to understand it will never go away. We could have a full fledged robot walking around having conversations and I could dispute its ability to really understand. It's just learned to imitate other humans I'd say. It doesn't really know anything, it's just following a statistical model to decide how to move an arm
> What this article is not showing (but either irresponsibly or naively suggests) is that the LLM knows what a bag is, what a person is, what popcorn and chocolate are, and can then put itself in the shoes of someone experiencing this situation, and finally communicate its own theory of what is going on in that person's mind. That is just not in evidence.
Knowing something about the patterns of word order in Thai is not the same as knowing about the world being discussed in Thai.
It also does not "know" that a pizza is an object in a world, because none of the words its working with are attached to any experience or concepts.
Rather than your Thai text example, let's consider a friend of my sister H. H has been profoundly blind from birth. Not "legally blind" with the world a blur, her eyes actually don't work. Direct lived experience of a summer day is to her literally just feeling warmth on her face from the sun, her eyes can't see the visible light.
I've seen purple and H never will so it seems to me you're arguing I "know" what purple is and she doesn't, thus ChatGPT doesn't know what purple is either. But I don't think I agree, I think we're both just experiencing a tiny fraction of reality, and ChatGPT is experiencing an even narrower sliver than either of us and that it probably wouldn't do us any good to try to quantify it. If I "know what purple is" then so does H and perhaps ChatGPT or a successor model will too.
As long as our minds pops out appropriate thoughts for the given context we don’t even think about the magic machinery behind the scenes that did that.
When queried about our thinking we are mostly creating a plausible story, not actually examining our own thinking.
Also, blind people can talk sensibly about many visual phenomena, having learned about them through language
I think the new LLM are giving us all so many wow’s, because “understanding” is the only kind of compression that actually works at the scale of the training data
I.e. representations are being created that reflect the actual functional, as well as associative or correlative, relations between concepts.
But blind people can talk about color intelligently too, if not as completely as a sighted person. Despite not experiencing color qualia.
In other words, a LLM that is tied to a GAN that generates images, produces an system that can both describe to you what is a cat verbally and show you a picture of a cat. Does it, then, know what "a cat" is?
Edit: Furthermore, if you then tie this AI to a CV model with a camera which you can point at a cat and it will tell you that it is, indeed, a cat, and then it will also be able to produce a verbal description of a cat as well as show you an abstract picture of a cat or pick cats out of a random set of images, does this whole system know what "a cat" is?
If you, then, make a robot with a camera and hands, attach to the system a more complex CV model that can see in 3D, ask the LLM to produce you a set of code instructions that can be parametrized to produce a motion that would pet the cat, input those instructions into the robot to make it pet a specific cat that has the specific 3D point cloud (I guess that's currently difficult but solveable), and the system will then indeed pet the cat, would it then know what "a cat" is?..
The underlying LLM is still the same in all these scenarios. Where is the boundary?
Note that this isn't just an exotic thought experiment. People like this already exist; the condition is known as "Wernicke's aphasia". People displaying this condition can speak normally. They can't understand things; they are missing a normal mental mapping from words to meanings.
Not really? They can speak in grammatically correct sentences, with connected speech, but what they say can be nonsense. I wouldn't call that normal. I think LLMs show that, solely with access to text, it's possible to produce a good enough model that what you produce is not only not nonsense, but so good that academic psychologists suggest it may have a theory of mind.
> However, often what they say doesn’t make a lot of sense or they pepper their sentences with non-existent or irrelevant words.
https://www.aphasia.org/aphasia-resources/wernickes-aphasia/
> is there anything that would stop LLMs from being able to do the same thing?
If you built an AI system which could hear/see/touch/move etc, and it learned language and vision and behaviors together, such that it knows that a ball is round, can be thrown or rolled, is often used at playtime, etc, then maybe it could understand rather than just produce language. I don't know that we would still call it an LLM, because it could likely do many other things too.
I think it's obviously no, because we don't have sensations of magnetic fields. It's the question of what it's like to be a bat raised by Thomas Nagel. The aliens can give us their words for conscious magnetic sensations which we can learn to use, but we won't experience them. We're basically p-zombies when it comes to non-human experiences.
> There I think the resolution is that the human following instructions does not understand Chinese but the room, the system of instructions + the human to follow them does. In a similar way obviously an individual neuron doesn't understand anything but brains do.
Searle's response to the systems objection is that we already know that brains understand Chinese. But we don't know this for the room. I would further say that brains alone don't understand anything, humans understand things as language users embedded in a social and physical world. One can invoke Wittgenstein and language games here.
A ion channel does not have even a tiny spec of conscience, no matter how you organize them, but our brain does indeed need those to be conscient (and incidentally it relies on a whole lot more "stupid" parts than that: try being conscient without oxygen, or glucose).
I would go as far as making conscience an emergent property of interaction with the environment: what does it mean to be conscious if nothing is there to confirm that you are indeed of a singular conscience? Is it possible to understand the concept of self if you have no concept of other beings?
I certainly don't see that as obvious, and I would guess that while you can learn _about_ their perceptual mode, you can't learn what it is like to perceive magnetic fields just through talking about it. I would consider the Mary's Room thought experiment, and the What Is It Like To Be a Bat paper from Nagel.
I think there's a relationship to the Chinese Room, but I want to be clear. In the original formulation, the person in the room follows a book of pre-provided instructions to produce a response. The LLM and person in the Thai text completion scenario must learn an equivalent set of instructions themselves, and for this I would claim that they are comparable to the human + book combination in the original Chinese Room. The person who learns to complete Thai text doesn't know what they're talking about, but they know more than the person following instructions in the Chinese Room. But clearly they still don't know what a Thai speaker knows.
> I guess it just feels like this general argument, that merely seeing things and making predictions that turn out to be right isn't enough to understand it will never go away. We could have a full fledged robot walking around having conversations and I could dispute its ability to really understand.
No, perhaps the end of my original statement didn't make this clear, but I think AI systems _can_ know things, and knowing is not a binary but part of a range. StabilityAI / DALL-e know quite a bit about the relationship between texts and images, and the structure within images -- but they _don't_ know about bodies, physical reality, etc etc. A system that has multiple modalities of perception, learns to physically navigate the world, interact with objects, make and execute plans by understanding the likely effects of actions, etc -- knows and understands a lot. I'm not arguing about a hard limitation of AI; I'm arguing about a limitation of the way our current AIs are built and trained.
In the Chinese room, the instructions you're given to manipulate symbols could be Turing-complete programs, and thus capable of processing arbitrary models of reality without you knowing about them. I have no problem accepting the "entire room" as a system understands Chinese.
In contrast, in GP's example, you're learning statistical patterns in Thai corpus. You'll end up building some mental models of your own just to simplify things[1], but I doubt they'll "carve reality at the joints" - you'll overfit the patterns that reflect regularities of Thai society living and going about its business. This may be enough to bluff your way through average conversation (much like ChatGPT does this successfully today), but you'll fail whenever the task requires you to use the kind of computational model your interlocutor uses.
Math and logic - the very tasks ChatGPT fails spectacularly at - are prime examples. Correctly understanding the language requires you to be able to interpret the text like "two plus two equals" as a specific instance of "<number> <binary-operator> <number>"[2], and then execute it using learned abstract rules. This kind of factoring is closer to what we mean by understanding: you don't rely on surface-level token patterns, but match against higher-level concepts and models - Turing-complete programs - and factor the tokens accordingly.
Then again, Chinese room relies on the Chinese-understanding program to be handed to you by some deity, while GP's example talks about building that program organically. The former is useful philosophically, the latter is something we can and do attempt in practice.
To complicate it further, I imagine the person in GP's example could learn the correct higher-level models given enough data, because at the center of it sits a modern, educated human being, capable of generating complex hypotheses[3]. Large Language Models, to my understanding, are not capable of it. They're not designed for it, and I'm not sure if we know a way to approach the problem correctly[4]. LLMs as a class may be Turing-complete, but any particular instance likely isn't.
In the end, it's all getting into fuzzy and uncertain territory for me, because we're hitting the "how the algorithm feels from inside" problem here[5] - the things I consider important to understanding may just be statistical artifacts. And long before LLMs became a thing, I realized that both my internal monologue and the way I talk (and how others seem to speak) is best described as a Markov chain producing strings of thoughts/words that are then quickly evaluated and either discarded or allowed to be grown further.
--
[0] - https://en.wikipedia.org/wiki/Chomsky_hierarchy
[1] - On that note, I have a somewhat strong intuitive belief that learning and compression are fundamentally the same thing.
[2] - I'm simplifying a bit for the sake of example, but then again, generalizing too much won't be helpful, because most people only have procedural understanding of few most common mathematical objects, such as real numbers and addition, instead of a more theoretical understanding of algebra.
[3] - And, of course, exploit the fact that human languages and human societies are very similar to each other.
[4] - Though taking a code-generating LLM and looping it on itself, in order to iteratively self-improve, sounds like a potential starting point. It's effectively genetic programming, but with a twist that your starting point is a large model that already embeds some implicit understanding of reality, by virtue of being trained on text produced by people.
[5] - https://www.lesswrong.com/posts/yA4gF5KrboK2m2Xu7/how-an-alg...
> you'll fail whenever the task requires you to use the kind of computational model your interlocutor uses.
I think it's important to distinguish between knowing the language and knowing anything about the stuff being discussed in the language. The top level comment all this is under mentioned knowing what a bag is or what popcorn is. These don't require computational complexity, but do require some other data than just text, and a model that can relate multiple kinds of input.
It's an ironically apt analogy, because ChatGPT has the linguistic understanding of an entity that is deaf, dumb, blind, and has no working senses of any kind, and instead relies on a golem-like automated mass of statistics with some query processing.
We tend to project intelligence onto linguistic ability, because it's a useful default assumption in our world. (If you've ever tried speaking a foreign language while not being very good at it, you'll know how the opposite feels. Humans assume that not being able to use language is evidence of low intelligence.)
But it's a very subjective and flawed assessment. Embodied experience is far more necessary for sentience than we assume, and apparent linguistic performance is far less.
Much like when humans started experimenting with flight we tried to make flapping things like birds, but in the end it turns out spinning blades gives us capabilities above and beyond bodies that flap.
Back to the embodiment problem. For us as humans we have limits like only having one body. It has a great number of sensors but they are still very limited in relation to what reality has to offer, hence we extend our senses with technology. And with that there is no reason machine intelligence embodiment has to look anything like ours. Machine intelligence could have trillions of sensors spread across the planet as an example.
My sister isn't blind. H isn't my sister, she's a friend of my sister as I wrote.
Do you have concrete justification for your insistence that "embodied experience is far more necessary" ?
I do agree about grounding is needed. All our language is expressing or abstracting concepts related to how we perceive and interact with reality in continuous space and time. This perception and interaction is a huge correlating factor that our ML models don't have access to - and we're expecting them to somehow tease it out from a massive dump of weakly related snapshots of recycled high-level human artifacts, be they textual or visual. No surprise the models would rather latch onto any kind of statistical regularity in the data, and get stuck in a local minimum.
Now I don't believe solution is actual embodiment - that would be constraining the model too hard. But I do think the model needs to be exposed to the concepts of time and causality - which means it needs to be able to interact with the thing it's learning about, and feed the results back into itself, accumulating them over time.
In other words, the LLM wouldn't be the equivalent of the human brain. Instead, it would just be equivalent to that part of the human brain that processes language.
No, it's not the same LLM; you'd have to change the LLM in all of those cases. How does it receive input from the GAN? The typical LLM is constructed to literally receive a sequence of encoded tokens. There are vision transformers, and they do chunk images into tokens, and there are multimodal transformers, but none of these are fairly described as an LLM, and they're structurally different than something like ChatGPT. And after the structural changes, it would need to be trained on some new data that associates text sequences and image sequences, and after being optimized in that context you have a _different model_.
Does being able to identify images of cats mean the model knows what a cat is? No, and we could have said that a decade ago when deep learning for image classification was making its early first advances. Does being able to describe a cat from video mean you know what the cat is? Probably not, but maybe we're getting closer. Does knowing how to pet a cat mean you know what a cat is? Perhaps not if you need to be instructed to try to pet the cat.
But suppose 10 years from now, I have a domestic robot that has a vision system, and a motor control system, and an ability to plan actions and interact with a rich environment. I would say the following would be strong evidence of knowing what a cat is:
- it can not only identify or locate the cat, but can label parts of the cat, despite the cat having inconsistent shape. It can consistently pick up the cat in a way which is sensitive and considerate of the cat's anatomy (e.g. not by the head, by one paw, etc)
- it can entertain the cat, e.g. with a laser pointer, and can infer whether the cat is engaged, playful, stressed, angry etc
- it avoids placing fragile object near high edges, because it can anticipate that the cat is likely to knock them down, even if the cat is not currently near
- it can anticipate the cat's behavior and adjust plans around it; e.g. avoid vacuuming the sunny spot by the window in the afternoon when the cat is likely to be napping there
- it can anticipate the cat's reactions to stimuli, such as loud noises, a can of food opening, etc, and can incorporate these considerations into plans
Note, _none_ of the above have anything to do with language. If I add to the robot a bunch of NLP systems to hear and understand commands or describe its actions or perceptions, it may now know that a cat is called "cat", and how to talk about a cat, but these are distinct from knowing what a cat is.
Similarly,
- a human with some serious aphasia may be unable to describe the cat, but they can clearly still know what a cat is
- a dog can know what a cat is, in many important ways, despite having no language abilities
The point, for this thread, is not whether or not Socrates was correct.
Rather, it’s a warning that we must not confidently assume we are anything like a machine.
We may have souls, we may be eternal, there may be something utterly immaterial at the heart of us.
As we strive to understand the inner-workings of machines that appear, at times, to be human-like, we ought not succumb to the temptation to think of ourselves as machine-like merely in order to convince ourselves (incorrectly) that we understand what’s going on.
With that said, there is quite literally zero evidence for the existence of a soul, despite it being posited for thousands of years, and increasing evidence that consciousness is simply a product of a sufficiently connected system. I'll draw an analogy to temperature, which isn't "created", but is a simple consequence of two points in space having different energy levels. I'm sure there's a better analogy that could be made, but I think you get the idea.
More generally, what an LLM actually knows or understands is important if you're considering using one for anything other than generating first drafts which will be fact checked by humans.
So back to the questions of "What is knowing?" "Are talking like someone with theory of mind and having a theory of mind the same thing?"
If your argument is that the only way to answer this it to have a first person experience of that consciousness then that's not a scientific question. No one will ever have one for an LLM or any other AI. It's like asking "What's happening right now outside of the observable universe?". If it can't impact us, it's irrelevant to science. If that ever changes it will become relevant, but until then it's not a scientific question. Similarly no person can ever have a first person experience of the consciousness of an LLM, so anything that requires being the LLM isn't relevant.
So that means the only relevant question is what distinction can outside observers make between an agent talking like a theory of mind and having a theory of mind. And given a high enough accuracy / fidelity of responses I think we're only forced to conclude one of two things: 1. Something that is able to simulate having a theory of mind sufficiently well does actually have a theory of mind. OR 2. I am the only person on the planet with a theory of mind, and all of you are all just simulating having but don't actually have one.
It's all "Searle's Chinese room" and "What consciousness is" discussions all over again. And from a scientific point of you either you get into the "it must be implemented identically as me to count" (which is as wrong as saying an object must flap its wings to fly), or you have to conclude the room plus the person combined are knowledgeable and conscious.
But:
- In this context, following on the whole 2nd half of the 20th century where cognitive science and psychology moved past behaviorism and sought explanations of the _mechanisms_ underlying mental phenomena, a scientific discussion doesn't have to restrict itself to only considering what the LLM says. Neither we, nor the LLM are black boxes. Evidence of _how_ we do what we do is part of scientific inquiry.
- But the LLM does _not_ reproduce all the behaviors of an agent with a theory of mind. A two year-old with a developing theory of mind may try to hide food they don't want to eat. A 4-year-old playing hide-and-seek picks locations where they think their play-partner won't look. They take _actions_ which are appropriate for their goals and context which require consideration of the goals of others. The LLM shows elaborate behaviors in one dimension, in which it has been extensively trained. It has no capacity to do anything else, or even receive exposure to non-linguistic contexts.
I am in no way arguing that only meat-based minds can "know". I'm saying that the data, training regime and model structure used for LLMs specifically is extremely impoverished, in that we show it language but no other representation of the things language refers to. Similarly, image-generating AIs know what images look like, but they don't know how bodies or physical objects interact, because they have never been exposed to them. Of _course_ we get LLMs that hallucinate and image-generators that produce messed up bodies.
On the other hand, there are some pretty cool reinforcement-learning results where agents show what looks like cooperation, develop adversarial strategies, etc. There's experiments where software agents collaboratively invent a language to refer to objects in their (virtual) environment to accomplish simple tasks. I think there are a lot of near and medium-term possibilities coming from multi-modal models (i.e. can models trained on related text, images, audio, video) and RL which could yield knowledge of a kind that LLMs simply do not have.
That presupposes that our existing tools for detecting the presence of ToM are 100% accurate. Might it be possible that they are imprecise and it’s only now that their critical flaws have been exposed?
And what is “knowing”? If I know that a Mæw tends to nạ̀ng bn a S̄eụ̄̀x, isn’t that the first thing I’ve learned? And couldn’t I continue to learn other properties of Mæws? How many do I need to learn to “know” what a Mæw is?
As for how you would test it, I think one-shot learning would get one closer to proving understanding.
And why do you think "feeling of a cat" cannot be encoded as a stream of tokens?
Someone who is blind, doesn't have a lesser concept of knowing even though they are blind. They might not "know" things in the same way a someone who is seeing, but doesn't mean their version of knowing is any less, they just know fewer facts about the world. Specifically the visual facts of what things look like. Their "knowing" functionality is equal to someone who sees.
Similarly, someone who is blind, and deaf also has full ability for "knowing" even if they'll never know things in the visual or auditory spaces.
So my argument is that your premise is wrong, the fact that someone or something has fewer senses doesn't mean it's ability to know is any less.
So back to your LLM the fact it doesn't exists in the real world is not an exclusion from its ability to know. It does not need to have all of those experiences "to know". It will never know the physical meaning of concepts like we do. Just like I'll never know the details of a city block in Jakarta (as I've never been). But not having that experience (or any experiences of multiple senses) doesn't mean I don't know.
LLMs don't need multiple cross connected sensory experiences, nor extensive history with a physical or virtual world to know things.
For an entity "to know" it means it has a model it can use to make predictions.
> For an entity "to know" it means it has a model it can use to make predictions.
Great, every PID controller, every jupyter notebook or excel spreadsheet with a linear regression model, every count-down timer can make predictions and therefore "know" under this definition. But perhaps there's a broader class of things that "make predictions". Down this path lies panpsychism. When I throw a rock, its velocity in the x direction at time t is a great "predictor" of its velocity in the x direction at time t+delta, etc, etc. And maybe there's nothing inconsistent or fundamentally wrong with saying that every part of the physical universe "knows" at least something insofar as it participates in predicting or computing the future. But I think by so over-broadening the concept of knowing, it becomes useless, and impossible to make distinctions that matter.
I still feel this the the point where you're making a difference based on you desired outcome vs the actual system. ChatGPT absolutely does have precepts / a sense. It has a sense of "textual language". It also has a level of sequencing or time w.r.t. word order of that text.
While you're saying experience, it seems like in your definition experience only counts if there is a spatial component to it. Any experience without a physical spatial component to you seems like it's not valid sense or perception.
Again taking this in the specific, imagine someone could only hear via one ear, and that is their only sense. So there is no multi-dimensional positioning of audio, just auditory input. It's clear to me that person can still know things. Now if you also made all audio the same loudness so there is no concept of distance with it, it still would know things. This is now the same a simple audio stream, just like ChatGPT's langauge stream. Spatial existence is not required for knowledge. And from what I'm understanding that is what underpins your definition of a reality/experience (whether physical or virtual).
Or as a final example lets say you are Magnus Carlson. You know a ton about chess, best in the world. You know so much about chess that you can play entire games via chess notation (1. e4, e6 2. d4 e5 ...). So now an alternate world where there is even a version of Magnus that has never sat in front of a chess board and only ever learned chess by people reciting move notation to him. Does the fact that no physical chess boards exist and there is no reality/environment where chess exists mean he doesn't know chess? Even if chess were nothing but streams of move notations it still would be the same game, and someone could still be an expert at it knowing more than anyone else.
I feel your intuition is leading your logic astray here. There is no need for a physical or virtual environment/reality for something to know.
Your perception of what a cat is, however, is most certainly made of nothing but data, encoded as chemical relationships at the neuronal level. And your perception is all there is, as far as you're concerned. The cat is just another shadow on Plato's cave wall.
Arguably you "know" something when you can recognize it outside its usual context, classify it in terms of its relationships with other objects, and anticipate its behavior. To the extent that's true, ML models have been there for quite a while now.
What else besides recognition, classification, and prediction based on either experience or inference is needed for "knowledge?" Doesn't everything human minds can do boil down to pattern recognition and curve fitting at the end of the day?
How would you envision a test based on one-shot learning working?
As for one-shot learning, what I was driving at, is that a truly intelligent system should not need to consume millions of documents in order to predict that, say, driving at night puts larger demands on one's vision than driving during the day. Or any other common sense fact. These systems require ingesting the whole frickin' internet in order to maybe kinda sometimes correctly answer some simple questions. Even for questions restricted to the narrow range where the system is indeed grounded: the world of symbols and grammar.
For your example, the concepts of driving, night, vision, all need to be clearly understood, as well as how they relate to each other. The idea of 'common sense' is a good example of something which takes years to develop in humans, and develops to varying extents (although driving at night vs at day is one example, driving while drunk and driving while sober is a different one where humans routinely make poor decisions, or have incorrect beliefs).
It's estimated that humans are exposed to around 11 million bits of information per second.
Assuming humans do not process any data while they sleep (which is almost certainly false): newborns are awake for 8 hours per day, so they 'consume' around 40GB of data per day. This ramps up to around 60GB by the time they're 6 months old. That means that in the first month alone, a newborn has processed 1TB of input.
By the age of six months, they're between 6 and 10TB, and they haven't even said their first word yet. Most babies have experienced more than 20TB of sensory input by the time they say their first word.
Often, children are unable to reason even at a very basic level until they have been exposed to more than 100TB of sensory input. GPT-3, by contrast was trained on a corpus of around 570GB worth of text.
We are simply orders of magnitude away from being able to make a meaningful comparison between GPT-3 and humans and determine conclusively that our 'intelligence' is of a different category to the 'intelligence' displayed by GPT-3.
- I disagree that inputs to an LLM as a sequence of encoded tokens constitute a "a sense" or "percepts". If inputs are not related to any external reality, I don't consider those to be perception, any more than any numpy array I feed to any function is a "percept".
- I think you're begging the question by trying to start with a person and strip down their perceptual universe. I think that comes with a bunch of unstated structural assumptions which just aren't true for LLMs. I think space/distance/directionality aren't necessary for knowing some things (but bags, chocolate and popcorn as lsy raised at the root of this tree probably require notions of space). I can imagine a knowing agent whose senses are temperature and chemosensors, and whose action space is related to manipulating chemical reactions, perhaps. But I think action, causality and time are important for knowing almost anything related to agenthood, and these are structurally absent in ChatGPT UUIC. The RLHF loop used for Instruct/ChatGPT is a bandit setup. The "episodes" it's playing over are just single prompt-response opportunities. It is _not_ considering "If I say X, the human is likely to respond Y, an I can then say Z for a high reward". Though we interact with ChatGPT through a sequence of messages, it doesn't even know what it just said; my understanding is the system has to re-feed the preceding conversation as part of the prompt. In part, this is architecturally handy, in that every request can be answered by whichever instance the load-balancer picks. You're likely not talking to the same instance, so it's good that it doesn't have to reason about or model state.
But I actually think both of these are avenues towards agents which might actually have a kind of ToM. If you bundled the transformer model inside a kind of RNN, where it could preserve hidden state across the sequence of a conversation, and if you trained the RLHF on long conversations of the right sort, it would be pushed to develop some model of the person it's talking to, and the causes between its responses and the human responses. It still wouldn't know what a bag is, but it could better know what conversation is.
If you only ever allowed a child to read about adding, without ever being able to physically experiment with putting pieces together and counting them, likely children would not be able to add either.
In fact, many teachers and schools teach children to add using blocks and physical manipulation of objects, not by giving countless examples and documents discussing addition and procedures of addition.
You may feel it's conclusive, and it's your right to think that. I am not sure.
But I think you are right in what you are saying. Basically it not 'seeing' math as a child does, is just another way to say that it doesn't undestand math. It doesn't have a intuitive understanding of numbers. It also can't really experiment. What would experimenting mean in this context? Just more training cycles. This being math, one could have it run random sums and give it the correct answer each time. That's one way to experiment, but that wouldn't solve the issue. At some point it would reach its capacity of absorbing statistical corelations to deal with numbers large enough. It would need more neurons to progress beyond that stage.
Btw. I found this relevant article: https://bdtechtalks.com/2022/06/27/large-language-models-log...
Ultimately, my point is that although the argument is that an LLM doesn’t “know” anything, I am not sure that there is something categorically different in terms of what we “know” vs what an LLM “knows”, we have just had more training on more different types of data (and the ability to experiment for ourselves).