Scientists should use AI as a tool, not an oracle(aisnakeoil.com) |
Scientists should use AI as a tool, not an oracle(aisnakeoil.com) |
This coupled with people acting on its predictions is a kind of self fulfilling prophecy.
which is to ask, are AI safety folks building models of this pattern? :)
The problem is you can't just "use logic and reason" because simple models are not good enough. The nuance dominates, but that's why we have experts.
What's funny to me is that people will confidently argue with experts and others value their opinion over the expert's knowledge. But on the other hand, people tend to just take machines at face value. Maybe these aren't overlapping groups, but it does appear that way. There's a great irony in trusting a machine but not the person/s that built said machine.
First you need to ask if AI "safety folks" actually understand the technology, and if they are thinking about it objectively. If they believe that we're a few years away from accidentally creating Skynet, they need to put down the crack pipe and go work in another field.
"facially impossible" ... does that really riff on "on the face of it", or is it farcically misspelt?
Garbage in, garbage out 8)
> Also, ML code tends to vastly more complex and less standardized than traditional statistical modeling.
I mean, hey, it's proof that the text isn't AI generated, since ChatGPT is better at English than that, but it makes it hard to read and I'm not going to buy their book if it's going to be full of errors like that.
Thus I think the biggest success of AI will be the arts, where imprecision is not fatal, and hallucinations turn into entertainment instead of "truths".
You can see an example of this in the use of LLMs to generate code. In that case, there's a whole SDLC pipeline designed to detect errors: type systems, language compilers and runtimes, tests of various kinds, QA, user feedback, etc. We don't just trust confident software developers to produce correct code.
Even a life-critical function like medical imaging - where imprecision can be fatal - can potentially benefit from this, where AI is used in conjunction with human review. It mainly requires development of some standards of practice - unlike with an average user blindly trusting the output of a model, radiologists would need training on how to use the models in question.
1) people confidently arguing with domain experts about topics that they have little to no experience in.
2) people valuing the opinions of arguers from 1 over experts.
There, fixed the title.
I think this is even better
Crux passage:
> Our focus here is on leakage, which is a specific form of illegitimacy that is an intrinsic property of the observational inputs of a model. This form of illegitimacy remains partly abstract, but could be further defined as follows: Let u be some random variable. We say a second random variable v is u-legitimate if v is observable to the client for the purpose of inferring u. In this case we write v € legit{u}.
> A fully concrete meaning of legitimacy is built-in to any specific inference problem. The trivial legitimacy rule, going back to the first example of leakage given in Section 1, is that the target itself must never be used for inference:
> (1) y !€ legit{y}
So ultimately this all about bad experimental discipline re: training and test data, in an abstract way? I’ve been staring at this paper for way too long trying to figure out what exactly each “target” is and how it leaks, but I hope that engineering-translation is close
Some even think that the end goal is actually an autonomous research agent that can make decisions about what questions to ask and why, and that's one of the true marks of AGI. That to me is insane and misses the entire point of science altogether, even once we reach that technical feasibility. We ask questions about the universe to expand our human relationship with the universe, not to just amass more research capital for the sake of it. And the fact that the AI snake oil has infected big chunks of science reveals which parts of it are just gold rush speculation and which aren't.
There's a more fundamental challenge of training scientists to understand why we ask the questions we ask. You can't just offload that to some background task and trust that it makes sense.
Our governmental research agency, and NIH as a whole has TONS of research data that we don't have the manpower to screen and provess. There are also gaps in data that AI/ML could help us simulate. AI research assistants could potentially help us process and evaluate "what questions to ask" by, for example, looking for trends in QSAR (quantitative structure-activity relationship) models for novel chemicals and help us direct our attention to compounds of toxicological interest.
We've also been trying to use the AI research assistants to speed up the process of evaluating the scientific literature for toxicologists who have to make regulatory decisions. Our agency has a backlog of chemicals that we would love to evaluate, but lacks the manpower to do so.
No profit motive or much "clout" interest, at least that I've seen. Just a lot of public servant scientists who need some extra help protecting the public.
What worries me the most about these AI solutions, however, is their usage in the public sector. They can certainly be useful helpers, like, they can scan images for cancer and if added to existing processes involving humans, often lead to enhanced results. They can’t replace any existing methods, however, as we learned here in Denmark a few years ago. Unfortunately that lesson hasn’t been learned across the public sector. I think medicine and healthcare learned it, but right now, we’re replacing actual human controls, audits and sometimes decision making with AI or an unwarranted trust in AI results. Which is going to lead to some really terrible results considering how bad things like LLMs often are at being lucky in even “common knowledge” situations. It’s further enhanced by how some of the work it’s tasked to do isn’t as black-and-white as writing code is. We use AI tools in our daily work, and they are ok, but as anyone who’s used them for programming probably knows by now, they aren’t exactly great at being lucky. Sometimes they’ll hallucinate solutions that simply do not exist.
This is how they work, and as I said earlier, AIs can be great enhancers. They aren’t replacements though, and if we start treating them like they are, which is very tempting from a change-management and benefit-realisation perspective, we’re just going to get in trouble. This is unfortunately exactly what we’re doing, and why wouldn’t we? Most western public sectors have functioned on at least some form of new public management for two decades by now, sometimes longer. As a result the entire systemic culture is geared toward efficiency and cost reduction, even when it doesn’t really result in either efficiency and cost reduction on a broader perspective.
Now, if scientists are on board. Then what hope does a public bureaucracy have?
Not something I'd trust as a "source of truth". Maybe a neat idea generator. And some of the deep learning algorithms can identify patterns that humans might miss -- patterns that could reveal useful insight. But they're not doing the knowledge work.
What do you even mean by "intelligence" when you say a probabilistic syllable generator "is not intelligence"?
I don't have answers; just the lingering question "why are we building this?"
Unfortunately, from an outsider perspective, this looks like being widely and generically useful. If you don't understand your tools, you're going to misuse them, and this hype cycle is the result.
T in AI stands for tool.
It is it because it's actually shit but it's the best thing we've seen yet and everyone is just in denial?
Most normal people look at AI like ChatGPT as an amazing tool and have used it effectively as a replacement for Google, Grammarly etc. And for them it's fine because any mistakes are localised to them.
The problem are those building products on LLMs e.g. Legal, Customer Service who are knowingly misrepresenting the capabilities of what it can do to companies who don't know any better. And I would argue this is fraudulent and where we will see most of the problems.
So usually it ends up in overfitting, but is more about having information at training time that it shouldn't.
My partner recently went a bit nuts writing an article with the help of GPT4. She was very proud of how productive she'd been until I asked if she'd actually searched for the papers GPT4 had referred to.
Of course, many of the referred to papers didn't exist...
Traditional pre-spam-overload Google results could often give a high quality answer, or if not, you’d at least get the sense of the low quality. Not so with LLMs.
Also are you suggesting people fact check an AI by asking it if it is correct? That seems absurd.
It's only a mistake from your perspective. The model just generates text based the probabilities it learned during training. In that respect, there is no such thing as "incorrect" output because the model doesn't operate at that level of abstraction.
I asked it "what is the PS3 game where the full version of To Kill a Mockingbird is in there?" and it responded back with "The Sabateour", when the correct answer would have been "The Darkness". That is incorrect by most definitions of the word, whether or not it's a consequence of the training model doesn't really change that.
I suppose we could get into details about epistemology and ontology about the nature of what an answer "is", but I think it's fair to say that "incorrect" is when it gives you something that doesn't accomplish the task you asked it to do, or rather when it tries to accomplish the task but what it gives you don't work.
If the context is that it's a tool, correct is defined as reality within the context of the use of that tool. If it's to find facts, it can be incorrect, since the context of a fact is reality. If it's writing a story, then "correct" would be based on continuity, etc.
If you're using it as a tool to generate words related to previous ones, then sure, it's always correct, but that's not probably not a useful tool for most people. But, being a next word predictor doesn't mean it can't also be a useful tool in real world contexts. There are, literally, billions of dollars being spent on pushing them to be more "correct" in more contexts, so it's a useful concept being considered, even though they're "just" next word predictors.
I’m struggling to think of any comparable technology where the regular median users understanding is both fundamentally wrong— and is being purposefully misinformed.
I’ve said from the outset that in academic settings you should be able to cite an AI as a writing assistant, it would clear up a lot of the confusion about its use. If you used it poorly it’s still on you, but at least there’s some transparency by which to judge the work.
It is a bit tricky though, there are definitely points that come up with GPT that people would not think of normally. So in that sense it is still distinguishable from writing solely by oneself, but I would argue the GPT-assisted essays are just better writing and more well-rounded.
Ask anyone below 30 if they can write cursive today, or know their times tables hehe. Ask them if they can derive a formula instead of using Mathematica.
Or ask a developer if they know how their pixel shaders work, or what’s going on under the hood of their favorite runtime, how hash tables work, or really anything. Previous generations did. When the complexity gets too high people just trust the machines I guess.
And no one actually knows what the LLM internals are anyway.
EVERY single "fact" was perverted by either mixing with another fact, or misrepresenting by replacing a word like "good" with "superb" or "fantastic" (I guess optimistic means lie-through-your-teeth?)
YMMV, but basically I achieved nothing except a waste of about 30mins and an honest, personal evaluation of the limits of GPT.
There are a couple issues I can see in that people may be unaware of how much the AI's hallucinate, but also there's a real probability that people will pick and choose what they like based on what sounds correct vs what is correct.
AI is a great tool, but it's also convincingly deceiving at times, so much so that many people are totally oblivious to it.
Keep in mind that there's no difference between what happens inside a model when it "hallucinates" vs. when it generates "correct" output. It's the exact same process.
I would imagine there's also a raft of people who will use it as a reason to give up on any search for truth.
I still do hold a lot of hope for their eventual capabilities, but I'm also pretty pessimistic on what the direct and Nth order social effects will be.
I was trying to use it as a research tool and it hallucinated 95% of the references I asked for (not a made up percentage, I counted)
Ironically the one real source turned out to be quite useful.
I think oil companies are the greatest existential threat to humanity via lobbying and eventually climate destruction. Second is social media companies. It’s so easy to spread misinformation against a whole populace and it’s going south fast.
There is just too much incentive for those vested in these corporations to just stop.
This feels like a thinly veiled jab at COVID era public health recommendations. Can you be more clear about which fields you’re referring to?
When our state of knowledge of the world changes , "domain experts" have the most to lose and our state of knowledge of the world is constantly changing.
Most domains also don't have the exactness of a programming language so are exposed to the same human processes as displayed in a middle school popularity contest.
The whole concept of the "domain expert" is really a modern superstition. An especially powerful superstition because it is the superstition of those who believe themselves beyond superstition.
I can think of domains where sensationalist opinions are lifted, but not ones where the general consensus is blatantly false. I can think of plenty of instances where large news organizations have grossly misrepresented conclusions of research.
> but who is Joe Sixpack supposed to turn to?
This, I agree with. It is why I actively voice dissent, as an expert and in areas where I have domain expertise, against so-called science communicators (not all are "so-called") and when the news gets it wrong.
Hell, I'll do this when actual science communicators get it wrong. Like when Niel DeGrassee Tyson is being dumb[0]. He also thinks hydrogen bombs don't have fallout...[1]. They do...
That said, I still don't think this is a reason to distrust scientists. But I think it is important for scientists to speak out when communicators get it wrong. I think this is a common problem and allows the conmen to gain power. But that's not the only force at play. Truth is complex. Approximate truth is bounded in complexity. But lies can be infinitely simple. So we get it wrong when we "reason our way through" something, because typically the base assumptions are wrong. This makes many conmen truly believe the lies that they are selling.
Joe Sixpack can reason through that. But Joe Sixpack can also reason through the concept that if he was easily able to reason through something and that experts disagree, it's pretty likely there's a reason why other than them being dumb and <Joe Sixpack> knowing better. Can, but doesn't. And we as the public let that happen. This may seem like an insurmountable problem, but instead it is a problem which just needs sufficient effort. Momentum builds, so the more people that push against this, the more common it'll become. And to be clear, it is perfectly fine to question experts. It is not perfectly fine to confidently disagree while not actually understanding the topic. If you don't know the difference, read a few papers/works in the topic and see if you can understand 90+% of it (if it is CS or Engineering, see if you can replicate).
This makes you consistent. I have no problem with this.
> I didn’t have to read the Harry Potter fanfic to know ad hominems are poor arguments
I mostly agree. But my point is that Eliezer Yudkowsky doesn't actually have the qualifications. I want to be clear that academic degrees aren't necessary to qualify someone, just like they aren't necessary to qualify someone as a good programmer. But it is generally harder and the foundation is shakier. In this case, most of his arguments are founded on incorrect assumptions. They are often logical, but it doesn't matter if something is logical if the premise is incorrect.
LLMs remove that ability to be discerning about what to trust.
More broadly, the meaning and usage of specific words are important for these products because they shape how people perceive their utility.
If a thing isn’t “correct” because it has no sense of understanding, and therefore is only “correct” due to projection by the user, then that’s a super important distinction.
I'm not saying there aren't a few possible mistakes that do become less likely if we switch to "curve fitting, but I suspect that it does not matter much either way on the most serious mistakes.
Let's say a company creates an automated system based on a curve fitting algorithm. Then things go wrong. Now it is quite easy to say the company is responsible for any damage and must pay for the rectification.
When we say an AI is deployed and things go wrong, we have a sci-fi movie and responsibility is somehow magically moved away from the company that deployed the algorithm.
To me it feels that "AI" as a term is a clever marketing term that companies will use to deflect responsibility. And I think it is one of the reasons why Open AI, Musk and others pushed this AI safety non-sense.
The aim of calling it "curve fitting" or something similar would be to take the magic out of it so the broader public doesn't get confused. I think that's worthwhile.
Surely when google gives bad results, it's "the same process" as when it gives good results. And when a book gives wrong information, it's the exact same kind of ink as correct information.
I was just trying to explain to the guy above what I think the guy above that meant.
In general, I think it's a good example of the kind of social problem tech can make a lot worse but no better: when a society has lost its grasp of epistemology, multiplying the amount of information available, at a net decrease in quality and reliability, merely multiplies the scope of the problem.
Since we do not possess a definition of intelligence that isn't human-like, it would be meaningless to argue if LLMs are intelligent in general. All that can be said is that they are not intelligent in the way that humans are.
For example: if one asks a question using street slang, the answer generated will be generated from training data about your subject, but from online sources that used street slang in their conversation about that issue. Likewise, if you use ordinary language for your question, the generated response will be from ordinary language conversations of your topic. However, if your question concerns any type of formalized knowledge, by asking your question using the formal language of experts in that topic, then the generated AI answer will come from training data that used this same formal expert terms, and are most likely to be correct, because they come from discussions of that subject’s matter experts.
Plus, don't use LLMs for fact retrieval, use them as strategy guides. They really excel as strategy advisors.
In my rush to be a smartass, I did miss that you also asked what definition of intelligence they were working with though, so I suppose I didn't really add anything besides unnecessary snark (-‿-")
You believe you "asked it do something," but that's just you anthropomorphizing the model and your interaction with it. Of course the AI companies encourage that perspective, but it's a factually dubious one at best.
Judging whether a model's output is "correct" involves you imposing an external context on both the prompt and response that the model typically doesn't have access to. It also typically has no ability to test its responses.
This is part of why good prompt engineering can be so important - because what you get out is a function of what you put in, and pretending that the model is a question-answering oracle only takes you so far.
Of course what the AI companies are trying to do is train and prompt the models in such a way that their output is considered "correct" from a user's perspective more often than not. In an interaction with an AI company's salespeople, you might argue about "correctness". But that's not going to help understand what's actually going on.
I passed it input in the serialized form known as "English text". I expected a response also in serialized English that I can then decode in my brain to something that comports with reality. If I requested from a web server some JSON giving me my bank balance, and the balance it gave me is not accurately reflecting reality, it's not anthropomorphizing anything to say that it's incorrect, any more than pinging Nginx is.
And to be clear, we can wax philosophical all you want about "correctness", but that's really sidestepping the point: I don't care why it's giving me wrong information.
In my bank example, does it really matter, for the end user, if it's because of some integer overflow error or if it's a null pointer there's just a special `if` statement saying that antonvs account should always print out a different number for your balance.
I think nearly everyone would say that that's incorrect, and it actually wouldn't be clever or insightful for someone to say "no that's just a result of how the computer was programmed! You're imposing a human understanding of correctness on your bank balance!"
Exactly, you expected it, but that doesn't change what's actually happening. The model doesn't know what you expect. It can't read your mind. The best it can do is infer some things, such as that English input should produce English output - and the models are indeed pretty good at that!
> to something that comports with reality.
This is a rather unrealistic expectation in general, when you examine it. You raised a good example with which to do that, though:
> it actually wouldn't be clever or insightful for someone to say "no that's just a result of how the computer was programmed! You're imposing a human understanding of correctness on your bank balance!"
You're right, it wouldn't, because that's a very different situation which helps illustrate the point. The code for the bank app has been written to match your notion of correctness. That's only possible because it has a narrowly defined, specific purpose. It has all the necessary information needed to produce a correct response. The acceptance criteria are clear, including validation and integrity checks on the response. As a result, your expectations should be satisfied, and if they aren't, it makes sense to say that the bank app is not correct.
None of that applies to the AI models we're discussing. An LLM or image model doesn't have a narrowly defined, specific purpose. It can't possibly have access to all the information it needs to "answer" any possible "question" "correctly". It can't possibly have access to acceptance criteria specific to a question unless they're provided explicitly and in detail as part of a prompt - again, underscoring the importance of prompt engineering. And its ability to validate responses - check whether they "comport with reality" - is very limited, at least currently.
An example that's closer to the situation with an AI model would be a tool like a hammer. If you hold a hammer by its head and try to hammer in a nail with its handle, is the hammer "incorrect" when it fails at the task you have "asked" it to do?
> I don't care why it's giving me wrong information.
Just as with the hammer, if you want to be able to use these tools effectively, you should care why.
> I suppose we could get into details about epistemology and ontology about the nature of what an answer "is"
The machine has no concept of an 'answer'; when people call these things autocomplete on steroids, they're not really being that inaccurate.
Literally no one here disputes that the it’s a glorified autocomplete, but that is completely irrelevant to if it correctly answered a question.
I find this kind of pedantry extremely annoying because it sounds insightful without actually saying anything. Like, no shit, it’s just doing what its algorithm dictates, no one, and I mean no one disputed that. The question of correctness and incorrectness falls into “how accurately did it respond to my query?”
It's a not so thinly veiled critique of Eliezer Yudkowsky.
> Maybe you are distinguishing between doomers vs broader AI Safety proponents,
I do. These are different classes of people. But many doomers mascaraed as AI Safety proponents. Just as many conmen mascarade as ML/AI researchers. I suspect distinguishing the groups is quite difficult for those without domain expertise.
> most people on the CAIS letter (https://www.safe.ai/work/statement-on-ai-risk)
I don't care about the opinion of most of these people (there are some I VERY much do), nor do I think this is a meaningful letter.
Interfacing with a model does not endow one with any level of expertise. If this were true, the whole thread would be ill founded because people using GPT are interfacing with it. Instead, one needs to actually deeply study these models. There are things we know about them, and quite a lot. The term "blackbox" gets thrown around a lot, but that doesn't make everyone's expertise on the matter equally valid. In fact, the more complex something is to understand suggests the fewer number of people are qualified to have a reasonable opinion on the matter. My complaint is we often act as if the opposite is true.[0]
My second big problem with the CAIS letter is it means nothing. All it says is "I don't want to kill all humans." This is a fairly universally agreed upon statement and is in fact the default statement. It does not say anything about the potential risk. That's a completely different matter.
Worse, many of the people who have signed this are literally at the helm of the ships steering us into a dystopian future (which is not covered by this toothless letter). So I'm not sure what meaning this is supposed to have other than pageantry. Do not forget that these are the same exact people pushing and promoting abuse of these tools. I do not blame Average Joe for thinking that GPT is equivalent to Google (which itself cannot be trusted at face value, but this does not make it a useless tool) when that is often the way that it is promoted/advertised. So if you are concerned, I wouldn't use this as evidence.
[0] There's an added problem that you can become above average in any given subject relatively quickly. This is a double edged sword because knowledge is valuable but it often results in one being over confident. And the learning difficulty grows exponentially, which is why there are so few experts in any given subject matter. Because expertise is understanding nuance and complexity. The great irony of the doomers is that they fall back on "unknown unknowns" while not putting effort towards putting a bound on that.
Let’s look at regular intelligence. Stuff in human brains.
If I ask a human a question like “how many sides does a square have?” and that human says “three”, that’s an objectively wrong answer, and I would say that that person is “wrong”.
The human brain also does not have a narrowly defined specialized purpose and can do a lot of things and yet we are perfectly ok still saying “true” or “false” to the truth values being asserted.
A truth proposition can have a “correct” or “incorrect” value. If a human gives me the wrong bank balance because they misremembered it, it’s still wrong regardless of the actual intelligence associated with it.
So again, I don’t think this kind of pedantry is actually useful because you’re not actually saying anything; you’re saying “these models aren’t trained on being correct, they’re on word patterns”, which was never disputed, but that is orthogonal to the truth value.