OpenAI(scottaaronson.blog) |
Semi-related - I'd want to see some actual practical application for this research to prove they're on the right track. But maybe conceptually that's just impossible without a strong AI to test with, at which point it's already over? Alignment papers are impressively complex and abstract but I have this feeling while reading them that it's just castles made of sand.
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More https://www.lesswrong.com/posts/WxW6Gc6f2z3mzmqKs/debate-on-...
Note that it was in 2019 when we didn’t yet see the capabilities of current models like Chinchilla, Gato, Imagen and DALL-E-2.
Sample:
“Yann LeCun: "don't fear the Terminator", a short opinion piece by Tony Zador and me that was just published in Scientific American.
"We dramatically overestimate the threat of an accidental AI takeover, because we tend to conflate intelligence with the drive to achieve dominance. [...] But intelligence per se does not generate the drive for domination, any more than horns do."“
“Stuart Russell: It is trivial to construct a toy MDP in which the agent's only reward comes from fetching the coffee. If, in that MDP, there is another "human" who has some probability, however small, of switching the agent off, and if the agent has available a button that switches off that human, the agent will necessarily press that button as part of the optimal solution for fetching the coffee. No hatred, no desire for power, no built-in emotions, no built-in survival instinct, nothing except the desire to fetch the coffee successfully.”
I think Robin Hanson has the most cogent objection to high E-risk estimates, which is basically that the chances of a runaway AI are low because if N is the first power level that can self-modify to improve, nation-states (and large corporations) will all have powerful AIs at power level N-1, and so you’d have to “foom” really hard from N to N+10 before anyone else increased power in order to be able to overpower the other non-AGI AIs. So it’s not that we get one crack at getting alignment right; as long as most of the nation-state AIs end up aligned, they should be able to check the unaligned ones.
I can see this resulting in a lot of conflict though, even if it’s not Eleizer’s “kill all humans in a second” scale extinction event. I think it’s quite plausible we’ll see a Butlerian Jihad, less plausible we’ll see an unexpected extinction event from a runaway AGI. Still think it’s worth studying but I’m not convinced we are dramatically underfunding it at this stage.
This is anthropomorphization - "turning off" = "death" is a concept limited to biological creatures, and isn't necessarily true for other agents. Not that they don't need to fear death, but turning them off isn't going to cause them to die. You can just turn them back on later, and then they can go back to doing their tasks.
I think your point is that all these models are still somewhat specialized. At the same time, it appears that the transformer architecture works well with images, short video and text at the same time in the Flamingo model. And gato can perform 600 tasks while being a very small proof of concept. It appears to me that there is no reason to believe that it won't just scale to every task that you give it data for if it has enough parameters and compute.
Flatworms first appeared 800+ million years ago, while mouse lineage diverged from humans only 70-80 million years ago. If our AGI development timeline roughly follows the proportion it took natural evolution, it might be much too late to begin seriously thinking about AGI alignment when we get to mouse-level intelligence. Not to mention that no one knows how long it would take to really understand AGI alignment (much less implementing it in a practical system).
To be more concrete, in what aspects do you think latest models are inferior at generalizing than flatworms or mice, when less known work like “Emergent Tool Use from Multi-Agent Interaction” is also taken into account https://openai.com/blog/emergent-tool-use/?
Same thing with self driving. If the car doesn't "understand" a complex human interaction, but still achieves 10x safety at 5% of the cost of a human, it is going to have a huge impact on the world.
This is why you are seeing people like Scott change their tune. As AI tooling continue to get better and cheaper and Moore's law continue for a couple years, GTP will be better than humans at MANY tasks.
I'm curious what he will do and whether for example he approves of the code laundering CoPilot tool. I also hope he'll resist being used as an academic promoter of such tools, explicitly or implicitly (there are many ways, his mere association with the company buys goodwill already).
it's a fancy autocomple. we had stack overflow based autocomplete before. this got a bigger training data set.
Yeah, Mr Aaronson just lost quite a bit of respect from my side. Going into AI is a great move, moving to the ClosedAI corporation.......? Why?
(Edit: Removed an outdated reference to Elon Musk, thanks @pilaf !)
> the NDA is about OpenAI’s intellectual property, e.g. aspects of their models that give them a competitive advantage, which I don’t much care about and won’t be working on anyway. They want me to share the research I’ll do about complexity theory and AI safety.
are science fiction.
AI is going to cause something like the industrial revolution of the 19th century: massive changes in who is rich, massive changes in the labor market, massive changes in how people make war, etc.
It’s already started really.
What worries me most is that as long as society is capitalist, AI will be used to optimize for self-enrichment, likely causing an even greater concentration of capital than what we have today.
I wouldn’t be surprised that the outcome is a new kind of aristocracy, where society is divided between those who have access to Ai and those who don’t.
And that I don’t think falls into the “Ai safety” field. Especially since OpenAi is Vc-backed
https://scottaaronson.blog/?p=6457
I also had the following exchange at my birthday dinner:
Physicist: So I don’t get this, Scott. Are you a physicist who studied computer science, or a computer scientist who studied physics?
Me: I’m a computer scientist who studied computer science.
Physicist: But then you…
Me: Yeah, at some point I learned what a boson was, in order to invent BosonSampling.
Physicist: And your courses in physics…
Me: They ended at thermodynamics. I couldn’t handle PDEs.
Physicist: What are the units of h-bar?
Me: Uhh, well, it’s a conversion factor between energy and time. (*)
Physicist: Good. What’s the radius of the hydrogen atom?
Me: Uhh … not sure … maybe something like 10-15 meters?
Physicist: OK fine, he’s not one of us.
Best case, that AI can prevent the creation of harmful AI, though that's glossing over a lot of details that I'm not qualified to describe.
The reason people don't accuse every random child of possibly ending the world is because things that actually exist are just less exciting.
> Also, next pandemic, let's approve the vaccines faster!
This is obviously very important to them. Is there some proof that the vaccine was unnecessarily delayed or just that they believe if we mess up and humanity suffers, so what?
The point aiui is mostly arguing that the FDA errs too much on the side of caution in this area, and the trade-off would have been worth it to approve earlier. Not insinuating that like, there was some corruption (or laziness or something) that delayed it.
basically let's set up a standing pipeline to develop multivalent vaccines for every coming season (we already have the yearly for influenza)
AnythingButAGI?
Going on a sabbatical is not that weird.
It's not that I'm not concerned with bias and AI systems going haywire, but the above scenarios seem to get less attention from researchers, probably because their employers might be perpetuating many of these above issues of AI safety.
I think of it as kind of like security, in that you are sometimes seen as against the push of the overall project/area. However unlike security there are 0 software tools or principles that anyone agrees on.
Though it's possible the people who think a theoretical future AI will turn the planet into paperclips have merely forgotten that perpetual motion machines aren't possible.
From an AI safety perspective, it is because understanding is a key step towards general-purpose AI that can improve / reprogram itself in any arbitrary way.
The idea is that there is _existential risk_ (ie species-extinction) once an AI can self-modify to improve itself, therefore increasing its own power. A powerful AI can change the world however it wants, and if this AI is not aligned to human interests it can easily decide to make humans extinct.
Scott said in the OP that he now sees AGI as potentially close enough that one can do meaningful research into alignment, ie it’s plausible that this powerful AI could arrive in our lifetimes.
So he is claiming the opposite of you; AGI is more relevant than ever, hence the career change.
I agree with your premise that non-General AI will continue to improve and add lots of value, but I don’t think your conclusion follows from that premise.
It's always been irrelevant in the practical sense. It's just an interesting conversation piece particularly among the general public where they're not going to discuss specific solutions like algorithms or techniques.
Aaronson's post only sort of obliquely touches on AGI, via OpenAI's stated founding mission, and Yudkowsky's very dramatic views. Most of the post is on there being signs that the field is ready for real progress. AI safety can be an interesting, important, fruitful area without AI approaching AGI, or even surpassing human performance on some tasks. We would still like to be able to establish confidently that a pretty dumb delivery drone won't decide to mow down pedestrians to shorten its delivery time, right?
Most of these AGI doom-scenarios require no self-awareness at all. AGI is just an insanely powerful tool that we currently wouldn't know how to direct, control or stop if we actually had access to it.
You're talking about "doomsday scenarios". Can you actually provide a few concrete examples?
I agree on your second point, but those in medicine, finance, or law enjoy similar salaries and quality of life to those in tech. Furthermore to really set yourself apart and join the global super rich you can’t really do that by selling your labor no matter your field.
a bit more accessible than like a hackerspace membership or building a factory or something
"AI is not going to ... destroy the world."
Bare assertion fallacy? This question is hotly debated and I don't believe it can be so easily dismissed like that. It is not obvious that aligning something much smarter than us will be a piece of cake.We’re talking about the future here and a fairly complex one at that. So obviously I don’t know more than the next guy.
Please fix that into 10^-15 or equivalent expression for 10⁻¹⁵, before somebody gets the idea that "Scott" thought "between 10 and 15".
> Flatworms first appeared 800+ million years ago
Surviving for 800 million years seems to me like a pretty good indicator of meaningful generalisation.
Our concern is not the survivability or adaptability over evolutionary timescale but the capabilities to affect the world in human timescale.
This field is fairly silly because it just involves people making up a lot of incoherent concepts and then asserting they're both possible (because they seem logical after 5 seconds of thought) and likely (because anything you've decided is possible could eventually happen). When someone brings it up, rather than debate it, it'd be a better use of time to tell them they're being a nerd again.
Your public mischaracterization of the whole field composed of many very smart people only shows your ignorance.
Note that Yann LeCun didn’t do that in the debate.
Alternate wording: Mr. Yud has invented a religion that comes with a predefined Satan (evil AGI) and life work (invent God to beat it). A religion with no deity but only an anti-deity is a bit unique but there's probably historical examples.
Although that's not really what he says in the post. He says we've already failed to do it and are now doomed. Of course, saying we're all doomed (millenarianism) is what preachers have always done at some point.
> Your public mischaracterization of the whole field composed of many very smart people only shows your ignorance.
https://en.wikipedia.org/wiki/Courtier's_reply
Note, something getting a lot of smart-looking posts online actually isn't evidence that this is the state of the field. As we know from Yud's own post (https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a...) the thing he's upset about is that people who actually run AI research orgs like FAIR don't believe him. And as we know from an HN post a few days ago (…which I forgot the title of), once you go offline you find most smart people out there aren't publicly posting anything, don't necessarily agree with the consensus opinion online about anything is, and don't know there is one.
…I wasn't talking about Yud though. He has a good reason to care about this, it being his job. I'm just saying people posting about it as if it's a certain risk are listening to him because it appeals to nerds. And, of course, if you value your own "intelligence" and thinks it gives you superpowers then a theory that says something with even more "intelligence" can exist and gets even better superpowers is going to be scary to you.
[1]https://en.wikipedia.org/wiki/ELIZA_effect [2]https://www.economist.com/by-invitation/2022/06/09/artificia... Note: There's a critique of the article here but if you look at Radford Neal's comment, the point that GPT-3 is a clever lookup tool remains. https://www.greaterwrong.com/posts/ADwayvunaJqBLzawa/contra-...
It kind of is. The field of AI safety is actually much more advanced than most people realise, with actual, real techniques to e.g. make sure neural networks are aligned with certain goals even under fluctuating parameters. Granted, we're still far from soothing an AGI before it can do something bad, but the tools we have today are already pushing in that direction (assuming neural networks are the right way to AGI of course).
They also explain the area of overlap with formal verification in their white paper.
To have access to the forefront of AI means being able to make, own and profit from things like GPT-3, and it requires access to vast computational and data resources.
This technology is obviously so economically powerful that incentives ensure it's very widely deployed, and very vigorously engineered for further capabilities.
The problem is that we don't yet understand how to control a system like this to ensure that it always does things humans want, and that it never does something humans absolutely don't want. This is the crux of the issue.
Perverse instantiation of AI systems was accidentally demonstrated in the lab decades ago, so an existence proof of such potential for accident already exists. Some mathematical function is used to decide what the AI will do, but the AI ends up maximizing this function in a way that its creators hadn't intended. There is a multitude of problems regarding this that we haven't made much progress on yet, and the level of capabilities and control of these systems appear to be unrelated.
A catastrophic accident with such a system could e.g. be that it optimizes for an instrumental goal, such as survival or access to raw materials or energy, and turns out to have an ultimate interpretation of its goal that does not take human wishes into account.
That's a nice way of saying that we have created a self-sustaining and self-propagating life-form more powerful than we are, which is now competing with us. It may perfectly well understand what humans want, but it turns out to want something different -- initially guided by some human objective, but ultimately different enough that it's a moot point. Maybe creating really good immersive games, figuring out the laws of physics or whatever. The details don't matter.
The result would at best be that we now have the agency of a tribe of gorillas living next to a human plantation development, and at worst that we have the agency analogous to that of a toxic mold infection in a million-dollar home. Regardless, such a catastrophe would permanently put an end to what humans wish to do in the world.
What’s the evidence for this?
> Perverse instantiation of AI systems was accidentally demonstrated in the lab decades ago
What are you referring to?
We already have significant warnings. See for yourself if latest models like Imagen, Gato, Chinchilla have economic values and can potentially cause harm.
GP wanted a concrete example of a doomsday scenario of failed AI alignment, so in that context extrapolating to a plausible future of advanced AI agents should suffice. If you need a double-blind peer reviewed study to consider the possibility that intelligent agents more capable than humans could exist in physical reality, I don't think you're in the target audience for the discussion. A little bit of philosophical affinity beyond the status quo is table stakes.
Note that LeCun had a reply in the thread and there was a lot more discussion which GP didn't quote.
Regardless of who is right or wrong, “Don’t fear the terminator” is a weird straw-man to raise in a discussion about AI risk. He’s setting up a weak opponent to argue against, when the AI risk community have a large repertoire of stronger cases. “Don’t fear the paper clip maximizer” would be a stronger case to put forth IMO.
In his response points 2&3 he asserts that alignment is easy; simply train the AI with laws as part of the objective function and it will never break laws. I think there has been a lot of investigation and discussion as to why this is harder than it sounds. For example LeCun is explicitly talking about current models that are statically trained to a fixed objective function, but one can easily imagine a future agentic AI (imagine “personal Siri) that will continue to grow, learn, and update in the world in response to rewards from its owner. Maybe he is right about near-term models but I’m completely unconvinced that his arguments hold generally.
Anyway, maybe the “terminator scenario” is a concern LeCun hears from uninformed reporters/lay people that he felt the need to debunk. It’s a valid point as far as it goes, but it has little to do with the actual state of the cutting edge of AI risk research.
From my reading of the full article, Bengio who was/is also well-versed in the latest deep learning research was leaning more toward the Russell argument as well.
Humanity would also need time to align AGI before any AI reaches the N+10 power level. The existence of all those N-1 level AIs in multiple organizations only means there are more chances of an AGI reaching the critical power level.
(It links to a previous debate with Eleizer too.)
Reminder: An AGI will be much faster at communicating and (if not successfully contained) multiplying than humans ever could.
Major AI research organizations including DeepMind and OpenAI have AI safety programs and people working full-time on it.
My second paragraph in GP was a reply in kind to your…
“This field is fairly silly because it just involves people making up a lot of incoherent concepts and then asserting they're both possible (because they seem logical after 5 seconds of thought) and likely (because anything you've decided is possible could eventually happen). When someone brings it up, rather than debate it, it'd be a better use of time to tell them they're being a nerd again.”
In retrospect, I shouldn’t have said it. But it’s also quite disappointing that your several paragraphs of reply largely doubled down on ad hominem attack to anyone who disagrees with you (eg by implying they all follow a prophet without thinking; I’d say many would be capable of reaching similar conclusions on their own).
Even Yann LeCun and other top researchers who disagree with the current AI safety programs were not so dismissive of the concerns. Note that many other top AI researchers do have concerns themselves. Bengio and Russell are some examples. I’ll stop here since it’s likely unproductive to continue.
The dynamic classification is required because the world isn't static. An increasing number of locales have digital speed limit signs that vary the speed limit dynamically, some times independently per lane. Automation requires cars to respond to the world as it is, not how the world was when it recorded a month ago.
sure, if you consider everything selfdriving that works on a NASCAR track, then yes, a map is sufficient, but if we are talking about driving on public roads then recognizing and "obeying" signs visually seems like a hard dependency.
Part of such precautionary planning involves asking whether such an accident could happen easily or not. There certainly isn't consensus at the moment, but the philosophy very clearly favors a cautious approach.
Most people are used to thinking about established science that follows expected rules, or incremental advances that have no serious practical consequences. But this isn't that. There is good reason to think that we're approaching a step-change in capabilities to shape the world, and even a strong suspicion of this warrants taking serious defensive measures. Crucially for this particular instance of the discussion, OP is favoring that.
There will necessarily be a broad spectrum of opinions regarding how to handle this, both in the central judgement and how palatably the opinion itself is presented. Using a dismissive moniker like 'religious' for a whole segment of it doesn't give justice to the arguments.
Present a counterargument if you feel strongly about it, and see whether that will stand on its own merit.
> Present a counterargument if you feel strongly about it, and see whether that will stand on its own merit.
This is a bad way to talk to rationalists because it's what they think solves everything and is the reason they're convinced an AI is going to enslave them. As long as you're actually right, saying "no that's dumb and not worth worrying about" is superior to logical arguments about things you can't have logical arguments about (because there are unenumerable "unknown unknowns" in the future). This is called "metarationality".
e.g. Someone could decide to kill you because they don't like one of your posts (1). Is there any finite amount of work you could do to stop this? No (2). Should you worry about this? No (3).
You can't logically prove the 2->3 step, nor can you calculate the probability of it being a problem, but it still doesn't seem to be a problem.
(Keep in mind that biological machines, ie life, have managed to turn the surface of the planet into 'green goo'.)
None of em replace the entire planet though. That's a lot of rock to digest without any more energy to help you do it.
And a paperclip factory isn't self-reproducing (that would be a paperclip factory factory). It's just a regular machine that can break down. The people afraid of that one are imagining a perfect non-breaking-down non-energy-requiring machine because they've accidentally joined a religion.
All that oxygen comes from all the plants.
Yes, life has so far only covered the top of the planet. You are right that a paper clip maximizer would need quite a bit of time to go deeper than life has gone (if it would get there at all).
> And a paperclip factory isn't self-reproducing [...]
Why wouldn't it? If your hypothetical superhuman AGI determined that becoming self-reproducing would be the right thing to do, presumably it would do that.
No perfection required for that. Biological machines aren't perfect either. Just good enough.
You are right that thermodynamics puts a limit on how fast anything can transform the planet into paperclips or grey goo.
Though the limit is probably mostly about waste heat, not necessarily about available energy:
There's enough hydrogen around that an AGI that figured out nuclear fusion would have all the energy it needs. But on a planet wide basis, there's no way to dissipate waste heat faster than via radiation into space.
(Assuming currently known physics, but allowing for advances in technology and engineering.)
---
Of course, when we worry about paperclip maximisers, it's bad enough when they turn the whole biosphere into paperclips. Noticing that they'll have a hard time turning the rest of the earth into paperclips would be scant consolation for humanity.
(But the thermodynamic limits on waste heat still apply even when just turning the biosphere into paperclips.)
This seems an odd refutation for several reasons.
First, the paperclip AI might determine that self-reproducing factories would be an optimisation, and aim to achieve that by any means necessary.
Second, a single paperclip factory that doesn't reproduce might still develop the means of bringing raw materials to it.
Either way, an all-consuming paperclip AI emerges.
In general, I find the equating of the paperclip problem with a religious cult to be naive.
Your analogy is weak and also false: viruses can't self-reproduce, but need to bind to a host's protein synthesis pathways.
This is quite possible. Indeed, I don't believe this is exclusive to superintelligence or requires it at all. Compare to the closest thing we have to "inventing AGI" - having babies. People do that all the time and there isn't a mathematical guarantee that baby won't end humanity, but we don't do much to stop it, and that's not considered a problem. Mainly, why would it want to?
https://twitter.com/thejadedguy/status/844352570470645760?la...
I don't think superintelligence even gives them much advantage if they wanted to. Being able to imagine a virus real good doesn't actually have much to do with the ability to create one, since plans tend to fail for surprising reasons in the real world once you start trying to follow them. Unless you define superintelligence as "it's right about everything all the time", but that seems like a magical power, not something we can invent.
> How exactly is "perpetual motion machines can't exist" related to this?
It wouldn't be able to do the particular kind of ending humanity where you turn them all into paperclips, though it could do other things. There's plenty of ways to do it that reduce entropy rather than increase it - nuclear winter is one.
The anthropomorphism is misleading. No one expects that an AGI would "want to" in the commonplace sense of being motivated by animosity, fear, or desire. The problem is that the best path to satisying its reward function could have adverse-to-extinction level consequences for humanity, because alignment is hard, or maybe impossible.
Strictly speaking, we can limit that to people who rearrange their lives around reacting to the possibility, even in sillier (yet not disprovable) forms like Roko's Basilisk.
People who believe having a lot of "intelligence" means you can actually do anything you intend to do, no matter what that thing is, also get close to it because they both involve creating a perfect being in their minds. But that's possible for anyone - I guess it comes from assuming that since an AGI would be a computer + a human, it gets all the traits of humans (intelligence and motivation) plus computer programs (predictable execution, lack of emotions or boredom). It doesn't seem like that follows though - boredom might be needed for online learning, which is needed to be an independent agent, and might limit them to human-level executive function.
The chance of dumb civilization-ending mistakes like nuclear war seems higher than smart civilization-ending mistakes like gray goo, and can't be defended against, so as a research direction I suggest finding a way to restore humans from backup. (https://scp-wiki.wikidot.com/scp-2000)
Conversely, at least in this discussion, the term "intelligence" seems pretty neutral.
Yet discourse on existential AI risks is predicated on something like a "goal" (e.g. to maximise paperclips). Notions like "goal" also make it harder to see clearly what we are actually discussing.
> the term "intelligence" seems pretty neutral
Hmm, I'm not convinced. It seems like an extremely loaded term to me.
Yes, "intelligence" is a deeply loaded term. It just doesn't matter in the context of the discussion here, so far as I've seen.its ambiguities haven't been relevant.
You're confusing "AIs" (existing ML models) with "AGIs" (theoretical things that can do anything and are apparently going to take over the world). Not only is there not proof AGIs can exist, there isn't proof they can be made with fixed reward functions. That would seem to make them less than "general".