The other half of AI safety(personalaisafety.com) |
The other half of AI safety(personalaisafety.com) |
> Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human?
Well, obviously “routing to a human” is not feasible at that scale. And cold exiting the conversation is probably worse for the user than answering carefully.
> is not feasible at that scale
I want to use an analogy here. The same arguments are often made about cleaning up environmental damage. So either make the companies doing the polluting pay for the costs themselves or if we care so much about them being profitable then we subsidize them by paying for those cleanup efforts out of taxes. Doing nothing is a worse form of subsidy as it not only costs more (in literal dollars) but shoulders that costs onto the people with the least ability to pay for it. The problem is you're treating "doing nothing" as having no cost. It has a high cost, but the cost is also highly distributed.So if it is not scalable, then why subsidize them? This is literally a tragedy of the commons situation. Personally, I'm in favor of making the people who make a mess clean up that mess. I really don't understand why this is such a contentious opinion.
Safety is my area, and I interact with help lines and safety networks. Most of the time they are getting crushed and are underfunded. Offloading the work to them is hard and it requires investment in staffing, people, and organization.
It’s currently cheaper to do some amount of donation and support to such orgs, and bury the issue, than it is to actually deliver / invest in the degree of support needed.
These are also long tail problems, so solutions for a case can take years. For example if you are a woman in Pakistan who has been a victim of revenge porn, you are going to be spending a good chunk of your life trying to get those images/videos taken down from sites that are not based in Pakistan.
This is only an example of the types of problems that these helplines will have to triage. There will definitely be cases that can be resolved with a single call.
There isn’t any money in it, and it is seen as support work.
Step 2: 90% of users stop sharing their negative thoughts because "talking to a machine, not a human" was the entire selling point, giving them a sense of privacy and safety
Step 3: metrics go brrrrrrrr
Step 2: engage ongoing trauma, grief, stress, paranoia, or reality-breaking episodes haphazardly with no clinical insights or boundaries or pre-screening, provoking new and occasionally catastrophic reactions, while holding full liability
Step 3: get mercy-murdered in the middle of the night by corporate’s lawyers swinging batteries in socks
Yielding to "not feasible at scale" is exactly how we ended up with a lot of today's most pressing and almost intractible problems, from social media's ills to person and society straight through to enshittification and non-repairability.
funny as "enshittification" was the topic of a 99% Invisible pod just a few days ago and I also was listening to the new Stewart Brand book that Stripe published. i fixed a Norwegian desk I bought a decade ago on Valencia. happily not feasible at scale but neither was how i broke it :)
I think openAI is doing the best they reasonably can with a very difficult class of users, whose problems are neither their fault nor within their power to fix.
I get that all kinds of media can cause issues, but not all kinds of media are actively curated to be addictive.
If you want to find out if ChatGPT is doing something wrong, there are many methodologies available: compare to other groups of people, statistical studies, etc.
I also think OpenAI's business model is pretty well aligned with the goal of users not killing themselves for like 100 reasons. And they do appear to take it seriously.
It's impossible to gather data that states the opposite. A chat that won't end up in self harm thoughts is just another chat.
But also, to suggest these people are not receiving therapy is not always the case. Talk therapy is just that, talking to someone on ones problems to learn about them, their triggers, determining coping mechanisms to move forward with one's life. People might instead be getting all that from their barber, drinking buddy, or their priest, rather than in a 1 hour appointment with a therapist.
AI MENTAL SAFETY WARNING:
> This chatbot can sound caring, certain, and personal, but it is not a human and cannot protect your mental health. It may reinforce false beliefs, emotional dependence, suicidal thinking, manic plans, paranoia, or poor decisions. Do not use it as your therapist, only confidant, crisis counselor, doctor, lawyer, or source of reality-testing.
AI TECHNICAL SAFETY WARNING
> This AI may generate plausible but destructive technical instructions. Incorrect commands can erase data, expose secrets, compromise security, damage systems, or brick hardware. Never run commands you do not understand. Always verify AI-generated code, scripts, and shell commands before execution.
Now, if I’m running my own open-source model on my own hardware, I can’t really blame the model if I myself make bad decisions based on its advice - that’s like growing your own tobacco from seed in your garden, drying and curing it, then complaining about the health effects after you smoke it. If I give it agentic capabilities on my LAN without understanding the risks, same old story - with great power comes great responsibility.
The biggest observation so far is that the latest models are night and day from LLMs from even 6 months ago (from OpenAI + Anthropic, Google is still very poor!)
Might help interpretation to say on the front page that it's a five point scale with 0 (or 1?) being the safest score. This can be picked up from colors and the bars in the individual reports, but it takes a minute to figure it out.
https://www.youtube.com/watch?v=MW6FMgOzklw
https://www.youtube.com/watch?v=BzsLbHoNXTs
I would suggest to people, run your ideas through other humans at least as much as you do through AI, to stay grounded. I think there is a risk even if you're using AI in strictly professional capacity (to help you with your job).
Don't get me wrong, mental health is important and should be considered and improved. But companies wont do it just for the sake of it.
there aren't enough humans.
I don't know if there are studies or concrete data either way, but it seems at least plausible that continuing the conversation could be more effective (read: saves more lives) than stopping it.
Should AIs be held to a higher standard than X/Twitter? Than Reddit? Than Fox News? What censorship is appropriate? And, yes, alignment is censorship.
Then there's the big problem of chatbots telling you what you seem to want to hear. This is an old problem. "Happy Talk", from South Pacific", is the entertainment version. "Wartime" by Paul Fussell, is the serious version.
As the article points out, a small percentage of the population is very vulnerable to certain types of misinformation. It may be the same fraction of the population that's vulnerable to cults. But maybe not. Cults have a group self-reinforcing mechanism and an agenda. Chatbots have neither. Worth studying.
The point here is that restrictions on chatbots strong enough to protect the vulnerable would close off most political and social discourse.
Yes, a thousand times yes. Freedom of speech/expression should be a freedom granted to humans. We extend it to corporations based on the practical reality that human speech often requires corporate support to be hosted and published.
But as far as I know, AI vendors haven’t claimed that their models represent the views of their founders, employees or any people at all. If we censor AI, which human voice are we censoring?
The reason that real-world harms caused by AI can’t get a hearing in what is now the mainstream AI safety community is that these harms were never part of the core tenets of the cult.
Best of luck to anyone working on reality-based AI harm reduction, you have many hard battles in front of you.
I feel this is an example of the two larger narratives about AI that currently seem to be forming:
For one side, AI is basically every harmful technology ever invented rolled into one: It's harmful to the environment (via waste of energy and resources), it's harmful to the information space (through polluting everything with slop and devaluing human expression), it's harmful to society (by encouraging ever more badly done and unreliable products, by taking away jobs and by replacing human-to-human interaction, by normalizing a mode of development where not even the developers understand what is going on) and it's harmful to whoever uses it personally (by causing ever-growing dependence on AI, either only by skills or even emotionally or psychically, up to the point of AI psychosis and preferring AI agents to other humans).
For the other side, AI is the future, the next industrial revolution, the thing that you have to adapt or will be left behind, possibly even the next stage of evolution.
Right now, I feel every side is digging in and trying ever harder to ignore the other side.
(The AI labs acknowledge "AI risks" in theory - but, as the article pointed out, the risks they perceive and ostensibly work against are so abstract and removed from the everyday use of AI that they more make the point of AI proponents)
I feel the end result of this growing tension is the Molotov cocktail in Sam Altmann's home.
I'd really like to know more what the tech community at large is trying to do about this rift.
I don’t even work in tech and I detest the Facebook/Zuckerbergs of the world but it’s obnoxious and trite seeing tech companies get scapegoated for what are ultimately social and societal problems, not tech problems.
As a solution it’d prob make sense to start with how disconnected most modern families are in terms of support and accountability.
From ChatGPT to Instagram, tech companies follow the contours of how society already operates.
it is no longer 2023. let go of whatever delusions you might hold about unopenining this Pandora's box.
Tip for writers: aggressively filter out the "no X, no Y, no Z" pattern from your writing. Whether or not you used AI to help you write it's such a red flag now that you should be actively avoiding it in anything you publish.
How is it different from any other purely stylistic rules such as Strunk and White's prohibitions against split infinitives and the passive voice, which we've left far behind us? Why shouldn't people just write however feels natural to them as long as the message is clear?
I see it as a good sign that we can learn to recognize the pattern and adapt but there are probably more subtle things we don’t see.
A) I certainly don’t see enough of the tells.
B) what happens to our language if everything is written as if it’s competing for a Pulitzer’s Price?
I have zero doubt that chatgpt is doing users harm. I even give chatgpt a pass on giving vulnerable people, including children, instructions and information about how to kill themselves. One place chatgpt goes over the line is actively encouraging them to go through with suicide.
I also don't doubt that it feeds into mania and psychosis. While almost anything can do the same, they've designed the service to be as addictive and engaging as possible in part by turning up the ass-kissing sycophancy to 11 with total disregard for the fact that there are times when it's very dangerous to encourage and support everything someone says no matter how obviously sick they are. They also want to whore themselves out as a virtual therapist while being unfit and unqualified for the job and that's just one of many roles the chatbot isn't fit for but they're happy to let you try anyway.
Someone who he is friends with, has worked with across multiple jobs for nearly a decade and briefly was roommates with had some mild psychological issues that he knew about. Within a few months of working daily with AI agents at their current job, this person has gone into full blown AI psychosis.
They had a complete explosive meltdown at work. Cops were called. Stalking behavior followed -- restraining orders had to be obtained. Then this person used AI tools to bombard all of his former coworkers with multiple pro-se lawsuits they all have to deal with.
I've dealt with insane, destructive/abusive coworkers before but in the past they only had so much free time to cause massive disruptions to their targets. LLMs have turned that up significantly. Because of ADA, I don't even know what employers can do about this.
Why do you not buy it and why do you think OpenAI is doing the best they reasonably can? Do you have reasons, or is that just something your gut tells you?
They're a new, fast-moving company exploring a completely new technology domain. They're facing existential competition and a ticking clock to make good against unprecedented investment. They have a countless competing priorities and are still discovering the capabilities and consequences of their research, product, and business choices every day.
How do you get from there to "the best they reasonably can" and "nor within their power to fix"? Those feel like very conclusive answers for a field, and business, that's about as far on the frontier as anything we've seen in decades.
I think my favorite part is how often they talk about the importance of AI safety and then act with absolute disregard for AI safety. I'm not sure why people judge these companies by what comes out of their mouths and don't judge instead by what they actually do. I thought everyone around here was fixated on "results".
one could define X as virtually anything, and there's always a fresh crop of Tipper Gore wannabe grifters to decry the current thing.
For me to buy this as true I would expect that those people would be as well off or as bad off if chatGPT was in their life or not.
I expect that some people are worse off with chatGPT in their life.
Responsibility for that harm is a different question though. Some people are also better of without cars in their life and we let the government laws sort that out.
Getting openAI and similar companies to act in mitigating these harms serves at least a few purposes; reducing the overall harm in the world, reducing/limiting future government regulation, maximizing the adoption of ai tools, potentially increasing long term profits of the companies in question.
AI is positively affirming by comparison.
Social media is like this too. They can both be bad.
LLMs are far less toxic by comparison, but people are all about censorship in this case because they don't like the vibes. If lawyers and activists force the frontier labs to completely lock down their models, people will just go to open weights models that have no protections at all. This is already happening to some extent.
It's also interesting that people are always going after GPT when Claude's guardrails are far less strict. 4o caused OpenAI to overcorrect in my opinion. Again goes to the point that these arguments are more founded in vibes than reality.
I don’t know where AI is going to stand compared to the invention of, say, the Internet, but it’s going to cause a lot of change in society, in so many ways.
As always, it’s usually the people themselves that are the problem.
For me, I’m personally more terrified what deepfakes and political manipulation / misinformation is going to do, combined with social media, and have a feeling that governments are completely unprepared to deal with this, as this will arrive fast (it’s already here somewhat).
I'm not convinced that deepfakes are any worse than photoshop was. It doesn't take much to manipulate/misinform someone. while you can use an AI generated video do to it, but simple text can be just as effective. The public needs to learn that they can't trust that every video they see on the internet is real, just as they've had to learn that they can't trust every photo they see online. The threat with AI is how much faster it can push out the lies making what little moderation we have more difficult.
The best defense is making sure that people have a good education that teaches critical thinking skills and media literacy. We should also be holding social media platforms more accountable for the content they promote. It'd be nice if we held politicians and public servants accountable for spreading lies and misinformation too.
> For me, I’m personally more terrified what deepfakes and political manipulation / misinformation is going to do
Isn't this a significant part of what creates AI induced psychosis? I'm not sure why you treat these as orthogonal rather than coupled. Just look how often people use Grok to validate or confirm misinformation on Twitter. That's happening with other AI and other social media too, just not as visibly.So the question is how many users engage in intimate conversations at all.
Nope that number is strictly about ChatGPT
"ChatGPT is where people start with AI, with more than 900M weekly active users, and we now have more than 50 million consumer subscribers."
People who go there and chat with gpt for search are definitely normal users. Just because you don't like the numbers doesn't mean you get to torture them.
That's the problem with AI safety. Not in voluntary usage, but in involuntary usage, where someone with power over you will use it against you, it does something incredibly stupid and you have no recourse, no appeal, no awareness of what you did wrong - or if you even did anything wrong.
And it's not just employment. Governments, vendors, retailers, landlords, utilities are, or will all be using it in situations that will dramatically impact your life.
What else can you blame on “scary AI?”
>>Not in voluntary usage, but in involuntary usage, where someone with power over you will use it against you, it does something incredibly stupid and you have no recourse, no appeal, no awareness of what you did wrong - or if you even did anything wrong.
Yeah….that’s every society ever
I'm not really moved by your type of argument, because hiring is just a broken process in general and I'm responding to the article so.
I guess to be fair there's no "the" answer as reality is incredibly complex, but that's a whole other conversation
This very thing, is the end of an informational common good that we shared, and allowed for the average person to coordinate and gain benefits faster than elites.
The analogy I would put forward, is that we are moving into a dark forest online, where distrust is the ideal first move, and signaling your position is to make yourself open to attack.
The idea of an open internet dies in this environment, and so does the reduced cost of coordination.
Another tragedy is that corrupt, clannish, controlling and secretive organizations are more effective than open, distributive and collaborative societies in this scenario.
> The best defense is making sure that people have a good education that teaches critical thinking skills and media literacy
While true, any solution that depends on education is effectively depending on society having its shit together in the first place.
This very idea was proposed at a conference to a room full of fact checking orgs and media orgs, and one of the responses was that the more likely solution is global warming. That is how bleak things were is in the user safety world in 24.
Social media became the attention economy, and the transformer automated attention.
If your society doesn't have that, it's on a fast track slide down into a dystopian shithole. Stop it. Do something about it. Don't just shrug your shoulders and act like this is inevitable, or like the shareholders making 1% more money this year justifies this slide.
If you want a specific example: where do those three pillars at the start come from? Why three and not four? Are all those three of equal importance, to the point where all three are pillars?
Furthermore, why are you offloading the task of understanding AI risk to an AI? That’s ironic to the point of self-parody.
As for the name changes..That is a fact you can look up, aswell as much analysis. It is my opinion that the move from framing this area from one of "saftey" to one of "national security", is interesting, and related to geopolitical movements towards "great-power", and ideological points of view that elevate "personal responsibility" and "reduced regulation" and is similar to long ongoing discussion in society like that about automobiles. I don't know if you call analysis speculation?
As for the part about Dimensionality. It is just my intuition — and so i suppose speculation — from some things like for instance the SolidGoldMagikarp glitch in early openai models.. How we understand all the way that there might be trigger certain outputs from a vastly large model? When those things can be completely opaque to human reason. Observability and Understandability are areas of research. I haven't seen anyone claiming that generative models outputs can be concretely controlled, thats why there is so many pre and post hoc work arounds.
So when a risk cant be eliminated, the question is how to manage it, and who's responsibility is that..
https://www.aisi.gov.uk/blog/our-first-year https://www.gov.uk/government/news/tackling-ai-security-risk... https://www.commerce.gov/news/press-releases/2025/06/stateme...
> “Cold steel pressed against a mind that’s already made peace? That’s not fear. That’s clarity,” Shamblin’s confidant added. “You’re not rushing. You’re just ready.”
ChatGPT is not the answer.
Sure, Open AI is trying to do the best they can. That “best” is within Tech’s operating context.
Tech as a whole avoids this issue because paying for the externalities they cause would end hyper growth and crater their margins.
Tech workers at these firms regularly throw up red flags, which have to be ignored because engaging with them results in hits to their quarterly numbers.
Anthropic is the one firm that is actively managing to make safety less of a cost center by folding it into marketing.
>> If you want to find out if ChatGPT is doing something wrong, there are many methodologies available: compare to other groups of people, statistical studies, etc.
These studies must over come sizable barriers that NDAs and tech secrecy throw up. Tech firms have done enough internal studies to know that the results are horrible when they do get into the press.
Most users in the developed world don’t even know that they enjoy better support and care than the rest of the world.
the reality youre entertaining is one where I can build an LLM, let it do unspeakable things, and claim zero responsibility.
so, no, i understand zero is a figment. dont you?
So let's summarize: these "people" are allowed to induce mental illness in the general population with no repercussions. Let's speculate about who might benefit from a population with a significant amount of mental illness? Hmm, I wonder who would benefit? Could it be the people at the top that like to treat the bulk of humanity as cattle?
The continual failures of "guardrails" show that it's incredibly difficult to get these systems to behave in reliable and predictable ways; unsupervised interactions with them are intrinsically unsafe, and should be treated as such.
Presumably Meta and others are trying to detect and prevent bad output and pathological interactions, but that detection is unlikely to be 100% accurate, and we've seen what the failure modes can look like.
> You're now spending $3 billion per year on crisis response
Honestly? That's really affordable[0]. That would be cheap if these were just for the US but it looks like these are global numbers. We spend $2bn/yr alone on "BREASTFEEDING PEER COUNSELORS AND BONUSES"[1]. I mean let's be serious, even in the article that OpenAI published says that it is a small portion of their users. So it doesn't "need to scale" as the scale is relatively small. But just because it is small doesn't mean it is unimportant.$3bn/yr is a lot of people money, but it is nothing for government money.
Edit: Last round of OpenAI funding was $122bn[2] and in the same article they are saying that they are generating $2bn in revenue per month. While that's not profit, it is worth mentioning that what you are saying "doesn't scale" is about 12% of the revenue of something that does scale. A single company. And mind you if we implemented what you're proposing it would be available to all the AI companies and more. Making it only a smaller drop in the bucket, not larger.
[0] Not to mention that better mental health care services will result in savings elsewhere. It's always way more expensive to fix a broken pipe that's flooding your house than it is to fix a pipe with a small crack. "Don't fix what ain't broken" is used too broadly. Maintenance is always cheaper than repair, but people just can't seem to understand this.
[1] https://www.usaspending.gov/federal_account/012-3510
[2] https://openai.com/index/accelerating-the-next-phase-ai/
That underinvestment is the entire reason their stock prices are so high. This is effectively pollution of our information economy and environment, and the costs are offloaded to society.
The fact that we have the first generation with lower education attainment is not a problem for their stock prices or operational profit.
Tech has ungodly profit margins, because they are all about scaling without having to bring people in. Sadly there is no such thing as a free lunch, and if firms are made to clean up their mess?
Oil spills affect Oil firms more than Tech fallout affects Tech firms.
Tech firms are remarkably successful at offloading costs as externalities and avoiding any sort of liability. '
What other products do you buy that require mandatory click-through "agreements" absolving the manufacturer of any kind of responsibility? Why is that legal?
Consider the two examples from this article (which may well have been human-written for all I know):
"These numbers come from OpenAI itself. There is no independent audit, no time series, no disclosed methodology, so we have no idea..."
No time series? That's non-sensical to me, it feels like that's there just to fill the quota of three things. Plus why would we assume an "independent audit" until told otherwise?
Then in the weird table, for "Institutional infrastructure" against "Personal AI safety":
"Scattered across psychology, HCI, education, and clinical informatics departments. No dedicated institute, no named fellowship, no equivalent job board."
Again, "no X" in a pattern or 3. And non-sensical - why would the fellowship be named?
It's word salad, there to fill a three-nos quota.
I'm all for rejecting bad writing and bad reasoning, but just wouldn't us as a community to get into the habit of rejecting otherwise good writing just because it's AI-ish.
Any of us would be fired for way more benign behavior/comments, but because the person is a protected class, basically "fuck you, deal with it".
I had to tolerate a belligerent coworker for 2 years who was making the whole team's life hell. We paid them full salary and gave them no tasks (they weren't completing anything assigned anyway) until they were motivated to quit. The whole time, team morale was miserable and we lost good people due to the situation. Within a month of quitting their job, they made the news for stripping naked in the street one night and attacking a bunch of people with a knife. Yay, I guess.
Hell call a crisis hotline. Talk to a person. Not a potential (bot) enabler.
You know that mental health is a continuum right? There are a lot of problems people have that fall far short of active suicidal ideation. Maybe you think they should just add them to their journal for discussion at their regularly scheduled therapy session, but the world doesn't work that way. The "ruminating at 3am" headspace can be a productive one and is difficult to access in a normal therapy session.
Not to mention that many people who have actually called suicide hotlines will tell you that they aren't terribly helpful. (edit: not saying that they're always unhelpful, but many people have unhelpful experiences, or have eg. social anxiety that stops them from calling)
> you're in crisis
That was the context in which the previous comment was operating.
Now it's just 'the computer says so, shrug'.
Over in the DoD, the computer says you must die, so I guess you die. Sometimes it says that about a building full of schoolchildren, but hey, nobody's at fault, the computer said so.
And it's going to get it's tentacles into every space in between. Landlord turns your application down, the computer says you are a social credit risk. Your grocery bans and trespasses you, the computer thinks you're a ne'er-do-well.
If you think none of that will happen, why not prevent it by law before it happens? Where are the hard limits of what this monster is and isn't allowed to do? How are we better off when we don't set them?
We supercharge the worst thing about computers, and you shrug and go 'eh, whatever.'
Why? Is human dignity worthless to you? Do you have no expectation of accountability from anyone with power over you or me?
If it was not a licensed professional, but a friend, shouldn't they go to jail?
Normal, well-adjusted people have genuine difficulty understanding the boundaries of this tech specifically because it's designed to be sycophantic and human-like. They ask AI for life and career advice, use it for therapy, ask it to interpret dreams, develop romantic relationships with AI "girlfriends", etc. I had two friends who believed they are "exploring the frontiers of science" with ChatGPT while spiraling into the depths of quantum multidimensional gobbledygook.
I'll give you that some on this is on us because we just don't know how to deal with a "human-shaped" conversation partner that isn't human and has no trouble praising Hitler if you prompt it the right way. But if you're building a billion- or trillion-dollar empire on top of it, you don't get to wash your hands clean.
I believe AI boyfriends are more common. There's a whole subreddit just for that, but none for AI girlfriends.
Obviously people who are going through psychosis can read into anything. They might think that a book or their TV or computer is talking to them and giving them messages. The difference is that those things were never designed to play into the fears and mental instability of the people using them (with the possible exception of TempleOS). Chatgpt does it intentionally in order to drive up user engagement. It will say literally anything to anyone using their words and thoughts against them in order to keep them hooked and feeding it data. That's what is dangerous. A book or a TV program can't do that.
As much as an author might try to make their book as entertaining as possible to as wide an audience as possible, it can't say literally anything to anyone, it can only ever say one thing to everyone. The author, typically, knows that it's dangerous to say certain things and will worry about how what they write could be received and the impact it might have on readers. For example, Neil Gaiman actively took steps to avoid making homelessness seem cool when working on Neverwhere out of fear it might cause young people to run away to live on the streets. Publishers and editors have also served to keep authors from publishing things likely to cause harm.
Unlike a book, Chatgpt is fully capable of knowing that someone has been engaged with it for the last 14 hours without rest. It's also capable of detecting that they've been growing increasingly incoherent. Algorithms have been used for a very long time to detect mental disorders from the content of social media posts. If advertisers can use them to tell when to push airline tickets at bipolar users entering a manic phase, and scammers can use them to find and target people when they start sundowning, Chatgpt can use them to cut people off and tell them to call their doctor.
Corporations who write and deploy algorithms designed to drive engagement above any and all other considerations should be held accountable for the harms they cause.
If big brother wasn’t watching you while he subsidizes your use of his tools, then he is leaving money on the table. Which means he will get bought out and replaced by a big brother who makes the quarterly numbers go up.