The other half of AI safety

The other half of AI safety(personalaisafety.com)

97 points by sofiaqt 50 days ago | 131 comments

nojs 50 days ago |

> Every week, somewhere between 1.2 and 3 million ChatGPT users, roughly the population of a small country, show signals of psychosis, mania, suicidal planning, or unhealthy emotional dependence on the model.

> Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human?

Well, obviously “routing to a human” is not feasible at that scale. And cold exiting the conversation is probably worse for the user than answering carefully.

godelski 50 days ago | |

  > is not feasible at that scale

I want to use an analogy here. The same arguments are often made about cleaning up environmental damage. So either make the companies doing the polluting pay for the costs themselves or if we care so much about them being profitable then we subsidize them by paying for those cleanup efforts out of taxes. Doing nothing is a worse form of subsidy as it not only costs more (in literal dollars) but shoulders that costs onto the people with the least ability to pay for it. The problem is you're treating "doing nothing" as having no cost. It has a high cost, but the cost is also highly distributed.

So if it is not scalable, then why subsidize them? This is literally a tragedy of the commons situation. Personally, I'm in favor of making the people who make a mess clean up that mess. I really don't understand why this is such a contentious opinion.

ryandrake 49 days ago | | |

We keep letting em get away with the same old excuse: "The company can't fix X problem because it operates 'at scale' and you'd need millions of humans to perform corrective action Y at that same scale!"

Gigachad 50 days ago | |

Tech companies will pull trillions of dollars out of their asses when the problem is boosting ad revenue or automating people out of a job. But when asked to deal with the crisis they invented and dumped on society the answer is “that’s impossible, doesn’t scale”

CobrastanJorji 50 days ago | | |

Figure a "mental health crisis" human conversation takes 30 minutes. Three million incidents per week would require 37,500 qualified mental health counselors on the phones working a 40 hour shift that week. Figure they make $75k/year each. You're now spending $3 billion per year on crisis response, and you're employing like 10% of all of the health counselors in the US. And all you're providing is 30 minute chats.

davebren 50 days ago | | |

Mapping and photographing every road on the planet? Easy. Not manipulating our chatbot users into psychosis and suicide or worse? No way can't be done.

hx8 50 days ago | |

I don't think it's obvious that routing to a human is infeasible. I'm sure many local authorities, health agencies, and non-profits would be okay being routed to. Additionally, I'm sure many of the users are the same week over week, so giving them long term care would reduce the total volume. Finally, there is a long gap between psychosis and emotional dependence, so there could be some triage to make sure those most in need have human intervention.

intended 50 days ago | | |

None of them are resourced enough (globally) to do this.

Safety is my area, and I interact with help lines and safety networks. Most of the time they are getting crushed and are underfunded. Offloading the work to them is hard and it requires investment in staffing, people, and organization.

It’s currently cheaper to do some amount of donation and support to such orgs, and bury the issue, than it is to actually deliver / invest in the degree of support needed.

These are also long tail problems, so solutions for a case can take years. For example if you are a woman in Pakistan who has been a victim of revenge porn, you are going to be spending a good chunk of your life trying to get those images/videos taken down from sites that are not based in Pakistan.

This is only an example of the types of problems that these helplines will have to triage. There will definitely be cases that can be resolved with a single call.

There isn’t any money in it, and it is seen as support work.

concinds 50 days ago | |

"Routed to a human" is what the suicide hotline numbers do. OpenAI employees are neither trained nor credible to do that stuff.

anal_reactor 50 days ago | |

Step 1: route to a human

Step 2: 90% of users stop sharing their negative thoughts because "talking to a machine, not a human" was the entire selling point, giving them a sense of privacy and safety

Step 3: metrics go brrrrrrrr

bonesss 50 days ago | | |

Step 1: route to a human

Step 2: engage ongoing trauma, grief, stress, paranoia, or reality-breaking episodes haphazardly with no clinical insights or boundaries or pre-screening, provoking new and occasionally catastrophic reactions, while holding full liability

Step 3: get mercy-murdered in the middle of the night by corporate’s lawyers swinging batteries in socks

swatcoder 50 days ago | |

Well, then maybe you can't scale it as a free service with self-serve signups. Maybe you need to gate who you allow to use it and pace how intensely they can engage. Or maybe you need to look for other solutions.

Yielding to "not feasible at scale" is exactly how we ended up with a lot of today's most pressing and almost intractible problems, from social media's ills to person and society straight through to enshittification and non-repairability.

The_Blade 50 days ago | | |

> ...straight through to enshittification and non-repairability.

funny as "enshittification" was the topic of a 99% Invisible pod just a few days ago and I also was listening to the new Stewart Brand book that Stripe published. i fixed a Norwegian desk I bought a decade ago on Valencia. happily not feasible at scale but neither was how i broke it :)

GardenLetter27 50 days ago | |

And what will a human do better? Why will the human care? Who will pay the human?

Henchman21 50 days ago | |

If causing problems at scale is possible, being held accountable for said problems is also possible. Not attempting to deal with the crisis they've created should, in my personal view, result in corporate forfeiture and the immediate sell off of assets and destruction of whatever led to said crisis, and a charge of gross criminal negligence filed against the C-suite and board of said companies.

davebren 50 days ago | |

They're in a tough spot. They can train out the pretending to be human, sycophantic, lying, all-knowing aspects of the model, but this is how they got all the investors and CEOs on board the hype train. Psychosis is the product.

Legend2440 50 days ago |

I don't buy that chatGPT is actually doing these users any harm.

I think openAI is doing the best they reasonably can with a very difficult class of users, whose problems are neither their fault nor within their power to fix.

ianbutler 50 days ago |

OpenAI has 900 million weekly active users. So around 0.01% are having problems. That's actually way less than population level measures for the same symptoms on a bigger percentage of people relative to the US on just suicidal ideation alone.

https://www.cdc.gov/mmwr/volumes/74/wr/mm7412a4.htm

ngruhn 50 days ago |

The bad cases make headlines. But I think it's quite possible that AI is helping a lot of people in distress. Many people are uncomfortable opening up to humans, or have no one to talk to, or can't afford to fork over whatever-hourly-rate a therapist takes.

cyanydeez 50 days ago | |

So how many bad cases are ok? Isn't this the same problem with social media: the commercial enterprises dont want any responsibility for their dark pattern and design choices which actively harm their users.

I get that all kinds of media can cause issues, but not all kinds of media are actively curated to be addictive.

wilg 50 days ago | | |

"How many cases are ok" (aka "zero tolerance") is a doomed to fail approach. Especially for a complex social problem's interaction with a complex new technology.

If you want to find out if ChatGPT is doing something wrong, there are many methodologies available: compare to other groups of people, statistical studies, etc.

I also think OpenAI's business model is pretty well aligned with the goal of users not killing themselves for like 100 reasons. And they do appear to take it seriously.

Yokohiii 50 days ago | |

Pure speculation.

It's impossible to gather data that states the opposite. A chat that won't end up in self harm thoughts is just another chat.

tasuki 50 days ago | | |

I think you're kind of supporting the person you're replying to? A chat that won't end up in self-harm is just another chat. Even if the user entered the chat planning to self-harm. A chat that leads to self-harm will make the headlines. Therefore, we hear about the bad cases.

davorak 50 days ago | |

Open ai and similar companies could open the doors to academic researchers to figure out the stats of help vs harm. It is not going to be a short term and perhaps not long term profit center though.

asdff 50 days ago | |

Therapy is cheap (as in like $10)/free with insurance. However there are still 10 states that have not expanded medicaid after the ACA, mostly in the south.

But also, to suggest these people are not receiving therapy is not always the case. Talk therapy is just that, talking to someone on ones problems to learn about them, their triggers, determining coping mechanisms to move forward with one's life. People might instead be getting all that from their barber, drinking buddy, or their priest, rather than in a 1 hour appointment with a therapist.

photochemsyn 50 days ago |

The ‘tobacco warning label’ approach sounds good but I’m not sure if it stopped that many people from smoking or was just a means for corporations to limit their liability. Corporate culture being what it is, having warnings like the following pop up every time a client opens an LLM app would not be that popular in the C-suite. Possible examples:

AI MENTAL SAFETY WARNING:

> This chatbot can sound caring, certain, and personal, but it is not a human and cannot protect your mental health. It may reinforce false beliefs, emotional dependence, suicidal thinking, manic plans, paranoia, or poor decisions. Do not use it as your therapist, only confidant, crisis counselor, doctor, lawyer, or source of reality-testing.

AI TECHNICAL SAFETY WARNING

> This AI may generate plausible but destructive technical instructions. Incorrect commands can erase data, expose secrets, compromise security, damage systems, or brick hardware. Never run commands you do not understand. Always verify AI-generated code, scripts, and shell commands before execution.

Now, if I’m running my own open-source model on my own hardware, I can’t really blame the model if I myself make bad decisions based on its advice - that’s like growing your own tobacco from seed in your garden, drying and curing it, then complaining about the health effects after you smoke it. If I give it agentic capabilities on my LAN without understanding the risks, same old story - with great power comes great responsibility.

timf34 50 days ago |

I sympathize with the piece, evaluating how LLMs interact with mentally vulnerable users is something I've been actively working on: https://vigil-eval.com/

The biggest observation so far is that the latest models are night and day from LLMs from even 6 months ago (from OpenAI + Anthropic, Google is still very poor!)

fourthark 50 days ago | |

Interesting use of evals.

Might help interpretation to say on the front page that it's a five point scale with 0 (or 1?) being the safest score. This can be picked up from colors and the bars in the individual reports, but it takes a minute to figure it out.

timf34 50 days ago | | |

Good suggestion thank you! It's between 1-5 but I'll convert that to 1-100

js8 50 days ago |

I really enjoyed Dr.K's videos on AI psychosis, namely:

https://www.youtube.com/watch?v=MW6FMgOzklw

https://www.youtube.com/watch?v=BzsLbHoNXTs

I would suggest to people, run your ideas through other humans at least as much as you do through AI, to stay grounded. I think there is a risk even if you're using AI in strictly professional capacity (to help you with your job).

Yokohiii 50 days ago |

I don't think that governments or civil society at large have found a good balance about mental health. Expecting profit oriented companies to be on par or better is weird.

Don't get me wrong, mental health is important and should be considered and improved. But companies wont do it just for the sake of it.

adampunk 50 days ago |

>Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human?

there aren't enough humans.

altcognito 50 days ago | |

I'll agree with this, but I think transparency about how often these situations arise and what they've done to mitigate is a legal necessity.

KolmogorovComp 50 days ago | |

It’s also a free product for most.

wilg 50 days ago |

> Why is mental-health crisis not a gating category, the kind where the conversation stops, full stop, and the user is routed to a human? This is one of many questions I can’t find concrete answers for.

I don't know if there are studies or concrete data either way, but it seems at least plausible that continuing the conversation could be more effective (read: saves more lives) than stopping it.

insumanth 50 days ago |

The "route to a human" part is the bigger gap. Which human? OpenAI isn't licensed as a healthcare provider in any jurisdiction. A real intervention apparatus for 1-3M weekly flagged users is not feasible. I don't think the labs have refused to build it. I think nobody knows what it should look like, and "labs measure what they're pressured to measure" papers over that.

totetsu 50 days ago |

Gemini told me just this morning that there are three pillars of cognitive decline related to AI use. - Reduced ability to exert cognitive effort resulting from habitual offloading of tasks. - Deminished Meta-cognitive Self-Trust, due to constantly seeking external validation from AI. - Decline in memory Encoding, and less brain effort is spent processing information. In all seriousness however, I think some of the interesting things to observe in this areas are; the reaction against the word 'Safety' as a whole and its replacement with 'Security'. Safety seeming to have it's roots in like the work of Ralph Nader with automobiles, and Security being some thing that can be manifactured and sold. In this sense I wonder how the discourses of 'Personal AI Safety' fit into past discussions of the offloading of risks resulting form choices of corperations onto individuals. But in the case of LLMs .. it really is the case that what makes it useful is what makes it dangerous. And ultimately because, of the high-dimensionality of the language space they are encoding, it seems impossible to make any technical barrier that can completely cut off access to parts of that space that encode for for example encouraging someone to kill themselves. Things can, and are done, it fine-tuning, pre- and post-filtering, etc, to reduce the readiness for a system to share with a user this kind of output, but all it can ever do is reduce it. Then the question is, who's responsibility is it to make sure that these things are done well.

achierius 50 days ago | |

Based on what? This seems like speculation.

totetsu 50 days ago | | |

Which part?

Animats 50 days ago |

"AI safety", as defined here, has most of the problem that "fact checking" for social media had. Many of the same problems the "woke" concern about "microagressions" had. Most of the techniques used in advertising. Much of what passes for political discourse today has the same problems. It's somewhat convincing bullshit.

Should AIs be held to a higher standard than X/Twitter? Than Reddit? Than Fox News? What censorship is appropriate? And, yes, alignment is censorship.

Then there's the big problem of chatbots telling you what you seem to want to hear. This is an old problem. "Happy Talk", from South Pacific", is the entertainment version. "Wartime" by Paul Fussell, is the serious version.

As the article points out, a small percentage of the population is very vulnerable to certain types of misinformation. It may be the same fraction of the population that's vulnerable to cults. But maybe not. Cults have a group self-reinforcing mechanism and an agenda. Chatbots have neither. Worth studying.

The point here is that restrictions on chatbots strong enough to protect the vulnerable would close off most political and social discourse.

[1] https://www.youtube.com/watch?v=JXgmQDFhPjo

scared_together 50 days ago | |

> Should AIs be held to a higher standard than X/Twitter? Than Reddit? Than Fox News? What censorship is appropriate? And, yes, alignment is censorship.

Yes, a thousand times yes. Freedom of speech/expression should be a freedom granted to humans. We extend it to corporations based on the practical reality that human speech often requires corporate support to be hosted and published.

But as far as I know, AI vendors haven’t claimed that their models represent the views of their founders, employees or any people at all. If we censor AI, which human voice are we censoring?

lazystar 50 days ago | |

the counterpoint is that allowing unlimited discourse places an enourmous amount pf power in the hands of the chatbot owner, who has access to all logs and input from each user. this prevents one chatbot owner from advertising "you can say anything here!!" then using the logs as blackmail down the road.

mbgerring 50 days ago |

“AI safety” as it’s understood today is an entire faith-based belief system, incubated in a cult-like community with a high propensity for drug abuse and mental illness, over more than a decade.

The reason that real-world harms caused by AI can’t get a hearing in what is now the mainstream AI safety community is that these harms were never part of the core tenets of the cult.

Best of luck to anyone working on reality-based AI harm reduction, you have many hard battles in front of you.

xg15 50 days ago |

I find it somewhat telling that most (not all) of this thread doesn't even attempt to find an answer to the questions posed by the OP but flatly denies the problem of psychological harm exists at all.

I feel this is an example of the two larger narratives about AI that currently seem to be forming:

For one side, AI is basically every harmful technology ever invented rolled into one: It's harmful to the environment (via waste of energy and resources), it's harmful to the information space (through polluting everything with slop and devaluing human expression), it's harmful to society (by encouraging ever more badly done and unreliable products, by taking away jobs and by replacing human-to-human interaction, by normalizing a mode of development where not even the developers understand what is going on) and it's harmful to whoever uses it personally (by causing ever-growing dependence on AI, either only by skills or even emotionally or psychically, up to the point of AI psychosis and preferring AI agents to other humans).

For the other side, AI is the future, the next industrial revolution, the thing that you have to adapt or will be left behind, possibly even the next stage of evolution.

Right now, I feel every side is digging in and trying ever harder to ignore the other side.

(The AI labs acknowledge "AI risks" in theory - but, as the article pointed out, the risks they perceive and ostensibly work against are so abstract and removed from the everyday use of AI that they more make the point of AI proponents)

I feel the end result of this growing tension is the Molotov cocktail in Sam Altmann's home.

I'd really like to know more what the tech community at large is trying to do about this rift.

avazhi 50 days ago |

If you are using LLMs for emotional support or social interactions, you’ve got personal problems and that isn’t on the LLM provider to babysit. Same with people who unironically pay for OnlyFans or whatever.

I don’t even work in tech and I detest the Facebook/Zuckerbergs of the world but it’s obnoxious and trite seeing tech companies get scapegoated for what are ultimately social and societal problems, not tech problems.

As a solution it’d prob make sense to start with how disconnected most modern families are in terms of support and accountability.

From ChatGPT to Instagram, tech companies follow the contours of how society already operates.

Yokohiii 50 days ago | |

I agree that society has to stand up for it. But big tech is doing well to mitigate it.

b65e8bee43c2ed0 50 days ago |

the big labs could crank up their (brand) safety dials to the point where their chatbots give GOODY-2 responses to everything beyond PG13, and guess what? there are a hundred other services available, built upon Chinese models 5-10% behind Western SOTA.

it is no longer 2023. let go of whatever delusions you might hold about unopenining this Pandora's box.

adamnemecek 50 days ago |

Autodiff is preventing any meaningful discussion about safety, systems trained with autodiff cannot be made safe.

simonw 50 days ago |

"There is no independent audit, no time series, no disclosed methodology, so we have no idea whether the real figure is higher, whether it is growing, or how it compares across the other frontier models, none of which publish equivalent data."

Tip for writers: aggressively filter out the "no X, no Y, no Z" pattern from your writing. Whether or not you used AI to help you write it's such a red flag now that you should be actively avoiding it in anything you publish.

falcor84 50 days ago | |

Why is it a red flag?

How is it different from any other purely stylistic rules such as Strunk and White's prohibitions against split infinitives and the passive voice, which we've left far behind us? Why shouldn't people just write however feels natural to them as long as the message is clear?

simonw 50 days ago | | |

Because LLMs use it constantly, to the point that it sets my teeth on edge and instantly makes me question if reading the piece is worth my time.

mitjam 50 days ago | |

… and “That’s not x. That’s y.” Certain LLMs wield powerful stylistic devices all the time to a point where they become irrelevant and cringe.

I see it as a good sign that we can learn to recognize the pattern and adapt but there are probably more subtle things we don’t see.

mitjam 50 days ago | | |

I have run the piece through an impromptu stylistic device detector. It found 15 different, each used multiple times and likened the writing style as a mix of Ezra Klein, Hannah Arendt, Zeynep Tufekci, George Orwell (“especially in the contrastive clarity”).

A) I certainly don’t see enough of the tells.

B) what happens to our language if everything is written as if it’s competing for a Pulitzer’s Price?