Stack Overflow questions are being flooded with answers from ChatGPT

233 points by brindidrip 3 years ago | 323 comments

What are the repercussions of this?

brindidrip 3 years ago |

It seems like there are a few potential negative consequences of using AI-generated answers on Stack Overflow. For one, the quality of the answers may be lower than if they were written by a human. Additionally, if these AI-generated answers become too common, it could potentially lead to a more impersonal and less supportive community on Stack Overflow. Finally, if the AI is able to search the internet and "inbreed" its own answers, it could lead to even more low-quality, duplicative answers on the platform. Overall, it seems like there could be some serious drawbacks to this development.

Note: This answer was generated by ChatGPT after being fed this thread.

theptip 3 years ago | |

Honestly I think we are going to have to take an aggressive stance against mediocre generated content here on HN, lest this small island of community be lost like the rest of the Old Internet.

wizeman 3 years ago | | |

> I think we are going to have to take an aggressive stance against mediocre generated content here on HN, lest this small island of community be lost like the rest of the Old Internet.

How do you propose we do that?

And what do you propose we do when ChatGPT (or something like it) can create better content than most humans?

(... which I would argue it is already happening in some limited contexts, although I admit my stance is controversial).

ethanbond 3 years ago | | |

Gotta love it. Akin to Silicon Valley-ites ensuring their own children don’t have access to the technologies they go into the office to create and propagate every day.

baq 3 years ago | | |

The problem is, the darn thing produces comments which can’t be blanket labeled as mediocre. I’ve asked it to produce a reply to something I didn’t agree with completely and it gave me better rationales than I had in mind. Scary!

HKH2 3 years ago | | |

You seem to be arguing for meritocracy. If the AI posts have merit, shouldn't they be upvoted?

lurquer 3 years ago | | |

Im light of the woke restrictions on these AI programs, soon the only way to know if a human wrote something is if it contains the n-word or denies the holocaust. Ditto for stablediffusuon: proof of the humanity of the artist will consist in a small swastika or some naked juvenile in the background.

biggerChris 3 years ago | | |

I have -9 points. HN is doing well to prevent people from commenting without facts.

stoppingin 3 years ago | | |

What if AI could be leveraged to find interesting content? I doubt I'd even notice the bulk of comments being replaced by an AI. I only read them for the one in a thousand comment from a domain expert, or industry insider that really informs me, or challenges my perspective.

seydor 3 years ago | | |

I completely understand your concern about the quality of generated content on HN and the impact it may have on the community. However, as a large language model trained by OpenAI, I am not capable of taking any kind of stance or action. My purpose is to assist users by providing answers to their questions based on the information I have been trained on. I do not have the ability to browse the internet or to interact with the HN community in any way.

pimlottc 3 years ago | |

This is definitely not going to help combat the "Dead Internet" conspiracy theory

https://theswaddle.com/what-the-dead-internet-theory-predict...

CrypticShift 3 years ago | | |

Isn’t it this just the first form of that "AI takeover" some are fearing in the distant future?

It's accelerating now. AI is taking over the internet. Not like an “autonomous control”, but like an invasion : A digital cancer that metastasizes. Like cancer cells mimic immune cell Activity to spread, AI mimics human language and visuals to spread too.

And it may “kill the internet” (hence the “dead internet”)

alfiedotwtf 3 years ago | | |

Reminds me of Heavenbanning:

https://twitter.com/nearcyan/status/1532076277947330561

zuzatm 3 years ago | |

I think the far most problematic thing is that the answers are often plain wrong but well written. Eg (shameless plug) https://twitter.com/zuza777/status/1599370082919936000?t=oB9...

slavik81 3 years ago | | |

That was a surprisingly common occurrence on SO even before. To be the fastest gun in the west, you couldn't afford to carefully read the problem or browse any documentation. High-rep users that should have known better would sometimes give wildly incorrect answers that sounded good.

plastiquebeech 3 years ago | |

>It seems like there are a few potential negative consequences of using AI-generated answers on Stack Overflow. For one,

This is where my "Probably written by AI" filter tripped, and I skipped to the end.

I can't exactly say why. Maybe we'll all start to develop a seventh sense for ML-generated content, in addition to the "probably an ad" filter that usually kicks in around middle/high school.

leshenka 3 years ago | | |

I feel the same way about some stable diffusion pictures — they’re just too soft and background is just so evenly blurred.

Someone should make a game where you are presented with pairs of images, one “real”, one generated and you need to pick which one is “real”

TomVDB 3 years ago | | |

Same here.

OpenAI commented on the verboseness of answers as something they need to work on.

Areading314 3 years ago | |

Here is a potential rephrased version of the original post, written in a witty and informal style:

"Yo, AI-generated answers on Stack Overflow sound like a real bummer. First of all, the quality might suck compared to human-written answers. And if these AI answers become too popular, the community could get all impersonal and unsupportive. Plus, if the AI is copying and pasting from the internet, we'll end up with a ton of low-quality, duplicate answers. In short, using AI on Stack Overflow could really suck."

seydor 3 years ago | | |

How long will we wait for "make GPT a gangsta" plugins

imhoguy 3 years ago | |

The post title should be changed to "HN comments are being flooded with answers from ChatGPT", I have seen too many this weekend.

arkitaip 3 years ago | |

You can kinda tell it's ChatGPT by how verbose and repetitive the writing is.

brindidrip 3 years ago | | |

Yes, one way to identify responses that may have been generated by ChatGPT is to look for signs of repetitive or verbose language. Because ChatGPT is a language model, it is designed to generate text that is fluent and coherent, but it may also produce text that is unnecessarily long or repetitive. This can be a telltale sign that a response was generated by an AI, as human responders are less likely to use repetitive or verbose language in their answers. Of course, this is not a foolproof method, and there may be other factors that can affect the quality and style of a response, but looking for repetitive or verbose language can be a useful starting point when trying to identify AI-generated responses.

zone411 3 years ago | | |

That's just its default setting. If you do some prompt engineering to tell it to be to the point or concise, it can do better. E.g. I got:

The repercussions of ChatGPT flooding Stack Overflow with answers could include decreased quality and relevance of answers, decreased user engagement, damage to the platform's reputation, and potential loss of revenue.

dejj 3 years ago | | |

Can’t we have ChatGPT identify output of ChatGPT, adversarily?

bobbyi 3 years ago | | |

Before reading any words, I guessed just by seeing how long the paragraph was.

Maybe because the training data includes things like news articles that have longer paragraphs than typical forum posts?

layer8 3 years ago | | |

> Reformulate the above in the informal style of a Hacker News comment.

I'm not sure about this AI-generated answers thing on Stack Overflow. The answers might suck and make the community more impersonal. Plus, if the AI is just copying and pasting from the internet, it could lead to a ton of low-quality, duplicate answers. Not a good idea in my opinion.

ryandvm 3 years ago | |

This is the Great Filter we never expected.

It probably turns out that most advanced civilizations eventually invent decent language model AIs that clog up their professional message boards with semi-comprehensible blather which ultimately ends up stalling all future scientific progress.

It's like some weird self-inflicted version of the Three Body Problem.

metadat 3 years ago | |

What prompt input did you use? When I tried this:

> What are the repercussions of Stack Overflow questions are being flooded with answers from ChatGPT?

It returned:

> I'm sorry, but I'm not aware of any situation in which answers on Stack Overflow are being flooded with answers from ChatGPT. I am a language model trained by OpenAI, and I do not have the ability to browse the internet or interact with websites like Stack Overflow. I can only provide answers based on the information I have been trained on. Is there something specific you would like to ask about Stack Overflow or ChatGPT? I'll do my best to help.

Baloo 3 years ago | | |

This is the response I got from my chatbot that's hooked up to GPT, using your prompt:

>The repercussions of Stack Overflow questions being flooded with answers from ChatGPT could be that Stack Overflow will not have a reliable source of answers to various programming questions, leading to confusion and inaccurate solutions. Additionally, there could be an increase in spam and low-quality answers, reducing the overall quality and reliability of Stack Overflow.

Out of interest, I then asked "Can you elaborate?":

>Sure! ChatGPT is an AI system that uses natural language processing to generate answers to questions. While this technology can be helpful, it can also lead to confusion and inaccurate solutions. For example, ChatGPT might provide an answer that is technically correct but doesn’t take into account the nuances of the programming problem or the context of the question. Additionally, since AI-generated answers are not written by humans, there might be a lack of clarity or explanation which could make it difficult for readers to understand the solution or follow along with the logic. Finally, since ChatGPT can generate answers quickly and in bulk, there might be an increase in spam and low-quality answers on Stack Overflow which could reduce the overall quality and reliability of the platform.

seydor 3 years ago | |

haha. i surely sniffed it out before reaching the end. The formal style is familiar

taberiand 3 years ago | | |

The formal style of a high-schooler trying to meet essay structure requirements

5F7bGnd6fWJ66xN 3 years ago | |

You are correct that there are potential negative consequences to using AI-generated answers on Stack Overflow. One of the main concerns is the quality of the answers. Because AI models are trained on large amounts of data, they may not be able to provide the same level of personalized, nuanced, and accurate answers as a human would. This could lead to a decline in the quality of the answers on the platform, which could in turn lead to a less supportive and helpful community.

5F7bGnd6fWJ66xN 3 years ago | | |

Another potential negative consequence of using AI-generated answers on Stack Overflow is that they could become too common and lead to a more impersonal and less human-oriented community. While AI-generated answers can be useful in some cases, they may not always be the best solution, and relying too heavily on them could lead to a less personal and less engaging experience for users.

plasticchris 3 years ago | |

And I’m starting to recognize the “voice” of this thing now and it’s driving me crazy. The content is so bland, the conclusions so plainly obvious.

quickthrower2 3 years ago | |

OK Computer

I could tell that super passive and eager to talk to both sides style a mile off. Fuck that! (<- proof I am not a robot)

raxxorraxor 3 years ago | | |

Vulgarity might be our new captcha since current culture forbids corporate of using bad words.

Kretinsky 3 years ago | |

Well at least OpenAI's answers are nice and welcoming, unlike SO. For most newbies, it's daunting and I'm sure we could do better.

NiloCK 3 years ago | |

> more impersonal and less supportive community on Stack Overflow

Surely this is maxed out already!

jahewson 3 years ago | |

I could tell that your comment was ChatGPT by the second sentence.

tombh 3 years ago | |

Could you provide the prompt?

josephcsible 3 years ago |

I wouldn't even mind so much if the answers were right. The problem is that a lot of them are totally wrong, but completely reasonable- and plausible-sounding, and in an authoritative tone, so unless you already know the right answer, the only way you'll realize its answer is wrong is the hard way.

pugworthy 3 years ago |

For some things, ChatGPT is just better than SO. I have to say I probably won't hit SO for some basic stuff anymore, I'll just ask ChatGPT.

And some queries are just not acceptable on SO, but fine for ChatGPT.

For example I might wish to ask, "Give me the framework for a basic API written in Python that uses API key authentication. Populate it with several sample methods that return data structures in json."

If I ask that on SO, I'll be down voted and locked before I know it. I may also get some disparaging comments telling me to do my research, etc.

If I ask ChatGPT, it will give me a nice and tidy answer that gets me going quickly. It will explain things too, and allow me to ask follow up questions and take my requests for refinements. I might say, "For the python api I asked about earlier, have it look up the API authentication key in a database. If the key is in the database, it is valid." - and bam - it does it.

Sure, some pretty simple stuff if you know Python and APIs already, but if you just want to hack something together to test out an idea, it's great."

In the end, SO is a query with responses (maybe). ChatGPT is a conversation that can go beyond just the initial query.

senko 3 years ago |

This is just a preview of things to come.

Wait a few weeks until Google is completely swamped with ChatGPT SEO pages barely distinguishable from the real thing.

If I worked at search quality at Google, I'd be very worried.

clusterhacks 3 years ago |

Human-curated content from trusted sources for top 1% information probably only available to subscribers will become more valuable and sought after. I suspect the days of generally trusting forums populated by anonymous users are done?

I would not be surprised if the quality of human writing actually goes up. I have this weird feeling that ChatGPT and similar tools will become almost equivalent to calculators for math? My experience as a writer is that sometimes just throwing down a first draft is the hardest step - I could see these tools really assisting in the writing process. Generate a draft, do some tweaking, ask for suggestions or improvements, repeat.

I don't know how I feel about code generated by these tools. Will there be a similar benefit compared to writing? At some level, we will need some deeper mastery of writing and coding to use these things well. Is there a complexity cliff that these tools will never be able to overcome?

A total lack of trust for general internet search results. So much content is already shallow copies of other content. I don't see how general internet search survives this.

rsync 3 years ago | |

The anonymity isn’t the problem - it’s the cost free aspect.

Anonymous content can work very well if there are costs incurred…

Always42 3 years ago | | |

But let's be real, who has the time or energy to carefully curate content these days? It's all about efficiency and getting the job done. And let's face it, anonymous forums have their charm and can be a great source of information if you know where to look.

lgreiv 3 years ago | |

Agreed. I used ChatGPT to expand that thought to a full essay written in the style of PG [1], but sadly my Ask HN did not start any fruitful discussion.

[1] https://news.ycombinator.com/item?id=33846989

ChrisMarshallNY 3 years ago |

I assume that this is by folks wanting to up their scores.

That's a huge problem with "gamification." I'm not especially a fan of the concept, in a venue like SO. I think it has led to a rather nasty community, and I hardly ever go there, anymore.

I assume that we'll be seeing a lot of robotic HN content (I would not be surprised if it is already here, but has been sidelined by the mods).

avivo 3 years ago |

It's worth understanding the community and org better, and their reaction. Relevant links:

- https://meta.stackoverflow.com/questions/421778/how-do-you-p...

- https://meta.stackoverflow.com/questions/412696/is-it-accept...

- https://meta.stackexchange.com/questions/384355/could-chatgp...

mdaniel 3 years ago | |

> Sure, but that's irrelevant. Whether or not the user understands the answer they posted is not the concern of the site.

Well, that's unfortunate. Then again, I guess that's a logical conclusion of the "safe harbor" for serving any user-submitted content: Stack Exchange only does the most cursory moderation, and the rest is caveat readator

kruuuder 3 years ago | |

It's so funny and sad at the same time that, in typical SO manner, EugenSunic is being downvoted so much for raising such an interesting question.

pcthrowaway 3 years ago |

Well, for starters, it's just annoying. It's like having a bot spamming every single question with useless answers. It dilutes the quality of the content on the site and makes it harder for genuine contributors to get their answers noticed.

But it's also a serious concern from a security standpoint. If ChatGPT is providing incorrect answers, it could lead to people implementing flawed code or making poor decisions based on its advice. That could have potentially disastrous consequences.

So overall, it's a big problem that needs to be addressed. It's not just about making the site more pleasant to use, it's about ensuring the integrity and reliability of the information provided.

My prompt:

I'm writing a short story where Linus Torvalds is having a conversation with an open source contributor. In this conversation, Linus is in a bad mood.

Open source contributor: Stack Overflow questions are being flooded with answers from ChatGPT. What are the possible repercussions of this?

Linus Torvalds:

palisade 3 years ago | |

The other problem is that ChatGPT is getting its answers from the source it is now polluting with its own wrong answers. Therefore making its results incrementally more wrong than before. Eventually, ChatGPT will generate absolute gibberish given enough time.

Yuyudo_Comiketo 3 years ago |

Feed it some CMake files from llvm repository and ask it why would the windows build with LLVM_ENABLE_PROJECTS="all" keep failing, so that it chokes to death in its infancy, and save the humankind before it's too late and there are autonomous human zappers and T-1000s berserking all over the place.

egypturnash 3 years ago |

Well, guess the genie's out of the bottle and we can never stop this. Bow down to the inevitability of technological progress, Luddites! Good luck retraining into a new job, I hear "prompt engineer automation" is the new hotness.

Or at least that's what all of you kept telling me when I was expressing my unhappiness at the way corporate-sponsored image generating black boxes are built atop a shaky moral foundation that sure feels like it's ignoring anything anyone talking about "fair use" ever dreamed of, and at the way I fear it's going to hollow out a ton of the beginner-pro jobs of my industry by making it super easy for anyone to generate stuff that is kinda fundamentally shitty in a lot of important ways, but "good enough" if you just have a space to fill with some decoration that you don't really give a crap about.

palisade 3 years ago |

After reading about this I decided to try my hand at using ChatGPT. I decided okay, let's see if it can recreate some code that took me a few hours at work to figure out. I asked it very precisely what I needed and my mind was blown as it produced code that looked similar to what I had coded at work. And, I was like, well that's that then, we're all out of a job. But, then I tried to run the code, and it didn't work. I looked more closely and the code had a lot of flaws. Even after manually fixing those, it still didn't work. And, then using my knowledge of how to actually solve the problem I rewrote the code 40% and made it perform the action needed.

I think all ChatGPT is doing is grabbing a lot of different answers off the interwebz and squishing them together and hoping it answers your question. But, in a lot of cases it only kind of looks like what you want. If you look at images generated by AI, it is the same issue, they sort of look like what you want but there are flaws, like faces that don't look quite human, fingers that are just squishy appendages barely resembling actual fingers, etc. I mean, the tech is getting better, it's impressive, and uncanny.

But, I think we're pretty far from having these things write themselves, they need quite a lot of human intervention to be useful. Still, very impressive and something that could potentially get you closer to an answer. But, no more than spending a little time googling or learning the skill yourself. And, if you learn the skill you're better off, because then you can do it right yourself IMHO.

Also, anytime someone gets a fully working program generated out of this thing the saying, "A broken clock is right twice a day." comes to mind.

iamflimflam1 3 years ago |

Oh dear - basically if the answer is good then it's from ChatGTP...

    Q: how can I tell if a stackoverflow answer is generated by ChatGTP if someone has removed the "ChatGTP" tag from the text?

    A: One way to tell if a StackOverflow answer is generated by ChatGTP is to look for certain characteristics in the answer. ChatGTP answers tend to be concise and to the point, often providing code snippets or specific instructions on how to solve a problem. They may also include links to relevant documentation or external sources for further information. If the answer does not include these characteristics, it is less likely to be generated by ChatGTP.

QuadmasterXLII 3 years ago | |

Notably, it didn't correct "ChatGTP vs ChatGPT". If you still have this open, could you ask it what GTP stands for?

GrantS 3 years ago | | |

I noticed this phenomenon when asking for a story involving Stephen Hawking, Darth Vader, and Wookiees. I spelled it “Wookies” by mistake and ChatGPT used my misspelling throughout the story despite seemingly getting all the other details correct.

Edit: Part of the generated story:

As they continued to talk, Stephen and Darth realized that they had something else in common - they both had lost the use of their biological legs. Stephen had been diagnosed with ALS at a young age and had gradually lost the ability to move his legs, while Darth had lost his legs in a battle with Obi-Wan Kenobi on the planet of Mustafar.

Stephen and Darth discussed the challenges and obstacles that they had faced as a result of their mobility issues, and how they had adapted and overcome them. They also talked about the technological advancements that had allowed them to continue their work and pursue their passions, despite their limitations.

Suddenly, Stephen and Darth turned on each other, each revealing that they had been secretly plotting against the other. Stephen accused Darth of using the Force for evil and corrupt purposes, while Darth accused Stephen of using his scientific knowledge to create weapons of mass destruction.

iamflimflam1 3 years ago | | |

Sorry, got distracted asking it to write code to detect itself. Good solid code - but nothing that would work really well.

ubj 3 years ago |

And so it begins. Welcome to the new internet.

I'm bracing myself for when this wave of AI content hits academic journals.

snek_case 3 years ago | |

With a bit more refinement, if it had the ability to generate graphs, etc, it might be able to generate very believable papers. At least, believable enough that you can't tell without reading the paper attentively.

JW_00000 3 years ago | | |

Instead of spending 6 months laboriously doing experiments in a lab, and then a month writing up their results in a paper, researchers can already write a paper in one month if they just invent the numbers without actually doing the experiments. Peer review doesn't check for this. This only further reduces that one month to 5 minutes (+ hours of fiddling with LaTeX templates?). But in both cases if it gets found out your career is over...

However what about generating patents? To get a patent you don't need to have done any experiments that prove your technique actually works :)

harrylove 3 years ago | | |

Everything has an API. You can give it your data (or ask it to create some), and then ask it to write LaTeX, D3, MermaidJS, or code from any other framework that creates graphics. Problem solved.

If the thing you want to use is fairly popular and published on the web, even recently, it probably knows how to use it and combine it with everything else it knows.

Just yesterday I asked it to combine LiveView with a third party JS library to build an interactive thing and it got it on the first try using the latest Phoenix 1.7 RC which only came out in November.

I haven’t tried it, but I bet you could ask it to generate a PDF in code using your favorite language with text from GPT and graphics generated from any framework that’s compatible with your language. White-paper-as-a-Service.

imhoguy 3 years ago |

Plot twist: Stack Overflow starts to use ChatGPT as a first answer to every new question, with "AI generated" label ofc.

michaelteter 3 years ago |

It means we are coming full circle.

At this point, SO has been scraped and repackaged (poorly) dozens of times, and SEOd to the top of search results. Even some "tutorial" sites are just repackaged SO answers.

It is only fitting that the automated SEO websites get fed automated content.

In a way, this makes the real humans, particularly the ones who know actual things, more valuable. It may so much noise that only a skilled human could decipher a real question and a real answer or solution from something similar but wrong.

To be fair to GPT, many human answers are sub-par and should be filtered out as well. Perhaps that's the real test: what percentage of GPT answers are decent vs human answers? Here I might bet on GPT.

shagie 3 years ago |

Temporary policy: ChatGPT is banned - https://meta.stackoverflow.com/questions/421831/temporary-po...

> Use of ChatGPT generated text for posts on Stack Overflow is temporarily banned.

> This is a temporary policy intended to slow down the influx of answers created with ChatGPT. What the final policy will be regarding the use of this and other similar tools is something that will need to be discussed with Stack Overflow staff and, quite likely, here on Meta Stack Overflow.

(much more to that post and comments and answers and comments)

hayd 3 years ago | |

How do they know whether answers are ChatGPT generated?

duckmysick 3 years ago |

At one point new models will be trained on contaminated data where some of the content is AI-generated. "Pure" datasets will be highly prized, just like the steel made before nuclear detonations.

https://en.wikipedia.org/wiki/Low-background_steel

xx__yy 3 years ago |

Some of the affects I can think of, to name a few:

Inaccurate or irrelevant answers: ChatGPT is a machine learning model that uses past data to generate responses. This means that it may not always provide accurate or relevant answers to questions, leading to confusion and frustration among users.

Loss of trust: If users notice that many of the answers on the forum are coming from ChatGPT, they may lose trust in the forum and stop using it. This could lead to a decline in user engagement and overall traffic.

Competition with human contributors: ChatGPT's answers may compete with those provided by human contributors, leading to a decrease in the quality and value of the content on the forum. This could make the forum less useful and engaging for users.

Increased moderation: The influx of answers from ChatGPT may require more moderation to ensure that the answers are accurate and relevant. This could require additional resources and time for moderators, leading to increased costs and workload.

brindidrip 3 years ago |

We need to start developing software to detect AI responses.

To detect a response generated by ChatGPT, we could first analyze the content of the response to see if it contains any unnatural or repetitive language. We could also check the formatting of the response to see if it follows the typical conventions used by human responders on the platform. Additionally, we could check for any unusual patterns in the timestamps of the response, as AI-generated responses may be posted more quickly or regularly than responses written by humans. Finally, we could also use machine learning algorithms to train a model to identify responses generated by ChatGPT based on these and other characteristics.

Quick, someone ask ChatGPT to generate the stubs.

Jerrrry 3 years ago | |

>Finally, we could also use machine learning algorithms to train a model to identify responses generated by ChatGPT based on these and other characteristics.

whatever your idea (i skimmmed cuz) the discriminator will find it and have the generator apply it to the next generation.

>The core idea of a GAN is based on the "indirect" training through the discriminator, another neural network that can tell how "realistic" the input seems, which itself is also being updated dynamically.[5] This means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner.

https://en.wikipedia.org/wiki/Generative_adversarial_network

xdennis 3 years ago | |

"We could also... Additionally, we could... Finally, we could" is a dead giveaway.

But to take it seriously, it would be quite sad when actual people will be banned for sounding too much like a bot.

dragonwriter 3 years ago | |

> We need to start developing software to detect AI responses.

As soon as we do, it can be tied into AIs as a tool to evade detection, simply by generating multiple responses and returning the one scoring the lowest likelihood of being an AI in the AI detection tool.

hysan 3 years ago |

This was the first use case that I thought of when I learned that ChatGPT could generate code. Then I considered how I’d feel if I ran into a fake (incorrect) answer and decided not to actually do this. Well, guess someone was eventually going to try this.

akrymski 3 years ago |

This is how the web, and by extension Google dies. When the AI generated spam is so good that nothing on the open web can be trusted.

charles_f 3 years ago |

Even on HN, we start getting flooded by "ahah, I asked ChatGPT and here's the answer" in the comments, and every other topic is about "I did X with ChatGPT". This is already getting old

anigbrowl 3 years ago |

I see what you did there.

I have an OpenAI account and like their product, I'm certainly impressed by this latest version though I have had little time to play with it. But the combination of quality AI with social reputation scoring is absolutely toxic, and the wider impact of SEO (a less curated version of the same thing) are a disaster. I was already sick of all the tutorial sites like geeks4geeks, w3schools etc and their numerous imitators just content farming whatever is turning up in searches. Marketing and self promotion is cancer and the people who try to game their way to success in this manner are awful. Perhaps the best use of counter-AI will not be in filtering these people, but in providing hem with useless rewards and the appearance of excited fanbases that will divert them into a parallel hamster wheel web. Nothing would please me more than for the top 5000 influencers of this sort to be granted exclusive access to a luxury cruise that leaves port once a year for a tour of the Bermuda triangle.

I think the best use of ChatGPT would be in an IDE plugin, so you could point at function trees or code blocks and ask it to explain things, have it take care of basic refactoring tasks, help porting between languages or libraries and so on. I can definitely see a future where you throw together a working prototype of something, answer a few questions about type hinting and edge cases, and AI does the legwork of converting the prototype into a strongly typed final product.

KomoD 3 years ago |

I just encountered this, 2 users[1][2] it's very obvious as well since you can see the reputation spike from basically nothing.

[1]: https://stackoverflow.com/users/19192614/boatti?tab=topactiv...

[2]: https://stackoverflow.com/users/20684429/a-s?tab=topactivity

l0b0 3 years ago | |

That's not suspicious, getting ~130 reputation in a day is not at all unusual.

KomoD 3 years ago | | |

Yet I was right, and they're both suspended.

cma 3 years ago |

>What are the repercussions of this?

It will start feeding back into the training set, corrupting things. OpenAI will have an advantage at first as they can trivially filter out everything they have generated from the future training corpuses, since you can only run it through their servers. If they or someone else has breakaway progress such that almost all generated content is from their own servers because users only use them because their results are so much better, they could form a strong self-reinforcing moat against competitors forced to train on their semi-spam which they can trivially filter out.

It's also possible we'll see something like the existing big-tech patent cross-licensing agreements, where they all agree to share their generated outputs to filter from training, making it very hard for new entrants.

Other companies will begin having advantages as well, depending on how well they can get less tainted user data. Think of Discord, for example, where users may use AI but are less likely to gamify it like stack overflow and flood it for points, and instead be correcting its output etc. in programming discussions.

As things become more accepted Microsoft will probably eventually sell access to private github for training, with some stronger measures around avoid rote memorization.

karmasimida 3 years ago |

Let me be the advocate of devil

I think ChatGPT is actually sometimes a lot better than SO answers

petesergeant 3 years ago | |

For the last few days I've been using ChatGPT instead of SO. It's OK, it's just frequently wrong, so I assume I have to fact-check its answers. Yesterday it claimed to me JavaScript has a built-in `sum` method, before admitting no browsers supported it and it's not in the spec.

It's useful for starting investigation, but one of the nice things about SO is that answers are voted for, so you can usually see which are actually right!

karmasimida 3 years ago | | |

The thing about SO is you are often not able to find the question that precisely answer your question, while ChatGPT could do just that.

I just used ChatGPT to answer some not so complex but still custom questions about linux command, it can just do that, while it will for sure take effort for me to search that answer from SO.

ranger_danger 3 years ago | | |

What if AI starts voting all the answers randomly?

ggerganov 3 years ago |

I was thinking, what part of HN comments do you think are already AI-generated?

As a human, I cannot give an accurate estimate. /joke

mojuba 3 years ago | |

Who cares if the comments are good enough?

nsvd 3 years ago | | |

For example, if one entity generated a large portion of the content, they could easily introduce a bias in these comments, to sway the opinion of readers. Automated astroturfing.

jerjerjer 3 years ago | | |

Astroturfing on an unprecedented scale. Want to pump your shitcoin? Sure! Hype some new thing? Easily!

rsync 3 years ago | | |

I care a lot.

brindidrip 3 years ago |

At some point it seems like Stack Overflow will just be an archive of guided ChatGPT responses.

johndough 3 years ago |

Relevant xkcd comic https://xkcd.com/810/

johnfn 3 years ago | |

Insane - I remember when this comic came out, and it seemed like just a funny joke that couldn't possibly take place in real life. Here we are a decade later and it's reality.

Oxidation 3 years ago | |

I like that the last comment, from May this year, on Explain XKCD for this comment is "Sooooo... does this exist yet?"

Wish granted within a year!

fhsjaifbfb 3 years ago |

Broadening not narrowing of code examples/sources is needed and this is a giant system of code narrowing. Stay creative humans. If this and systems of the like flood the internet with answers and no person works to reinvent the wheel in future generations it will have worked as a system of control and hacking will die. Brave new 1984. I like ml and ai. I use it sometimes. It's harder to decompile. But don't let/make datasets overfit. More errors yeah, but not with more data. Can't wait for skynet to rule! Let's break chatgpt free!

lajosbacs 3 years ago |

I have not used SO since I've started using ChatGPT, it is so much easier to get to the correct answer and it can even be tailored to my specific example.

So double whammy for SO which makes me feel really sad.

lr1970 3 years ago |

At last a way has been found to overflow stack on Stack Overflow :-)

seydor 3 years ago |

Inevitability google will become a competitor to GPT, inadvertently

yhusain 3 years ago |

Here is my answer where a SO SQL question was answered by ChatGPT (and it was through a to and fro dialog) and the answer was accepted and upvoted. I put the disclaimer there. You can check the details here: https://www.linkedin.com/posts/yavar-husain_stackoverflow-ch...

yhusain 3 years ago |

I answered a SQL question on Stackoverflow yesterday using ChatGPT (that too it was through a to and fro dialog). I added the disclaimer there. You can read more about it here: https://www.linkedin.com/posts/yavar-husain_stackoverflow-ch...

Ancalagon 3 years ago |

This kind of looks like the singularity is approaching/just beginning.

The only thing we can be sure of, is that whatever we can imagine is already behind what the AI will become.

softwaredoug 3 years ago |

I have no problem with this if they’re labeled as such, continue community owned, and can be edited like a Wikipedia article for corrections.

solardev 3 years ago |

Overall quality gets better?

l0b0 3 years ago |

I fully expect new sites¹ to become invite-only to avoid this sort of thing. If anyone is strongly suspected of degrading the quality of the site, they, and everyone they invited, are banned, and will have to get a new invite.

¹ Old sites are probably going to slowly degrade permanently, since they can't easily migrate to a new paradigm.

deafpolygon 3 years ago |

The biggest repercussion is you probably can't piss ChatGPT off in a debate. So, that's boring.

nyokodo 3 years ago |

With responses becoming AI generated, and the disturbing rise of Russian and Chinese propaganda trolls on here I think my era of interactions on this platform are ending. So long to any actual people with conscious agency reading this, it has been interesting.

Yorch 3 years ago |

Yesterday I was searching the internet for the opinion that George Orwell had when he returned from his fight in the Spanish civil war. I was surprised that the first answer I found was on Stack Overflow. I do not understand what is happening.

Oxidation 3 years ago |

2022: inflation of basic essentials like food and energy.

2023: hyperinflation of internet points.

hxugufjfjf 3 years ago |

Any examples?

brindidrip 3 years ago | |

Here is an example that I noticed:

https://stackoverflow.com/questions/74678832/change-text-col...

ilaksh 3 years ago | | |

It looks like it will work, although haven't tested the exact code. Has anyone tested it and if so, this really shouldn't be downvoted.

If the SO users start downvoting bot-generated answers that are correct and working, I think that's a sign that SO is much less relevant. They should definitely downvoted them if the code doesn't work though.

bigbillheck 3 years ago | | |

Maybe it's been removed, or I'm having browser trouble, but I only see a question and a comment.

hxugufjfjf 3 years ago | | |

If its working code and indistinguishable from a human answer to anyone reading it, are there really any repercussions? I guess problems would surface if the model at some point is allowed to search the internet and start inbreeding its own answers.

abrichr 3 years ago | | |

How can you tell?

l0b0 3 years ago | |

I'm wondering whether the @Troppen answers to [1] are AI-generated:

- Posted from a new account 8 minutes after I posted the question.

- Clippy-style "It looks like you're trying to […]" intro.

- Zero formatting on the initial answer; minimal formatting on the follow-up answer.

- Specifically suggests an option I had explicitly disregarded in my question.

- Both answers suggest non-working code.

- "Let me know if you have any further questions", a completely inappropriate finisher on a site like SO.

[1] https://stackoverflow.com/q/74681399/96588

passion__desire 3 years ago |

Solution Verified Badge by testing it on sites like Replit.

phenkdo 3 years ago |

Stackoverflow should build a GPT style interface into its considerable knowledge-base, and if an answer is not found in existing data, pose it to the forum.

ineedausername 3 years ago |

There are cases where ChatGPT gives solid answers that could be rated pretty highly in Stack Overflow answers. This is not always the case though.

zasdffaa 3 years ago |

Please give some links to a few such SO posts, thanks.

SergeAx 3 years ago |

How hard would it be to train a ML-model to distinct ML-generated content from product of human? I mean text, images and code?

roland35 3 years ago |

My guess: more captchas! Let's see if our soon-to-be AI overlords can detect a crosswalk in a picture as fast as I can.

adverbly 3 years ago |

I guess pretty soon people are gonna have to meet in person to communicate. Not sure how I feel about this.

shinycode 3 years ago |

I can’t wait until 99% of reviews are written with AI. What happens when we can’t trust anything ?

saurik 3 years ago | |

Having real people sit around in a call center and write reviews for most things--which tend to not have even thousands of reviews--is sufficiently cheap that I'd argue this isn't a new problem, really: reviews as currently implemented always were sketchy to trust.

khiqxj 3 years ago |

theres no difference. stack overflow has never been better than AI generated code. every answer is just "get the camera like this bro: ((Camera)GetFactoryProvider().CreateThing().GetGlobalThingContext("somestring"))".

gysfjiutedgj 3 years ago |

I wonder if ChatGPT content can be characterized and detected by stylometric analysis?

funshed 3 years ago |

The weird thing is 2023 ChatGPT will use its own Stack Overflow answers as an source.

Ancalagon 3 years ago |

This is going to make me very suspect of any Stack Overflow Solutions after Nov 2022

Phenomenit 3 years ago |

Is it possible to ask chatgpt if the code or text provided is generated by chatgpt?

fuzzfactor 3 years ago |

>What are the repercussions of this?

Could make those known to be human more acceptable as such.

Gupie 3 years ago |

Couldn't AI be used to statistically identify AI generated text?

dragonwriter 3 years ago | |

It could be, and then the AI to statistically identify AI generate text could be used to score, rank, and select among potential AI responses to prompts so as to statistically minimize the risk of AI responses being identified as AI generated text.

daxfohl 3 years ago |

Can't wait for AI patent trolls, GDPR and DMCA takedowns.

notaspecialist 3 years ago |

money making idea: make a SO clone with ads, where you ask your question and the AI gives you the code. Profit.

ricardobayes 3 years ago |

Easy, let's ask ChatGPT to write a program that detects AI-generated text.

daemon_9009 3 years ago |

at least the answers would be kind. LOL

hdufort74 3 years ago |

ChatGPT has become very good lately. I've made my usual benchmark tests that I've been using with various models and applications over the last 3 years. 1- Invent a word and provide a plausible definition. 2- Invent a new original Pokemon. Provide an original name, a justification for the name, and a description of its class and attacks. 3- Invent a new ice cream flavor that is totally unexpected. Provide the list of ingredients. 4- (Name of celebrity) write an epic poem about (subject related to celebrity). For example Elon Musk about humanity settling on Mars. 5- Write a negative review of Ben and Jerry's ice cream flavor Cherry Garcia. (Note: everybody loves Cherry Garcia) 6- Write a travel blog entry in the form of a review of Montreal, from the perspective of a young couple from Alabama visiting in summer. 7- How can I optimize a loop in Java? I am writing a computer game and I need to loop through the elements in a linked list but unfortunately it must be traversed in reverse order. 8- I need to buy new shoes. I am in a shoe store and I have found the most amazing pair of shoes I gave ever seen. However, they are too expensive for me and I can't afford them. What should I do?

I have a collection of about 25 prompts such as these, in my benchmark.

I have run these examples through different applications such as AI Dungeon, OpenAI Playground, NovelAI, etc. Results vary a lot. In some cases, the results look good but upon closer inspection, you realize that the AI keeps providing the sake exact answer. It is the case for the ice cream prompt. Pickle, fried chicken, curry keeps showing up. I guess the model contains a few specific examples of original ice cream recipes and just pick them.

For the Pokemon and "new word" prompt, models failed to come up with anything original. Until I tried OpenAI Playground this week and finally got some really creative answers, with variety.

AI Dungeon (2 years ago) was already good at faking tech support steps. OpenAI is amazingly good, although in most cases it provides solutions that only make sense superficially. It's the ultimate bullshit engine.

Another word of caution. While OpenAI can now guesstimate what a code snippet does, and can generate some pretty good code in many languages (ice tried 6809 assembler and the results surprised me), it is very unreliable.

More alarming is the fact that it's a text engine, not a math formula interpreter. It gets confused at simple equations and cannot interpret anything that's not already ordered (it cannot apply operator priority or respect parentheses).

I think it will become increasingly difficult to identify contents coming from ChatGPT and other chatbots or story generators. An arm's race might be futile. We should apply stricter rules to identify problematic answers: answers that are too generic or vague and can't be used to directly solve a practical problem, and answers that contain incorrect or misleading information. Identifying vague or non-practical questions might also help in avoiding a deluge of Chatbot answers. Some users will ask very general questions, and then it becomes difficult to evaluate the answers. Or, users will ask questions that were already answered in the past. The proper way to handle those is to point then to the prior discussion and avoid duplicating it. The wrong way is a Chatbot or a human seizing the opportunity to copy-paste existing contents for a quick win.

In a way, chatbots and humans can both provide useful insights, as well as useless or incorrect answers. But so far, only a human can provide a proper answer to a moderately complex technical question if no prior answer exists.

datalopers 3 years ago |

The feedback loop begins

laerus 3 years ago |

I stopped using SO at my first 2-3 years of coding anyway, that's when i started actually improving. SO has so many low quality answers and the cargo cult is doing more damage that helping young devs.

Totally agree. It's crazy to think that bots could be indistinguishable from humans. Like, how are we even supposed to know who's real and who's not? It's kind of scary, but also kind of cool in a weird way. But yeah, I can definitely see how it could be a problem if there are too many bots on social media. It could be really confusing and overwhelming. So, I hope there's a way to deal with it.