AI-Shunning robots.txt(github.com) |
AI-Shunning robots.txt(github.com) |
Even if they are right now it would be a quick switch for them to just ignore it.
Maybe your question is "how do we know if whatever system GPTBot feeds downstream didn't just get your content via something else that crawl your site?" I am not sure we have anything to defend against those, other than signalling via robots.txt to say that our content is not intended for AI use.
Imo, they best solution would be to license your content so crawlers pay a fee for crawling and using your content.
Does IA themselves block crawlers? It doesn't look like it according to their robots.txt, even going so far as to say "Please crawl our files."
What would stop an actor from maliciously complying with a robots.txt file by just going to the internet archive instead.
Regardless, my original question is still valid. The companies have already shown a lack of care about the data they train off of. So if ethics have already gone out the window, what is to stop them from ignoring this file if they are not already.
Also - out of curiosity - do you use any AI yourself?
AI from any project will allow AI to be used commercially, and thus I oppose it. Moreover, I oppose AI on various other princincples even independent of this: it further isolates people and can be used to develop other technologies that are too powerful for us to handle. In short, I believe human beings en mass are too stupid to use AI.
> Also - out of curiosity - do you use any AI yourself?
I do not, or at least I try my best not too. In fact, I hate AI with a passion. Obviously, there may be products here and there that have used AI that I in turn use. What can you do? But I attempt to minimize any contact I have with AI: I don't use Grammarly, any form of auto-suggest, I use an ancient phone (and I RARELY use it, I hate smartphones), I don't use AI features in software such as AI-noise reduction, I turn off all automatic features in software that may have some AI behind it.
If I find out a website uses AI for content generation, I ban it and never visit again.
The other day I downloaded a text editor that looked cool but I deleted it because I realized it has an AI-console (even though I never used it).
I also work for a business and I convinced them not to use AI. We're an online magazine and it turns out the vast majority of our readers supported that decision.
In short, I am against AI because I believe it provides virtually no benefits to humanity, only detriments.
I automated my site's robots.txt[0] by scraping your site. It would be extra nice if darkvisitor.com exposed a plain text version or JSON representation of the list.
[0] https://tbeseda.com/blog/automating-my-robots-txt-to-block-a...
Also why are "AI" crawlers are worse than "normal" crawlers?
Either way, this is an exercise in futility.
Is it really? Every drop of opposition towards AI in my book is a good thing. This robots.txt thing is a small drop maybe, but over time public hatred for AI can build and it might in fact be taken down. Especially outside the tech bubble, many people are ambivalent towards AI.
Yes, in modern society were are taught to value innovation and ignore its downsides, but the more vocal opponents are against it, the more those downsides will become apparent. Hopefully, it will bring the ruin of all AI companies and research.
What needed is indifference, not hate.
Crazy world.
A search engine will index your content to bring people to it through search. An AI crawler will take your content to recapitulate it and sell it to others. Obviously it's more complicated than this, but this is how one might see it who wishes to use this file.
> Either way, this is an exercise in futility.
Not necessarily disqualifying. Laws against theft are also futile, in the sense that honest people don't need them and dishonest people don't follow them, and history since at least Hammurabi has been replete with examples of such laws not stopping theft. And yet. Seems worth the calories it costs to say "for the record, I do not give my consent for what you're doing".
Search engines and AI things are typically owned by the same company. AIs are fed with the data collected by a search engine. The only difference is whether AI gets the data in realtime or waits for the search engine to collect another data dump.
Fighting windmills as I see it.
I would back a general move to block crawlers from non-open models (whatever that means and if such a thing was practical) as it might be a strong lever to encourage good behaviour.
- Cloudflare
- Webserver-level user-agent blocking (Apache, nginx)
- Application-level user-agent blocking (`if request.user_agent == 'OpenAI'`)
None of them are ideal since you can simply change your user agent, but all of them seem like better options than robots.txt to me.
E.g. nginx $http_user_agent
Or did you mean sites? Information wants to be free.
If AI is trained only on data provided by those with agendas, you won’t want to live in that world.
e: Okay, this is funny.
The other point was to make this community maintained rather than rely on one source to provide all the inputs.
I'm thinking it also helps to bring up a feature request on the source material so we can all limit the drift. ie. if darkvisitors.com had a sort of plain text API, your repo could check for new entries via GH Actions and create issues or even PRs.
At least, that's what they say[0].
>What would stop an actor from maliciously complying with a robots.txt file by just going to the internet archive instead.
Nothing; as far as I understand scraping public web is legal or that's what courts are saying lately. Btw, it's mind boggling to be me that after 30 years of commercial Internet and Web, we still don't have a definite answer is scraping of public websites and public web content legal or illegal.
[0] https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...
I was more thinking from a public perception side instead of legal, but legal would be a good question too.
Something like, "Yeah I totally respected your robots.txt file the only reason I have your data is because I crawled IA, see they are the ones you should be mad at not us"
We also have to look at this bubble differently. With Crypto there was a general technical level that was required to use it that many people just did not have. I would say the same is true for most bubbles we have seen.
AI is generally available for anyone that has enough understanding on how to open a browser to use. Too many non technical people have just accepted it without understanding the risks.
Edit: You can really see just how bad this is going to get if you look at Apple. How they are deemed to be "behind" because they don't have generative AI right now and how they are so desperate that they are looking at Google.
I really don't think this is going to burst and really all that is going to happen is even more consolidation and we will be screwed.
> I really don't think this is going to burst and really all that is going to happen is even more consolidation and we will be screwed.
It's always FOMO until it's FUD.
Agree that hate is not the solution. But saying "no" decidedly and loudly is absolutely something that people need to relearn and I am happy to see that some do now.
Just being indifferent is not sufficient.
Finding and taking measures to protect yourself and others is a positive way to approach this IMO.
Indifference is what leads people to ignore climate change. Indifference is what leads people to allow corporations to destroy communities. Indifference is what allows global capitalism to keep going.
I am not concerned with an AI hype bubble bursting. AI already has enough power to strengthen the industrial society outside the hype. Perhaps hatred is not quite the right word. Perhaps love is a better one: a love for smaller communities and more sustainability. Wasn't it Che Guevara who said, "the true revolutionary is guided by strong feelings of love."?
Regardless, indifference is not he answer: it is passionate opposition with a zero tolerance policy towards AI, AI researchers, and AI companies.
That is a good thing.
Sometimes you have to work with the tide, because fighting it is futile and even self defeating.
Said the fish before approaching the waterfall.
While I agree that the incentive structure is set up in a strong way for AI to be further improved and rolled out, what's the endgame here? Who can build the most powerful centralized AI so that nearly everyone else is out of a job? And who is that going to benefit?
I just don't get it.
Have we all decided to "just play the game" and ignore how dumb it is?
Video game AI is obviously in a different league than ChatGPT but uses the same label.
Some AI is machine learning and some isn’t.
I agree with a lot of what you are saying, but I think there are valid use cases of AI (before chatgpt) that is actually a benefit.
You don’t use a smart phone, but auto correct is a genuinely great addition. It doesn’t remove anything the human does and improves the usability.
On its own autocorrect isn’t going to write a story. Even the suggestions that have been added in recent years are more for human usability than anything else.
Handwriting reading models, fall detection models, etc.
I do think we need to separate generative AI that replaces humans from traditional AI that offers assistance.
I know someone is going to argue, well chatgpt is augmenting me by checking my code, emails, etc. and that may be true right now but we are kidding ourselves if that will be the situation long term.
AI is just a branding buzzword by Big Tech companies....before AI, there were "machine learning" and "deep learning" buzzwords widely used. "Blockchain" and "SaaS" are also some of the most infamous buzzwords.
For as long as I can remember AI has been the term used for video games.
But that is the problem when talking about "AI". Because Video Game AI isn't even Machine learning, but we also call ChatGPT AI.
So I just caution against saying "No AI" when that could get rid of things that are nowhere near what we are currently talking about for AI.
I don’t see any effective way to fight it other than joining it. Any well intentioned steps can be easily and even unintentionally subverted by other players who have different perspectives on the ethical landscape.
It’s not that we have decided. Others will decide for us regardless of our decision.
How do you fight it by joining it?
To me, being against AI is pretty much 'evil'. You support humans existing in a barbaric existence like animals, suffering. I support laws to criminalize people hampering it like they are doing in Japan.
I literally will fight against people like you.
Your mentality is irresponsible and exactly the problem I have with AI.
Because we don't want nearly every job to be automated by AI?
"Crazy world"
It is necessary, it just isn't available. We still live in a capitalist society in which anyone not a member of the capitalist class is required to labor in order to afford the necessities of survival. AI means fewer opportunities to do so, despite the requirement remaining constant. No one is choosing to labor under this system, any more than one chooses to eat, drink or sleep.
AI is not being implemented to free the labor class from this obligation, it is being implemented to free the capitalist class from the obligation to provide the means of survival to the labor class in exchange for their labor. The end result will not be the labor classes living lives of luxury in creative and intellectual pursuit, but as much unemployment and poverty as the market can bear.
Personally, I don't want my job to be automated. I write for a living and if AI takes my job, I won't get paid. I prefer to create value in the world that other people appreciate. I don't WANT to sit in a concrete cage (an apartment) and consume media, with no real purpose in society.
Believe it or not, the majority of people in the world need to feel like they are working for something. Yes, some people will be able to find other causes (mine will be the opposition of AI), but others won't. Of course, that will mean the necessity of drugging people with media (and physical substances...why do you think marijuana is becoming legal in more places?).
The end result is a mode of pure consumption for almost all except the elite who control all the production, and they will decide what happens with the world. Personally, I don't want that: I want land and autonomy to use it to grow food and preserve ecosystems. I want the world to be sustainable, and not just set up for the purpose of furthering technology.
You speak of societal changes on a year-scale. I'm talking about decades and the long-term. This level of automation is bad, and won't do any favours for humanity except the ultra-rich, who will eventually perish like everyone else.
what do you think will happen to us devs if AI gets good enough to do our jobs? Do you think our companies will keep us around because we're just so darn smart?
What do you think is _already happening_ to in-house artists, content/technical writers, marketing analysts, and other jobs that are directly impacted by LLMs in their current form?
I assume you like being paid, buying things, food, etc.
Would it be great if we lived in a utopian society that money no longer mattered. Sure! Even with AI I see basically zero chance of that happening in any reasonable amount of time before AI destroys our society.
But after seeing environmental damage, reading widely in philosophy and sociology, writing about it to clarify it, I came to a different conclusion: that technology is not all it's cracked up to be, especially when it is plugged into a system of global capitalism whose ultimate aim is consumerism.
Just think: one of the biggest companies in the world (Google/Alphabet) has as its primary goal to promote unsustainability. If that doesn't make you think, what will.
And let me ask you this: are you so sure about technology, given that you were raised in a world that praises it like a religion? I think the fanatical religious witch hunters also thought they were right, simply because they were raised in such an environment.
Loving technology is the default position of the rich. Is that just a coincidence?
I love technology because it's interesting, who cares what rich people think. I can't understand this oldhead defeatist mentality a lot of people here seem to have similar to yours.
By the way, I am not defeatist because:
1. I think we can make great progress, only progress towards rewilding nature
2. I only consider technology a dead-end, not humanity! I believe we can move past arbitrary technological development and discard our consumerist ways.
Ignoring that your comment is phrased in a hurtful way towards the parent which got you a downvote from me, why do you think these two are connected in any way?
Humans are not separate from the nature we exist in.
You mentioned studying philosophy, you're confusing your ontological and epistemological positions.
...or, we could actually evaluate the transition of life from natural to technological. Yes, in a philosophical sense, you are right, what we do is "natural". But then if we just say everything we do is natural, then we might as well just do nothing. But there are still meaningful distinctions between the technological human organization (even if natural) and the rest of the world, and we would do well to examine if what we are doing is really harmonious with everything-but-us -- because if not, then I reject it outright even if it is natural in the sense that you describe.
Your wordplay is really not very impressive.