Why we chose not to release Stable Diffusion 1.5 as quickly(danieljeffries.substack.com) |
Why we chose not to release Stable Diffusion 1.5 as quickly(danieljeffries.substack.com) |
Yeah, you can stop pretending that the neutering is the right thing to do, clearly it's something you somehow are forced to do, due to some serious threat your received.
I don't see how NSFW photos can easily be stopped from being generated, with the model being open source. Maybe the model could be heavily pre-filtered to remove any photos that could possibly be used for NSFW images.
Which has a LOT of NSFW images in it. I suspect if you removed them from the training set it would go a long way to curb NSFW output but as you say people could easily train their own NSFW latent diffusion model.
I tend to thinking they are co-joined, but clarity helps.
I think the social harms side, they need to be careful to under-promise and over-deliver. The likelihood of preventing social harms is frankly close to zero, what they can do is make it more complicated.
Think like this: use stable diffusion to make one "actor" dance a lambada in the left field and save it. in a new state, make a different "actor" dance a lambada in the "right" field. Now using alpha masks combine the two actors. Can this represent sexy dancing? you bet your sweet bippy.
Promising not to release "two person sexy dancing" in this situation would be over-promising. Sure, it was done outside of the AI by masks. Will the law makers care?
(for actor and lambada and sexy dance, substitute whatever contextually means "harm" in a two-actor situation, semantically)
Who are these regulators?
With hardware prices going down and new GPUs and better algorithms coming to light, it’s only a matter of few years until anybody will be able to train custom versions as powerful as today’s AI, without protections, probably biased, etc.
Sure, they will be 5-10 years behind big corps, but it won’t matter once poorman AI will be good enough to matter.
Their new handlers can do anything to the contrary and are incentivized to curb release as well. The market is saying their new handlers are going to do that.
So we enjoy you proving us wrong!
That has to be a dig at OpenAI
Open source or not they are funded. And that funding needs to generate profit one way or the other.
This first release gave them the popular attention which they needed. It was successful.
Somewhat similar gut feeling to when popcorn time was released, although it might not be exactly the same.
While I really wish I'm wrong, my gut tells me that broadly trained machine learning models available to the general public won't last and that intellectual property hawks are going to one day cancel and remove these models and code from all convenient access channels.
That somehow international legislation will converge on the strictest possible interpretation of intellectual property, and those models will become illegal by the mere fact they were trained on copyrighted material.
So reminder to everyone: Download! Get it and use it before they try to close the Stable doors after the horses Diffused. Do not be fooled by the illusion that just because it's open source it will be there forever! Popcorn time lost a similar battle.
Get it now when there are trustworthy sources. Once these kinds of things go underground, it gets much harder to get a trustworthy version.
https://texaslawreview.org/fair-learning/
Here is a short quote from an IP lawyer:
“In terms of the ingestion of publicly accessible code, Ochoa said, there may be software license violations but that's probably protected by fair use. While there hasn't been a lot of litigation about that, a number of scholars have taken that position and he said he's inclined to agree.”
https://www.theregister.com/2022/10/19/github_copilot_copyri...
IANAL but I would take any opinions on this right now with a huge grain of salt and treat them more as advocacy than actual predictions of any legal outcomes.
Whether there is a good case for it being considered fair use doesn't matter at all until its actually litigated and historically the result with fair use in relation to new technologies has always been a crapshoot.
The result could easily be affected by the actual cases that get litigated, and one well chosen lawsuit where machine learning software is shown to produce output that's too close to the material it was trained on could result in a completely different outcome.
LLMs have the unfortunate limitation of being both powerful and lending themselves to centralized control choke-points due to how resource intensive they are to train. Under this paradigm, I fear commercial entities will be able to easily navigate the legal landmines and continually improve while open efforts perpetually lag far behind.
There are many vested interests who want this control for various reasons they justify as: protection from x-risk, keeping it out of the hands of abusers and bullies, economic advantage. Their reasons for want of control are either well intended but wrong-headed or profit-motivated and disingenuous.
Rather than challenging the likes of GPT-3 and Copilot enabling freedom, I fear folks will be forced to send all their videos, pictures, text and code to the servers of Microsoft, Amazon and Google or lose access to advantages as LLMs continue to improve at a rapid clip.
Because it's already good enough to have made it's way into many of my workflows.
I do feel that many companies will, ironically, use "ethical" as a pretext to not be open.
It was hard to accomplish, but you can finetune SD on your computer. They are working on instruction-tuning LLMs as well. In general ML models are not closed boxes inaccessible to us - they can be finetuned, reprompted, you can even average two versions to get a mix of two models. In the last 2 years lots of papers were written on finetuning and prompting, all of them geared towards low resource AI adaptation to new tasks.
Is it possible to crowdsource AI training with something that looks similar to folding@home?
The quality of the output you can get with the models right now have perpetual utility IMO. If you use it to create patterns, backgrounds, or even just for inspiration creations right now, it might be a shame if it didn't progress (depending on your position) but it's fine as-is if you put in the work to compose and refine the raw output.
While it'd be difficult to improve upon the model, it might be easy enough to finetune it if needed, and it's certainly worth it to USE it as is.
There is a limited number of models costing 6 digits in dollars in train time and are freely available. There is certainly value in preserving them, in a world of artificial scarcity.
I wonder if that will continue.
My understanding is that's partially because it currently relies on GPUs, which until relatively recently there was a limited demand for, and the market is basically controlled by a single company.
Will we see cheaper special purpose AI accelerators? Like happened with crypto mining ASICs.
We'll go thru the same with AI but ultimately it won't be stopped. As long as there's no world wide coordination limiting its impact, AI will continue its course.
Countries that don't do that will be just as successful in the world marketplace as are countries that don't respect copyright.
You mention Popcorn time. I wonder if torrents in general could be a great example of how something like this plays out? Torrenting took the world by storm and had an amazing "product-market fit" for the early internet days. Of course, downloading copyrighted material was always illegal but that didn't stop many.
Over time, legal but paid alternatives rose up: Spotify, iTunes, Netflix. These players found their place in the market by balancing the interest of copyright holders and the needs of users looking for cheap and easy access to entertainment.
Just as Netflix acquired large content libraries, same here. With enough money, large training datasets could be acquired in a legally solid manner.
It's interesting to think where this analogy might fail as well, and how the paths of these technologies could differ. For one, torrenting was mostly for entertainment, and thus impacted B2C first. On the other hand, language models are more so for media _creation_ and the B2B sphere.
Like torrents, you first have to resort to random websites who get randomly taken down as they acquire reputation. If a person takes the face and responsibility for something, he gets litigated into oblivion.
So you get to the point where trustworthy and untrustworthy sources are indistinguishable
. Now what they do is create untrustworthy sources. Like time for popcorn. Sow discord.
Fork several times, create intentionally malwared versions of both the program and the website. Keep kicking off the trustworthy sources of search engines, while magically skipping takedown requests for the less trustworthy websites.
Find ways to break old versions if possible, just to force them to keep moving. (they can make gradio randomly change APIs just to break the old trustworthy versions)
All of this can happen.
You didn't mention one of the largest (perhaps even the largest) distributor of copyright content (which happens to also be free, for now): YouTube.
You can watch/listen to endless amounts of copyrighted content (and other types of content) on there completely for free, and to say it's tremendously popular would be an understatement.
Google has made it work through ads. Perhaps something like that will happen with image-generating AI.
It reminds me to follow the datahoarder maxim that if you don’t admin then servers, you don’t have the data. So now I clone stuff to a local drive.
There is a legal machinery that works behind the scenes which we aren't always aware of.
Irony is the "NSFW" moral concerns, when the media companies put out such negative and filthy content as it is.
i was actively following torrentfreak at the time and there was genuine excitement with something incredible but that only lasted a week :-(
why do you say they lost the battle? the original team threw in the towel within the week but there are people who have taken the fight
https://github.com/popcorn-official/popcorn-desktop/releases... here, the latest release was on 04 Sep 2022 so it is very much in active development with a lot of people contributing https://github.com/popcorn-official/popcorn-desktop/graphs/c...
so while the original team might not be working on it, like a true free software, the code lives.
Get 200 interested people backing up 1 TB each and you have your 200 TB backup.
With redundancy and error correction data added to the mix, you should be able to lose a certain percentage of participants and still have access to the full, error-free backup.
That's the only possible interpretation, really. AI models algorithmically remix input intellectual property en masse, without any significant amount of human creativity, the only thing copyright law protects. As such, the models themselves are wholly derived works, essentially a compressed and compact representation of the artistic features of the original works.
Legally, a AI model is equivalent to a huge tar.gz of copyrighted thumbnails: very limited fair use applies, only in some countries, and only in certain use contexts that generally don't harm the original author or out-compete them in the market place - the polar opposite of what AI models are.
Just feels absurd to me because how is this different from any Human artist who you could equally say was "trained" on copyrighted material.
>Get it now when there are trustworthy sources. Once these kinds of things go underground, it gets much harder to get a trustworthy version.
People have already reverse engineered most text2image models and given enough hardware can train their own. There is no need for this hysterical take. As long as the internet exists you will be able to train these models.
Tit-for-tat. Regulators and artists don't want this? Okay, include in all open source software licenses that regulators and artists are now barred from using them without payment.
https://www.gnu.org/licenses/license-list.html
> [...] is a nonfree license because it extends the four freedoms only to some kinds of organizations, not to all. Such a restriction in a software license, in the name of any cause whatsoever, imposes too much power over users. Please don't use this license, and we urge you to avoid any software that has been released under it.
https://opensource.org/osd-annotated
> 1. Free Redistribution
> The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.
> Rationale: By constraining the license to require free redistribution, we eliminate the temptation for licensors to throw away many long-term gains to make short-term gains. If we didn't do this, there would be lots of pressure for cooperators to defect.
Also, discriminating like you suggest would make those licenses closed source by definition.
We can barely get governments to use open source even today, without restrictions. Hell, we can barely make them manage source code for commercial products they commission and pay for. I've walked into govt shops that were 100% binary dependent to the original software author, which never delivered source code and charged them trough the nose for the basic servicing.
Like it or not, the government and regulators represent us, we need individual accountability but harming the govt. directly harms ourselves firstly. The bureaucrats and the corrupt hardly care.
Personally, I think commercial use cases should get license agreements from the authors for their training data, but I think non-commercial exemptions to advance the field of AI makes sense.
Irregardless of what I think though, the UK has set an international precedent, and the EU is apparently discussing about possibly extending it to commercial use cases as well. So there's that.
Where copyright may be applicable is when the models reproduce original art without modification that a reasonable person wouldn’t know the difference.
The blog post says they are worried about the ability to use the model to "use it for illegal purposes and hurting people". I think that they are referring to the ability to create all kinds of compromising pictures (porn) with celebrities, kids, etc. Am I misreading that? They don't mention copyright anywhere.
The conspiracy theorist would say that if you were doing something you shouldn’t, you wouldn’t mention it. Instead, you’d give a more palatable excuse to buy yourself some time while you figure out how to get away (legally) with the thing you shouldn’t be doing.
Something that would make any non-trivial model a legal nightmare.
Encapsulates it all well I like this statement, total pottery
My very rough take on the situation: the company gained their notoriety by building on OpenAI's pioneering research but with an important twist of releasing their models as unneutered open source. Now, their openness is starting to falter due to strong pressure from outside forces.
If they're unable to continue playing the hardball game they themselves invented, I think their glory days will end as fast as they started. The competitive advantage was always their boldness. If they lose that, quickly others will take their place.
In general, I don't think tech that's as open, powerful and easily reproducible as these language models can be stopped. Sure, maybe regulations will delay it a bit, but give it a few years and any decent hacker or tinkerer will be dabbling with 5x better tech with 5x less effort.
"What we do need to do is listen to society as a whole, listen to regulators, listen to the community."
"So when Stability AI says we have to slow down just a little it's because if we don't deal with very reasonable feedback from society and our own communities then there is a chance open source AI simply won't exist and nobody will be able to release powerful models."
Looks like someone is leaning on them :(
https://reddit.com/r/StableDiffusion/comments/y9ga5s/stabili...
His comments regarding RunwayML’s release of 1.5 were especially interesting:
> “No they did not. They supplied a single researcher, no data, not compute and none of the other reseachers. So it’s a nice thing to claim now but it’s basically BS. They also spoke to me on the phone, said they agreed about the bigger picture and then cut off communications and turned around and did the exact opposite which is negotiating in bad faith.”
> “I’m saying they are bad faith actors who agreed to one thing, didn’t get the consent of other researchers who worked hard on the project and then turned around and did something else.”
After taking $100M in venture capital and two distinct drama events due to disorganization, this is unlikely to last.
I read than when some guys from 4chan started running the leaked NovelAI model, they generated porn non-stop for 20 hs or more, no sleep, no eating.
Even without conspiracy theories, these models cost upto 10s of millions to generate, no suprise investors wouldn't like if you are giving it all for free, there should be some revenue model.
IMO it is like finding a computer in a world without them. It is mind-blowing and it will take over your mind if you let it. For some folks that results in lots of porn, for others it'll be fear. My guess is that it'll wear off eventually.
They mentioned regulators here, and I would be curious to hear the story behind that.
Don’t want to go too tin foil hat, but it makes you wonder if a certain other AI company that claims to be “open” may be afraid of a company that actually is open and is applying political pressure.
Extremely likely that the FAANG lobbyists went into overdrive. The big guys know this will be an extremely important industry for the coming decades and don't want a new competitor swooping in with nothing to lose when established companies are forced to be cautious.
The reality is that bad actors have the resources to train their own stable diffusion on a dataset of whatever they want to deep fake and such delays do not slow them down one bit.
What it does slow down is normal people using those models.
From the smallest thing like mobilenetv3 through whisper, stable diffusion, CodeGen, and bloom those are huge productivity equalisers between the huge corpos and the little guy.
Also the same thing can be said about frameworks like huggingface's. Just recently I was looking for a way to classify image type (photo or not photo[clip art, cartoon, drawing]) in an android app. Of course first hits on Google stear towards Microsoft Azure's paid API service. I was unhappy with having to use an over-the-Internet-API (with potentially sensitive end user's private pictures) so in one day of work I managed to download a pretrained MobileNetV3. A couple of 10k+ image datasets and I wrote <50 lines of python to tweak the last layer and fine tune the network. On rtx 2070 training took 10 minutes. Resulting accuracy on real data? 90%+. The model loads and infers in few hundreds of ms on modern phones(instantiating and loading takes longer than the inference BTW).This is priceless and 100% secure for end users. For thilose interested in the details I use ncnn and vulkan for gpu(mobile!) inference.
Every commercial model maker's wet dream is to expose the model through an API, lock it behind a firewall and have people pay for access. This is not just hugely inefficient. It is insecure by design.
Take copilot by example. I'm perfectly happy for all my hobby-grade code to be streamed to Microsoft, but no chance in hell I'll use it on any of my commercial projects. However faux pilot run locally is on my list of things to try.
The first AI revolution was creation of those super powerful models, the second is the ability to run them on the edge devices.
https://danieljeffries.substack.com/p/why-the-future-of-open...
The people that he discredits as "leak the model in order to draw some quick press to themselves" are the researchers that are named in the Stable Diffusion paper. Yes, Stability.AI gave them lots of money. But no, they are not leaking the model, they are publishing their own work. It's university researchers, after all. And Stability.AI does NOT own the model.
1. The web UIs I have used are taking advantage of the same mental pathways as an electronic slot machine. Just like you can max out your bet on a slot machine and mash a button until you run out of credits, you can do the same on the hosted stable diffusion apps until you get a shareable hit.
2. Just like the dream you had last night, nobody wants to hear about it at breakfast, no matter how epic it was, because it's not backed by any meaning.
That said, I love stable diffusion and am an addict to it almost every day.
2) StabilityAI gave RunwayML compute time for them to train Stable Diffusion (they're also the creators of the original model). It's weird to categorize them as " other groups leak the model". They're the ones that created the model! (Source: https://huggingface.co/runwayml/stable-diffusion-v1-5/discus...)
However, that will inevitably spread to include more and more companies and will also start happening outside the US. All the research around this is being published and there's a lot of open source code that facilitates this. So, it's just a matter of people optimizing and improving that and hardware getting cheaper.
I expect that once that market is big enough, you'll see cloud providers step up with provisioning infrastructure for this stuff. It will still be expensive to use but it won't have a lot of limitations.
AI driven porn is basically the obvious use-case where there are some big companies with lots of money operating in that space and plenty of incentive to make this happen. Morally that might actually be preferable to exploiting people as is their current way of operating. The likes of OpenAI won't be able to do much to stop that.
Anyways.. this shit grinds me.. yet another "open source" AI proejct pretending to be fo rthe people.. finally get a massive valuation and now it's all "we must be security concious"..
Hypocrtyes and here is an interview with the founder of Stable Diffusion stating the exact opposite approach by "having faith in people"!
They got 100mil USD in founding and I feel like pressure squeeze them hard as they are trying to monetise models, but how you monetise open source models when someone can just fine-tune your weights and make better/faster/cleaner model and software without losing 10mil+ on training original.
You are always few mil behind rivals and after past few weeks which was PR nightmare they lost most of the "community driven" advantage.
I fell like they are extremely desperate for attention (drama was artificially created cause it clicks conspiracy) or they are just so chaotic and lack proper leaders that everything is burin.
Stability.ai took over the /r/stablediffusion subreddit. Community enflamed. They then turned the subreddit back over to the community
Stability.ai delayed 1.5 model. And now sent this justification. Community enflamed.
https://www.federalregister.gov/documents/2022/10/13/2022-21... (AI mentioned 4 times)
https://eshoo.house.gov/sites/eshoo.house.gov/files/9.20.22L... (at the very end "export controls" are mentioned multiple times)
What people need to understand is that the bar for worryingness shouldn't be "government looking into it".
Governments look into things all the time, and in such a diverse environment as the U.S legislative branch, we cannot just pack every opinion of every member into a single "government" momolith. That is why, in fact, we even have legislative systems with different representatives from different parties at all. This isn't an undesirable effect, this is how it's supposed to work, and in a good way.
Ugh. It feels like so many of these models are trying to censor NSFW material.
I think OpenAI changing their revenue model and corporate structure to better reflect how much money they were about to make really left a mark on the internet around trust in the AI space.
The default is going to be to assume that AI companies like stability have sold out, to that end it would not surprise me if even this minor incident leads to a splitting and a new open model that becomes popular.
I understand the point the author is trying to make. I understand what OpenAI is getting at with safety. I understand what the regulators are getting at.
But it is too late. The genie is already out of the bottle and granting wishes. What are you going to ban at this point? Math education?
It's time to accept that it's not that hard to come up with a few a100s and train models for harm if thats your goal. You can write code that harms people too. The answer is not to ban code. The answer is not to heavily regulate AI (not all countries will regulate it, it will be like banning gunpowder or electricity)
As for this particular release - what is being implied they were going to wait for? Figuring out the model? Regulation? The internet to start acting calm and reasonably? We don't even know what these models fully do yet. It's hard to imagine what you could know in 6 months vs now that would allow you to release with a big thumbs up.
More and more I'm realizing how politically controversial AI will become. Already today we're starting to see that on various axies. I think weirdly in a few years it may be a top issue.
IMO the AGPL goes a long way to solving that problem. But if the AGPL is not for you, I suppose you could use some non-commercial license terms. It seems like "closed source" is a much better fit for folks who want a great deal of control over licensees. In practice, "closed source" code can be published for licensees to see but instead of granting terms to all comers you could force people to ask you for a license, review their use case and only then decide to grant a copyright license -- with or without source.
Seriously, fire any coward lawyers erroring on the side of caution and get some that are versed in the NRA playbook.
So, they're going to delay their release so that if you type a naughty word it won't make a naughty image. You know what happens within hours? Someone releases a modified version of the weights that over corrects it back and makes it even more naughty.
0: magnet:?xt=urn:btih:3a4a612d75ed088ea542acac52f9f45987488d1c&dn=sd-v1-4.ckpt&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a6969%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337
Doesn't this ultimately result in local maxima? All the biases get reinforced and all the novelty (things the system hasn't seen/produced yet) goes away.
A tiny example: Dall-E (and SD) both struggled with eye positioning, for example. Wouldn't training a model on their output then reinforce that particular bias of poorly positioning eyes? Now multiply this by every existing quirk in the models.
You can't predict future legislation. Intellectual property legislation (which is an absolute cancer IMO) can outlaw models and their results. It can outlaw distribution of the data sets, the training, the models. Tech companies already acted way beyond the requirements of the law and effectively censured open source projects like popcorn time.
Can they prevent a determined hacker which already got everything? No.
Could this be the last model to be trained on a wide dataset, available to the public? Yes.
Could they make it a living hell where getting these tools in the future will only be from untrustworthy websites where half the download buttons give you an exe, and all your less tech savvy friends won't bother? Easily.
Could this tool become impossible for companies to use without risking litigation? Very easily.
People tend to forget those making the rules do not have their interests at heart, and every single intellectual property law is designed to leave companies and not people holding all the rights. And those laws can absolutely do damage. Do not underestimate the power of legislation.
At least with copyright law, there's an argument for training being fair use. If generative art becomes a notorious market for CSAM, everyone in the field goes to jail.
[0] Also, I'd like to know what your opinion is on GitHub Copilot. A lot of people decry Copilot for stealing code but love Stable Diffusion for being public, even though they're the same concept and trained in the same quasi-ethical way.
[1] https://www.reddit.com/r/StableDiffusion/comments/y9ga5s/com...
> getting these tools in the future will only be from untrustworthy websites where half the download buttons give you an exe
These models can already be downloaded via well known (ie community reviewed) torrents. So can many terabytes of labeled training data. This particular horse is well out of the barn.
>We want to crush any chance of CP. If folks use it for that entire generative AI space will go radioactive and yes there are some things that can be done to make it much much harder for folks to abuse and we are working with THORN and others right now to make it a reality.
I'm absolutely certain Linux has been used to kill children. Detergent, pesticide, and even pillows too!
Tools shouldn't be limited based on what the worst way they can be used. Stable diffusion is an absolute positive for society. Even when it's used to generate CP, every image that model creates is not one that involves a real kid.
The cat is out of the bag when OpenAI announced DALLE. Stable Diffusion only accelerated it a bit. Even if Sablity or lawmakers manage to prevent or outlaw open models, criminals will continue to build their own and ignore laws.
The only thing their reluctance does is harm Stability and give their competition a chance to catch up. Perhaps that's a good thing. Maybe it's time for another organization to take the lead.
I get the dilemma as a creator though. I wouldn't want my products to be used that way too.
I would add though - if Moore's law continues this will be almost unstoppable in a decade or too.
At that point, I assume RunwayML too talked about releasing it.
Then, a few months later they released it.
And suddenly the response is "How dare they"?
=============================
RunwayML, who co-authored the StableDiffusion paper and funded CompVis together with StabilityAI, have unilaterally released the newest model of StableDiffusion, version 1.5. It seems that this was done without StabilityAIs consent, who so far have held the finished model back to supposedly prune it of NSFW stuff. This is criticized by many and accusations exist that they are only doing this to make more money as the 1.5 model has been available for quite some time on their own website against a usage fee. Do note however that the 1.5 model has only very minor improvements over the 1.4 model.
The link to the model can be found here: https://huggingface.co/runwayml/stable-diffusion-v1-5
The release was accompanied by the following tweet from RunwayML:
https://twitter.com/runwayml/status/1583109275643105280
This was followed by an accusing statement by a - now confirmed to be fake - account claiming to be Patrick Esser:
https://media.discordapp.net/attachments/1023643945319792731...
The model was released under the following license which indicates that RunwayML were legally allowed to release the model:
Use-based restrictions as referenced in paragraph 5 MUST be included as an enforceable provision by You in any type of legal agreement (e.g. a license) governing the use and/or distribution of the Model or Derivatives of the Model, and You shall give notice to subsequent users You Distribute to, that the Model or Derivatives of the Model are subject to paragraph 5. This provision does not apply to the use of Complementary Material. You must give any Third Party recipients of the Model or Derivatives of the Model a copy of this License; You must cause any modified files to carry prominent notices stating that You changed the files; You must retain all copyright, patent, trademark, and attribution notices excluding those notices that do not pertain to any part of the Model, Derivatives of the Model. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions - respecting paragraph 4.a. - for use, reproduction, or Distribution of Your modifications, or for any such Derivatives of the Model as a whole, provided Your use, reproduction, and Distribution of the Model otherwise complies with the conditions stated in this License.
Trademarks and related. Nothing in this License permits You to make use of Licensors’ trademarks, trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between the parties; and any rights not expressly granted herein are reserved by the Licensors.
The license can be found here: https://huggingface.co/spaces/CompVis/stable-diffusion-licen...This was followed by a takedown notice from StabilityAI:
Company StabilityAI has requested a takedown of this published model characterizing it as a leak of their IP
While we are awaiting for a formal legal request, and even though Hugging Face is not knowledgeable of the IP agreements (if any) between this repo owner (RunwayML) and StabilityAI, we are flagging this repository as having potential/disputed IP rights.
The takedown notice can be found here: https://huggingface.co/runwayml/stable-diffusion-v1-5/discus...This was followed by a statement from RunwayML in that same thread:
Hi all,
Cris here - the CEO and Co-founder of Runway. Since our founding in 2018, we’ve been on a mission to empower anyone to create the impossible. So, we’re excited to share this newest version of Stable Diffusion so that we can continue delivering on our mission.
This version of Stable Diffusion is a continuation of the original High-Resolution Image Synthesis with Latent Diffusion Models work that we created and published (now more commonly referred to as Stable Diffusion). Stable Diffusion is an AI model developed by Patrick Esser from Runway and Robin Rombach from LMU Munich. The research and code behind Stable Diffusion was open-sourced last year. The model was released under the CreativeML Open RAIL M License.
We confirm there has been no breach of IP as flagged and we thank Stability AI for the compute donation to retrain the original model.
Emad, CEO of StabilityAI, has come forward on the official StableDiffusion discord stating that they are okay with the release and have taken down the takedown notice:https://media.discordapp.net/attachments/1015751613840883735...
https://media.discordapp.net/attachments/1015751613840883735...
Emad also says they didn't send the takedown request: https://cdn.discordapp.com/attachments/1032745835781423234/1...
https://github.com/AUTOMATIC1111/stable-diffusion-webui/
It is extremely active - author updates it 10-20 times per day.
I do agree that most of the dross I create is only of interest to me. OTOH I got some laughs with my "Liz Truss holding up signs with rude words on them" series from British friends yesterday.
Who stays awake at night wondering what meaning that cool picture of a dragon had?
They just enjoy it and move on with their day.
Same with AI-generated images... they just look cool, amazing, or hot.. whatever. Most people just enjoy them, without wondering what deep meaning they might have.
It's mostly art critics and the like who wonder what deep meaning Van Gogh's Starry Night or the Mona Lisa have (never mind abstract work like Jackson Pollock's). Everyone else just likes it or they don't.
Before this gets flagged to oblivion, this is obvious. You just have to recognise that the "regulators" and industry insiders Emad is trying to "shield" you from are enemies and ask yourself, how do I hurt them?
And reasonably technical people have zero issues, as it should be.
Media piracy has been abundant my entire life. It's never slowed down or become inaccessable.
What I would value much more is the writing, directing, editing, and acting... and you can't yet get very good quality of any of that through AI yet.
Maybe someday, but not today.
Sure, it's not on the most visited homepages of the world - but it hardly went away. Even on the most visited homepage it's just few clicks away.
True, but that might not matter to the general public, legislators, or judges.
I just read a new article on the BBC related to this: "Deepfaked: 'They put my face on a porn video'"[1]
The person in question was not physically abused. Only a fake porn video with her face on it was released. But she was still emotionally distraught over it, and said: "You start thinking about your family," she says, holding back tears. "How would they feel if they saw this content?"
This is a real concern to a lot of people, and I doubt they'll be swayed by people pointing out that they weren't physically abused.
So, yeah, there'll likely be a massive backlash against some AI-generated content and the software that generates it, and laws will likely be made against it in some countries.
But in other countries and on the dark web it'll still exist. There's ultimately no way of stopping it, and eventually people will come to terms with its existence and widespread availability, no matter what the law of some countries says.
https://youtu.be/0_BBRNYInx8?t=85 This video (released yesterday) talks about how SD takes an image, converts it to latents.
You'd need to decode those latents back to an image representation and scan. (possibly other ways but that's the most straightforward I can come up with although time intensive).
Many products benefit pedophiles in one way or another. Mobile phones, computers, video editing software, cars (vans?).
"But that's ridiculous, of course we can't prohibit cars just because they can be used by criminals."
Exactly.
More specifically, researchers and hobbyists would be made criminals in this hypothetical. Why would I stop playing with ML just because someone passed a useless and poorly thought out law?
Have you monitored its temperature while using it? Did your warranty cover it?
But sure, let's talk about outputs as well. From the second source we can see this from Tyler Ochoa:
"If there's only one good way to do it, OK, then that's probably not eligible for copyright. But chances are that there's just a lot of code in [the training data] that has used the same open source solution, and that the output is going to look very similar to that. And that's just copying."
I have seen some probable copyright violations from the output of Copilot, such as comments and some certain structural similarities that might be protected, although it is hard to say. But focus on the first part of what Mr. Ochoa is saying here, which is also laid out in this quote:
“In computer programs, concerns for efficiency may limit the possible ways to achieve a particular function, making a particular expression necessary to achieving the idea. In this case, the expression is not protected by copyright."
https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...
This allows for verbatim copies if they are utilitarian in nature!
As for why we should allow verbatim copies of utilitarian features... First, let's preface this with the substantial similarity of the structure, sequence and organization as established in Whelan v. Jaslow which amongst other things says that you cannot merely change the variable names if the expressive structure of the code remains the same. Now let's imagine 10,000 software developers who all implement Dijkstra's algorithm in C and then run it through clang-format. Aside from variable names, isn't it safe to assume that many of the implementations are going to be exactly the same?
As for why it was carefully selected... more often than not when I bring these things up people who feel upset about Copilot go off to cherry-pick some random quote out of context in order to support their upset feelings. Therefore I'm highlighting the important parts as to help people look beyond their upset feelings.
This is a complicated and nuanced matter. Attempting to channel everything through the lens of "this makes me personally feel bad and must be completely wrong" does not help the discourse. It may make you popular to a certain crowd but it might be unpopular to the public at large and it might also be incoherent from a legal standpoint, akin to bashing your head against a wall at a weekly meetup of the local heads-bashing-against-walls club.
There is plenty of room for discussion on what constitutes not only the legal interpretation of fair use and the idea/expression dichotomy but also the bigger picture. The knife always cuts both ways. Would it be acceptable to the open-source community if Microsoft could stop anyone from publishing Dijkstra's algorithm in C# because they wrote it first?
That's a very petulant way to defend cherry picking. I wasn't asking you to support one particular view; in fact that's the problem I was identifying. Your sources presented a balanced view, which you misrepresented by citing only the part that supported your own conclusion.
> focus on the first part
No, because the second part matters too. Here's Lemley and Casey again (emphasis mine):
<<<some purposes—say, ... a translation program that produces a translation of an entire copyrighted work—seem more substitutive than transformative, so that if they run afoul of the ever-broadening definition of similarity in music, fair use is unlikely to save them.>>>
Or the Register:
<<<"I actually think there's a decent chance there is a good copyright claim," said Tyler Ochoa ... the functional nature of the code means that reproducing it in a suggestion may not be seen as particularly transformative, which is one of the criteria for determining fair use>>>
Those are your own sources undermining - if not outright contradicting - your one-sided interpretation.
The limitation to market harms in the four-factor test for fair use should not be considered permanent. Law is, after all, a social construct. There's ample precedent for considering harms to the commons, to communities, and so on in other areas of law. Also, there might indeed be market harms. If a company open-sources some of their code but also hopes to profit by selling it in pre-packaged form or as a service, then AI-washed copying could constitute harm in even the most market-myopic terms. The "transformative" test is also pretty suspect in the context of AI-assisted copying, but this is getting long enough so I'll not go down that rabbit hole just yet.
> <verbosity about "utilitarian" copies which are not the issue here>
Enjoy your red herrings. I don't share your taste for them.
When Tyler Ochoa is saying that there is a decent chance of a copyright claim he is specifically talking about the output of the model.
Here is the full quote:
In the Texas Law Review in March, 2021, Mark Lemley, a Stanford law professor, and Bryan Casey, then a lecturer in law at Stanford, posed a question: "Will copyright law allow robots to learn?" They argue that, at least in the United States, it should.
"[Machine learning] systems should generally be able to use databases for training, whether or not the contents of that database are copyrighted," they wrote, adding that copyright law isn't the right tool to regulate abuses.
But when it comes to the output of these models – the code suggestions automatically made by the likes of Copilot – the potential for the copyright claim proposed by Butterick looks stronger.
"I actually think there's a decent chance there is a good copyright claim," said Tyler Ochoa, a professor in the law department at Santa Clara University in California, in a phone interview with The Register.
The use of the word "but" marks the transition from a discussion around the model itself to the outputs of the model.
Is it not also perfectly clear that Lemley and Casey are also of the opinion that the model itself is fair use?
The law moves slow. Even were that to happen eventually, the laws will very likely be challenged in courts, and those will take a while to be resolved.
Finally, even if the US outlaws this, there'll be plenty of other countries where it'll be legal. There's plenty of infringement of US copyright in China, for instance. The same is likely to happen in regards to AI that's illegal in the US but legal elsewhere.
By the time the law catches up, model creation may become so easy to create by individuals instead of just by deep-pocketed corporations, it may be practically impossible to stop.
Incorrect. In the absence of new legislation case law is how these things get worked out, but new statutes could be passed and could void prior case law.
https://www.copyright.gov/title17/title17.pdf
Let me expand on this a bit... if you read through the above text (feel free to search for the below terms) of the current laws around copyright you will notice that there is no discussion of:
https://en.wikipedia.org/wiki/Structure,_sequence_and_organi...
https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...
https://en.wikipedia.org/wiki/Idea–expression_distinction
These legal doctrines are the result of the details as hashed out in case law.
This is distinct from countries that use civil law jurisprudence. Common law jurisprudence relies heavily on case law.
Long live clopen source!
Definitions are not what matters in the end. Why doesn't the viral and restrictive element of GNU's GPL license make the license "non-open"?
"Open source" has a definition (https://opensource.org/osd) and the GPL meets it, because it doesn't prevent derived worries from being distributed under the same license.
Then Amazon comes in and sets up their own even easier hosting while providing none of the profits or development effort back to the original project. The AGPL is no problem because Amazon is fine to publish any changes they make, thats not an issue because the real product is the hosting and not the code.
There's nothing particularly "modern" about this "problem". Redhat caused this problem with GPL and BSD-licensed code in 1993.
It's a feature, not a bug. Some people probably didn't like it then either.
If you resent companies and people making money from your work, you're not really embracing the ideals of open source. So just keep it closed source. Like I said earlier you can always reveal the source to licensees when they pay you, and you can prevent them from distributing the source.
AGPL is for code, not for infrastructure, and you can still profit off of AGPL-licensed code by:
1 - selling support for it
2 - getting donations
3 - asking for money in return for working on feature requests
4 - being paid by a corporations to work on an open source product they themselves use and profit from
The problem is releasing stuff as open source and expecting to profit off it while preventing others from doing the same. You fundamentally can't. Why bother with open source with an attitude like that?
The AGPL is easy to comply with in a way that still earns a company profit while the software authors get nothing.
Agreed that non-commercial public source or just closed source is what these people should be doing.
Funny way to admit that your original statement was flat-out wrong.
There is no separate "civil law jurisprudence" and "common law jurisprudence". Common law by itself is by definition not jurisprudence. Civil law combines aspects of statutory and common law (as one form of precedent) into a single system. Some aspects of common law even make their way into criminal law. Instead of just Googling for buzzwords, learn what they mean before you try to bluster your way through an argument with them.
Oh, you mean the very first thing I had to explain to you at the beginning of this exchange because you seemed to be ignoring it? Very little of these discussions has been about the models. Most of the discussion is about the outputs, and there the fair-use case is - as Lemley/Casey and Ochoa both concede - much weaker.
But by all means keep going on about feelings. We can all tell it's not others' feelings that are being hurt by mere disagreement.
The entire art tradition is based on copying for study, and then using our brains to convince ourselves that we've "transformed it enough" or "my reference is obscure enough that no one will know."
Now we've simplified that process, and more people are exposed to the risk. I hope that the law takes a minute to evaluate the pace of change instead of saying "ITS TOO DANGEROUS, we must BAN IT", but my hopes are low.
Now that we can readily generate photorealistic CSAM, there's little to no risk of inadvertently creating an customer base for actual CSAM.
I mean, some SD applications like the interior designer market this as a great tool for potential buyers to try out ideas before they buy.
Well as long as they have a recent-ish NVidia card, rip AMD users.
- lexical filtering by applying a blacklist of artist names on the original prompt
- perceptual filtering - drop all generated images that look too close to copyrighted images in your training set
- re-captioning based filtering - use a model to generate captions for an image and apply filters on the captions; you can also filter by visual style
- CLIP based filtering where you use embeddings to find nearest neighbours, and if they are copyrighted then you can drop the image
- or train a copyright violation detection model that takes generated images and compares them to images from the original authors
Copyright enforcement struggles are going to be interesting to watch in this decade. But I think it will slowly become irrelevant, because anything can be generate again slightly different until they finally pass the filters.
The iPod would not have had the impact it had without piracy.
Is it legal? I don't know. I guess they have the fig-leaf of taking down copyrighted content when asked.
Fig-leaf not withstanding, if Google (YouTube's owner) didn't have such deep pockets I'd be amazed they didn't get sued in to oblivion like Napster.
AI thrives and depends on large amounts of clean, well labeled data.
Large corporations understand this and have hoarded data for a long time now. Some of them have also managed to label this data by millions of people through things like Recaptcha, or just by hiring lots of people to do it.
Open datasets tend to be much smaller and dirtier than small, open projects have access to.
I suppose it would be possible to, over time, collect lots of data and crowd-source some project to clean it up and label it well enough to be useful, then crowd-source the AI model training itself, but it would probably take a long time and by then corporate-owned AI models will already dominate (as they do now with MidJourney, for example, being way better in my experience than Stable Diffusion, but with time the difference will only get starker).
I'd also be concerned with such ostensibly open projects eventually going closed and commercial as IMDB did after getting lots of work by volunteers freely giving their time to writing reviews.
More recently the open LAION data sets have become widely used by both tech giants and independent researchers.
The problem is DL is really sensitive to dirty data, disproportionately so.
At $DAYJOB once we cleaned the dataset, removed a few mislabeled identity/face pairs (very few, about 1 in 1e4) and the metrics goes up a lot.
A fair compromise is any marketplace for clusters with good interconnect but a lot cheaper than the cloud. Tuning distributed training and network transport layer for settings not as homogeneous as the cloud will also help on top of generally good interconnect. Security is a concern.
Building on points raised by pmoriarty, being able to scrape data makes up for lacking labeled data in the era of self-supervised training. IP-hawks are now putting a damper on that option, which is why I worry this might backfire from a freedom perspective.
I even think there will be multiple initiatives like this, and there will be at least 1 big repository that accepts inputs and retrains periodically for anyone who wants the model.
This should be next on my list since my current project depends on SD model and having data backed up gives me confidence that I can get rid of all their stuff if needed.
You don't even need to store it all at once on your computer: stream it and generate checksums on the fly. Then distribute the torrent, and seed sections at a time. It can also be distributed on IPFS.
I've seen a lot of torrents being used for distributing neural network (mostly stable diffusion forks).
Seconded, actually. I do have a bad habit of assuming people already know this.
> Official sources are recommended for this reason.
Very not seconded; see for example comments elsewhere in this thread about untrustworthy sources for popcorn time, and recall that the GP was specifically discussing the risk of Stability AI deciding to kill this.
So a paywall. I agree it's a annoyingly misleading term for the concept and would be happy to hear better alternatives[0], but I haven't found one yet.
0: like eg "passphrase" instead of "password" or "assuming the conclusion" instead of "begging the question" for their respective concepts
This is part of the motivation for “foundation models”.
There’s another paradigm called student/teacher models where a randomly initialized model updates it’s weights according to another pretrained model. This could (maybe?) be used to achieve the desired effect of a model that learned in a “clean room”.
If you constantly watch videos of people eating cheeseburgers, you might want to eat a cheeseburger yourself.
I'm familiar with the research in this area, and that's not something you can say confidently; most work (and by that I mean 2 or 3 papers in total) has gone into investigating the role 'generated' depictions of CSAM play in the collections of hoarders. No psychological study, as far as I'm aware, has conducted an investigation on those who enjoy cartoon material akin to what you might find in a Japanese manga.
In fact, there's some evidence against what you're saying; anthropological research on fans of cartoon material ('lolicons' or 'shotacons') in Japan shows that their communities draw hard lines between '2D' and '3D' not just in this area of sexuality, but in their sexualities as a whole. This sexual inclination toward the 2D world is termed the 2D-complex and is akin to 'digital sexuality' or fictophilia, not pedophilia.
By way of analogy, perhaps BDSM would work as a good counter point to you. Many people (some studies suggest the majority of people) engage in 'rape fantasies' or other such fantasies of illegal or immoral nature, yet although actual depiction of rape is rightly banned by the state, its simulated variants are not, and we are comfortable to acknowledge that sexual desires do not always manifest in real life, and sometimes the thrill of fantasy itself is the attraction. To make it real would, ironically, defeat the whole point.
Hasn't this nonsense been thoroughly debunked by multiple studies at this point? I would assume evidence and "modern thinking" supports the exact opposite of what you claim, unless by modern thinking you mean the same thinking that tries to hide research they don't like.
Video games do not cause violence. End of story.
[0] Evidence Mounts: More Porn, Less Sexual Assault. https://www.psychologytoday.com/us/blog/all-about-sex/201601...
More likely people will just generate more synthetic content to consume.
this is retarded, if you watch a movie, play a video game, read a book with crime events then you will become a criminal. We have a ton of shooter games and still no evidence that this caused more gun violence around the world.
It's not well explored at all and you just made that up lmao.
That's akin to the idiotic arguments of the past that "allowing people to see homosexuality will make them homosexual!"
Completely ridiculous.
No one will go to jail, except maybe some people who get caught creating, distributing or collecting those images.
Keep in mind, there already are a lot of illustrated/anime style pictures of CSAM on that site of years though (something that is legal in many countries), so it's sort of becoming a blurred area as these AI art generators are still somewhat like that but now are getting to be more photorealistic.
As far as the models not being trained on NSFW content, there was already leaked models that were, and there are unofficial models trained by outsiders using SD that are specifically trained on for example adult image websites.
A human artist is obviously capable of generating CSAM, even if they have never seen that before.
Filtering of training data is countered by increasing capabilities to generalize:
Two years ago, that was a viable strategy: models could barely produce what was in the training data again.
Today models can generalize much better and compose concepts they have been trained on into new concepts that they haven’t.
Two years from now, filtering will be irrelevant.
Detection becomes easier - is it pornography with a child in it?
Generation starts to become trivial - this video, but this person has the features of an X year old.
At least in the latter case no-one's actually getting raped.
Then there are narratives. They are weaved so that the suggested actions and solutions will somehow fit the interests of the participants. The narrative can be CSAM, it can be copyright of artists and owners of the training set, the narrative can be disinformation. The narrative doesn't care that current laws do not prohibit anything and that it's all legal. The narrative justifies actions the participants wanted to do because of their interests.
And finally there are actions. They can push legislations, but that's not the only tool (and yes it's slow). Companies can always comply and cooperate, especially when their interests align. Google itself is a participant, with Imagen. They can create a restrictive policy and kick things off their search engine, because that is in their interests too, not because of a narrative or legislation. Just like they profited in YouTube for every piracy site suppressed.
The interests of every single company is stacked against individuals running this at home for free. There are enough narratives to be weaved to justify actions which would stop that.
For decades, and in many countries even today, just getting paid to drive someone in your car is illegal, and you need a "taxi license". It doesn't need to make sense. We could end up with required license to use generative AI in 10 years and nobody would bat an eye after decades of propaganda and narratives.
I mean this isn't even speculative anymore after what happened with - hilariously named - OpenAI
> that many companies will, ironically, use "ethical" as a pretext to not be open.
Yes, weaponized ethics as sleight of hand for control is a common historical pattern.
This is exactly why I pay $30 per month for MidJourney. The output is just phenomenally better than most of the images coming out of SD, and the UI is much better as well. It's just not worth my time fiddling with SD if the results are so bad in comparison.
If/when SD catches up, I'd jump ship to using it in a heartbeat.
In fact, generally DL is quite tolerant to label noise, especially using modern training methods such as SSL pretraining.
https://arxiv.org/pdf/1705.10694.pdf https://proceedings.neurips.cc/paper/2018/file/a19744e268754... https://proceedings.mlr.press/v97/hendrycks19a.html
Or multiple IPFS CIDs. I think you can have a "directory" (CID) that contains multiple CIDs, and only need the content hashes to build it.
You can also publish multiple CIDs and ask people to seed random ones; that's how Libgen does it (and is similar to the multiple torrents concept).
The same file can be used to seed both torrents and IPFS.
Same with CP. You have to be sick to enjoy it. Very sick.
One example of a bad thing - you could easily imagine an instagram bot that looks for pictures of people with their kids, then uses a Stable Diffusion like model to produce pictures of the people having sex with their kids, or horrible things happening to the kids, and reply to the target account. The bot might threaten to post the pictures and accuse the person of being a pedophile unless the person pays X in bitcoin (or whatever). Or, the bot could just post such pictures for fun.
I think we don't know if fake CSAM will have a good or bad effect on pedophiles and, sadly, there is no real way to reliably test that (so far as I know). Fake CSAM might placate pedophiles, or it might whet their appetite. It's hard to know what to do.
I think we will eventually get to the point where very good unrestricted image generation models are available to the general public. When that happens there will be chaos - you will live to see man-made horrors beyond your comprehension.
And of course, I'm not looking forward to the world ushered in by free roam with this technology, mainly for the reasons you stated.
I’m referring to legally, sorry I should have specified.
There are way too many factors at play to simply point at porn, which is probably harder to obtain now in all honesty. I found many random porn magazines/pages as a child. Never did I ever go looking for it, but finding it was always a thrill.
People buy less magazines now (based on convenience store shelves increasingly excluding them.)
But they are not harming anyone else.
> I would not assume that the reactions to simulated sexuality is the same as simulated violence.
A would not assume anything. Conduct research and draw conclusions. Don't speculate.
This is the right conclusion, but the logic is entirely wrong.
The reason why video games do not cause violence is that play violence is not anywhere close to the real thing, not that people firewall off fiction from reality. There's plenty of cases in which a piece of fiction has changed people's views! Crime shows are notorious for skewing how actual juries rule on cases. Perry Mason[0] taught them to expect dramatic confessions and CSI[1] taught them to weigh whiz-bang forensics over other kinds of evidence.
In the specific case of porn, there isn't really a difference between "play sex" and "real sex": they poke the same regions of your brain. And the people who are responsible for keeping actual pedophiles from reoffending are pretty much unanimous that the worst thing you can do is give them a bunch of, uh... let's call it "material". So if you're already a pedophile, giving you access to simulated CSAM won't substitute for the real thing. It'll just desensitize you to reoffending.
>> Conduct research and draw conclusions. Don't speculate.
I've favoured the MIT licence for what little OSS I've published thus far. But, I'm becoming increasingly concerned that ruthless profit-above-all-else driven companies can include my (benign) work in systems that causes real harm.
Perhaps something like the OpenAI approach to their GPT-3 deal with Microsoft is better. That is, if the work Microsoft do with GPT-3 goes in a direction OpenAI doesn't like, OpenAI reserves the right to veto the work [1].
[1]: https://www.ted.com/talks/the_ted_interview_the_race_to_buil...
A terrible idea for a number of reasons (in terms of legal enforceability, unintended side effects, and more). The following two articles do a good job of explaining why such a license really isn't practical:
https://perens.com/2019/09/23/sorry-ms-ehmke-the-hippocratic...
https://www.gnu.org/philosophy/programs-must-not-limit-freed...
>The Software shall be used for Good, not Evil.
AIUI, it was put it mostly as a joke.
https://www.cnet.com/culture/dont-be-evil-google-spurns-no-e...
Your first assumption is that your inventions are important enough to be of use to “bad people”.
The other is your assumption that you have the objective ability to determine good from bad uses of a benign invention.
I’m increasingly looking for the psychological reasons why these ML models and their outputs cause such an emotional reaction in certain individuals.
For example, the language of opponents of Copilot speaks in absolutes. And when presented with the history of copyright when applied to software the opponents seem to not register that copyright (logically) does not extend to the non-expressive parts of a work.
“In computer programs, concerns for efficiency may limit the possible ways to achieve a particular function, making a particular expression necessary to achieving the idea. In this case, the expression is not protected by copyright."
https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...
This allows for verbatim copies if they are utilitarian in nature!
As for why we should allow verbatim copies of utilitarian features... First, let's preface this with the substantial similarity of the structure, sequence and organization as established in Whelan v. Jaslow which amongst other things says that you cannot merely change the variable names if the expressive structure of the code remains the same. Now let's imagine 10,000 software developers who all implement Dijkstra's algorithm in C and then run it through clang-format. Aside from variable names, isn't it safe to assume that many of the implementations are going to be exactly the same?
Now, this doesn’t mean that GitHub is not in violation of other copyright claims, such as clearly expressive parts like comments and more!
I don't argue that such licenses are bad (though the FSF might), just that they are neither open source nor free.
The Slaughterbots campaign argued, rightly, I think, that advanced autonomous lethal weapons should be suppressed because they enable unethical uses and unscrupulous actors far more than legitimate defense.
It can't really be seen in isolation from the environment (social, economical, etc) it's going to come into I suppose, but in the real, concrete world we have creating them is not a neutral act, and some of the consequences are reasonably predictable.
It's not an easy question. However, as the creator of the software I guess I feel that my opinion should count in how it's used. As a simplistic example, if in some dystopian timeline my OSS were used to facilitate a holocaust I'd like to be able to do something to halt that. It doesn't matter that the perpetrators feel that what they're doing is right.
I'm not trying to be contrarian here, I was curious why not and why this isn't a thing already. I'm just more of a programmer guy and less of a lawyer guy