This isn't ML, it is a ripoff and is violating clear software licensing terms. https://news.ycombinator.com/item?id=27710287
Software freedom matters, but I wouldn't expect the typical HN type to understand, since their money is made on exploiting freely-available software, putting it into proprietary little SaaS boxes, then re-selling it.
If anyone thinks they don't, ask why Microsoft didn't train Copilot on their Windows, Office, or Azure source repositories.
For one, just because your code is covered by the GPL, it doesn't mean every single line in isolation is copyrightable. It has to demonstrate creativity. That's why you don't have to worry about writing for (int i = 0; i < idx; i++) {.
This means that the output of any algorithm on copyrighted code is still under the original copyright. I mean, we still apply the copyright of the original to the output of compilers, even though compilers can be transformative with inlining and link-time optimization, to the point that it mixes disparate code in the same way Copilot does.
In fact, I wrote some software licenses [1] that codify the fact that algorithms cannot change copyright.
Microsoft did not just copy individual lines. They fed whole repositories into their model, ignoring the license (if it exists) even though they knew from the start that information generated by the model will be publicly available. Available usually out of context, but nonetheless - the scope of the input and intent are very clearly "everything" and "redistribution".
Just adding a filter/ML model to the output shouldn't matter. I dare you to build a Copilot clone trained from leaked internal Microsoft code and then trying to argue the output is a bit mixed up.
That is a clear violation imho.
OSS licenses have been litigated and upheld. Can't supply details of my own experience for confidentiality reasons but plenty of plaintiffs have prevailed in suits about violations of OSS license terms. My guess is the numbers are higher than you might think because a lot of the cases end in non-public settlements.
Then there's private repositories. If they included those in the training data set that's even more actionable.
Personally I think this is software piracy at an absolutely unprecedented scale. Machine learning is just information transfer from the training data into weights in a model, a close relative of lossy data compression. Microsoft is now reselling all its GitHub users' code for profit.
I'd argue Microsoft too, was/is overconfident about how this would play out. I would have expected a little more caution on selecting the training data.
copilot is known to reproduce entire blocks of text including non functional parts like comments.
> it doesn't mean every single line in isolation is copyrightable
It is if you can prove reproduction apart from your own original work (fair use). Unlike patents copyright doesn’t protect uniqueness. It is only a shield from reproduction, and if reproduction is demonstrable to a court you are likely at risk.
Some of us think is detrimental to humanity at whole.
Copyright certainly matters. It's a big deal legally and economicically all over the world.
Suppose that it's just a bad idea and shouldn't exist. Does that mean that I should release my code into the public domain? I think you could make a good case that even being totally opposed to copyright morally or pragmatically or otherwise, given that it currently is enforced in many places it's worthwhile to play along. For example, some people would prefer a world without copyright, but GPL their code, because it might prevent a greater evil.
I have a feeling Copilot is more of a tool for publicity than for development.
You don't have to use Github to have a skin in the game. As long as someone has access to your open source code, no matter where it's hosted, anyone is free to upload it to Github. The open source license of your code allows that.
So much this. If a neural network is capable of regurgitating code verbatim (with comments!), it's not a stretch to say it's a derivative work of the GPL code used to feed it.
[0] https://www.gnu.org/software/repo-criteria-evaluation.html#G... [1] https://github.blog/2021-01-05-advancing-developer-freedom-g...
But, github could easily establish a non-us entity to host export restricted code. And for savannah, if anyone had any code they were worried about export control for their code, savannah would quickly and easily have an independent person host that repo outside the US.
https://stackoverflow.com/legal/terms-of-service/public#:~:t...
> You agree that any and all content.. that you provide to the public Network... is perpetually and irrevocably licensed to Stack Overflow on a worldwide, royalty-free, non-exclusive basis pursuant to Creative Commons licensing terms (CC BY-SA 4.0)
Technically a lot of people who copy from Stack Overflow are breaking CC BY-SA 4.0 since it requires attribution AND requires distributing code that uses it under the same license ( I think - I am not your lawyer) :
As an example I would like to see is a Cosinger, where the AI is trained using songs on youtube and streaming services. With the final product, a user start to sing and the algorithm attempt to sing along and give the singer suggestions for how the song should continue. I could see how a lot of musicians would be willing to pay good money for such program, and removing obligations to pay any money for the training set would make it much more feasible to create.
There are already AI's that create music (through unlikely from proprietary training sets). A Cosinger shouldn't be too far from that.
The same difference as allowing Google to prosper while beating down ThePirateBay, another search engine.
When copilot came out, one thing it reminded me of was the ethical considerations of face generators in animation. The output naturally has some similarities with the training data, and it is trivial to use a limited set of actors in order to create faces with canny similarities of the actors. A question that people asked (here on HN if I recall) was if you needed permission from those actors to use in the training set, or if this would allow anyone to "steal" the face of public faces and create semi-look alike that can then be used in anything from porn to advertisement.
The law is undoubtedly going to catch up.
So, why call for white papers? I don’t believe they will publish any papers that go against their views.
I think that's a backwards because it's putting the conclusion first then seeking to justify it, but to each their own.
They are asking for views on the machine learning, which they do not have arguments or a position on.
Isn't that literally a lawyer's job?
Having tested copilot, most suggestions are based on existing code in your opened file. Furthermore, most snippets tend to be relatively short, where it feels more like a Stack Overflow answer than existing code.
Of course it is possible to make the model generate longer pieces of code that are potentially GPL. But you would have to do certain effort for it. It also tends to adopt your coding style.
But maybe the fact that there are no guarantees makes it unfair.
[GitHub Copilot License Config Menu]
Show suggestions with the following tags:
- [ ] GPLv3
- [x] GPLv2
- [ ] AGPL
- [x] CC-BY-SA
- [x] Apache License
- [x] MIT License
- [ ] No License AttributionAnd they need a report button with a picklist of reasons.
i wonder if they could retrain the model on BSD or MIT licensed code only; How much of the open source code is licensed as GPL vs more permissive licenses, does anyone know?
Interesting that they want to charge for the use of co-pilot, I guess that we will see this business model more in the future.
A little nitpicky, but the only proprietary part it requires is the plugin itself, not the IDE—Copilot runs just fine with the Free build of VS Code compiled from source from GitHub, after flipping a switch to enable WIP APIs.
I did it two days ago, installing the Copilot plugin in a Free build of VS Code provided by my distro.
Same link, just 13h ago, but with 5x less upvotes than the one in here: https://news.ycombinator.com/item?id=27992894
My money's on yes, but this isn't settled until SCOTUS says so.
>How likely is the output of Copilot to generate actionable claims of violations on GPL-licensed works?
This depends on how likely Copilot is to regurgitate it's training input instead of generate new code. If it only does so IF you specifically ask it to (e.g. by adding Quake source comments to deliberately get Quake input), then the likelihood of innocent users - i.e. people trying to write new programs and not just launder source code - infringing copyright is also low. However, if Copilot tends to spit out substantially similar output for unrelated inputs, then this goes up by a lot. This will require an actual investigation into the statistical properties of Copilot output, something you won't really be able to do without unrestricted access to both the Copilot model and it's training corpus.
>How can developers ensure that any code to which they hold the copyright is protected against violations generated by Copilot?
I'm going to remove the phrase "against violations generated by Copilot" as it's immaterial to the question. Copilot infringement isn't any different from, say, a developer copypasting a function or two from a GPL library.
The answer to that, is that unless the infringement is obvious, it's likely to go unpunished. Content ID systems (which, AFAIK, don't really exist for software) only do "striking similarity" analysis; but the standard for copyright infringement in the US is actually lower: if you can prove access, then you only have to prove "substantial similarity". This standard is intended to deal with people who copy things and then change them up a bit so the judge doesn't notice. There is no way to automate such a check, especially not on proprietary software with only DRM-laden binaries available.
If you have source code, then perhaps you can find some similar parts. Indeed, this is what SCO tried to do to the Linux kernel and IBM AIX; and it turned out that the "copied" code was from far older sources that were liberally licensed. (Also, SCO didn't actually own UNIX.) Oracle also tried doing this to the Java classpath in Android and got smacked down by the Supreme Court. Having the source open makes it easier to investigate; but generally speaking, you need some level of suspicion in order to make it economic to investigate copyright infringement in software.
Occasionally, however, someone's copying will be so hilariously blatant that you'll actually find it. This usually happens with emulators, because it's difficult to actually hire for reverse engineering talent and most platform documentation is confidential. Maui X-Stream plagiarized and infringed PearPC (a PowerPC Macintosh emulator) to produce "CherryOS"; Atari ported old Humongous Entertainment titles to the Wii by copying ScummVM; and several Hyperkin clone consoles feature improperly licensed SNES emulation code. In every case, the copying was obvious to anyone with five minutes and a strings binary, simply because the scope of copied code was so massive.
>Is there a way for developers using Copilot to comply with free software licenses like the GPL?
Yes - don't use it.
I know I just said you can probably get away with stealing small snippets of code. However, if your actual intent is to comply with the GPL, you should just copy, modify, and/or fork a GPL library and be honest about it.
To add onto the FSF's usual complaints about software-as-a-service and GitHub following US export laws (which, BTW, the FSF also has to do, unless Stallman plans to literally martyr himself for--- oh god he'd actually do that); I'd argue that Copilot is unethical to use regardless of concerns over plagiarism or copyright infringement. You have no guarantee that the code you're actually writing actually works as intended, and several people have already been able to get Copilot to hilariously fail on even basic security-relevant tasks. Copilot is an autocomplete system, it doesn't have the context of what your codebase looks like. There are way better autocomplete systems that already exist in both Free and non-Free code that don't require a constant Internet connection to a Microsoft server.
>Should ethical advocacy organizations like the FSF argue for change in copyright law relevant to these questions?
I'm going to say no, because copyright law is already insane as-is and we don't need to make it worse just so that the copyleft hack still works a little better.
Please, for the love of god, we do not need stronger copyrights. We need to chain this leviathan.
Please continue using GitHub as you were, but maybe consider acting on your words and either removing or changing licenses within your code that does not represent your ideals. Nothing is preventing you from releasing code into the public domain, so do that!
Is this true? Is there really a large portion of contributors speaking up against this? I got the opposite sense, that it was a very small portion of contributors speaking up against this but I don't have any evidence one way or the other.
No, that's your opinion, which as it turns out also has no legal basis. For me, I want proper attribution from people who use my code. And for any code that I release that's under copyleft, I absolutely do want that license followed.
You seem to be fine releasing your stuff into the public domain, and that's great that you want to do that, but you don't speak for everyone.
Not everybody is and that's ok too.
However other people for varying reasons have other ideas ...
> We will read the submitted white papers, and we will publish ones that we think help elucidate the problem.
Doesn't give me hope they're aiming for unbiased opinion. I would be very surprised if any of the published papers don't closely align with FSFs apriori position.
The word "unbiased" seems to be doing a lot of heavy work in your comment. The FSF is inherently biased towards its project -- how is that a problem?
That's straw-man, I never said (nor do I think) FSF should not be biased towards its project.
However, I would be more willing to trust the results of this call if I had confidence that all solid arguments are presented, even if they're not aligned with FSF's agenda. Hiding them won't make them disappear - you might as well get as informed as possible about the issue, especially if you care deeply about the issue and agree with the FSF.
* I have already made my position clear in public, [1] so I could probably be identified.
* I am not a lawyer, just some bloke who attempted to write FOSS licenses to combat ML on copyrighted code. [2]
[1]: https://gavinhoward.com/2021/07/poisoning-github-copilot-and...
The big GPLv3 push and development - plenty of attacks on folks actually shipping product on GPLv2 and building communities around that model (which keeps software free but allows users of the software to do what they want with it pretty much including putting in devices that are locked down - cars / tivo's etc).
Here's an opportunity to really advance in an interesting area with ML -> something that may open up programming to more people -> may advance computers ability to program and modify their own programs in the long run.
And regardless of the FSF attorney stuff, places like china, tiny little LLC's with no assets will very likely use the wonderful amount of code on the web to develop solutions in this space, even if FSF claims everything is a violation. Where is the vision anymore from FSF.
One thing that's been sad about the FSF -> it's gone from what I would consider a forward looking idealism sort of thing -> here's how we could do / make cool stuff that let communities work together -> to now sort of a legal compliance type org that really is focused on "actionable claims" " protected against violations" etc.
Question - does the Linux community and other successful larger open source communities welcome the FSF and their attorney's into the discussion? I can hardly imagine the BSD's, the Linux folks really connecting anymore with them.
Is there space for a different group, maybe a collection of actual develops shipping code in larger communities to get together, no FSF / SFC lawyers present, to think creatively about the future? What should we be working for, what is fair to everyone, what helps society, what works around pro-social community building?
A tool that helps with cross language building blocks for common functions etc (stackoverflow on steroids) - just how bad is this?
Instead, I would like a system telling me about obscure things, traps, vulnerabilities, performance issues, etc... like the machine learning linter. The way I could see it work is by matching my code with bugfix commits. For example if several commits replaces "printf(buffer)" with "printf("%s", buffer)" and I write "printf(buffer)", I want an AI to tell me "code like yours is often replaced in commits, it may be wrong", bonus points if it can extract the reason from commit messages ("format string vulnerability") and suggest a replacement ("printf("%s", buffer)"), mega-bonus if it can point me to good explanation of the problem.
Pissing lines of code is easy, I can do it, anyone with a couple weeks of training can do it, I don't need a bot to help me with that. Thinking about everything while I am pissing my lines is hard, and I will welcome a little help.
A nice thing about that approach is that it is unlikely to result in worse code than what I would have written by myself, because it will be designed to trigger only on bad code.
Notebooks programming has a flow of "execute a small bit of code, check the results, and iterate", and this fits perfectly with Copilot since you still need to check if the suggestions work.
Maybe this kind of programming is where Copilot finds a niche, maybe not. I don't know. I'm skeptical of its use in larger applications where you can't trivially check if the code you wrote (with its help) did what you want. I think there needs to be a lot more tooling built around that to really make it compelling for larger applications like that, likely in the form of more editor tooling integrations. But I think it's promising. I wrote about that a little more here: https://phillipcarter.dev/posts/four-dev-tools-areas-future/...
there are more countries in the world than the United States, and most of the world's developers live outside of the United States
copyright only works because the Berne Convention was more or less universally agreed between governments
most countries won't pay any attention to what the US Supreme Court decides
Copyright lawsuits across nation state lines are pretty much non-existent and not worth it. What matters in the U.S. is pretty much as far as anyone who cares about copyright is going to care about.
The FSF considers the user to be the one using cars/tivo's/other devices. In their view, this was a design flaw of gplv2 that it allowed locking out end-users of their devices.
For Linux this was not the case. The important part that modifications/extensions were shared (and maybe even upstreamed), while the end user access wasn't important.
The case of tivoization fractured the interest between the mostly moral "I want freedom for the end user" and the more immediately benefical "If you use my code, I want reciprocity for modifications".
I personally believe that today the latter case won, even for a lot of non-gpl software that gets lots of contributions e.g. via github for lots of different reasons, but the moral case gets more dire.
Looking at security for older (or shockingly often even current) devices, right to repair and lots of other issues concerning the effective loss of rights with more modern devices, the concerns of the FSF were often accurate, but with the increasingly hostile approach to "proprietary" IP and thus the exclusion of GPLv3 and similar licenses not palatable to the larger open source community.
The approach to IP in china is also sometimes a lot different, see https://www.bunniestudios.com/blog/?p=4297.
https://sfconservancy.org/blog/2021/jul/23/tivoization-and-t... https://news.ycombinator.com/item?id=27937877 https://events19.linuxfoundation.org/wp-content/uploads/2017...
Apparently what TiVo did (breaking proprietary software if you modify GPLed software) is even allowed by GPLv3.
Is anyone building strong communities on AGPLv3 / GPLv3? I feel the momentum shifted towards Apache / MIT style licenses unfortunately.
The users of the software are the owners of the devices. The distributors are the ones locking down the devices to prevent the users from modifying the software (often so that the distributors can control something else the users are doing).
GPL is about end-user freedom (as opposed to software distributor freedom). This is why GPLv3 exists.
So yes, FSF created GPLv3 to focus on USERS freedoms, but the users are not writing the software - so it remains the devs who pick licenses.
So your argument is if China does not care about license neither should we, the thing is I am fine with that, I know Windows source code is leaked so let's train an AI on it too
I think is a clear sign that MS did not trained on proprietary code , it means that is not legal or not safe, so the question is why GPL or other licenses are safe, I think you need the authors or the licenses to give you the permission to use the code as training data in black box, locked, proprietary algorithms.
This really doesn't give me much comfort though. Making a repo public doesn't imply anything, it could be "All rights reserved".
I guess, but then they should have their story straight before they start the astroturfing campaign.
As we're seeing, there's VERY little software where the specific algorithms or ideas in the software are what's valuable. The value comes from the ability to sell a service based on the software and operate it at scale. Like you said, how much SaaS is mostly open source stuff packaged up? Android is (sort of) open source, companies pay lots of people a lot of money to contribute to the Linux kernel where they give away the code they developed with that money, etc etc.
[1] https://en.wikipedia.org/wiki/The_Free_Software_Definition#T...
This goes beyond fair use or satirical/comedic effect. They are training their models to output text in the style of the authors being absorbed. The style of is exactly the artistic effect that is being copyrighted.
> The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
Not including the copyright information for the MIT-licensed code is a violation of the license.
But in my (non-lawyer) opinion - if the reproduced code is substantial/unique enough to be deemed to be covered by the license, then it's also substantial/unique enough to be subject to that license requirement.
Patent law is about selling items not consuming them so they could prevent me from selling a clone of office but they cannot prevent me from installing office
as far as contract law that would be between two parties so if I obtained a copy of office somewhere and I did not have a contract with Microsoft nothing I would not be violating a contract with Microsoft copyright is the only mechanism they use to stop unauthorized distribution of their software
Closed source benefit from the secretiveness of compiled software.
> Are reposts ok?
> If a story has not had significant attention in the last year or so, a small number of reposts is ok. Otherwise we bury reposts as duplicates.
> Please don't delete and repost the same story. Deletion is for things that shouldn't have been submitted in the first place.
They are also ok with reposting year+ old stories that did get significant attention at the time, since the new respost may find a new audience.
What makes you so confident that this would not be ruled fair use?
(And for people not familiar - if ruled fair use, it doesn't matter what the license is because fair use is an exception to copyright itself.)
Here's the relevant quote:
> GitHub is arguing that using FOSS code in Copilot is fair use because using data for training a machine learning algorithm has been labelled as fair use. [1]
> However, even though the training is supposedly fair use, that doesn’t mean that the distribution of the output of such algorithms is fair use.
My licenses say, basically, "Sure, training is fair use, but distributing the output is not."
The licenses specifically say that the copyright applies to any output of any algorithm that uses the source code code as all or part of its input.
Now, I have not gotten a lawyer to look at my licenses yet (it's in the works), so don't use them yourself. But because everyone keeps saying that training is fair use, I'm fairly confident that only training is fair use.
Of course, it might not be, but that would take more court cases and more precedent. I wanted to poison the well now [2] to make companies nervous about using a model that was partially trained with code licensed under my licenses.
[1]: https://valohai.com/blog/copyright-laws-and-machine-learning...
[2]: https://gavinhoward.com/2021/07/poisoning-github-copilot-and...
Licenses basically by definition cannot say what is and isn't fair use...
[1] https://twitter.com/mitsuhiko/status/1410886329924194309
Which is not necessarily hypocritical. The amount of copying needed for something to be copyright infringement is not high… but it's still significantly higher than the amount needed to leak information. For that, just a few words will do, e.g.
// For Windows 12
or // Fuck [company name]
or long secret_key[2] = {0x1234567812345678, 0x8765432187654321};That as long as they have the right to read the data, they have the right to train an AI on it. The fact that the code is available under an open source license is irreverent to them.
As for why they didn't use their own private code to train their AI, I suspect it was more of a non-malicious: "we don't need to, this public github repo dataset is big enough for now"
Personally, I think Microsoft should double down on this legal stance. Train the AI on all their internal code. And train it on any code they have licensed from other companies too.
How is that conciled with the fact that a person that read copyrighted code (not even the original source code, a mere decompiled version of it !) is forbidden to reimplement it directly:
https://www.computerworld.com/article/2585652/reverse-engine...
Are you sure?
But linters work with hand crafted static rules, which is good and the idea is not to replace them. The idea is to used big data techniques to find unwritten rules based on commit histories, the idea being that we are more likely to remove bad code than good code. So if your code looks like code that is often removed, is is most likely bad, even if it doesn't match an explicitely written anti-pattern.
There are also other triggers of code removal and refactoring that are outside the code base, such as an organisation migrating to a different platform. An AI trained on a large public commit history could encourage a general shift towards already-established big players, punishing smaller organisations.
and the UK has essentially no concept of fair use
and the same will play out across every country in Europe and the 90% of the world that isn't the United States
https://www.lw.com/thoughtLeadership/enforcement-of-foreign-...
how is this completely different situation relevant in the slightest?
we're talking about possible massive, pre-medidated industrial scale copyright infringement by Microsoft, a large multinational with a substantial UK presence
not some random guy in the US
if Microsoft don't show up to the court: I win by default
I can then send in the bailiffs to start seizing their property (and their staff will be arrested if they interfere)
personally I'd start at one of their datacentres
If a company copies a competitors product then the chance of getting sued is very high. If they can show that, in fact, there was zero copying at all, then they can get the case dismissed and save great legal expense.
Yes. However, my licenses only say what people already say. Then the licenses go further and say, "But anything else is not allowed."
Everyone else says training is fair use. My licenses agree. But they make it clear that I don't believe that anything else is fair use.
Yes, these licenses must be tested in court. Except that they poison the well now.
You do not seem to get it. Yes, I understand that if fair use applies, my licenses don't matter. I get that. I promise I do get that.
The purpose of these licenses is to sow doubt that fair use applies to distributing the output of ML models.
Lawyers are usually a cautious lot. If a legal question has not been answered, they usually want to stay away from any possibility of legal risk regarding that question.
The licenses create a question: does fair use apply to the output of ML algorithms? With that question not answered, lawyers and their companies might elect to stay away from ML models trained with my code, and ML companies might stay away from training ML models on my code in the first place.
That is what I mean by "poisoning the well." The poison is doubt about the legality of distributing the output of ML models, and it is meant to put a damper on enthusiasm for code being used to train ML models, especially for my code.
That doesn't usually mean you can use code though, see: https://news.ycombinator.com/item?id=27726343
While the corporate momentum switched to Apache/MIT licenses, there are strong communities built on AGPLv3/GPLv3.
* Nextcloud - file hosting (AGPLv3)
* Source Hut - git hosting (AGPLv3)
* StreetComplete - OpenStreetMap editing (GPLv3)
* F-Droid - Free Software "app store" for android (GPLv3)
* NewPipe - alternative Youtube frontend (GPLv3)
While these aren't necessarily used by large corporations, their individual communities are thriving and strong.
The shift toward SSPL and Commons Clause licensing is another argument in favor of AGPLv3 licensing. Amazon/Google often won't touch your AGPLv3 code (and you can still sell proprietary licenses to other companies that can't/won't use AGPLv3).
The way this works is all contributors are required to sign a CLA -> the corporate developer can then use their code under ANY license, and most importantly can integrate into propriatery products or sell to others.
The code is then released as an AGPLv3 to be "open source" - but literally the only company with the "super" rights to license / make money off it is the corp dev.
It's kind of genius -> so I think we may see more (A)GPLv3 stuff coming this way. The corp developer can then offer for example a hosted version of the software WITHOUT releasing all the related code! But anyone else would have to release their code.
You an see how this is done here:
If a third party is contributing a lot of code that is highly relevant, the third party is under no obligation to sign the CLA. The third party is entirely within her rights to refuse to sign the CLA and distribute an AGPLv3-only fork of the software.
If this fork is significantly better than the original, the original authors are out of luck when it comes to proprieatary relicensing.
This is what happened with OwnCloud/Nextcloud. OwnCloud was AGPLv3 but required a CLA. OwnCloud became OpenCore and started distributing "enterprise" features as proprietary upgrades. Some developers were unhappy with this and forked OwnCloud and started developing Nextcloud. All contributions to Nextcloud are AGPLv3 only and cannot be re-licensed by Owncloud. Interestingly enough, any new code released under AGPLv3 by Owncloud can still be used by Nextcloud.
Which I think is perfectly fair: you are getting a full product, and you can do with it as you please (including profit off of it), as long as you publish your changes too!
The fact that the original copyright holder has the rights to close it off for future developments is completely natural, and if you do not want to allow them to do that, don't sign a CLA and fork. Oh, there's a cost in maintaining a fork? Pick your poison then :)
To me what matters is that once you get the software, you have freedom to use and modify it. I am ok if you do not have the "freedom" to close it off. If you start being a bigger contributor than the original company, you avoid all of the problems with a fork, but you can't say you did not benefit from the original AGPL release.
Actually anyone that has the AGPL code can sell and/or make money from it. People regularly buy GPL software and pay monthly subscriptions to hosted AGPL software.
If you can't compete without having some code as "trade secrets"; that's your failed business model, not a fault of the license.
They literally made the GPLv3 because they cared about that very much.
https://sfconservancy.org/blog/2021/jul/23/tivoization-and-t... https://news.ycombinator.com/item?id=27937877 https://events19.linuxfoundation.org/wp-content/uploads/2017...
Buying a book, buying an audio CD, or buying a DVD/Blu-ray is granting the holder permission to read,listen,view that product as a single instance. You can lend them out, but that's all you're really allowed to do with them. The text,audio/video is not owned by you to do with as you please. People obviously do not like that, and argue making copies/backups is their right. Maybe that's acceptable, but we can agree posting them on torrents and sharing in any other manner from a copy made from the thing you have is not.
Saying that, training a model on someone's copyrighted text is not part of the agreement of the usage of said text whether it's a copyrighted magazine, newspaper, or book. If the people doing the training reach out to the copyright holders and get specific permission to use their copyrighted material in such a manner, then go ahead. The fact that people feel like they can do anything without the common courtesy of asking for permission is troubling to me that we've lost something as a society. There's no acknowledgment that someone has created something by their own work so that the creator can do with it as they please. A large portion of people believe that because it was created they deserve/should be able to/etc do what ever they want with someone else's creation. Including getting paid for derivitave works from the original creation.
Also, Copilot is copying much more than short excerpts, going as far as to reproduce large amounts of copyrighted code verbatim[1].
[1] https://twitter.com/mitsuhiko/status/1410886329924194309
It's unsurprising that copilot can reproduce the most famous subroutine of all time precisely, given that it occurs in hundreds or thousands of repos.
Also that code is not copyrightable. Pure algorithms are not copyrightable, copyright of code arises from its literary qualities.
E.g. I can copy an algorithm out of an ISO spec and that doesn't make my code a derivative work of the spec requiring me to pay royalties to ISO.
When you strip out the algorithmic elements out of fast inverse sqrt, you are left with what? Single letter variable names. That is certainly far below the threshold for copyright.
I see this sentiment a lot in FOSS spaces but I don't really understand why. The role of judicial process _isn't_ to provide a guiding moral philosophy around social organization. Depending on the government in question that's either a role of government functions or isn't something that should be guided at all. The role of law often (and yes, not in all governments, but at least in the US) is to offer a contract between the state and the individual.
I understand the potential for abuse here in using Copilot to regurgitate licensed works without adhering to the terms of the work's license, but I'm not fluent enough in law to know if this is illegal or not. Calling out and specifically applying strict limits this practice is certainly something I'm sympathetic to, and I'm very curious to see what the courts come up with. But swayed by a moral argument I am not.
In some jurisdictions this is in fact their right by law as long as they own the original (the music/film industry of course used this as an excuse to slap additional fees on every sale of any storage medium). Redistribution is different however.
Moving on, I’ll put this to you: you claim training a ML model against copyrighted text is in violation of the ‘permission’ granted by the rights holder. However, flip this on its head for a moment – that’s basically all human brains do. Clearly, the greatest writers of our time haven’t written their works in a vacuum. Rather, that historical reading and inspiration becomes sufficiently obfuscated that we deem something adequately creative enough to be granted its own copyright.
Fundamentally, how does Copilot differ, other than perhaps being a poor implementation? Is it by not being ‘adequately creative’ enough? Is there some future version you could envision that would be, or is it the principle you’re arguing against?
Also, you're taking the machine learning metaphor literally. AI models do not "learn", they're just statistical models, they don't understand anything. There is no comparison to human learning that isn't superficial or metaphorical.
The real question is how Copilot is any different than a compiler, or lossy encoding or compression.
I'm probably just a curmudgeon, but I don't understand the point of Copilot. So I'm probably not the best to opine about it. However, I am very opinionated about copyright in manner that typical flows against HN group think.
I totally missed the non-wrapped question.
Because I don't give a crap about down-votes/up-votes. I just know from experience my views on copyright do not gel with the majority views on HN. I was just acknowledging that fact. Conversations can be had regardless of votes. My views on Napster/MP3 trading are in the same realm (and somewhat related with copyright issues). I was a co-owner of a small music site when Napster was in its heyday, and we saw direct repurcussions of people not buy music because they got it from mp3 trading. Group think here is all "things for free when I want it, how i want it", yet I still have conversations. I'm not afraid of a measly -4 points because my thoughts are contrary to group think.
At the same time, if something like this gets your goad, how is asking how something is better being better in and of itself?