GitHub Copilot is ‘unacceptable and unjust,’ says Free Software Foundation

GitHub Copilot is ‘unacceptable and unjust,’ says Free Software Foundation(infoworld.com)

252 points by axsharma 4 years ago | 232 comments

The position of the FSF is severely misrepresented by the title. Open the full article, you'll see that all FSF says is GitHub Copilot is proprietary software and SaaS, and all forms of proprietary software and SaaS are unacceptable and unjust. What about the copyright issue of machine learning, then? FSF says it's a new thing with many open questions, they are not really sure, right now they are calling for whitepapers from the public to hear your comments [0].

I think it's a reasonable position to take. Reducing the scope of fair use to strengthen copyleft is a double-edged sword, as it simultaneously makes copyright laws more restrictive, such a ruling can potentially be used by proprietary software vendors against the FOSS community in various ways. It's an issue that requires careful considerations.

[0] https://www.fsf.org/blogs/licensing/fsf-funded-call-for-whit...

pessimizer 4 years ago | |

> as it simultaneously makes copyright laws more restrictive, such a ruling can potentially be used by proprietary software vendors against the FOSS community in various ways.

Could it? Copyright law is FOSS's only protection. That's why it's witty - copyright law against copyright. Weakening copyright law in an ad hoc way is absolutely not good for FOSS. It's fine to rewrite copyright in a way that explicitly allows things like Copilot, as long as FOSS gets to copy bits of proprietary code, too.

Otherwise, after some appeals court judgement that the FOSS community failed to participate in (or even worse, subelements participated in on the wrong side) we're going to end up with a copyright practice that looks like the NFL exception in monopoly law.

bcaa7f3a8bbc 4 years ago | | |

> It's fine to rewrite copyright in a way that explicitly allows things like Copilot, as long as FOSS gets to copy bits of proprietary code, too.

This is exactly what I was thinking about. If Copilot is fair use, it means that all proprietary source code, as long as they're publicly available to read, will be free to use as training materials for a hypothetical free and open source machine learning project, which I think would be a good thing. An example is a proprietary program released under a restrictive "source available" license, you can read it but not reuse it under any circumstances (and I believe these projects are already included in Copilot's training data). This is why I said fair use can be a good thing and a ruling to reduce the scope of fair use can potentially be used by proprietary software vendors against the FOSS community.

It would be even better if training from all forms of available proprietary binary code can be fair use, too. It may allow the creation of powerful static binary analysis or code generation tools by learning from essentially all free-to-download proprietary software without copyright restrictions. However, the situation of proprietary binary code is more complicated here. Reverse engineering proprietary binary code is explicitly permitted by the US copyright laws, but the "no reverse engineering" clause in EULA overrides it, and this can be a bad thing. It makes FOSS's fair use right meaningless, meanwhile giving proprietary software vendors a free pass to ignore FOSS licenses.

Thus the outcome is unclear, it may go either way, this is why I said such an issue requires careful considerations.

raxxorrax 4 years ago | | |

I disagree that copyright is FOSS only protection.

But it is true that this proprietary product extracts is value on the basis of open source software exclusively.

Yes, it would be nice to have the source of autopilot in exchange, but I think far more important would be for third parties to have the same access to the code to provide similar tools.

isoprophlex 4 years ago |

Morally, I hope the FSF wins.

Otherwise, I hope copilot makes it big. It'll create a new generation of developers that are dependent on these tools to do their work. Also it'll lower the barrier for non-software engineers to participate in writing code. SO copy pasting on steroids.

The resulting mediocre spaghetti will break at record-breaking rates; cleaning up the mess will be highly lucrative!

jfmc 4 years ago |

Copilot is the perfect machine for clean room design and license/copyright laundering. It is unethical and unfair to the open source community.

I do not care if it breaks code to bits and recomposes them again regurgitated by <YOUR-LATEST-AI-TECHNIQUE-HERE> in a way that is untraceable: it would not work without learning from our open source code. Code produced by this method should be automatically licensed under the most restrictive license of its input used for learning.

pydry 4 years ago |

I'm kind of wondering if this controversy might not end up being a storm in a teacup.

From what I've seen copilot really lowers the barrier to writing buggy code. If indeed it does turn out to be a tool that lends itself to machine gunning rather than shooting yourself in the foot it almost doesnt matter who owns what IP.

The relentless attempts at developer commodification will, of course, continue, but I can already sense this one ending up like the developer outsourcing craze of the mid-2000s that the Economist also got a little too excited about.

uberswe 4 years ago |

Copilot is a fancy autocomplete tool for code. I think the controversy comes from it being trained on public repos without adhering to licensing. I used copilot and thought the best part was when it would autocomplete based on other code I was writing. Sometimes the Copilot would help me see places where I had repetitive code which could be turned into a function.

jeroenhd 4 years ago |

I think MS knows damn well that they've forfeited the ethics of their code generation. There's a reason they've trained the model on Github repositories instead of, say, the Windows kernel driver tree. They know their model arbitrary copy/pastes other people's code so they train it almost exclusively on other people's code that they don't care for it it gets stolen. Their assumption seems to be "if Bing can find it, it's up for grabs, no matter the license". Good luck getting the same treatment from MS if you upload the leaked XP kernel to github to make your own fork.

I'll accept the ethics of copilot when they add the source code for Windows, Azure and Office to their training set, because only then will MS truly reflect that their model doesn't cross the spirit or even letter of any licensing.

ben-gy 4 years ago |

I’ve been using copilot for the past couple to months, and it’s seriously becoming a part of my daily coding workflow.

The majority of suggestions are not quite what I want but then I’ve found the more I comment my code the more personalised the suggestions get and consequently (as a solo founder in my own startup) copilot finishing my code for me during late nights trying to ship features for customers before the following day is something I have become grateful for.

It’s a double edged sword because it’s enabling me to grow my business and remain self employed, but I also understand the concerns and at the end of the day it’s not something I need to do my job (like version control or an IDE for example), but more of a nice to have…

xaduha 4 years ago |

Why do I get a feeling that MS is fine with any turn of events? If some licenses get excluded, then in a way those gain more points in 'pain' according to this https://writing.kemitchell.com/2017/03/29/OSS-Business-Perce.... But what does MS care?

oaiey 4 years ago | |

MS will just retrain the model on a different input. They could not care less and will actually happy that they get an external statement on the license situation and the ethics.

neximo64 4 years ago |

I think its a fantastic tool to use to work on though. I didn't think so seeing the demo i basically brushed it off. But using it is probably one of the most productive things to happen in the past decade

I have my own GPL software out there, most of the time I think it doesn't get really used out there so its not that much of a concern to me, I imagine its like that for other devs too.

I suppose if you're MongoDB (similar to GPL/used to be) or some big company you care more.

ksec 4 years ago | |

I am wondering if there will be a GPL+ Copilot license.

detaro 4 years ago |

this mostly seems to report about the FSF writing a post, which already had discussion here: https://news.ycombinator.com/item?id=27998109 (281 points, 203 comments)

dmos62 4 years ago |

Could this become something people can't program without? Like imagine being stuck recycling the same programs and paradigms, not being able to move to something new, because Copilot hasn't seen it before.

BenjiWiebe 4 years ago | |

I'd guess yes, for some people. Others, though, will refuse to use copilot out of sheer obstinacy if nothing else. They will produce the new paradigms for copilot to then consume.

hesk 4 years ago |

So, I'm reading the linked article by RMS about Service as a Software Substitute (SaaSS) [1] which is one of the reasons why they object against GitHub Copilot.

The key argument why as SaaSS is ethically wrong is because it denies control over a computation that I could do on my own.

> "The clearest example is a translation service, which translates (say) English text into Spanish text. Translating a text for you is computing that is purely yours. You could do it by running a program on your own computer, if only you had the right program. (To be ethical, that program should be free.) The translation service substitutes for that program, so it is Service as a Software Substitute, or SaaSS. Since it denies you control over your computing, it does you wrong. (emphasis mine)"

I don't find that argument very convincing because it implicitly assumes that there is no alternative translation program that I could run on my own computer.

However, if there is an alternative, then a SaaS offers me choice. I can run a program on my own computer, e.g., if I am concerned about data privacy, or service reliability. The downside is that I have to install and maintain the software on my computer. Or, I could use an external service. The upside is that the barriers of use are minimal.

Of all the articles by RMS I have read so far, I find this one the least convincing.

[1] https://www.gnu.org/philosophy/who-does-that-server-really-s...

BulgarianIdiot 4 years ago |

I wonder what their opinion will be if they exclude the GPL family of licenses and includes only permissive ones.

gibbonsrcool 4 years ago |

Do we owe all our professors and textbook makers compensation when we make money off our brain neural networks that they trained? Everyone also keeps talking about how bad copilot is. It’s the first step! It’s only going to improve and probably fast, given the potential value creation.

cblconfederate 4 years ago |

copilot looks very cool, but if people end up using it a lot, it probably means their programming language is not expressive enough, after all they were invented in order to be accessible to humans.

What i'd like to see is a copilot for scientific papers. There s so much duplication out there that it would be easy to train and it would save tons of time from the chore of writing and referencing the same things over and over

k__ 4 years ago |

I think Copilot is a hard problem, maybe it isn't even solvable.

Sometimes it blatantly copies GPL code without my knowledge.

Sometimes I myself write code that could be part of a GPL code-base, without knowing.

Funny thing is, the difference here isn't the actual code that's written, but that Copilot has seen many GPL code bases and I didn't.

Sometimes I really have the feeling Copilot understands my code base and suggests code that seems to be custom tailored to it. Albeit in most of the cases it doesn't fit 100%.

I think the latter cases are when Copilot shines and doesn't violate GPL code at all, but can I be safe? Probably never.

catern 4 years ago |

The FSF has made no such statement. This article is complete bullshit and a slanted quote.

The FSF said it was unacceptable because it's proprietary, like Github in general.

They've made no statement about the specific details of Copilot.

Accacin 4 years ago |

To be perfectly honest, I think people will realise it's just not that useful and forget about it pretty quickly

Even at my place of work, there were some expressing interest in it, and after playing for an hour or two, haven't touched it since. I get the impression there are more people discussing it than actually using it.

mullikine 4 years ago | |

These language models are not being utilised very well by tools such as copilot because they are not mapping very many functions from the editor to language model. The more functions you map, the more you get out of it. If Copilot's workings were completely open and configurable, you would find that people collectively can work together to map many functions to the language model. They are capable of far greater wonders with deep integration and collaboration. I have tried to demonstrate this with emacs.

https://news.ycombinator.com/item?id=28045894

Accacin 4 years ago | | |

Interesting, thank you.

fareesh 4 years ago |

What seems useful to me is the ability to type in "function that takes the path to an image file and returns a new image file with rounded corners".

These are not groundbreaking problems - I'm generally looking for solution out there that uses a popular library. This is especially useful if it's a language where I'm not up to date on the de-facto library of choise is for various use-cases. In most cases, especially while prototyping I'm not going to write it myself, nor care about which library - I'm far more concerned with some big picture goal.

If someone builds a product that can do the work of Googling a solution for me, that's the draw of the product. The code is freely available anyway.

vrocmod 4 years ago |

I’ve been using Copilot for weeks now. It’s definitely useful for building upon what you already wrote. It’s very effective for single lines, but I don’t trust it to come up with entire functions. I tried, but obviously YMMV.

The licensing is definitely a problem, but I think that Copilot only highlighted the issue - it didn’t create it.

The concept of software license looks pretty fragile to me. You can own software but you can’t really own PL statements.

You can own the whole but you can’t really own the atomic parts that make the whole.

If so, closed-source is just a way to make you work really hard to achieve a result that someone else already achieved by means of obfuscation and secrecy. I’m not sure where open-source stands. Maybe it’s just a social contract.

tannhaeuser 4 years ago |

IANAL, but until the question of whether software produced with the aid of Copilot, thus potentially containing LGPL'd, GPL'd or even AGPL'd code fragments (you never know really AIU) is subject to these respective or other copyleft licenses is settled, I think customers are well advised to stay clear of using Copilot. To the best of my knowledge, github won't provide legal shelter if customers are getting sued for xGPL violations; GPL, OTOH, has sufficient case law to make using Copilot very risky.

dangoljames 4 years ago |

My biggest problem with copilot is not how it's trained, but with it's targeting of microsoft coding tools. I don't use visual anything, and I don't know anyone who does. I code a lot of python, html and JS, and I use neovim. If I need a smart 'crutch' I'll whip out pycharm.

Mostly I don't feel the need for such things, but it would be fun and interesting to see just how good copilot is.

Not fun enough to install visual whatsit though.

KronisLV 4 years ago |

Here's an exceprt from the linked FSF blog article: https://www.fsf.org/blogs/licensing/fsf-funded-call-for-whit...

  Areas of interest
  While any topic related to Copilot's effect on free software may be in scope, the following questions are of particular interest:
    - Is Copilot's training on public repositories infringing copyright? Is it fair use?
    - How likely is the output of Copilot to generate actionable claims of violations on GPL-licensed works?
    - How can developers ensure that any code to which they hold the copyright is protected against violations generated by Copilot?
    - Is there a way for developers using Copilot to comply with free software licenses like the GPL?
    - If Copilot learns from AGPL-covered code, is Copilot infringing the AGPL?
    - If Copilot generates code which does give rise to a violation of a free software licensed work, how can this violation be discovered by the copyright holder on the underlying work?
    - Is a trained artificial intelligence (AI) / machine learning (ML) model resulting from machine learning a compiled version of the training data, or is it something else, like source code that users can modify by doing further training?
    - Is the Copilot trained AI/ML model copyrighted? If so, who holds that copyright?
    - Should ethical advocacy organizations like the FSF argue for change in copyright law relevant to these questions?

While i do believe that the topic is definitely worthy of discussion, my question would be a bit different.

If the tooling is already pretty capable, wouldn't just ignoring all of the ethical questions lead to having a market advantage? Say, some company doesn't necessarily care about how the tool was trained and the implications of that, but just utilize it to have their developers write software at a 1.25x the speed of competition, knowing that noone will ever examine their SaaS codebase and won't care about license compliance. Wouldn't that mean that they'd also be more likely to beat their competition to market? Ergo, wouldn't NOT using Codepilot or tools like Tabnine put most others at a disadvantage?

Personally, i just see that as the logical and unavoidable progression of development tooling, the other issues notwithstanding, very much like IDEs did become commonplace with their refactoring tooling and autocomplete.

I've worked with Visual Studio Code on large Java codebases, as i've also used Eclipse, NetBeans and in the past few years IntelliJ IDEA; with every next tool i found that my productivity increased bunches. Now it's to a point where the IDE suggests not only a variety of fixes for the code itself, but also the tooling, such as installing Maven dependencies, adding new Spring configurations and so on. It would be hard to imagine going back to doing things manually and it feels like in time it'll be very much the same way in regards to the language syntax or looking at documentation for trivial things. After all, i'm paid to solve problems, not sit around and ponder how to initialize some library.

spywaregorilla 4 years ago |

Tangential, but isn't it kind of weird that Copilot is a code generator and not a style-gann kind of code refactorer? That feels like a much easier task because you get to infer the intent of code from an existing example rather than from context alone.

Applejinx 4 years ago | |

It gives rise to some interesting situations.

I'm an open source audio coder. I'm not any great shakes as a programmer but I make my living by regularly coming up with novel ideas, and my codebase is on Github and MIT licensed. Over the course of hundreds of DSP plugins, some key parts are very repetitive.

This means that there are audio processing algorithms I do which NOBODY ELSE is doing, because they're unusual and in some ways arbitrarily wrong. They're chosen to produce a particular sound rather than the textbook-correct algorithm output. Example: interleaved IIR filters, to make the audio interact differently in the midrange and produce a lower Q factor at the cost of producing some odd artifacts near the Nyquist frequency.

Nobody out there in the normal world or commercial DSP or academia would intend to do that, because there are significant reasons not to (which I work around, in context). But if that stuff appears in Copilot output, they are jacking my INTENT but violating the very lenient MIT license by stripping my credit. They'd also be misleading hapless audio programmers who didn't intend to adopt my techniques, but that's a side issue.

I'm interested in who else out there has a substantial codebase subject to Copilot reprocessing, who is demonstrating intent that isn't 'normal' and doesn't exist in the 'normal' world of whatever domain's being coded for.

The point is, can it be demonstrated that Microsoft is taking SPECIFIC things from specific open source developers that can be clearly traced back to one source of distinct intentions, and then stripping the licensing? I feel like said intentions cannot be 'normal and industry-standard and correct'. It's gotta be things like my IIR interleaving, where it's a quirky choice you wouldn't automatically do, very likely with costs and consequences in its own right. Something you could choose to adopt if you liked the trade-offs (or in my case, the sound).

docflabby 4 years ago |

GitHub is now a burning platform for free software - plan according

_pmf_ 4 years ago | |

It SourceForged itself.

codingdave 4 years ago |

> The reason is that Copilot requires running software that is not free, such as Microsoft’s Visual Studio IDE or Visual Studio Code editor the FSF contends, and constitutes a “service as a software substitute” meaning it’s a way to gain power over other people’s computing.

Hold up a second. So if people have already made the choice to run software that is not free... enhancing their chosen tool set is unjust? (Besides, VS Code is free.)

I'm honestly interested in understanding their perspective, but I'm not following the leap from using an extension in VS code to gaining power over other people's computing.

toastal 4 years ago | |

Free as in beer. Their built-in tracking to the editor from Microsoft isn't freedom. There's VS Codium that compiles the MIT project without telemetry, but at that rate, I'd use a different editor.

i386 4 years ago |

Copilot is a cool tech demo but it’s a very bad idea to have low skilled developers wire boilerplate code that they can only edit once.

hankman86 4 years ago |

I’d imagine that GitHub will end up re-training Copilot, excluding any “copyleft” licensed code. Not because what they do is legally tainted, but to avoid being berated by the FSF and the bad press that ensues.

Once again though, the FSF makes “free software” less relevant and harder to use. Who will want to use such software for anything when being threatened with costly litigation and bad press?

darepublic 4 years ago |

When tech such as copilot truly comes into its own it should be a productivity silver bullet. I hope at that point I will have access to it. At the point we have senior software engineer coding as a service if I had it just for myself I would hoard and not be quick to share

exporectomy 4 years ago |

> The FSF said there are legal questions pertaining to Copilot [...]

There have always been lots of untested legal questions about GPL & co. Why hasn't the FSF figured out what it is they do and don't want? Shouldn't knowing what the licenses actually mean and communicating that to people be their number one job? Why else do they exist? To spread feelings and confusion?

ksec 4 years ago |

>> We already know that Copilot as it stands is unacceptable and unjust, from our perspective. It requires running software that is not free/libre (Visual Studio, or parts of Visual Studio Code), and Copilot is Service as a Software Substitute.

So they dont know / not sure of the question of GPL usage in copilot. But they have a problem with SaaS and product that are not open sourced?

nonameiguess 4 years ago | |

The entire purpose of the Free Software Foundation is that any product you offer to a user should be owned fully by the user, which means they should be able to take it apart, modify it to suit their specific needs, and put it back together. At minimum, that means they need to be able to see the source code and be able to build it themselves and run it on their own hardware.

So yes, closed-source software as a service is inherently unethical.

You don't have to agree with them, but they've been pretty consistent in this position for nearly 40 years. It's not exactly coming out of left field.

rightbyte 4 years ago | |

There is no contradiction in that.

ksec 4 years ago | | |

I dont know. I guess I am not well versed with FSF. I thought they were to promote Free Software, I didn't know their world view was any non-open source software are "unacceptable and unjust".

injidup 4 years ago |

Summary:

FSF claims the world will end. FSF offers 500 dollars for an intern to write a white paper studying the problem.