Hold up a second. So if people have already made the choice to run software that is not free... enhancing their chosen tool set is unjust? (Besides, VS Code is free.)
I'm honestly interested in understanding their perspective, but I'm not following the leap from using an extension in VS code to gaining power over other people's computing.
Once again though, the FSF makes “free software” less relevant and harder to use. Who will want to use such software for anything when being threatened with costly litigation and bad press?
There have always been lots of untested legal questions about GPL & co. Why hasn't the FSF figured out what it is they do and don't want? Shouldn't knowing what the licenses actually mean and communicating that to people be their number one job? Why else do they exist? To spread feelings and confusion?
So they dont know / not sure of the question of GPL usage in copilot. But they have a problem with SaaS and product that are not open sourced?
So yes, closed-source software as a service is inherently unethical.
You don't have to agree with them, but they've been pretty consistent in this position for nearly 40 years. It's not exactly coming out of left field.
FSF claims the world will end. FSF offers 500 dollars for an intern to write a white paper studying the problem.
It's told in this talk, you can search for "printer" to find it < https://www.gnu.org/philosophy/rms-nyu-2001-transcript.txt >.
They promote Free Software (and specifically copyleft over permissive licenses) because they view proprietary software as morally wrong and something that should not exist.
> To release a nonfree program is always ethically tainted, but legally there is no obstacle to your doing this.
It paints a picture of a bigoted organisation.
[1]: <https://www.gnu.org/licenses/gpl-faq.html#ReleaseUnderGPLAnd...>
No, it is not. If it were, fair trade wouldn't be niche. It adds lots of other obligations on companies "based on their particular agenda". I think it is worthwhile goal and something I like to support, but won't pretend that it is somehow pure, self-evident goodness, just like non-fair trade is not pure evil.
Again, you might disagree or have different ideas what "free" is supposed to mean but you should be better than throwing around phrases like "agenda" or "nothing to do with ethics or morality".
Fair trade is based on pretty fundamental ethics, such as fighting slavery. Can you point any such concept being foundational to Free Software?
Not really, no. Again, lots of non-"fair trade" products exist and those are not against fighting slavery.
> Can you point any such concept being foundational to Free Software?
Seriously? Their whole stick is fighting software practices they consider unethical. You might disagree whether that is fundamental ethics, but to them it is and it is rude and dishonest to pretend that is not the case.
For example, I am not vegetarian, but understand that there are people who feel strongly about that. That is fine and if you go "vegetarians are pushing their agenda which has nothing to do with ethics" that says more about you than about them.
Don't be an asshole.
I think it's a reasonable position to take. Reducing the scope of fair use to strengthen copyleft is a double-edged sword, as it simultaneously makes copyright laws more restrictive, such a ruling can potentially be used by proprietary software vendors against the FOSS community in various ways. It's an issue that requires careful considerations.
[0] https://www.fsf.org/blogs/licensing/fsf-funded-call-for-whit...
Could it? Copyright law is FOSS's only protection. That's why it's witty - copyright law against copyright. Weakening copyright law in an ad hoc way is absolutely not good for FOSS. It's fine to rewrite copyright in a way that explicitly allows things like Copilot, as long as FOSS gets to copy bits of proprietary code, too.
Otherwise, after some appeals court judgement that the FOSS community failed to participate in (or even worse, subelements participated in on the wrong side) we're going to end up with a copyright practice that looks like the NFL exception in monopoly law.
This is exactly what I was thinking about. If Copilot is fair use, it means that all proprietary source code, as long as they're publicly available to read, will be free to use as training materials for a hypothetical free and open source machine learning project, which I think would be a good thing. An example is a proprietary program released under a restrictive "source available" license, you can read it but not reuse it under any circumstances (and I believe these projects are already included in Copilot's training data). This is why I said fair use can be a good thing and a ruling to reduce the scope of fair use can potentially be used by proprietary software vendors against the FOSS community.
It would be even better if training from all forms of available proprietary binary code can be fair use, too. It may allow the creation of powerful static binary analysis or code generation tools by learning from essentially all free-to-download proprietary software without copyright restrictions. However, the situation of proprietary binary code is more complicated here. Reverse engineering proprietary binary code is explicitly permitted by the US copyright laws, but the "no reverse engineering" clause in EULA overrides it, and this can be a bad thing. It makes FOSS's fair use right meaningless, meanwhile giving proprietary software vendors a free pass to ignore FOSS licenses.
Thus the outcome is unclear, it may go either way, this is why I said such an issue requires careful considerations.
But it is true that this proprietary product extracts is value on the basis of open source software exclusively.
Yes, it would be nice to have the source of autopilot in exchange, but I think far more important would be for third parties to have the same access to the code to provide similar tools.
Otherwise, I hope copilot makes it big. It'll create a new generation of developers that are dependent on these tools to do their work. Also it'll lower the barrier for non-software engineers to participate in writing code. SO copy pasting on steroids.
The resulting mediocre spaghetti will break at record-breaking rates; cleaning up the mess will be highly lucrative!
I do not care if it breaks code to bits and recomposes them again regurgitated by <YOUR-LATEST-AI-TECHNIQUE-HERE> in a way that is untraceable: it would not work without learning from our open source code. Code produced by this method should be automatically licensed under the most restrictive license of its input used for learning.
From what I've seen copilot really lowers the barrier to writing buggy code. If indeed it does turn out to be a tool that lends itself to machine gunning rather than shooting yourself in the foot it almost doesnt matter who owns what IP.
The relentless attempts at developer commodification will, of course, continue, but I can already sense this one ending up like the developer outsourcing craze of the mid-2000s that the Economist also got a little too excited about.
I'll accept the ethics of copilot when they add the source code for Windows, Azure and Office to their training set, because only then will MS truly reflect that their model doesn't cross the spirit or even letter of any licensing.
The majority of suggestions are not quite what I want but then I’ve found the more I comment my code the more personalised the suggestions get and consequently (as a solo founder in my own startup) copilot finishing my code for me during late nights trying to ship features for customers before the following day is something I have become grateful for.
It’s a double edged sword because it’s enabling me to grow my business and remain self employed, but I also understand the concerns and at the end of the day it’s not something I need to do my job (like version control or an IDE for example), but more of a nice to have…
I have my own GPL software out there, most of the time I think it doesn't get really used out there so its not that much of a concern to me, I imagine its like that for other devs too.
I suppose if you're MongoDB (similar to GPL/used to be) or some big company you care more.
The key argument why as SaaSS is ethically wrong is because it denies control over a computation that I could do on my own.
> "The clearest example is a translation service, which translates (say) English text into Spanish text. Translating a text for you is computing that is purely yours. You could do it by running a program on your own computer, if only you had the right program. (To be ethical, that program should be free.) The translation service substitutes for that program, so it is Service as a Software Substitute, or SaaSS. Since it denies you control over your computing, it does you wrong. (emphasis mine)"
I don't find that argument very convincing because it implicitly assumes that there is no alternative translation program that I could run on my own computer.
However, if there is an alternative, then a SaaS offers me choice. I can run a program on my own computer, e.g., if I am concerned about data privacy, or service reliability. The downside is that I have to install and maintain the software on my computer. Or, I could use an external service. The upside is that the barriers of use are minimal.
Of all the articles by RMS I have read so far, I find this one the least convincing.
[1] https://www.gnu.org/philosophy/who-does-that-server-really-s...
What i'd like to see is a copilot for scientific papers. There s so much duplication out there that it would be easy to train and it would save tons of time from the chore of writing and referencing the same things over and over
Sometimes it blatantly copies GPL code without my knowledge.
Sometimes I myself write code that could be part of a GPL code-base, without knowing.
Funny thing is, the difference here isn't the actual code that's written, but that Copilot has seen many GPL code bases and I didn't.
Sometimes I really have the feeling Copilot understands my code base and suggests code that seems to be custom tailored to it. Albeit in most of the cases it doesn't fit 100%.
I think the latter cases are when Copilot shines and doesn't violate GPL code at all, but can I be safe? Probably never.
The FSF said it was unacceptable because it's proprietary, like Github in general.
They've made no statement about the specific details of Copilot.
Even at my place of work, there were some expressing interest in it, and after playing for an hour or two, haven't touched it since. I get the impression there are more people discussing it than actually using it.
These are not groundbreaking problems - I'm generally looking for solution out there that uses a popular library. This is especially useful if it's a language where I'm not up to date on the de-facto library of choise is for various use-cases. In most cases, especially while prototyping I'm not going to write it myself, nor care about which library - I'm far more concerned with some big picture goal.
If someone builds a product that can do the work of Googling a solution for me, that's the draw of the product. The code is freely available anyway.
The licensing is definitely a problem, but I think that Copilot only highlighted the issue - it didn’t create it.
The concept of software license looks pretty fragile to me. You can own software but you can’t really own PL statements.
You can own the whole but you can’t really own the atomic parts that make the whole.
If so, closed-source is just a way to make you work really hard to achieve a result that someone else already achieved by means of obfuscation and secrecy. I’m not sure where open-source stands. Maybe it’s just a social contract.
Mostly I don't feel the need for such things, but it would be fun and interesting to see just how good copilot is.
Not fun enough to install visual whatsit though.
Areas of interest
While any topic related to Copilot's effect on free software may be in scope, the following questions are of particular interest:
- Is Copilot's training on public repositories infringing copyright? Is it fair use?
- How likely is the output of Copilot to generate actionable claims of violations on GPL-licensed works?
- How can developers ensure that any code to which they hold the copyright is protected against violations generated by Copilot?
- Is there a way for developers using Copilot to comply with free software licenses like the GPL?
- If Copilot learns from AGPL-covered code, is Copilot infringing the AGPL?
- If Copilot generates code which does give rise to a violation of a free software licensed work, how can this violation be discovered by the copyright holder on the underlying work?
- Is a trained artificial intelligence (AI) / machine learning (ML) model resulting from machine learning a compiled version of the training data, or is it something else, like source code that users can modify by doing further training?
- Is the Copilot trained AI/ML model copyrighted? If so, who holds that copyright?
- Should ethical advocacy organizations like the FSF argue for change in copyright law relevant to these questions?
While i do believe that the topic is definitely worthy of discussion, my question would be a bit different.If the tooling is already pretty capable, wouldn't just ignoring all of the ethical questions lead to having a market advantage? Say, some company doesn't necessarily care about how the tool was trained and the implications of that, but just utilize it to have their developers write software at a 1.25x the speed of competition, knowing that noone will ever examine their SaaS codebase and won't care about license compliance. Wouldn't that mean that they'd also be more likely to beat their competition to market? Ergo, wouldn't NOT using Codepilot or tools like Tabnine put most others at a disadvantage?
Personally, i just see that as the logical and unavoidable progression of development tooling, the other issues notwithstanding, very much like IDEs did become commonplace with their refactoring tooling and autocomplete.
I've worked with Visual Studio Code on large Java codebases, as i've also used Eclipse, NetBeans and in the past few years IntelliJ IDEA; with every next tool i found that my productivity increased bunches. Now it's to a point where the IDE suggests not only a variety of fixes for the code itself, but also the tooling, such as installing Maven dependencies, adding new Spring configurations and so on. It would be hard to imagine going back to doing things manually and it feels like in time it'll be very much the same way in regards to the language syntax or looking at documentation for trivial things. After all, i'm paid to solve problems, not sit around and ponder how to initialize some library.
I'm an open source audio coder. I'm not any great shakes as a programmer but I make my living by regularly coming up with novel ideas, and my codebase is on Github and MIT licensed. Over the course of hundreds of DSP plugins, some key parts are very repetitive.
This means that there are audio processing algorithms I do which NOBODY ELSE is doing, because they're unusual and in some ways arbitrarily wrong. They're chosen to produce a particular sound rather than the textbook-correct algorithm output. Example: interleaved IIR filters, to make the audio interact differently in the midrange and produce a lower Q factor at the cost of producing some odd artifacts near the Nyquist frequency.
Nobody out there in the normal world or commercial DSP or academia would intend to do that, because there are significant reasons not to (which I work around, in context). But if that stuff appears in Copilot output, they are jacking my INTENT but violating the very lenient MIT license by stripping my credit. They'd also be misleading hapless audio programmers who didn't intend to adopt my techniques, but that's a side issue.
I'm interested in who else out there has a substantial codebase subject to Copilot reprocessing, who is demonstrating intent that isn't 'normal' and doesn't exist in the 'normal' world of whatever domain's being coded for.
The point is, can it be demonstrated that Microsoft is taking SPECIFIC things from specific open source developers that can be clearly traced back to one source of distinct intentions, and then stripping the licensing? I feel like said intentions cannot be 'normal and industry-standard and correct'. It's gotta be things like my IIR interleaving, where it's a quirky choice you wouldn't automatically do, very likely with costs and consequences in its own right. Something you could choose to adopt if you liked the trade-offs (or in my case, the sound).
As a freelancer, every time a client decides for a cheaper alternative, I make very clear I would be delighted to work with them in the future anyway. It rarely fails, one or more years later, the clients calls me back because their cheap alternative turned out to suck and be expensive eventually. Last month, a client from Luxembourg called after 6 years of total silence. They still had me in their listing. 3 years ago, one called me because 2 years prior, the 50k quote they rejected from me turned into a 400K bill from my competitor, and still no release yet.
My rates have been steadily increasing for years thanks to this. Before, geeks were at a disadvantage because people didn't know better, and teams with a good marketing would destroy us. But now, they have been burned so many times. And it pays because more and more devs coming to the market are becoming dependent on their tooling. Now, more often than not, I work with teams that have been copy/pasting git commands not knowing what they do, that have never, ever looked the source code of their framework or don't know how to use a debugger. The HN bubbles tends to blind us to the reality of the corporate world.
Yesterday I did a deployment, but was not allowed to touch the machine. Instead, they made me call a guy sharing a screen of a Vista machine, while he was sshing prod using cmd.exe, and I had to dictate him the instructions to debug the deployment on their custom linux setup. A near retired sysadmin that couldn't type with 10 fingers, pressing 30 times the up arrow to find a bash command in the history every time. He could click on WinSCP very well though.
This 20 minutes job turned into an afternoon of billing.
Though I suppose that's what I look like as a Python expert to an old timer from the 80s that can code in assembly, debug using strace and understand L1 cache :)
People are scared we are going to get automatized by AI.
I am preparing for the most lucrative decade I ever worked in.
Hey, I feel called out on this. I type with 2/3 fingers and I'm quite fast even if not like a full fledged 10 fingers typist. At my age I don't think I will ever learn typing with 10 fingers, but I think I don't need it either.
Thank you for reminding me that it's about time for my yearly reread of Pro Git. It's amazing how many people look at you like you're a wizard when you just...read the documentation.
Maybe, maybe not though. From the perspective of a non-tech enterprise organisation we’ve moved to more and more standardised software that is “good-enough” to avoid dealing with the delays, going over budget, not quite what we wanted and expensive support of specialised software companies.
Office365 has basically replaced half our software suite, and while we do still by some extensions for them from 3rd party companies, Microsoft is simply getting more and more of our business by simply being good enough at a low enough cost.
I’m not going down on some conspiracy path here by the way. If anything, Microsoft is simply using this project to get free research for their Azure Automation services that are currently taking over all the RPA business from their much more expensive competitors. This needs janitors, but not well paid ones.
The client has a problem and asks us for a solution. We suggest a simple cost effective solution, client insists on custom software developed to their spec they have "perfected". Client lists all their nice to haves as must haves so they get their moneys worth, not realising I just charge more for more work.
The software is delivered to spec, then the client realises that their spec doesn't work in the real world because they just assumed the best and forgot about edge cases.
Non-tech companies just don't get tech, instead of seeing building software like building a house they view it more like a wizard does magic then a website appears.
Funnily enough the only thing you can say is that you can copy and paste more successfully with copiolot than someone else who is unnamed and possibly unknown.
The truth of it, whatever that may be will shake out.
It's even worse. Copilot encourages boilerplate and poor abstraction by making it cheaper.
There's definitely other scenarios, like my preferred one of Copilot being legal itself but devs being responsible for using code generated from it, same as if they were using a more direct copy-paste or search tool.
I think there's utility there, but the execution isn't quite right with me for encouraging the right behaviours.
It always amazes me how many people think in the future every one will know how to code, why? Every one has a car but not every one knows how to build a car, and why would they want to?
That being said, if you are a reasonably skilled coder, Copilot can help you __a lot__. I've started using it a few weeks ago and it typed __a lot__ for me. The sheer amount of time I don't have to spend typing amounts to hours / week once you get up to speed with the tech.
I tried my hand at Dreamweaver back in the day (approx. 2005) and didn't like what it generated. I've written a few pages manually (first when learning HTML ca 2004, then for a personal website in 2014) and it felt much nicer.
I have made Windows applications with GUIs as part of my job and for that I've mainly used WPF written as a mix of XAML and C#, written by hand and inspected in the editor. There are graphical tools, but I've mostly found it more efficient to write what you mean directly.
But how are things actually done in the web development business, nowadays?
But how do you tell the machine what your problem is? It's just another abstraction, a bit like manual milling vs CAM.
Conceptually that is the overall goal.
I'm in the industry since 20 years and gradually at the spare time I'm learning skills in house renovation. I think in the next 10-15 years I'll loose my job as software developer do to AI and will resort to some manual labor. Hoping I'd survive till retirement
For those who aren’t aware, PCB design used to be an automated task, done by software with minor tweaks. The thing is, complexity had a positive payoff, so soon we had trained technicians doing layout. Right now most PCB layout require so much technical knowledge most people working in layout are engineers with masters degree.
Of course there’s also a lot of cheap electronics where complexity doesn’t payoff and cutting development cost it’s what matters, but it’s not most of the market.
As long as you keep learning and improving, you are likely to see an increase of demand, not a decrease, although the job will be quite different.
The hard part of code writing is not the “transform this logic to code”, but to come up with the logic in the first place, which is pretty much transform this and that requirement into logic first. Which does often need domain specific knowledge, and possibly interaction with the client.
(Still learn to renovate as it is an awesome skill)
It still amazes me when people doubt or underestimate what can happen in future tech.
You don't think it's unethical for a human to learn from open source code and then write their own private code, surely?
What is the copyright of code written with copilot? Copilot learns the code and forgets authors.
Would you agree if I take your open source project, learn piece by piece, rewrite it from scratch and put my name on it without a single word about your work?
Computers read, process according to predefined algorithms, and output. A computer "learns" code when it comes over a wire in pieces over a bus, and writes code when it transmits it over a bus to a another device.
How's that? The entire point of a cleanroom re-implementation is that the, er, entities (historically, human programmers) writing the code have provably not seen the code being copied. Which is rather contrary to how copilot has seen approximately all the code.
Now you can just grab any leaked code of a closed source program, feed it into your AI and get back code you can license under the GPL and nobody can do anything about it.
An easy application I can think of is ZFS; simply feed the AI all CDDL licensed code, then ask it to reproduce ZFS. Probably will have some bugs but it would be licensable under GPL if the AI is considered a whiteroom.
[1]: https://ilr.law.uiowa.edu/print/volume-101-issue-2/copyright...
There is no barrier to writing buggy code. Writing buggy code is considered trivial in any language.
There's a constant tension between building fast and right (or should be if you're not fucking up).
The part that takes the longest is working out the tests and what the code should do, the actual internals of the implementation are simple, boring, and obvious.
Automate that and it makes developing even more fun that it is today.
Or, if you're not, you should be.
But, if copilot instead suggests just writing out the contents of the library directly into your code base a lot of people will do just that. That'll be lots of fun when you're trying to track down obscure bugs in huge piles of murky "copilot assisted" code.
It'll be especially bad in environments where developers feel either extrinsic or intrinsic pressure to always write more SLOC and churn out more PRs because it will allow developers to create a very compelling illusion of productivity.
I have a feeling this will be one of the long term side effects of copilot. I'm actually suspicious that this dynamic will blow away all of the productivity gains and then some and might lead to companies banning its use when they realize the true costs of sifting through the GPT spew.
LOL, it's been happening since the beginning of software. So many things reduce or replace developer work - compilers, libraries, templates, free/open tools. Desire is always going to expand to contain the whole space of what's possible and then overflow.
Microsoft can of course create Copilot using the GitHub code. It’s not publishing any derived work on its own - and this type of access to the code is likely a large part of the reason for buying GitHub in the first place.
The only ethical issue for Microsoft here is if Microsoft sells this service (they don’t - yet) and risk including nontrivial code without attribution (seems likely, given the behavior of the preview but if ms for example limits output to a few lines or prevents generating too large chunks verbatim the issue almost disappears).
Ethical/legal issues and risks for users of Copilot are much larger, such as if they use it to conjure up a nontrivial snippet and then not research the origin of it. It’s no better than copying it from the original location.
Microsoft could probably throw in parts of their closed source in copilot - but not even Microsoft controls that. Third parties have copyrights that prevent it too.
But people who keep code in public GitHub repos (I assume) let GitHub do things like train neural nets on it, and Microsoft obviously don’t keep much of the windows or office sources in public GitHub repos.
The fast inverse square root is the most nontrivial code I can think of and it's already been found to appear in suggested snippets, with attribution nowhere to be found.
If we accept Copilot as merely a tool, we'd need to consider any developer using that tool to be immoral. There's no discernable difference between shamelessly copy/pasted code and Copilot output, so why consider the tool more than an automated clipboard?
No, I think the tool is built wrong, setting users up to fail. It's a copyright footgun to produce buggy, vulnerable, often even completely wrong code.
As for the copyrights, all code with a license has the same copyright as any private code hosted on their own servers. You can't just plug some GPL code into your project and sell it, even if you can find the code itself on Google. There is no copyright difference between the projects, it's merely a matter of availability to the scanner.
Adding Microsoft's own, proprietary, quality code to the network would be the gesture of good faith that would make me believe that the developers never intended to break any licenses and that it all just got out of hand.
At least part of the reason has to be because only a tiny percentage of developers use C++, particularly the flavor of C++ that Visual Studio speaks, as opposed to Javascript, Python, etc. Moreover, kernel and driver code doesn't resemble boilerplate code used in desktop applications. Is this not obvious to the people who keep repeating this?
The C and C++ boilerplate Microsoft uses is very much relevant to any driver development or native applicationd development (if that still exists) for their platform. Their example code, MSDN snippets and documentation is very influential to anyone using C++ for Windows applications. Their COM+ libraries are even more relevant because they all live in user land.
There's also plenty of MS code that's written in other languages for platforms like Azure or UWP.
The there's the C and C++ code that's out there on Github. The C style of the Linux kernel, forked over and over, is completely useless for anyone developing network tools. The GTK or QT C++ files are useless for anyone writing wxWidgets code. The conventions, behaviours and style for the source code of libcurl and Linux are as distant from each other as Windows Explorer is from the NT kernel. Yet both have been taken into account by the mighty Algorithm.
How is my shitty early Android app, still written in Java, with clearly C#-inspired naming and almost PHP-like class structure more relevant to anyone than Microsoft's own code base? At least theirs is functional and useful.
"Nobody programs like Microsoft so the code examples is useless" is not an excuse, because you can apply it to almost every project on Github in some way. The machine learning is supposed to distinguish all of that, that's the entire point.
1. they can check to see if the generated code is an exact copy of an example in the training set
2. when the code matches, they can discard it, they got many predictions for each prompt anyway.
3. My preferred option - they can display the URL of the source page together with the code, acting like a regular search engine at this point; this also solves the problem of not knowing the copyright status of the code
It feels like the majority of my coding consists of translating extremely complex business requirements that neither the business people nor me understand 100% into highly specific code that appears to do what we want it to do. How can Copilot help me here?
For example - code quality aside; just for the sake of demonstration - if I have a line `name = data.get("name")` and then press enter and write "ad", it'll likely suggest `address = data.get("address")`, so I can type "[Enter]ad[Tab]" and save myself a few seconds.
Repeat this for every line in a program, and those seconds add up. I'm a fast typist, but it's still nice to have intelligent autocomplete that can infer my intentions with pretty good accuracy.
I'm guessing Copilot will largely be similar, but with support for multiple lines. It'll probably be especially helpful for imperative, somewhat repetitive languages like Go, where boilerplate is common.
Something tells me copilot can be a big win for very verbose languages like C, where most code follows very specific structures.
You’d be surprised at how much code is “borrowed”
>With all these questions, many of them with legal implications [..] there aren't many simple answers. To get the answers the community needs, and to identify the best opportunities for defending user freedom in this space, the FSF is announcing a funded call for white papers to address Copilot, copyright, machine learning, and free software.
Their unacceptable and unjust opinion is just from the licensing of GitHub CoPilot / Visual Studio Code itself:
>We already know that Copilot as it stands is unacceptable and unjust, from our perspective. It requires running software that is not free/libre (Visual Studio, or parts of Visual Studio Code), and Copilot is Service as a Software Substitute. These are settled questions as far as we are concerned.
> We already know that Copilot as it stands is unacceptable and unjust, from our perspective. It requires running software that is not free/libre (Visual Studio, or parts of Visual Studio Code), and Copilot is Service as a Software Substitute.
On the question of the use of source code released under the GPL, they do not have a position yet:
> With all these questions, many of them with legal implications that at first glance may have not been previously tested in a court of law, there aren't many simple answers.
I think people value their code snipits way way too much. A 10 line function to post to twitter is not worth anything. Its an entire codebase that has value.
It also makes sense, as someone putting their code under BSD for instance do so shouldn't be bothered by copilot regurgitating their code.
[Edit: I mixed BSD and MIT, I was going for the more permissive one. The point on reproducing copyright mentions still stands though]
I don’t agree in general with this. Remember that the BSD family of licenses still require that a copyright notice and the terms of the license be reproduced in distribution of the software and derivatives of the software, both in source and in binary forms.
Just cause I make software that I release under the ISC license, and I want proprietary software to be able to build on my work, does not mean I am ok with someone stripping away the copyright notice and the license terms from my code and claiming it as their own. Quite the opposite.
However, at the same time, if what is being reproduced is only a small snippet or some generic code, as I understand is what Copilot will usually do, I don’t personally mind. But I still think it needs to be tried in courts and that we get some rulings on it.
And I remain skeptical towards Copilot because I think it will be able to reproduce non-trivial portions of code as well, depriving people of credit for a lot of hard work that they put in. At the same time, it is cool tech, and it looks to have the potential to save a lot of time for a lot of its users by automating a lot of menial work in typing out the same old lines of code again and again. So it’s not like I am directly opposed to Copilot either. But I think we need to acknowledge the issues and that Microsoft and GitHub should work to address these kinds of things. And I am happy that the FSF is challenging them on these things, even though they are doing so from the point of view of a family of licenses that is more restrictive than the type of license I personally put on the code that I myself produce.
My impression is that these claims would not be actionable for a few simple reasons:
- The generated code is pretty small.
- The generated code is adapted to the context (i.e. not a vebatim copy).
- The generated code would be common to many repositories and not just one.
Because of all of the above, tracing any code fragment to a specific repository and then defending a claim would probably be very hard/impossible. Copyright is about the form of things and if it's not a verbatim copy of something really unique, it's hard to make the case for an infringement.
Everyone thinks this until they become the next Linksys, and have to crack open their entire tech stack because someone reverse engineered the text of the GPL in their firmware...
Not saying that i condone it or anything like that. However, it does feel like these things will oftentimes be ignored because of a lack of a regulatory body that'd inspect all codebases for compliance (even the idea of which doesn't feel feasible).
Because of that, cases where someone has both the skills to decompile a codebase and also has an axe to grind seem like the exception, rather than the norm.
FWIW this seems to be the current interpretation of copyright laws when it comes to machine learning, at least in the US. The only questions I've really seen about the legality of Copilot is about it reproducing code and whether that reproduction is fair use or not. But few are arguing that training the model itself on any available source is violating fair use.
I think this is a sensible take. An AI should be able to learn to program from any source code it can see, just like a human.
> But few are arguing that training the model itself on any available source is violating fair use.
People argue this all the time on HN.
But these same people seem to believe it is just pasting bits of code it has seen before together, so I suspect they don't have the technical or legal understanding to comment sensibly.
Like with sed(1)? It’s still just source obfuscation.
There are plenty of bug classes which are trivial in any language; plan interference is a good example. Languages provably cannot avoid these bugs entirely, just make them less easy.
An AI cannot hold copyright however and isn't capable of violating copyright (legal entities are, which an AI is not).
It literally applies to a human. Copyright is about reproducing the same work. "Transforming" the work means copyright doesn't apply.
Most of your brain is trained on ideas that come from someone else's proprietary IP, whether you realize it or not. Think about that next time you're unintentionally humming that catchy tune from a Coca Cola commercial.
The copy/transform distinction isn't just about fair use parodies or commentary, but things like writing music or drawing paintings or writing fictional books in a similar style to someone else (and using some of the same ideas).
The crux here is that we can't accept that machine learning is "learning". We think of it as copying, therefore subject to copyright.
It doesn't help that Copilot in edge cases will copy. But in many cases the resulting snippet is substantially a new work.
But AI is inevitable, and therefore we'll have to start treating machines like human agents. It'll be really weird.
Note though that all such examples of nontrivial regurgitation that have been presented yet have been deliberately “triggered” (as far as I know) knowing they would likely show up if copilot was fed the function header. It’s also important to remember that this is still preview software. The final version hopefully has more restricted output since this is obviously the big weakness of the system.
I agree it’s a license footgun 100%. But as I said this is the developers problem. Which is why few of us will ever be able to use it in its current form.
As for the ms sources argument - the reason ms bought GitHub is to have this kind of access to a lot of code. It’s their code to use in this way. People who committed code gave GitHub (and it’s future owners) the right. Microsoft (as far as I understand) can sell the right to view this code, for example, through GitHub fees. It’s not against the license of a GPL repo to do so. So Microsoft isn’t violating a license by mangling the code into snippets and charging for the pleasure of downloading those snippets. What’s against the license terms is for me to download the snippet, and accidentally use it in my proprietary software. Does that make the tool bad to the point of being useless? Perhaps. Is it illegal or unethical? I don’t think so.
> You can't just plug some GPL code into your project and sell it, even if you can find the code itself on Google.
Although some people seem to think copilot can be used to “wash” licenses by giving users a black box “excuse”, I think that idea is dead in the water. Anyone who has a nontrivial-enough GPL snippet in their proprietary code has violated the license.
It feels like cheating, but I put my imposter syndrome in a trash bin years ago.
C's even easier; you just write the C code with parens around everything, then run it through cpp, then correct any divergences from the expected code. (It does take a bit longer, though, because the compiler waits until the last minute to shout at you if you make a syntax error.)
I can see complicated C++ templates taking hours, but they're not really macros. (They're probably the correct tool for this, though.)
We still write HTML (ish), but then we compile it into JavaScript functions, which then generate HTML again on the client.
I'm being slightly facetious, but that's basically what a modern web app written with Vue/React/Svelte/Angular does.
Both are driven by vast amount of data processing that can't be done locally both because you don't have the horsepower and you don't have the bandwidth or pragmatic data access to terabytes of source code.
So instead of vacuous appeals to emotion, it's better to justify our opinions with objective reasons other than "but I want to have this". We all want things, but we're not entitled to them.
Aside from the fact Copilot literally can only be offered as a service due to its nature (unless you want to sound like one of those jokes where "you downloaded the internet to your USB stick"), everyone is free to offer a service precisely how they decide.
They're not obligated to give you anything they don't want to. They don't have to listen to you, or FSF, or anyone else about what they consider, arbitrarily, an "absolute disgrace". You use it or you don't use it. Simple as that.
P.S.: I consider it an absolute disgrace ice-cream is not free but this argument never seems to works in practice.
No, it's not. The discussion that we're having is over whether this is permissible or not, and the lobbying that groups such as the FSF are doing is in support of a different set of rules to be enforced.
None of this is simple.
If one were to "decompile" an existing artificial neural network model, is this basically what it'd look like inside? Or is it too crude of an analogy / a category mistake?
I think many companies are already in that exact scenario.
Messy code should be more costly to analyze and test.
It seems to me that the licensing part is the part you can't throw into a big markov chain, legally. Even if they aimed only at open-source licensed material without exception, the point where they discard all the licenses and export a 'generic' slurry is the point where they infringe by definition. If they trained on more restrictive licenses that's just doubling down: what's needed is annotation and maintenance of what bits of code came from what licensing pool. You could well have a giant pool of GPL, a giant pool of MIT (which I would be in, all the more since I maintain a very automatable code style that's easy to import from). You could accumulate a list of sources for anything you did, at whatever level of granularity is desired.
The purpose of throwing away this attribution is intent to infringe. It's constructing a machine for the explicit purpose of grinding code into sludge of intentionally small enough pieces that, if you reconstruct copyrighted code in your markov-chainy way, you've got grounds for pretending you didn't build your machine to do exactly that.
I believe all laws about intent have to deal with determining who is pretending and who isn't. But these laws still exist, because there are ways to prove such things
FSF is an ideological organization, they're a bit like the religious equivalent of some clergy in the far East.
Yes I know a circle of people respect FSF a lot and pay attention every time they wave their fingers at someone. The same is also true of the ayatollah when he issues a fatwa. And then the world keeps spinning and nothing changes.
If it was indeed written from scratch, I see no reason (although it’d feel nice) to credit my original work. Having multiple implementations of an idea is always a great thing.
Would this kind of copying be fine in software and not in other scientific papers or other industrial processes? Would it be fine if I train copilot on a patent database and start creating new patents (at a rate in which is would be unpractical to determine that it is regurgitating prior art)?
Open source would be ruined if it were easier to build upon past works with lower barriers to research and licensing?
> Would this kind of copying be fine in software and not in other scientific papers or other industrial processes?
Scientific papers are more about collecting and experimenting with novel data- and referencing an explicit paper trail of past results. It's not really comparable. Fiction is a better match.
> Would it be fine if I train copilot on a patent database and start creating new patents (at a rate in which is would be unpractical to determine that it is regurgitating prior art)?
This is a problem with the patent system, not copilot, and is also isn't a capability that copilot actually has. You're describing a different system entirely.
If a human does that we call it plagiarism.
If the AI could be shown to have copied the code it would likely to be found to be infringement.
If it was found to have generated new unique code, and merely leant how to program from the code it was trained on it likely wouldn't.
In either case, this is different to a clean-room implementation (which I think is what you said by "white room").
Clean-room implementations are supposed to protect against trade secret infringement, and are mostly used when building interop with hardware (where compatibility has special carve-outs).
If a person or AI had seen copyright code used in the project it would never be considered clean room.
But CDDL code is fine for a person or AI to learn from when building a new, incompatible implementation that doesn't share any code.
There's a fluency effect that happens when you're able to touch type without looking, similar to when you master a language enough to speak without thinking. You become more efficient because you dissolve the barrier between your thoughts and their expression.
I know there are lots of arguments intending to counter the importance of touch-typing in programming ("most of my time is spent thinking"), but I think those miss the point. Faster typing is just as valuable whether you're programming, or writing an email, or responding to a message.
Usually when someone says "touch typing" they refer to the standard "home row" approach, using all fingers. I could have made that more explicit.
Only during the most rare occasion, those 10 fingers will give up their self-designated posts and come to this massive array of buttons to do their exercise of pressing stuff, something like "su apt install", "dock run", "cd.." etc.
I'd say type fast != work fast, so I don't mind if someone is a slow button presser :)
I type with only two/three fingers. I don't hunt and peck, but I'm no touch typist either. I can type faster just with my index fingers than some people with all of their fingers.
That said, typing speed is not critical. I mean, if you're really slow I guess it matters, but it's no measure of the quality of your work. The brain is the bottleneck here, and all the slowness happens in the design/troubleshoot/think space anyway.
I use nearly all fingers, but where i really suffer is that i find key-combinations, notably alt+ to be really awkward because my hands are at a steep angle relative to the straight keyboard.
I live in Kakoune (vim-like) so "touch typing" is my bread and butter, but home-row just feels so bad to me.
I keep meaning to try a split keyboard with home-row. I suspect that's the root of my issue, and that my odd typing pattern is a result of trying to manually replicate a split keyboard. /shrug
I'm still painfully slow. Maybe 50wpm tops for natural language, embarrassingly much less for programming.
Thing is, I'm now even slower than I was before "taking the plunge", and I can't even go back to my old loose method I've nurtured for 20 years!
On the flipside, as you mentioned, typing in itself isn't a huge bottleneck, especially with autocomplete, and I'm much faster navigating the ide. So maybe it's a net positive after all.
Besides, I don't blame him, he was performing the job he was paid for.
Why is recognizing someone else's work so much pain?
The whole point is that copilot forgets who wrote the code and who is the author of the whole idea (unfortunately few programmers write it but sometimes it is there is you are patient enough to read documentation). Thus a copilot's user cannot know who deserves the credit.
This whole discussion is like if you train an AI to pick apples from a supermarket and leave them on the street waiting for someone else to take them home, and pretending that nobody is stealing anything.
Because its basically impossible to completely and accurately attribute the origin of all your knowledge. And it is impossible to verify that the source you think is the originator of your knowledge is the original creator of that knowledge. Odds are they learned it from someone else. It really doesn't matter, at all.
> This whole discussion is like if you train an AI to pick apples from a supermarket and leave them on the street waiting for someone else to take them home, and pretending that nobody is stealing anything.
No, because in this case the supermarket has lost apples. This is more like accusing street performers singing popular songs without permission of the songwriter of being thieves. Or an engineer studying a bridge and leveraging techniques used in that bridge.
This is a fallacy.
I touch type.
I use 4 fingers on my left hand and 3 fingers on my right to type. (This is counting thumbs on both hands)
I've never really understood the home row because if my hands are on the keyboard I'm actively typing (or playing video games), in which case all my fingers are busy actually doing things, so I basically just put my hands down where I'm going to start typing anyway. When I'm actively typing I'm not wasting time moving my hands back to the home row, either.
If you don't explain why you think the comparison is wrong, your comment might as well be "nuh uh".
https://mechboards.co.uk/shop/kits/helidox-corne-kit/#case
With small keyboards, you’re trading off physical distance between keys for having to press more buttons simultaneously. Might fix your problems with reaching key combinations, but the combinations themselves do become more complicated as well.
So far i've not felt i could get software to do the modal editing i speak of reliably in all of my environments. I'm on NixOS right now, and i didn't want to manage the software. It definitely is interesting though!