GitHub Copilot Generated Insecure Code in 40% of Circumstances During Experiment

GitHub Copilot Generated Insecure Code in 40% of Circumstances During Experiment(theinsaneapp.com)

261 points by elsombrero 4 years ago | 155 comments

lmilcin 4 years ago |

I thought this should have been expected.

Security starts with deep understanding.

Some standards and practices can help avoid some types of problems, and some are even rather effective (like airgapping your systems), but there isn't any way to assure security in general other than truly understand what you are doing.

I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.

For a good developer those low level, low engagement activities are not a problem (except maybe for learning stage where you actually want people engaged rather than copy/paste). What it does not help is the important parts of development -- defining domain of your problem, design good APIs and abstractions, understanding how everything works and fits together, understanding what your client needs, etc.

Also, I feel this is going to help increase complexity by making more copies of same structures throughout the codebase.

My working theory about this is this is going to hinder new developers even more than they already are by google and stack*. Every time you are giving new developers an easier way to copy paste code without understanding you are robbing them an opportunity to gain deeper understanding of what they are doing and in effect prevent them from learning and growing.

It is a little bit like giving answers to your kids homework without giving them chance to arrive at the answer or explaining anything about it.

Another way I feel this is going to hurt developers is competition in who can produce most volume of code.

I have already noticed this trend where developers (especially more junior but aspiring to advance) try to outcompete others by producing more code, close more tickets, etc. Right now it means skipping understanding of what is going on in favor of getting easy answers from the Internet.

These guys can produce huge amounts of code with relatively little actual engagement.

To management (especially with wrong incentives) this seems like a perfect worker, because management usually doesn't understand the connection between lack of engagement and planning at design/development time with their later problems (or they don't feel it is them that is going to pay the price).

The Copilot is probably going to make it even more difficult for people who want to do it the right way because even starker difference in false productivity measurements.

andix 4 years ago | |

The real evil here is boilerplate code.

I've seen so much boilerplate in the Java or classic .NET Framework world, it's incredible. So many layers of DTOs, Request/Response Models and so on, that could be just generated. Or most of the time even removed completely (that would cost some "architects" their job though).

This is also true for a lot of Redux or Angular/NgRx applications. So much boilerplate, that you can't find the relevant code anymore.

lmilcin 4 years ago | | |

(I have been professionally programming Java backends for the past 16 years).

Java is not the culprit here.

I think it is something that happened on the way that has something to do with J2EE and patterns craze we had a decade ago or two ago.

It doesn't help that frameworks like Spring and their documentation go out of their way to propagate these boilerplate-heavy patters.

Copying these lazy patterns is shortest, easiest way to get to working solution for a person that doesn't want to put any extra effort. And you can't get punished for doing this. Most developers don't even know there exist any other possibilities than mandatory controller calling service calling database layer and hordes of DTOs some people call "model".

simion314 4 years ago | | |

An IDE or other tools will generate correct boilerplate code. Seems a gripe from someone that prefers hidden magical code that setups code behind their back.

The evil is that someone trained an AI on random text , not even with some AST, so you have garbage in so no surprise you get garbage out.

A true AI would understand that "the dev wants trough find all lines of text in a file that have this property", the AI just does "this code string is similar to this other code string using this `black box metric`"

fn1 4 years ago | | |

This is because "Copy'n'paste" programming get's more and more common.

I see more and more juniors pasting code or shellcommands from StackOverflow with careless ease, without even pretending anymore that they're interested in how it actually works.

merpnderp 4 years ago | | |

Should look at Vue with the composition-api layer, there's close to zero boilerplate.

A store in Vue 3 can basically be:

  export default { state: readonly(state), ...setterFunctions }

It doesn't get more easy to read and streamlined than that.

amw-zero 4 years ago | | |

It is true that lines of code are correlated with bugs. In fact, that's the best predictor of the number of bugs - there was some study somewhere that concluded that.

I still doubt that that's a result of DTOs.

captainmuon 4 years ago | |

> Security starts with deep understanding.

I wonder if the way we are approaching it is wrong. We are basically putting text though a deep learning black box. The model might have learned some abstractions, but all in all it is just playing word games and trying to guess the most likely continuation of a string. Maybe we should go into the other direction and base such an AI on a really massive ontology. Instead of unstructured strings, put highly structured facts into the model.

For example, just like in Copilot you'd start with:

    def login_user(username, password):

But the ontology would also know things like:

- This is a web application and this function is going to be called after submitting a form

- Security specialist Bob says you should always hash your passwords

- Specialist Anne says you should use bcrypt

- Tom says Anne is 95% trustworthy

... and thousands of facts more. And then it would take them all into consideration, build a represenation of the problem you are trying to solve, find a strategy, and only in the end generate code.

I have a feeling that there was a qualitiative leap going from simple neural networks and multivariate methods to "deep learning" and modern machine learning, and that this is mainly driven by scale and available computing power. Now what if we try the same thing for ontologies, expert systems, and triple store databases? I think the difference will be between some AI parroting what it read on Wikipedia (direct speach), and a smarter AI being able to reason about what it read on Wikipedia (indirect speach).

lmilcin 4 years ago | | |

I think one way this could be improved is, instead of giving an exact answer (which is provably impossible to do correctly) maybe it could be possible to point the developer to other repositories where other people were solving similar problem.

There are already services that do this for you and I actually find them useful. For example, I might be trying to use a function from some library and it fails. If I get pointed to some public repositories that use the same library in function for similar purpose, I may learn that I am missing some critical setup. I can also browse different uses of this function/library and get informed on how it is at the very least used successfully by others.

perl4ever 4 years ago | | |

There was a project started in 1984 to do that:

https://en.wikipedia.org/wiki/Cyc

Supposedly an attempt to assemble a database of "common sense" facts and reasoning.

It has always been controversial and it's not clear what kind of success it's had.

DonHopkins 4 years ago | | |

You're touching on the "Neat -vs- Scruffy" dichotomy in AI. (But it's not necessarily a dichotomy -- they can be combined!)

https://en.wikipedia.org/wiki/Neats_and_scruffies

From the "Scruffy" side, there's Charles Rich's classic work on "Programmer's Apprentice".

https://dspace.mit.edu/handle/1721.1/6054

https://dspace.mit.edu/bitstream/handle/1721.1/6054/AIM-1004...

>The Programmer's Apprentice Project: A Research Overview

>MIT AI Lab Memo No. 1004, November 1987.

>Rich, Charles; Waters, Richard C.

>Abstract: The goal of the Programmer's Apprentice project is to develop a theory of how expert programmers analyze, synthesize, modify, explain, specify, verify, and document programs. This research goal overlaps both artificial intelligence and software engineering. From the viewpoint of artificial intelligence, we have chosen programming as a domain in which to study fundamental issues of knowledge representation and reasoning. From the viewpoint of software engineering, we seek to automate the programming process by applying techniques from artificial intelligence.

https://dspace.mit.edu/handle/1721.1/41967

https://dspace.mit.edu/bitstream/handle/1721.1/41967/AI_WP_1...

>Plan Recognition in a Programmer's Apprentice. Ph.D. Thesis proposal.

>MIT AI Lab Working Paper 147, May 1977.

>Rich, Charles

>Abstract: Brief Statement of the Problem: Stated most generally, the proposed research is concerned with understanding and representing the teleological structure of engineered devices. More specifically, I propose to study the teleological structure of computer programs written in LISP which perform a wide range of non-numerical computations. The major theoretical goal of the research is to further develop a formal representation for teleological structure, called plans, which will facilitate both the abstract description of particular programs, and the compilation of a library of programming expertise in the domain of non-numerical computation. Adequacy of the theory will be demonstrated by implementing a system (to eventually become part of a LISP Programmer's Apprentice) which will be able to recognize various plans in LISP programs written by human programmers and thereby generate cogent explanations of how the programs work, including the detection of some programming errors.

supernovae 4 years ago | |

I don't understand why people fear copilot or blame copilot.

Copilot doesn't bypass peer review, code review, unit testing so on and so forth.

gverrilla 4 years ago | |

Do you think only experts should be programming? I'm an amateur programmer, and I think copilot could help me a lot with unimportant things, as you said - I even tried to install it but I'm not on some list. I can read code, and have built a few programs - I've hired around 30 different programmers in my life, and the vast majority clearly are copy-pasters-adapters. The way I see it, that happens because programming is still way more complex than it should be - and copilot will help with that. Maybe you are thinking about elite developers or perhaps developers on big companies, but I think it will be greatly benefitial for us low-level coders amateurs, freelancers and fresh people. Am I wrong?

lmilcin 4 years ago | | |

> Do you think only experts should be programming? I'm an amateur programmer (...)

Amateur vs professional and novice vs expert are completely separate things.

You can be professional novice just as you can be expert amateur.

Now, the answer to your question is an obvious "NO". To be an expert you have to be a novice first.

The problem rather is "Are you making progress towards being an expert or are you just learning to more efficiently execute your novice workflow?"

> The way I see it, that happens because programming is still way more complex than it should be - and copilot will help with that.

No, it is just an illusion of help.

Just as your son may thank you for help when you give him an answer to his homework. From his point of view you have helped him, true, but from another point of view the point of the task wasn't to deliver answer to the teacher, it was to imprint something valuable on the mind of the child.

saurik 4 years ago | | |

I think the argument is that an amateur with copilot is going to stay an amateur longer than someone without copilot while simultaneously only helping them create something no one--including them--should rely on: it teaches the wrong habits and helps with the wrong problem.

macksd 4 years ago | | |

I sympathize with where you're coming from, but the phrase "unimportant things" bothers me. I'm always seeing clients deploy alpha or beta software in production. I see tech companies accumulating tech debt like nobodies business. None of that should happen. And often disasters involving tech get traced back to a cascading failure that started with something considered unimportant.

I love that software is an accessible discipline to hobbyists and that it empowers people. But it needs to be a discipline, top to bottom. We need deep understanding with security and robustness as fundamentals, good practices, and all of that baked into our tools.

lhorie 4 years ago | | |

From my experience with tutoring, I would say that it won't help. The best way to learn is to mechanically do the work. Having things fed to you yield poorer results, IME.

Another parallel: language learning. You learn more by speaking and writing than merely reading and listening, because the former actually requires you to actively associate grammar rules to your physical actions, whereas consumption has a lower bar of effort since you can infer things from context, gloss over things, etc.

chakkepolja 4 years ago | | |

This assuming most people using copilot are going to properly read generated code.

LeifCarrotson 4 years ago | |

> I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.

Sometimes you don't need an expert to produce highly secure, highly optimized code.

Have you seen the crap that people buy at Walmart? The furniture is not heirloom furniture, the food is not a 3-star artisanal experience. Have you bought tools at Harbor Freight? They're not the lifetime companion of a tradesman, kept in wood boxes and wrapped in cosmoline after each use. But an awful lot of work gets done with them, common homeowner wisdom is if you need a tool, buy it at Harbor Freight, if you use it enough to wear it out spend 10x to buy a really good one, but most tools you'll only use once or twice.

At workplaces across the country right this minute there are human beings doing rote transcription from one application to another, copy-pasting if they're lucky. That's a waste of effort and intellectual potential, and a hodgepodge of Excel equations or a crappy bit of Copilot glue code could be just the ticket. Yes, if those become the business' secret sauce and sold to customers on the Internet, they ought to put some effort into doing it properly, but there's a ton of work that could be accomplished with low-quality code.

Barrin92 4 years ago | | |

>Have you seen the crap that people buy at Walmart? The furniture is not heirloom furniture,

the difference is that your sofa isn't programmable and networked into every other appliance in your house underpinned by a general purpose computer rife for abuse.

Virtually every piece of software you install is an access point to your machine or your sensitive data. One isolated thing in the analog world breaks down, not a problem. One misconfigured password in a VPN client, and whoops part of your national oil infrastructure goes offline

https://www.reuters.com/business/colonial-pipeline-ceo-tells...

amw-zero 4 years ago | | |

> Sometimes you don't need an expert to produce highly secure, highly optimized code.

This is one for the ages.

908B64B197 4 years ago | |

> I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.

I can't wait until we start seeing Copilot Natives devs, who had it enabled from the moment they first opened VSCode at their "become an engineer in 3 months" bootcamp.

> To management (especially with wrong incentives) this seems like a perfect worker, because management usually doesn't understand the connection between lack of engagement and planning at design/development time with their later problems (or they don't feel it is them that is going to pay the price).

That's something I really want my competitors to do. Honestly it makes finding stocks to short much easier (or poaching talent...)

metb 4 years ago | |

>To management (especially with wrong incentives) this seems like a perfect worker, because management usually doesn't understand the connection between lack of engagement and planning at design/development time with their later problems (or they don't feel it is them that is going to pay the price).

This is how management is in most places I feel, especially when it comes to evaluating junior, and early senior engineers.

bsenftner 4 years ago | |

I too feel this is the wrong direction, from the fundamental aspect that trained algorithms contain no comprehension of what they are doing. They are the classic idiot savant. In a technological economy, comprehension of the environment is everything. I do not see how comprehension can be achieved without the elusive General AI, so I do not see this as anything other than a new area of research exposing how vitally important it is to have comprehension.

rented_mule 4 years ago | |

For context, I'm a very experienced software engineer (I shipped products before most of my coworkers were born) and I've been using Copilot for 6-8 weeks while creating a challenging (and therefore fun!) new system.

> This is mostly going to help people with already poor understanding of what they are doing create even more crap.

I can see how people who haven't used it at length might come to that conclusion, but my experience with it calls the "mostly" part into question. I'm sure there will be cases of that. But as someone who deeply understands my craft, I'm finding significant benefits.

> What it does not help is the important parts of development -- defining domain of your problem, design good APIs and abstractions, understanding how everything works and fits together, understanding what your client needs, etc.

Quite the contrary! The last time a new tool helped me with those parts as much was when I moved from C++ to Python in 1997. What I experienced in my C++ -> Python transition was that an enormous chunk of my brainpower could shift from language gymnastics to the problem domain. Copilot gives me a similar feeling. It frequently suggests exactly the 1-3 lines of code I was about to type and saves me 30-60 seconds (easily 20 minutes in a full day of coding). Much better than that, it lets my focus stay on better abstractions, APIs, etc.

> Also, I feel this is going to help increase complexity by making more copies of same structures throughout the codebase.

We, as engineers, are still responsible for what we produce. Any tool needs to be used with critical thought. Of course there will be those who don't think enough. And it might even make them look better in the short term. But that will be exposed in the medium to long term - `git blame` will point to them as the authors of problematic code and not Copilot. When such problems arise (or even better, before they arise), some of us who are more experienced need to step up and mentor less experienced folks so that they develop good habits.

A small sample of areas it's helping me...

When I decide that I want to use different representations internally and externally for some data in a class, I initialize the internal member variables. Part way through typing Python's `@property` decorator, it's suggesting the name of the property and exactly how to use the member variables to generate the external representation I want. Over half the time, it's exactly what I was about to type. Maybe a quarter of the time it's not and I just don't accept the suggestion (or do a quick edit). And 5-10% of the time it suggests an approach that is better than what I was thinking. And that's in a very simple use case.

In other scenarios, it often sets up my loops just as I want them. Sometimes it picks column major when I want row major. I just keep typing and as soon as it's clear I want row major, it's suggesting that. Again, occasionally it surprises me with something better - if I just use that one function I rarely have a need for, the inner loop melts away. Why didn't I think of that? Well, now "I" did. The code I'm producing with Copilot is better than the code I would have written without because I'm thinking as I use it.

Where it really saves me time / focus is when I have some tricky calculation or API call that isn't hard, but there's a bunch of little details to get right. One I did yesterday... lookup a value in a dict, but the key needs to be mapped through another dict. Between the original key, the two dicts, and the variable receiving the result there are four variable names, plus one more for the mapped key (to spread it across two statements for readability). Before typing anything, I paused for a second to get the names straight in my head. Before I finished my thought, it suggested the lines, I looked at it for a second to make sure it was right, laughed because it was, and hit tab. It wasn't a hard task, but it helped me stay focused on the bigger picture.

Most of the time this doesn't feel at all like boilerplate. It's picking up my variable names and properly using the data structures I setup in other parts of the code. There's a big misconception that it's just pasting snippets in. It feels very different from that in real usage. Also, it rewards good naming habits. In the example above, how did it know I wanted to map the key through that dict? `key_mapping` was in the variable name. Easier for others to read later and for Copilot to read now.

The system I'm building is definitely better designed because of Copilot. Not because Copilot did any of the design, but because it freed me up to focus on the design more. It will have downsides, but in experienced hands it can be a great tool. I'm not affiliated with Microsoft / Github / OpenAI in any way. I'm just doing better work because I'm using it and doing better work makes me feel good. When the time comes, I'll pay for Copilot out of my own pocket if my company doesn't pay for it.

shireboy 4 years ago |

…Compared to 60% of circumstances in the meat-based developer control group? :)

dimitrios1 4 years ago | |

I love that we always use the average here for these justifications. We just slowly chip away and any and all excellence. 10x memes aside, we all know what it's like to work with a truly talented and productive engineer versus your everyday schmoe collecting a paycheck. It's a story as old as time, and yet here we are doing the exact big factory industrialization techniques other industries have done and that is commoditize the thing that made them exceptional and eliminating artisanship, uniqueness, and ultimately quality and character.

It's a tragedy of the commons of a sort.

lhorie 4 years ago | | |

Wasn't there a thread here just yesterday about how 6% of some class of AI outperformed a human, but then it turned out that 0% outperformed two humans? That's also literally the lesson Uber learned the hard way when a SDV ran over a person (that zero humans is worse than one, and one is worse than two). This is also the principle behind code review, peer review, QA, middle management bureaucracy, and a whole lot of other things.

The tragedy, IMHO, is that AI models like this encourage centralizing decision making into a single black box (to the extent that external research then benefits the owner of the AI model rather than advancing public commons), whereas in pretty much every other aspect of life, we consider decentralization/redundancy of autonomy to be the solution to robustness problems.

spywaregorilla 4 years ago | | |

I disagree. 40% is not great, but unlike the masses of developers, this is a single system that can improve over time. Further, a system that can do most of the work but requires a security specialist to polish it is still a useful tool. What's important to recognize is that this is not a terribly novel concept. Unsecure code is written every day.

phreeza 4 years ago | | |

It may be a tragedy, but I fail to see why it is a tragedy of the commons? Which resource that is a available to all is being overused? High-paying dev jobs? Those are not a commons in the sense that tragedy of the commons implies because lower-quality devs don't stand to benefit by only taking a smaller part of the job.

spywaregorilla 4 years ago | |

and of the population that is likely to use copilot in production for their own work? 90%?

lupire 4 years ago | | |

These are made up numbers. A control group is needed.

toastal 4 years ago |

You are the free labor copilot to train Microsoft GitHub's Copilot tool. You are responsible for any of those insecure code errors and the diligence require. You will be on the hook for resulting problems. But Microsoft and their home-phoning, tracking-embedded editor will get real people to correct and train their machine for free—with their stated plan of later selling that machine back to us later.

I wish there were a “robots.txt” file for Git to disallow certain bots from training on anything I have written.

gnrlst 4 years ago |

I've experienced this first hand: the autosuggest is scarily accurate and insidious at the same time. On numerous occasions I've auto-filled a 10-15 line suggestion that looked like it was exactly what I wanted to do, but made a very critical mistake (e.g. in a For loop, referencing the wrong array despite calling it the right name). Not really security related stuff, but head scratchers that make it harder to debug since I didn't actually write the code.

mbrevda1 4 years ago |

For comparison, what percentage of human-generated code is secure?

iainmerrick 4 years ago | |

It seems reasonable to want Copilot to help you produce code of a reasonable quality.

If it’s just helping you crank out the same bad code more quickly, without learning anything in the process, that’s useful to know. Some people might still want a tool like that, I wouldn’t.

burnished 4 years ago | | |

Sure. But in order to know if its 'of reasonable quality' you need some sort of baseline to compare it to. What is reasonable quality? I think what your average human does is probably reasonable.

Like, if your average dev will produce insecure code in 80% of samples, then Copilot starts to look really good! But if its closer to 0.01% of code samples, then copilot looks more like an intriguing novelty, not to be brought too near serious work. Much like dippin dots in this regard.

eddieroger 4 years ago | |

That's basically where my gut went when I read the headline - so is that of a junior engineer, or really any engineer who hasn't had to think about it, and we don't promote their code directly to prod, either (if we can avoid it).

Copilot shouldn't be able to generate code destined for prod without review any more than should any line of code written by a human.

westurner 4 years ago | |

> For comparison, what percentage of human-generated code is secure?

Yeah how did they measure? Did static and dynamic analysis find design bugs too?

Maybe - as part of a Copilot-assisted DevSecOps workflow involving static and dynamic analysis run by GitHub Actions CI - create Issues with CWE "Common Weakness Enumeration" URLs from e.g. the CWE Top 25 in order to train the team, and Pull Requests to fix each issue?: https://cwe.mitre.org/top25/

Which bots send PRs?

moretti 4 years ago |

I use Copilot mostly as replacement for intellisense and macros. It helps me automating repetitive tasks. I would never trust Copilot for an algorithm or a snippet, I mean I would treat the code just like anything taken from StackOverflow or Github.

wcarss 4 years ago |

I couldn't find a link to the actual study anywhere in the article: https://arxiv.org/abs/2108.09293

adamsvystun 4 years ago |

It is important to remember that Copilot can improve. 40% is not a bad baseline, but one data point does not give us much info, we should wait and see the rate of improvement.

lampe3 4 years ago |

I'm using copilot now for some time and yeah it's more a toy than real help right now.

The only time it really helped when I needed to create a named list of char codes.

When it comes to more complex code than checking the code of copilot takes the same time as writing it. 90% of the time I needed to correct copilot.

For me, tools like linters are way more helpful then. If I could only use ESLint or copilot, I would go 100% of the time with ESLint.

harlekein 4 years ago | |

I think another risk with getting Copilot to start out, is that it might nudge you into a direction you wouldn't have gone into otherwise.

Whether that is better or not, I suppose, it depends.

lampe3 4 years ago | | |

I'm working on a HTML Tokenizer in Deno/Typescript.

Copilot only helps with boilerplate code which could be handled by good intellisense.

When it tries to generate a function from the function name it fails so hard that it is more in your way then helpful.

dexen 4 years ago |

Half joking:

so far GitHub Copilot is more feasible as tool for humans doing code-coverage for its input code, "given enough eyeballs, all bugs are shallow" style. When a developer goes, "huh, Copilot generated insecure code, better report it to the original project it learned it from" - if only Copilot was able to link to the original project, it would all be great and useful.

0-_-0 4 years ago |

I fail to see how this is particularly useful information about Copilot. The comparison should be:

1. How many times do people write insecure code when not using Copilot?

2. How many times do people write insecure code when using Copilot?

nextlevelwizard 4 years ago | |

It is useful since it means copilot is not taking your job any time soon. i.e. if 40% of the time the human driving the thing is needed to intervene and prevent obvious security flaws then expert is still needed to use the tool.

0-_-0 4 years ago | | |

I think it was obvious from the beginning that it's trained on GitHub code, so it would be surprising if it was better than the average code on GitHub.

In any case, if Copilot can generate code as well as the average programmer without supervision, that means it can already take the job of 50% of programmers. A more useful metric though is how many programmers can a person using Copilot replace by having greater productivity?

Also, in how many programming jobs does security matter? In my job for example it doesn't matter at all.

rcarmo 4 years ago |

As many people have pointed out indirectly, this is almost certainly caused by the training set. Without a bias or ranking for quality, it will just churn out the “best fit” or most popular snippets…

whazor 4 years ago |

Happily having access to GitHub copilot, it very often generates the code that I want. So it saves me from typing and also often saves checking Stack Overflow. I think the libraries/packages you use also play a big influence in how easy it is for copilot to create security flaws. Still, more training against security holes would be appreciated.

Animats 4 years ago |

Well, of course. GPT-3 has no underlying model of meaning. It's just autocomplete with a bigger data set. Used on natural language, it produces text that looks reasonable for about three paragraphs. Then you realize it's just blithering and has nothing to communicate. (Like too many bloggers, but that's another issue.)

wccrawford 4 years ago |

I'm actually impressed with that. There's so much insecure code out there that I'd have expected it to generate insecure code most of the time.

I'd still not use it. But it's an impressive trick.

bottled_poe 4 years ago |

What’s the baseline? 60% may still be superior to the average implementation.

queuebert 4 years ago |

This is exactly what an AGI would do if it wanted to pwn all our systems.

cannabis_sam 4 years ago |

Of course it did, why would github copilot “care about” security, unless the majority of code on github cared about security?

arvindamirtaa 4 years ago |

Unit tests with TONS of assertions, cleaned data from form to ORM object, stuff that look look like you're just through a list and doing the same thing over and over. For these, Copilot is great. I wouldn't trust it to do anything else though.

Nothing more. Nothing less.

COMMENT___ 4 years ago |

It's painful to see that GitHub Copilot is called "AI". For god's sake, it is not AI. It's just an advanced auto-complete for coders. GPT-3 is close to AI, GitHub Copilot is not.

Jesus Christ, please make them stop. Stop using AI as a buzzword.

arthur2e5 4 years ago | |

GitHub Copilot is literally GPT adapted for code. The paper on OpenAI Codex, the stuff powering Copilot, makes it very clear in the abstract. https://arxiv.org/abs/2107.03374

Either you call both AI or you call neither AI.

(A previous version of the comment stated that it was tuned from GPT-3. This is incorrect; the simpler GPT was used for faster convergence.)

COMMENT___ 4 years ago | | |

If I have to choose, then I would decide not to call them AI at all.

sprafa 4 years ago | |

Is it fair to call it an ML based system?

COMMENT___ 4 years ago | | |

Why not? GitHub Copilot was trained with ML to autocomplete your code. But it is not AI.

eurasiantiger 4 years ago |

I wonder if the Copilot model could somehow be repurposed to analyze the quality of a developer’s code. Seeing how Microsoft owns both GitHub and LinkedIn, it’s a good bet this is something they’re actively researching.

amw-zero 4 years ago |

If it's trained on code that we write, that sounds completely accurate.

Vaslo 4 years ago |

It's learning from existing code, right? Doesn't this say something about developers in general, or is the thought that it uses combinations of code that are insecure?

harlekein 4 years ago | |

I don't hold the average developer in very high regard. There are tons of developers who are much better than me and I readily read their books, follow their tweets, blog posts and online talks to learn from them. I hold them in high regard, but these people are not the average developer.

If you would pick any smaller company with a dev team, a freelancer or an agency, your chances of finding a developer who understands and upholds quality code is vastly reduced.

Not to mention a lot of beginners will just push their practice projects to GitHub and never look at it again. I'm also guilty of this, but I never realized Microsoft was training AI with this code. If Copilot is learning from these projects then I'd say the code it regurgitates is not average, but even below average.

softwaredoug 4 years ago |

I will say I’m not looking forward to writing some mundane code today.

It’s interacting with GCS to scan a bucket for an extension, load the data with pandas, and concat some dataframes. It’s something dumb but mildly finicky that’s going to eat up so much time I could be using for higher value work.

Copilot would be very welcome as I do this, instead of annoyingly going off to Google 3 different python libraries and getting it all to work nicely together.

gfiorav 4 years ago |

I think this should be pretty much expected. I'm unfamiliar with how this network is trained, but I'm pretty sure the data ranking is not perfect.

I'm guessing the ranking features are based on the repo stats, contributor stats, etc. Even "good" contributors will make rookie mistakes in certain areas.

Interesting to imagine how GH will try to solve this issue.

jpalomaki 4 years ago | |

It might be possible to learn from the change history of the projects. There's likely quite many commits which fix certain security issues, such as SQL injection problems. Maybe even with suitable metadata in the issues or commit messages.

mzs 4 years ago |

the actual paper:

https://arxiv.org/abs/2108.09293

previous discussion including comments from lead author:

https://news.ycombinator.com/item?id=28279365

RhysU 4 years ago |