https://arxiv.org/abs/2108.09293
previous discussion including comments from lead author:
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion
https://deepai.org/publication/you-autocomplete-me-poisoning...
https://edition.cnn.com/2020/09/27/tech/elon-musk-tesla-bill...
It's so transformative that people may allow it to circumvent licenses.
Security starts with deep understanding.
Some standards and practices can help avoid some types of problems, and some are even rather effective (like airgapping your systems), but there isn't any way to assure security in general other than truly understand what you are doing.
**
I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.
For a good developer those low level, low engagement activities are not a problem (except maybe for learning stage where you actually want people engaged rather than copy/paste). What it does not help is the important parts of development -- defining domain of your problem, design good APIs and abstractions, understanding how everything works and fits together, understanding what your client needs, etc.
Also, I feel this is going to help increase complexity by making more copies of same structures throughout the codebase.
My working theory about this is this is going to hinder new developers even more than they already are by google and stack*. Every time you are giving new developers an easier way to copy paste code without understanding you are robbing them an opportunity to gain deeper understanding of what they are doing and in effect prevent them from learning and growing.
It is a little bit like giving answers to your kids homework without giving them chance to arrive at the answer or explaining anything about it.
**
Another way I feel this is going to hurt developers is competition in who can produce most volume of code.
I have already noticed this trend where developers (especially more junior but aspiring to advance) try to outcompete others by producing more code, close more tickets, etc. Right now it means skipping understanding of what is going on in favor of getting easy answers from the Internet.
These guys can produce huge amounts of code with relatively little actual engagement.
To management (especially with wrong incentives) this seems like a perfect worker, because management usually doesn't understand the connection between lack of engagement and planning at design/development time with their later problems (or they don't feel it is them that is going to pay the price).
The Copilot is probably going to make it even more difficult for people who want to do it the right way because even starker difference in false productivity measurements.
I've seen so much boilerplate in the Java or classic .NET Framework world, it's incredible. So many layers of DTOs, Request/Response Models and so on, that could be just generated. Or most of the time even removed completely (that would cost some "architects" their job though).
This is also true for a lot of Redux or Angular/NgRx applications. So much boilerplate, that you can't find the relevant code anymore.
Java is not the culprit here.
I think it is something that happened on the way that has something to do with J2EE and patterns craze we had a decade ago or two ago.
It doesn't help that frameworks like Spring and their documentation go out of their way to propagate these boilerplate-heavy patters.
Copying these lazy patterns is shortest, easiest way to get to working solution for a person that doesn't want to put any extra effort. And you can't get punished for doing this. Most developers don't even know there exist any other possibilities than mandatory controller calling service calling database layer and hordes of DTOs some people call "model".
The evil is that someone trained an AI on random text , not even with some AST, so you have garbage in so no surprise you get garbage out.
A true AI would understand that "the dev wants trough find all lines of text in a file that have this property", the AI just does "this code string is similar to this other code string using this `black box metric`"
I see more and more juniors pasting code or shellcommands from StackOverflow with careless ease, without even pretending anymore that they're interested in how it actually works.
A store in Vue 3 can basically be:
export default { state: readonly(state), ...setterFunctions }
It doesn't get more easy to read and streamlined than that.I still doubt that that's a result of DTOs.
I wonder if the way we are approaching it is wrong. We are basically putting text though a deep learning black box. The model might have learned some abstractions, but all in all it is just playing word games and trying to guess the most likely continuation of a string. Maybe we should go into the other direction and base such an AI on a really massive ontology. Instead of unstructured strings, put highly structured facts into the model.
For example, just like in Copilot you'd start with:
def login_user(username, password):
But the ontology would also know things like:- This is a web application and this function is going to be called after submitting a form
- Security specialist Bob says you should always hash your passwords
- Specialist Anne says you should use bcrypt
- Tom says Anne is 95% trustworthy
... and thousands of facts more. And then it would take them all into consideration, build a represenation of the problem you are trying to solve, find a strategy, and only in the end generate code.
I have a feeling that there was a qualitiative leap going from simple neural networks and multivariate methods to "deep learning" and modern machine learning, and that this is mainly driven by scale and available computing power. Now what if we try the same thing for ontologies, expert systems, and triple store databases? I think the difference will be between some AI parroting what it read on Wikipedia (direct speach), and a smarter AI being able to reason about what it read on Wikipedia (indirect speach).
There are already services that do this for you and I actually find them useful. For example, I might be trying to use a function from some library and it fails. If I get pointed to some public repositories that use the same library in function for similar purpose, I may learn that I am missing some critical setup. I can also browse different uses of this function/library and get informed on how it is at the very least used successfully by others.
https://en.wikipedia.org/wiki/Cyc
Supposedly an attempt to assemble a database of "common sense" facts and reasoning.
It has always been controversial and it's not clear what kind of success it's had.
https://en.wikipedia.org/wiki/Neats_and_scruffies
From the "Scruffy" side, there's Charles Rich's classic work on "Programmer's Apprentice".
https://dspace.mit.edu/handle/1721.1/6054
https://dspace.mit.edu/bitstream/handle/1721.1/6054/AIM-1004...
>The Programmer's Apprentice Project: A Research Overview
>MIT AI Lab Memo No. 1004, November 1987.
>Rich, Charles; Waters, Richard C.
>Abstract: The goal of the Programmer's Apprentice project is to develop a theory of how expert programmers analyze, synthesize, modify, explain, specify, verify, and document programs. This research goal overlaps both artificial intelligence and software engineering. From the viewpoint of artificial intelligence, we have chosen programming as a domain in which to study fundamental issues of knowledge representation and reasoning. From the viewpoint of software engineering, we seek to automate the programming process by applying techniques from artificial intelligence.
https://dspace.mit.edu/handle/1721.1/41967
https://dspace.mit.edu/bitstream/handle/1721.1/41967/AI_WP_1...
>Plan Recognition in a Programmer's Apprentice. Ph.D. Thesis proposal.
>MIT AI Lab Working Paper 147, May 1977.
>Rich, Charles
>Abstract: Brief Statement of the Problem: Stated most generally, the proposed research is concerned with understanding and representing the teleological structure of engineered devices. More specifically, I propose to study the teleological structure of computer programs written in LISP which perform a wide range of non-numerical computations. The major theoretical goal of the research is to further develop a formal representation for teleological structure, called plans, which will facilitate both the abstract description of particular programs, and the compilation of a library of programming expertise in the domain of non-numerical computation. Adequacy of the theory will be demonstrated by implementing a system (to eventually become part of a LISP Programmer's Apprentice) which will be able to recognize various plans in LISP programs written by human programmers and thereby generate cogent explanations of how the programs work, including the detection of some programming errors.
Copilot doesn't bypass peer review, code review, unit testing so on and so forth.
Amateur vs professional and novice vs expert are completely separate things.
You can be professional novice just as you can be expert amateur.
Now, the answer to your question is an obvious "NO". To be an expert you have to be a novice first.
The problem rather is "Are you making progress towards being an expert or are you just learning to more efficiently execute your novice workflow?"
> The way I see it, that happens because programming is still way more complex than it should be - and copilot will help with that.
No, it is just an illusion of help.
Just as your son may thank you for help when you give him an answer to his homework. From his point of view you have helped him, true, but from another point of view the point of the task wasn't to deliver answer to the teacher, it was to imprint something valuable on the mind of the child.
I love that software is an accessible discipline to hobbyists and that it empowers people. But it needs to be a discipline, top to bottom. We need deep understanding with security and robustness as fundamentals, good practices, and all of that baked into our tools.
Another parallel: language learning. You learn more by speaking and writing than merely reading and listening, because the former actually requires you to actively associate grammar rules to your physical actions, whereas consumption has a lower bar of effort since you can infer things from context, gloss over things, etc.
Sometimes you don't need an expert to produce highly secure, highly optimized code.
Have you seen the crap that people buy at Walmart? The furniture is not heirloom furniture, the food is not a 3-star artisanal experience. Have you bought tools at Harbor Freight? They're not the lifetime companion of a tradesman, kept in wood boxes and wrapped in cosmoline after each use. But an awful lot of work gets done with them, common homeowner wisdom is if you need a tool, buy it at Harbor Freight, if you use it enough to wear it out spend 10x to buy a really good one, but most tools you'll only use once or twice.
At workplaces across the country right this minute there are human beings doing rote transcription from one application to another, copy-pasting if they're lucky. That's a waste of effort and intellectual potential, and a hodgepodge of Excel equations or a crappy bit of Copilot glue code could be just the ticket. Yes, if those become the business' secret sauce and sold to customers on the Internet, they ought to put some effort into doing it properly, but there's a ton of work that could be accomplished with low-quality code.
the difference is that your sofa isn't programmable and networked into every other appliance in your house underpinned by a general purpose computer rife for abuse.
Virtually every piece of software you install is an access point to your machine or your sensitive data. One isolated thing in the analog world breaks down, not a problem. One misconfigured password in a VPN client, and whoops part of your national oil infrastructure goes offline
https://www.reuters.com/business/colonial-pipeline-ceo-tells...
This is one for the ages.
I can't wait until we start seeing Copilot Natives devs, who had it enabled from the moment they first opened VSCode at their "become an engineer in 3 months" bootcamp.
> To management (especially with wrong incentives) this seems like a perfect worker, because management usually doesn't understand the connection between lack of engagement and planning at design/development time with their later problems (or they don't feel it is them that is going to pay the price).
That's something I really want my competitors to do. Honestly it makes finding stocks to short much easier (or poaching talent...)
This is how management is in most places I feel, especially when it comes to evaluating junior, and early senior engineers.
> This is mostly going to help people with already poor understanding of what they are doing create even more crap.
I can see how people who haven't used it at length might come to that conclusion, but my experience with it calls the "mostly" part into question. I'm sure there will be cases of that. But as someone who deeply understands my craft, I'm finding significant benefits.
> What it does not help is the important parts of development -- defining domain of your problem, design good APIs and abstractions, understanding how everything works and fits together, understanding what your client needs, etc.
Quite the contrary! The last time a new tool helped me with those parts as much was when I moved from C++ to Python in 1997. What I experienced in my C++ -> Python transition was that an enormous chunk of my brainpower could shift from language gymnastics to the problem domain. Copilot gives me a similar feeling. It frequently suggests exactly the 1-3 lines of code I was about to type and saves me 30-60 seconds (easily 20 minutes in a full day of coding). Much better than that, it lets my focus stay on better abstractions, APIs, etc.
> Also, I feel this is going to help increase complexity by making more copies of same structures throughout the codebase.
We, as engineers, are still responsible for what we produce. Any tool needs to be used with critical thought. Of course there will be those who don't think enough. And it might even make them look better in the short term. But that will be exposed in the medium to long term - `git blame` will point to them as the authors of problematic code and not Copilot. When such problems arise (or even better, before they arise), some of us who are more experienced need to step up and mentor less experienced folks so that they develop good habits.
A small sample of areas it's helping me...
When I decide that I want to use different representations internally and externally for some data in a class, I initialize the internal member variables. Part way through typing Python's `@property` decorator, it's suggesting the name of the property and exactly how to use the member variables to generate the external representation I want. Over half the time, it's exactly what I was about to type. Maybe a quarter of the time it's not and I just don't accept the suggestion (or do a quick edit). And 5-10% of the time it suggests an approach that is better than what I was thinking. And that's in a very simple use case.
In other scenarios, it often sets up my loops just as I want them. Sometimes it picks column major when I want row major. I just keep typing and as soon as it's clear I want row major, it's suggesting that. Again, occasionally it surprises me with something better - if I just use that one function I rarely have a need for, the inner loop melts away. Why didn't I think of that? Well, now "I" did. The code I'm producing with Copilot is better than the code I would have written without because I'm thinking as I use it.
Where it really saves me time / focus is when I have some tricky calculation or API call that isn't hard, but there's a bunch of little details to get right. One I did yesterday... lookup a value in a dict, but the key needs to be mapped through another dict. Between the original key, the two dicts, and the variable receiving the result there are four variable names, plus one more for the mapped key (to spread it across two statements for readability). Before typing anything, I paused for a second to get the names straight in my head. Before I finished my thought, it suggested the lines, I looked at it for a second to make sure it was right, laughed because it was, and hit tab. It wasn't a hard task, but it helped me stay focused on the bigger picture.
Most of the time this doesn't feel at all like boilerplate. It's picking up my variable names and properly using the data structures I setup in other parts of the code. There's a big misconception that it's just pasting snippets in. It feels very different from that in real usage. Also, it rewards good naming habits. In the example above, how did it know I wanted to map the key through that dict? `key_mapping` was in the variable name. Easier for others to read later and for Copilot to read now.
The system I'm building is definitely better designed because of Copilot. Not because Copilot did any of the design, but because it freed me up to focus on the design more. It will have downsides, but in experienced hands it can be a great tool. I'm not affiliated with Microsoft / Github / OpenAI in any way. I'm just doing better work because I'm using it and doing better work makes me feel good. When the time comes, I'll pay for Copilot out of my own pocket if my company doesn't pay for it.
It's a tragedy of the commons of a sort.
The tragedy, IMHO, is that AI models like this encourage centralizing decision making into a single black box (to the extent that external research then benefits the owner of the AI model rather than advancing public commons), whereas in pretty much every other aspect of life, we consider decentralization/redundancy of autonomy to be the solution to robustness problems.
I wish there were a “robots.txt” file for Git to disallow certain bots from training on anything I have written.
If it’s just helping you crank out the same bad code more quickly, without learning anything in the process, that’s useful to know. Some people might still want a tool like that, I wouldn’t.
Like, if your average dev will produce insecure code in 80% of samples, then Copilot starts to look really good! But if its closer to 0.01% of code samples, then copilot looks more like an intriguing novelty, not to be brought too near serious work. Much like dippin dots in this regard.
Copilot shouldn't be able to generate code destined for prod without review any more than should any line of code written by a human.
Yeah how did they measure? Did static and dynamic analysis find design bugs too?
Maybe - as part of a Copilot-assisted DevSecOps workflow involving static and dynamic analysis run by GitHub Actions CI - create Issues with CWE "Common Weakness Enumeration" URLs from e.g. the CWE Top 25 in order to train the team, and Pull Requests to fix each issue?: https://cwe.mitre.org/top25/
Which bots send PRs?
The only time it really helped when I needed to create a named list of char codes.
When it comes to more complex code than checking the code of copilot takes the same time as writing it. 90% of the time I needed to correct copilot.
For me, tools like linters are way more helpful then. If I could only use ESLint or copilot, I would go 100% of the time with ESLint.
Whether that is better or not, I suppose, it depends.
Copilot only helps with boilerplate code which could be handled by good intellisense.
When it tries to generate a function from the function name it fails so hard that it is more in your way then helpful.
so far GitHub Copilot is more feasible as tool for humans doing code-coverage for its input code, "given enough eyeballs, all bugs are shallow" style. When a developer goes, "huh, Copilot generated insecure code, better report it to the original project it learned it from" - if only Copilot was able to link to the original project, it would all be great and useful.
1. How many times do people write insecure code when not using Copilot?
2. How many times do people write insecure code when using Copilot?
In any case, if Copilot can generate code as well as the average programmer without supervision, that means it can already take the job of 50% of programmers. A more useful metric though is how many programmers can a person using Copilot replace by having greater productivity?
Also, in how many programming jobs does security matter? In my job for example it doesn't matter at all.
I'd still not use it. But it's an impressive trick.
Nothing more. Nothing less.
Jesus Christ, please make them stop. Stop using AI as a buzzword.
Either you call both AI or you call neither AI.
(A previous version of the comment stated that it was tuned from GPT-3. This is incorrect; the simpler GPT was used for faster convergence.)
If you would pick any smaller company with a dev team, a freelancer or an agency, your chances of finding a developer who understands and upholds quality code is vastly reduced.
Not to mention a lot of beginners will just push their practice projects to GitHub and never look at it again. I'm also guilty of this, but I never realized Microsoft was training AI with this code. If Copilot is learning from these projects then I'd say the code it regurgitates is not average, but even below average.
It’s interacting with GCS to scan a bucket for an extension, load the data with pandas, and concat some dataframes. It’s something dumb but mildly finicky that’s going to eat up so much time I could be using for higher value work.
Copilot would be very welcome as I do this, instead of annoyingly going off to Google 3 different python libraries and getting it all to work nicely together.
I'm guessing the ranking features are based on the repo stats, contributor stats, etc. Even "good" contributors will make rookie mistakes in certain areas.
Interesting to imagine how GH will try to solve this issue.
It’s simple. If you are concerned by this, don’t host your repositories on GitHub.
But as long as you give the public access to your code, they can study it and learn from it. Humans and machines.
If Copilot were to reproduce a larger part of, say, an MIT-licensed codebase or almost any other permissive licence, then they should legally provide attribution. I'm pretty sure that they don't even have an option to provide such specific attribution, which means that either they believe that the code copied from any one source is below the relevant threshold or they're just ignoring copyright.
Although judging from the results of this test it kind of seems like for a lot of accounts that's already happened.
Another way: copilot is a crutch. It may help you move about but if you get too comfortable with it you are never going to learn to run.
Similarly, if everyone seeks to "dumb down" programming, you end up with a large pool of "dumbed down" programmers, which is counterproductive precisely because AI is imperfect and you need a higher level of expertise to compensate for its shortcomings. As Kernighan famously said: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." Similarly, if one lets the AI do the thinking in their stead, what hope do they have of being able to debug it?
Ironically, though, programming already suffers from this exact problem in a very fundamental way: every tool exists to make a programmer's life easier, and consequently there are a lot of glue-code programmers. The few that actually impact the industry meaningfully (e.g. most notable software comes out of Bay Area) are very expensive because the supply of experts is limited.
In my heart I feel similarly to GP - and it does feel a lot like how I feel about tragedy of the commons situations. Maybe there seems to be a shared opportunity for everyone if these private companies would make the most of their financial capital, market dominance, dominance in human resources, and most especially leverage their network effects.
That would lead to better things for everyone, like the invention of smartphones. But the same corporations can also waste unimaginable resources and achieve very little. Often their failures don't just have little effect, but rather the failures choke/smother the market and prevent better alternatives from being widely used.
Anything that connects to any server or has any sort of networking.
Anything that needs privileges (like installing drivers) needs security.
Anything that reads/interpeds any data given to it needs to have security.
Of course this all depends what you mean by "matter". I can't come up with a program that doesn't need to think about security at all expect something trivial like hello world.
> to produce highly secure, highly optimized code.
The key word there is "produce" - meaning, the secure and optimized code is being written.
Funny that I would describe a solution based on machine learning as scruffy and a solution based on bayesian logic and knowledge databases as neat, whereas Wikipedia defines it the other way around.
https://en.wikipedia.org/wiki/Talk:Neats_and_scruffies
>Roger Schank first used those terms "scruffy" and "neat" at an AI conference in the 1970s. He proudly called himself a scruffy. 71.183.59.144 (talk) 02:17, 26 October 2011 (UTC)
>The terminology is sourced to the late 1970s or early 1980s and originated by Schenk according to this:
>"In particular, certain personality traits go hand and hand with certain styles of research. Schank and Abelson hit upon one such phenomenon along these lines and dubbed it the neats vs. the scruffies. These terms moved into the mainstream AI community during the early 80s, shortly after Abelson presented the phenomenon in a keynote address at the Annual Meeting of the Cognitive Science Society in 1981. Here are some selected excerpts from the accompanying paper in the proceedings:"
>The article quotes a lengthy excerpt of this keynote address, some of which I include below
>“The study of the knowledge in a mental system tends toward both naturalism and phenomenology. The mind needs to represent what is out there in the real word, and it needs to manipulate it for particular purposes. But the world is messy, and purposes are manifold. Models of mind, therefore, can become garrulous and intractable as they become more and more realistic. If one’s emphasis is on science more than on cognition, however, the canons of hard science dictate a strategy of the isolation of idealized subsystems which can be modeled with elegant productive formalisms. Clarity and precision are highly prized, even at the expense of common sense realism. To caricature this tendency with a phrase from John Tukey (1969), the motto of the narrow hard scientist is, “Be exactly wrong, rather than approximately right”.
>The one tendency points inside the mind, to see what might be there. The other points outside the mind, to some formal system which can be logically manipulated [Kintsch et al., 1981]. Neither camp grants the other a legitimate claim on cognitive science.... an unnamed but easily guessed colleague of mine (Schenk?), who claims that the major clashes in human affairs are between the “neats” and the “scruffies”. The primary concern of the neat is that things should be orderly and predictable while the scruffy seeks the rough-and-tumble of life as it comes ... The fusion task is not easy. It is hard to neaten up a scruffy or scruffy up a neat. It is difficult to formalize aspects of human thought that which are variable, disorderly, and seemingly irrational, or to build tightly principled models of realistic language processing in messy natural domains.
>What are the difficulties in starting our from the scruffy side and moving toward the neat? The obvious advantage is that one has the option of letting the problem areas itself, rather than the available methodology, guide us about what is important. The obstacle, of course, is that we may not know how to attack the important problems. More likely, we may think we know how to proceed, but other people may find our methods sloppy. We may have to face accusations of being ad hoc, and scientifically unprincipled, and other awful things."
>Source is Chapter 5 of this book edited by Schenk and published in 1994, titled "Beliefs, Reasoning, and Decision Making: Psycho-logic in Honor of Bob Abelson". Article needs clean-up, which I am doing now.--FeralOink (talk) 13:58, 2 August 2021 (UTC)
https://books.google.com/books/about/Beliefs_Reasoning_and_D...
>How is machine learning neat?
>Machine learning is only provably correct for the known examples it was trained for. If that is not an adhoc approach to AI, then I don't know what is. Big data is the epitome of a scruffy. No model, just data, not formalism, besides fitting a curve/model to the given data. It is the exact same approach that scruffies follow: abstracting from examples for specific sub tasks.<unsigned>
>Just because some mathematical methods are employed, like optimization for a sub-problem, i.e. curve fitting, does not make the approach itself neat.
>Obviously, scruffies also use mathematically rigorous approaches, when employing provably correct algorithms, such as searching trees, or certain signal processing approaches.
>So far, the only valid "neats", are those doing GOFAI: they use a minimal model and deduce everything based on it, with no added assumptions or axioms along the way.
>Machine learning is only based on added assumptions/axioms: the training data. New for each problem, no general model.<unsigned>
>Yeah, I noticed that too. Not sure who introduced machine learning to the article. I'm trying to clean up, e.g. removing the jargon about scruffies just being casual hackers throwing stuff together in an ad hoc manner. I don't know enough about the people involved though. I know about the methods you mention (curve fitting, converging series, mathematical modeling) but not necessarily who did what. I don't even know whether most of these guys, the neats OR the scruffies, would be comfortable with "big data" (i.e. lots of specious results with very low cost of being wrong).--FeralOink (talk) 10:48, 3 August 2021 (UTC)
I understand your point about learning and getting better at it. All I'm saying is most of programmers won't become experts: the market doesn't demand that, and most just aren't able or don't want to.
No-code will make a huge impact in next decade imo.
In my specific case, I would be able to become an expert programmer but I don't intend to because I have other carreer choices. So I think copilot would be of great help.
For amateurs, the homework is a great analogy - they don't need a lesson, they need a calculator so they can get back to the professional work they are doing.
Here, the parts you have missed reading the post:
> Do you think only _experts_ should be programming?
> I've _hired around 30 different programmers_ in my life
I’d say the exact opposite. Unlike this algorithm, developers can continue to learn. There will likely be future algorithms that are improvements, but this isn’t that.
It would be also the end of GitHub, as most users probably won’t accept such terms.
It is much better to generate code on the fly during build, so it doesn’t even go to source control and people can’t modify it.
I hope we are talking about same things, like you dislike creating classes and function explicitly and prefer adding a comment or some templating system that generates a ton of obscure code behind the scenes.
A good example for generated code are typed clients for an OpenAPI interface. Instead of writing a REST client on your own based on a spec, you generate it. And if something isn't right in the first place, don't edit the generated code, fix/configure the generator instead!
Or database models. Either generate the database from the models or the models from the database.
So it's better for code with little to no modification, while boilerplate / template are better for things that will be modified.
That's exactly the problem. and because templates allow you to modify the code, the template creators do no work on generalizing it so it covers all the use cases. So, on practice, templates usually require that you modify it.
And now you have a huge codebase, mostly with the default text, but with some changes at random, and one of those changes is breaking it. Good luck finding it.
Sigh... every time I see people generating any code (say through external tool or through IDE) I am always thinking how this could be a simple macro in any Lisp language.
So if you need a lot of template code, then your design or your framework is not well suited for the task.
In software development the goal is usually to move commonly used functions or patterns into a library or a framework, instead of copy&pasting them with slight modifications.
Sure, other good example is for example in Qt, the Designer tool will create some XML that shows the widget you placed properties, then a tool will generate code that is easy to read(not obfuscated or clever).
Can you give some examples on what kind of bad code generation/boilerplate you mean when you think at Java/C# ?
My impression (NOT A LAWYER) is that by hosting your code in a public repo on GitHub, you agree to their terms and give them the right to "read" your code including training AI models on it. Or at least that's what they're banking on.
Go host on Sourcehut or self-host with Gitea, and I would think it unlikely (but not impossible) that any big company would use your code to train their AI.
Just imagine, there's really nothing preventing people from scraping your blog to train their natural language processing AI or whatever, why would code be any different? Even if you put up a big sign saying you don't consent to having your data ingested by a neural network, I doubt it will get noticed anyway...
People have been taking large OSS codebases (eg. Linux kernel) for various statistical analyses. AI is just doing the same thing in a more sophisticated manner.
https://twitter.com/NoraDotCodes/status/1412741339771461635
There's also other references that GitHub public repos weren't the only source. They trawl other publicly readable code.
We need to jam through every change through 10 layers now, because of "clean architecture". The team is very slow and can't implement even small changes quickly.
The worst part is that I feel like I'm the idiot for thinking about whether the 50 classes (dtos, models, mappers, blablabla) actually make sense and reduce coupling. I see that anytime a tiny requirement changes, I need to update the 50 classes again, so in practice, it's just doesn't bring anything positive.
When I raise my concerns, they just roll their eyes, and make me feel like "I'm just not a senior enough guy" who just accidently got in the team.
It takes a lot of patience to undo this damage and explain that simplicity is much more important than lazily, mindlessly repeating "best" practices. I am using quotes intentionally because they aren't actually best -- "best" would suggest there are no better practices which obviously cannot be true.
The goal should always be to make the application simple and easy to work with. Patterns should be tools to achieve the goal rather than being goals themselves.
Simple is important because it allows understanding your application (which is important for developer efficiency as well as improving reliability). It also enables you to modify your application much more easily (more code usually means more work to change it) and this is important to fighting technical debts and to reduce cost of any future development.
Easier to do that than just admit technical dept. Some people cannot acknowledge a problem and live alongside it if it is too large to tackle immediately. It has to be explained away or compartmentalized. How simple it is blame the messenger. Sorry you had to experience that from your team.
I feel like this is a rather unfair comment because it doesn't sound like a situation created by "clean architecture."
Granted, you probably should not try and force every detail into this architecture just like you should not rewrite a perfectly good library just because it does not fit into it nicely. But even then; drilling through half a dozen or more layers for every change sounds just wrong. There should not be just any kind of separation in your program. There should be a separation of concerns.
The real problem seems to me a culture in which "We do $X because of $AUTHORITY." is regarded as a sensible answer to criticism. I have worked with exceptionally confident, almost blinkered, people in charge of the big picture and never once have I heard a bullshit answer like that.
i think the other thing is, theres clean architecture and then theres Clean Architecture TM where the thing is taken literally (leading to slavishly applying all the layers with lots of boilerplate, useless mocks and ridiculously coupled unit tests, over architected dependency injectors (assemblies) etc)
i was honestly surprised when i watched a series of lectures from mr uncle (bob) where he clarified a lot of things such as "use dependency injection only where it matters" and "unit tests should be replaced with integration tests after a system is finished being implemented" to "slavish following of agile "customs" is unproductive" etc etc
i think a lot of issues could be resolved if people took the time to think and listen carefully about these things and not stop at the first couple of search hits for "clean architecture"
edit:
heres the link: https://www.youtube.com/watch?v=7EmboKQH8lM&list=PLmmYSbUCWJ...
What you can sometimes do, is to remove all those layers to be able to implement or fix something. Then tell the team that you didn’t have time „to do it properly“ and you focused on functionality and efficiency over design. Management loves that.
And after it’s done, some of those abstraction nazis can refactor in all those abstractions again. So they don’t distract you from the next meaningful task.
But make sure, that you understand what benefit this decoupling brings. Because sometimes it’s useful, just not often enough.
It definitely is the culprit. They didn't even want to add `var` to the language until recently, and let's not even go to the anonymous class vs lambdas retardation.
These are just the things that they eventually buckled on, but Java is extremely boilerplatey - the bad patterns and XML crap got invented to deal with that problem.
DDD and onion are another issue, mostly coming out of the TDD movement and "make everything unit-testable". If I liked one thing about working with Rails is that they just gave you E2E tests from the start. But if Java/.NET were more flexible (dynamic or FP do far better at enabling simpler unit testing from my experience) mocking would be simpler so the unit testing part would be simpler too.
Also OOP is very commonly abused in those languages, to make easy stuff more complicated.
A "service" with a bunch of stateless functions (I am intentionally not calling them methods) is really just a library of routines and the class is used mostly for namespace purposes (to group related functions together) and maybe deliver access to some dependencies. But those dependencies could be thought about almost the same way as global variables in a C program, because usually there exists only single instance of the service.
Neither are DTOs being passed between these services an OOP meachanism -- they are almost C-like structs to make it easier to pass data between functions and to have single reference to them. The only exception maybe is things like equals(), hashcode() etc, but this is very shallow use of OOP patterns.
So it is really difficult to say this is abuse of OOP, when there is very little of actual OOP in it.
The OOP "abuse" I was referring to is mostly caused by inheritance. Five or more levels of inheritance is not so uncommon in some enterprise business logic. And once you have to work with that, you arrived in hell. Especially if it is split into different projects, that you can't navigate or debug as one easily.
I don't see how you figured there is a relationship between DDD and "make everything unit testable". DDD is about high level architecture. It's at the opposite end of the spectrum.
I agree, the crazy mock- and stub-heavy unmaintainable micro unit testing thing seems to have been an innovation that came out of the Ruby scene.
While I consider TDD the wrong approach in 90% of scenarios, in dynamic languages it works out much better because the object model is so flexible you can mock just about anything trivially. In C# and Java it's just boilerplate on top of boilerplate.
This realization is always endlessly infuriating to me as someone who was taught way to much Java in Uni, and had to intensionally push myself into other languages to realize why simple things like higher-order functions were subjugated under the tyranny of classes in java.
But far worse than that is the absolutely abysmal and destructive philosophy around types in java. Just mash em together with namespaces and classes, and then nerf type inference to the point that it couldn’t infer what is literally the most trivial reflexion-based equality: “Object o = New Object()”.