The goal is A) to get me to use it enough in personal projects that I convince my manager to pay for a business license, and B) encourage me to use more AWS API stuff (which CodeWhisperer is fine tuned on), where AWS makes the bulk of their revenue.
I have no qualms with either motivation.
First, Copilot supports a lot more languages (which is a big utility of such tools, that you are writing code in a different language much more quickly.)
Second, it fails more often with incorrect suggestions, and on non trivial things tends often to go line by line.
Today, we’re excited to announce the general availability of Amazon CodeWhisperer for Python, Java, JavaScript, TypeScript, and C#—plus ten new languages, including Go, Kotlin, Rust, PHP, and SQL. CodeWhisperer can be accessed from IDEs such as VS Code, IntelliJ IDEA, AWS Cloud9, and many more via the AWS Toolkit IDE extensions.
It was so unexpected for me that I had to pause for a second to process what happened.
Definitely disappointing compared to ChatGPT based code creation. I love describing what I want very briefly and getting a nice block of code to start tweaking.
I wish there was an easy way to benchmark these tools and revisit them when they pass a threshold of competence.
It's not necessarily the case, it can generate whole functions and even multiple functions.
Today I made a class called "DynamoUtils" and it suggested 2 full methods.
With copilot being embedded in all office software in the near future, MS may as well make GH copilot free. Interesting times!
While CodeWhisperer offers a free tier which may help individuals or pressure Copilot to lower personal account priced, AWS hasn't priced this very competitively for enterprise while their tool is still performing worse.
> /*Create a lambda function that stores the body of the SQS message into a hash key of a DynamoDB table.
Now, obviously that is not valid Java syntax and javac will fail on that, but could/would it be possible to just build an intermediate tool that'll expand this into Java (or whatever other language) so that you don't need to even see the expanded code in your editor, like the same way you don't need to see bytecode?
I get that practically, right now, that would be ill-advised since the AI may not be reliable enough and there are probably more cases than not where you need to tweak or add some logic specific to your domain, etc. But still, theoretically is that where we are heading, i.e. a world in which even what are now considered high level langs get shoved down further below and are considered internal/low level details?
"To help you code responsibly, CodeWhisperer filters out code suggestions that might be considered biased or unfair, and it’s the only coding companion that can filter or flag code suggestions that may resemble particular open-source training data."
It would be interesting if AWS actually does their attribution - how do they know which open source code was published in any public repo?
// Send string via mqtt
// use async_std::task;
// use async_std::prelude::;
// use async_std::net::TcpStream;
// use async_std::io::prelude::;
// use async_std::io;
// use async_std::sync::Mutex;
// use async_std::sync::Arc;
I also haven’t used these tools at all so if CodeWhisperer is a little “dumber” than copilot, I doubt I will even notice.
I was just thinking about this before reading the announcement. Part of our work is in aerospace; hardware and software being a part of that. All of it goes through layers-upon-layers of design, testing, verification and qualification for flight.
In my mind I saw this scenario where something happens and it ends-up in the courts. And then, in the process of ripping the code apart during the lawsuit, we come to a comment that changes it all. Something like this:
// Used Amazon CodeWhisperer to generate the framework of this state machine.
// Modified as needed. See comments.
That's when the courtroom goes quiet and one side thinks "Oh, shit!".What does the jury think?
They are not experts. All they heard is you just used AI to write part of the code for this device that may have been responsible for a horrific accident. Are their minds, at that point, primed for the prosecution to grab onto that and build it up to such a level that the jury becomes convinced a guilty verdict is warranted?
Don't know.
Does this mean we have to be very careful about using these tools, even if the code works? Does this mean we have to ban the use of these tools out of concerns for legal liability?
Personal example:
A year or so ago I wrote a CRC calculation program in ARM assembler. It could calculate anything from CRC-8 to CRC-32. This was needed because we were dealing with critical high speed communications and there was a finite real-time window to compute the CRC checksum. The code was optimized using every trick in the books, from decades of doing such work. Fast, accurate, did exactly what it was supposed to do. In production. Working just fine.
I was curious. A couple of weeks ago I asked ChatGPT to write a CRC-32 calculation routine given some constraints (buffer size, polynomial, etc.). It took a few seconds for it to generate the code. I ran it through some tests. It seemed to work just fine.
That's when the question first occurred to me: Would it expose us to liability if that code were to be used in our system? I don't know. I have a feeling it would be unwise to use any of it at all.
Wouldn't it be funny, interesting and perhaps even tragic if we had to have "100% organically-coded" disclaimers on our work in the future?
They already have alienated most of groceries and other online retailers from AWS. wouldn’t make sense for them to do this to others.
Amazon doesn’t have a lack of IP problem, they, like all large companies right now can’t turn ideas into products.
Does these language model bots help a little? Sure! But my worry of being replaced is currently sitting at like a 3% out of 100%. I expect to still have a job right up until we have AGIs, and quite probably for long after, as not everyone will be able to afford them. That is assuming we have any meaningful control over them.
For years we've been saying "computer time is cheaper than developer time."
Well, that's about to come back to bite us, in a big way.
But the real world violates this all the time. You want to buy a car. Some company you've never heard of in China makes the chips that detect whether or not your windshield wiper fluid reservoir has fluid. A shipment to the car manufacturer is ready to go out. But, there are no shipping containers. Until the windshield wiper sensor chips arrive, the car factory can't make any cars, and don't have room to unpack the shipping containers with unneeded parts that are piled up outside. So there is no container that can go back to China to bring the chips to the factory. While all that is worked out, SV venture capitalists print some money to give to a used car startup, making it super easy to get the best price on your used car. With no new cars available and flashy discounts to get the market kickstarted, the used car market shoots up, meaning that even though you want a new $60,000 electric car, all you can do is buy a used 1988 Yugo for $150,000. You walk to work, even though you have the money for the car you want.
If it's software, this is what we call a pageable event and the postmortem whines about "separation of concerns". But in the real world... well, we don't have those. We LOVE thinking we do, but when shit blows up, it's clear that we don't. So are we really surprised that software works the same way? It's how the Universe works, not bad architecture. The Universe has terrible architecture. Adjust some of those constants and try again!
and then you can just evaluate expressions within the function. The fancy way with editor support is: https://github.com/vvvvalvalval/scope-capture-nrepl
you make snapshots of the local variables at any point, and later evaluate code in the context of that snapshot. So you do some action in your program that results in that function being called, it'll save the input, you select that snapshot, and now you evaluate in the context of those function arguments as you edit and eval expressions in the function. And while clojure supports interactive development at a level beyond other mainstream languages, Smalltalk and Common Lisp have support for it on another level, for example: https://malisper.me/category/debugging-common-lisp/
There's some study where Smalltalk came out as the most productive language, I don't know whether it's more productive but that kind of interactive development where you build up your program evaluating it the whole time, without ever restarting, is a lot of fun. Why it went out of style I don't know
But if you're willing to do without the "as you edit" requirement, then what you're left with is a plain old breakpoint debugger. Certainly, there are many IDEs that have those builtin.
One step before this (AI as a pre processor that generates source code which is then validated by tests and committed without even review) I think is possible.
Cutting edge LLM apps utilize multiple LLMs to perform validation, task decomposition, etc. it’s not a stretch that a future application can take your pseudo code / spec, maybe ask you some clarifying questions, generate a bunch of code and test cases, maybe even launch a beta stage and prompt you to validate it.
As others have mentioned, LLMs are nondeterministic and can do the wrong thing on a given run. This is in contrast to a traditional program that is either buggy or bug free. OTOH another LLM can be trained to validate, and to debug.
There’s a lot of work to do before LLM apps are considered reliable enough to do their job without intense supervision.
COBOL was designed for normal business people use, remember?
You'll just have to program in AI understandable language, I'm sure there are going to be lots of quirks and tricks similar to the languages today.
These systems are non-deterministic by nature, so I doubt it unless something fundamentally changes. Moreover you'd have to be super specific to capture the business logic to the point that you're basically writing code in a high level dynamic language anyway.
Yes.
But... it'll expand it based on the probability of what you want looking like other things it's been trained on. If you want the obvious use case then it'll be magical. Just describe the code and it'll work. But as soon as you want anything slightly less than typical you'll need to start 'prompt engineering' to refine in greater and greater detail, possibly until you've actually put in more effort than it'd take to just write the code.
For anything that's even further outside of the training data it won't work but it might look like it does. In the short term that's going to trip a lot of people up.
The worst part will be when non-developers start to use it though. "Make me a web form that takes a name, email address, and ZIP code and saves them to Airtable" will probably work eventually ... but with no validation, no error handling, no security, no styling, no cross-browser testing... because the author didn't know to ask for those things in their prompt. AI derived apps are going to suck.
It's funny, I was actually just pondering what to do with it when I opened HN and came across your comment. I was thinking of improving it some more and then selling it for a low-ish price. One thing that'd really help though, is a more widely accessible GPT-4 API.
If it isn't, then Copilot has previously used OpenAI Codex (which is based on GPT-3)
Source: https://github.blog/2023-03-22-github-copilot-x-the-ai-power...
The security checks and OSS attributions feel very much like what "enterprise" software does when they know they can't compete, they tick boxes instead.
I would guess that after a few years if everyone starts depending on this technology they would increase the price.
None of those problems are amenable to modern LLMs. The moment you try to be formal enough to be unambiguous, you start writing code.
Maybe that description is incomplete.
Maybe there's a stack of mundane activities that are needed for that style to be effective.
----
"Within each project, a set of changes you make to class descriptions is maintained. … Using a browser view of this set of changes, you can find out what you have been doing. Also, you can use the set of changes to create an external file containing descriptions of the modifications you have made to the system so that you can share your work with other users.
…
The storage of changes in the Smalltalk-80 system takes two forms: an internal form as a set of changes (actually a set of objects describing changes), and an external form as a file on which your actions are logged while you are working (in the form of executable expressions or expressions that can be filed into a system). … All the information stored in the internal change set is also written onto the changes file."
1984 Smalltalk-80 The Interactive Programming Environment page 46
https://rmod-files.lille.inria.fr/FreeBooks/TheInteractivePr...
----
"At the outset of a project involving two or more programmers: Do assign a member of the team to be the version manager. … The responsibilities of the version manager consist of collecting and cataloging code files submitted by all members of the team, periodically building a new system image incorporating all submitted code files, and releasing the image for use by the team. The version manager stores the current release and all code files for that release in a central place, allowing team members read access, and disallowing write access for anyone except the version manager." (page 500)
1984 "Smalltalk-80 The Interactive Programming Environment"
I think it's this paper: https://www.ifpug.org/wp-content/uploads/2017/04/IYSM.-Thirt...
Presumably the differences between "GW Basic", "Basic (interpreted)", "Quick Basic", "Visual Basic" ("Excel" Visual Basic for Applications?) follow from differences between the software development tools provided with the different language implementations.
So shouldn't we expect wildly different results between Java + plain text editor and Java + IntelliJ IDEA?
But there's only some kind-of generic place-holder "Java".
Tried it for a bit this morning with the "newest" release and didn't immediately observe any improvement, though this is far from objective of course.
Also you need to translate/expand once (or multiple times, add tests, pick best benchmarked solution)
Where this could be useful would be in handling updates of packages and API's by itself. if you integrate only by prompt/words the AI can generate the appropiate latest lib integration that happens to work with your system or whatever
Accepting varied non-deterministic input, great. The same input generate different code each time, not the "feature" that you'd think it is.
The bigger problem is that LLMs are slow and expensive. Even in the future after many improvements, it makes more sense to have an LLM write a program once ever, rather than write a program on every compile or every execution.
Individual versions are deterministic though. Two identical prompts to a LLM at the same time can give drastically different results, because the the responses are probabilistic. You can't assemble complicated systems that way and expect them to behave consistently.
If a natural language compiler can output correct performant code, nondeterminism shouldn’t matter.
For example, take a script that randomly invokes either gcc or clang, maybe randomly sets the optimization level. Multiple invocations will output vastly differently, but we can be confident the output is correct and to some degree performant.
Once the hardware mute is engaged, I've not had the Alexa reply.
Don't get me wrong, there's plenty of reasons to bash Amazon. I just don't count Alexa and privacy with mute among them.
As for my joke in general - I guess my point, if I have to make one beyond making a joke, is that I don't really trust any company to do what they claim in all circumstances. Even if they have every intention of doing so, some bug or bad actor could compromise the intent - not saying it's identical, but look at the recent issue with Tesla camera pictures being shared. So if you're working on some mission critical or top secret code, I wouldn't trust _anything_ to be running or looking at it - not Copilot, CodeWhisperer, etc. etc.
I see the results from programmers who never learned assembly.
Just like repairing and building cars from the ground up makes me a better driver. For example, the clutch in my car lasts a lot longer than it does for other drivers.
2) I am an old fart and had never said anything even remotely approaching "who did not do assembly will never be able to code". One can learn programming using any language. However some experience with the languages of multiple level definitely does not hurt.
In order to evaluate generated code you have to know how to program.
Intervening can be done via LLM in some way I reckon
>"Intervening can be done via LLM in some way I reckon"
This inspires much confidence
Amazon as a whole has quite a history of using business data of people they are selling services to for their own purposes, and I wouldn’t put it past them with any AWS services not covered by the compliance agreements/certifications.
For whatever reason I got flagged :')
I've used GCP and AWS for about equal time and Google was the one with several worrisome overreaches into customer's accounts moments. Meanwhile AWS actually lets you view and adjust the role and policy given to customer support on your account.
That depends on what you want.
In the first place, the problem of compiling a natural language spec to code is obviously somewhere from undefined to Turing complete (depending on formulation). But if the compiler usually outputs some application with most of what the spec required, this compiler would be intensely useful for e.g. rapid prototyping.
Then the question is whether we can make an LLM based app that compiles natural language and gets you most of the way to the prototype you were building (or even better - asks clarifying questions to help refine your spec).
This isn’t that far fetched with current technology.