StableCode(stability.ai) |
StableCode(stability.ai) |
Yeah, this is not going to happen. Anyone who has ever tried to gather requirements for software knows that users don't know what they want (clients especially lmao.) The language they use won't be detailed enough to create anything meaningful. Do you know what language would be? Code... Unironically, the best language for software isn't English. It's code. Should you specify what you want in enough detail for it to be meaningful suddenly you're doing something quite peculiar. Where have I heard it before? Oh yeah, you're writing code.
These tools are all annoying AF. Developers don't need half-baked hints to write basic statements and regular people don't have the skills to hobble together whatever permutations these things spit out. Which rather begs the question: who the hell is the audience for this?
If that loop is shortened drastically, then trying, checking and tweaking is suddenly a much more viable design method. That doesn’t require a precise set of requirements.
this exactly.
If the AI could make something that semi works, and you check the output, and repeat until you find the output satisfying, then it will be one of the biggest improvements to software development. Sure, you wouldn't use it to write mission critical software such as aviation, etc. But you'd use it to automate the sorting of your email, or write a quick auto-reply and auto mail merge, or bang out a quick site.
No, you still need the skill of gathering precise requirements, otherwise you end up in endless churn of implementing the wrong requirements and then implementing the wrong corrections when you get bad corrections.
(Maybe we didn’t know before the general adoption of notionally-Agile development methods, which didn’t have this as there premise but were focused on other benefits of a shortened spec->product loop, we certainly know it after the widespread adoption of those methods.)
Shortened development loop does mean that you are more likely to have the whole market/domain shift under you between the time the requirements are defined and when the system is implemented, though, a frequently-realized risk with big upfront design that renders even precise and accurate (at the time gathered) requirements incorrect when implemented.
You'd be surprised by what regular people can build when you give them the power to create software. Here are a bunch of apps created using my tool/GPT-4: https://showcase.picoapps.xyz Most of our users have never coded before, and are able to build small tools to make their and their customers' lives better.
That's not a replacement for software engineering.
Anyone who has setup a coding project knows that actually creating the project structure, setting up dependencies, build scripts, making the code compile/be interpreted are all problems that can have extremely obscure, frustrating errors, and they happen before you even start coding.
Then, not to mention, deploying the software. Even if you give someone code, they won't immediately know how to run it. End users get worried at the idea of opening a terminal and running a command in it, no matter how easy it is. Not to mention setting up the software to do so. (Is the right Python version even installed?)
As such, even if an AI could write a perfect script in code from standard text to, say, lowercase all of the words in a document, it would still be hard for non-developers to use because of the surrounding knowledge barrier, outside of the code itself. Although, yeah, it would be easier.
On the contrary: developers are exactly the people capable of handling those complex requirements you speak of. As a developer, getting a computer to handle basic statements is great and frees you to handle the big stuff.
Being able to write “// Retrieves second value from path” and have the computer spit out some string parsing method is great. All those little helper methods that showily fill up projects are great candidates for an AI. Especially if it helps you break up code into smaller, more composable (and disposable) chunks. If an AI writes the code, and can easily do it again, maybe people would be more willing to delete stuff that isn’t needed.
It’s true that they won’t know how to exactly specify their needs. But they can share input and output examples and iterate on the solution.
I know folks without any programming background using ChatGPT to write code for them.
The code doesn’t work right off the bat but by iterating with the agent they can either get a solution or solve a portion of the problem.
LLMs will work really well when developers know what they want and how to ask for it, same with many no-code platforms. If you don't understand programming though, you can't even know if your request is possible.
It hasn't been solved any better this time, either.
> Which rather begs the question: who the hell is the audience for this?
In my opinion audience for code-generation AI are developers, not the general public. It's immensely useful to be assisted by AI to autocomplete and suggest my code. Whether that is because I'm not familiar with a language syntax or just don't have all the language API in my head.
The general public isn't going to have a clue how to put things together, and until AI can generate reliable and fully functioning code I doubt this is ever going to be for the general public. AI right now is essentially the combination of Google+StackOverflow for me but in a much faster pace. Instead of browsing through tens of SO questions and Google links to get to the exact situation I'm in I can just prompt the AI with all the details and get one response that has the answer to my problem, usually!
I bootstrapped dev learning by collecting all the necessary pieces of code but at the end of the data I feel like I'm just writing a huge semitechnical novel and the problems I encounter have nothing to do with the basic building blocks, it's entirely about code flow, data flow, entry points, race conditions and things you encounter after you hit 99% of test cases.
This stuff seems like new age "low code" environments.
I do believe there will be a day where we communicate what we need and software is written on the fly to facilitate that need. For better or worse.
Insofar as anything like that was ever true, it still is.
Not that writing code has ever been the hard part of software development.
More people will be able to express themselves, it doesn’t matter that your uncle won’t
Just like everytime people hyping a technology have said this with something else where “AI” is but otherwise an identical claim, no, it didn’t happen last time, its not happening this time, and there’s a pretty good chance its not happening next time, either.
I’ve found chat gpt to be more helpful in general. I can paste some code in and have a discussion about what I want it to fix for me.
GitHub Copilot sounds pretty neat though, I will admit that.
I get that consuming an API is far easier than setting up your own inference backend, but there are legitimate issues to consider before going in that direction.
https://huggingface.co/stabilityai/stablecode-instruct-alpha...
You can “run it locally”. Very handy if you do not trust automatically sending all your code to someone in the United States.
I wouldn't call these terms permissive. It's in line with the recent trend in released AI models, but fairly restrictive in what you're actually allowed to do with it.
So I asked it to "Write a python function that computes the square of the input number."
And it responds with:
def square(x):
Which seems quite underwhelming.I guess you could come up with a thousand example prompts and pay some students to pick which output is better, but I can also see why you wouldn't bother. It probably depends on language, type of prompt, etc.
HumanEval is abused but this model is only good for its size, it is no match for Copilot … yet
Can you put those numbers into context for those who haven't done HumanEval? Are those percentages so that 40+ means 40+% and 26 is 26%? If so does that imply both would be failing scores?
From interviews:
Implement queue that supports three methods:
* push
* pop
* peek(i)
peek returns element by its index. All three methods should have O(1) complexity [write code in Ruby].
ChatGPT wasn't able to solve that last time I tried https://twitter.com/romanpushkin/status/1617037136364199938
https://aider.chat/share/?mdurl=https://gist.github.com/paul...
https://chat.openai.com/share/d527f65f-8a6d-4602-acab-4d80ed...
If you want amortized complexity then a simple vector suffices.
2. The average complexity to search, insert, and delete data in a hash table is O(1), for interviews it works 99% of the time.
3. There is alternative O(1) solution you're looking for, I'll leave this exercise to you, bro. As well as the other exercise of being less toxic and a bit more respectful to people you don't know online lol.
Not very promising based on this lame test
I think what I want is this idea of "code completion" but not for writing the methods, which is the easy part. Instead the tool should structure classes and packages and modules and naming and suggest better ways to write certain things.
That is something I’d be very interested in if they can get the compute requirements down to those of say a standard 13B model. Then I could fine tune (correct term?) it on my offline data and hook it into something like fauxpilot and my IDE.
I had a look at some of the recent code models (wizardcoder,strider etc) but it seemed that you need a really large model to be any good and quite a few of them were trying specifically for python.
Very curious where they are getting this data from. In other open source papers, usually this comes from a GPT-4 output, but presumably Stability would not do that?
Stability AI, Apple, Meta, etc are clearly at the finish line putting pressure on cloud only AI models and cannot raise prices or compete with free.
I'm very optimistic and expect them to catch up. I've used the open models a lot, to be clear they are starting to compare to GPT3.5Turbo right now, they can't compete with GPT4 at all. GPT4 is almost a year old from when it finished training I think?
I expect open source models to stay ~1.5 years behind. That said they will eventually be "good enough".
Keep in mind too though that using and scaling GPUs is not free. You have to run the models somewhere. Most businesses will still prefer a simple api to call instead of managing the infrastructure. On top of this many business (medium and smaller) will likely find models like GPT4 to be sufficient for their workload, and will appreciate the built in "rails" for their specific usecases.
tl;dr - open models don't even compare to GPT4 yet (I use them all daily), they aren't free to run, and a API option is still preferably to a massive if not most companies.
Long or medium term these will probably be dirt cheap to just run in the background though. It might be within 3-5 years since parallel compute is still growing and isn’t as bounded by moores law stagnation
Cloud AI providers get a big advantage from batching/pipelining and fancy ASICs. The question is how much they are willing to lower the tax.
Ai: "Hold my beer".
Also exllama doesn't support non-llama models and the creator doesn't seem interested in adding support for wizardcoder/etc. Because of this, using the alternatives are prohibitively slow to use a quantized 16B model on a 4090 (if the exllama author reads this _please_ add support for other model types!).
3B models with refact are pretty snappy with Refact, about as fast as github copilot. The other benefit is more context space, which will be a limiting factor for 16B models.
tl;dr - I think we need ~3B models if we want any chance of consumer hardware to reasonably run coding models akin to github copilot with decent context length. And I think it's doable.
~7B-13B will work in 16GB RAM with pretty much any dGPU for help, and context extending tricks.
TBH I suspect Stability released a 3B model because its cheap and quick to train. If they really wanted a good model on modest devices, they would have re used a supported architecture (like Falcon, MPT, Llama, Starcoder...) or contributed support to a good backend.
*Also, I think any PyTorch based model is not really viable for consumer use. Its just too finicky to install and too narrow with hardware support.
``` def square(x): return x*x ```
I don't know how much slower it could be and still be useful though. The big thing is we need more VRAM, 30B is context length limited with only 24GB of vram, I've only barely made it above 3.2k tokens before running out.
I hope you're right, that it becomes common for systems to have either dedicated TPU type stuff similar to smartphones, and that they absolutely load the crap with VRAM (which I don't think is even that expensive?)
Models will also get smaller but I'm skeptical we'll get GPT4 performance with any useful context length under 24GB VRAM any time soon.
The Instruct model has that non-commercial restriction, but I'm not sure why. They say it was trained with Alpaca-formatted questions and responses, but I'm not sure if that includes the original Alpaca dataset.
I guess the obvious caveat is that these model are probably overfitted on these types of questions. But a specific benchmark could be made containing question kept secret for models. Time to build "Botrank" I guess.
> Nearly a year ago we wrote in the OpenAI Charter: “we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research,” and we see this current work as potentially representing the early beginnings of such concerns, which we expect may grow over time.
> This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas.
> We will further publicly discuss this strategy in six months.
Let me tell you, with some of the tickets I've had to deal with I do not think most people could actually describe the problem accurately enough to an AI to actually fix the issue.
Now, is it still better that no-code platforms exists and give non-technical people the chance to get started? Yes, probably. But the transition path is not clear to me, since no-code platforms don't want their users to move on. So, naturally, they evolve to do more and more complex stuff, which in turn makes their whole platform more complicated and scares off their very target audience.
So now, you need to hire agencies and "no-code developers" to work on your no-code app. Back at square one.
I can see the same story playing out with AI-based coding. If you don't know coding, AI-based coding is just a layer on top of a no-code platform.
However, over time you will need to describe less and less of the code for a large majority of use cases. I expect Generative AI will be able to take more generic prompts based on a specific vendor and really generate more with less prompting given context of whatever you are targeting. Ie azure , camunda, etc
Over the same time, the sophistication demanded of software will expand to more than offset this.
Source: this process of advancing tooling and advancing demands has been going on the whole time software has existed, with a consistent pattern.
We've all seen that obscene production workflow built on a Google Sheet or Jupyter Notebook that now needs to support this or that new feature or integration... Add AI-generated tools to the pile.
To be actually toxic for a moment, you do know that amortized and average are different right?
There is an easy solution to get O(1) for all operations too by allowing them to throw an exception: a simple array. There is no other O(1) solution for all operations that I am aware. In fact it is probably not too difficult to prove that such a solution does not exist.
I'm okay not getting 1% of the offers. I'm not $100 bill so everyone likes me.
I appreciate an attempt to educate me though, I wanted to make clear that any discussion like that is useless without solution to a problem. Post your solution, we'll discuss downsides. You can see it from my side, and I have another one. The parent commenter ain't got no solution, but keeps insisting he can implement that easily with this and that...
Good luck passing interviews with that attitude...
When using GitHub Copilot, I often write a brief comment first and most of the time, it is able to complete my code faster than if I had written it myself. For my workflow, a good code model must therefore also be able to understand natural text well.
Although I am not sure to which degree the ability to understand natural text and the ability to generate natural text are related. Perhaps a bit of text generation capabilities can be traded off against faster execution and fewer parameters.
My understanding is that a model properly trained on multiple languages will beat an expert based system. I feel like programming languages overlap, and interop with each other enough that I wouldn't want to specialize it in just one language.
The vocab size of llama2 is 32,000. I guess I personally don't think that there's enough difference in programming languages to actually save any meaningful number of tokens considering the magnitude of the current vocab.
https://huggingface.co/mlc-ai/mlc-chat-Llama-2-7b-chat-hf-q4...
it looks like if you just limit it to English it'd cut the count almost by half - further limiting the vocab to a specific programming language could cut it down even more. Pure armchair theory-crafting on my part, no idea if limiting vocab is even a reasonable way to improve context handling. But it's an interesting idea - build on a base then specialize as needed and let the user swap out the LLM on an as-needed bases (or the front-end tool could simply detect the language of the project). 3B or smaller models with very long context which excel at one specific thing could be really useful (e.g. local code completer for English typescript projects)