I see both vscode and netbeans have a concept of "inference URL" - are there any efforts like language server (lsp) - but for inference?
https://ai.meta.com/research/publications/code-llama-open-fo...
In my applications, GPT-4 connected to a VM or SQL engine can and does debug code when given error messages. "Reliably" is very subjective. The main problem I have seen is that it can be stubborn about trying to use outdated APIs and it's not easy to give it a search result with the correct API. But with a good web search and up to date APIs, it can do it.
I'm interested to see general coding benchmarks for Code Llama versus GPT-4.
Imagine a database of "Here is the console error, here is the fix in the code"
Maybe one could scrape git issues with console output and tagged commits.
What?!? No Befunge[0], Brainfuck or Perl?!?
[0] https://en.wikipedia.org/wiki/Befunge
/just kidding, of course!
I got a bridge, but it was the wrong size.
Thanks, in advance.
Asking for purposes of educating non-technologists.
But don't get me wrong, TheBloke is a hero.
I can see some people fine-tuning it again for general propose instruct.
I do wonder about how much use it'll get, seeing as running a heavy language model on local hardware is kinda unlikely for most developers. Not everyone is runnning a system powerful enough to equip big AIs like this. I also doubt that companies are going to set up large AIs for their devs. It's just a weird positioning.
for now it is :) but with quantization advances etc. it is not hard to see the trajectory.
Its also just the right size for llama.cpp inference on machines with 32GB RAM, or 16GB RAM with a 8GB+ GPU.
Basically its the most desirable size for AI finetuning hobbyists, and the quality jump from llama v1 13B to llama v1 33B is huge.
Pretend you're an uncreative PM on an AI team; what part of Facebook or VR could you feasibly improve by iterating on LLMs? Perhaps the content moderation system... but that would require wrangling with the company ethics comittee and someone else at the company probably already took ownership that idea. You've gotta do something compelling or else your ML engineers are going to run off somewhere else.
If I were to ask my ML engineers about what they wanted to work on, they're going to avoid areas where their model is outgunned (i.e.: chat) and instead prefer lower hanging fruit which generalizes well on a resume (i.e.: "Pioneered and published key innovations in LLM code-generation").
Of course, the alternative answer is that Meta wants to replace all of their jr. developers with GPUs, but I think their leadership is a little too preoccupied with VR to be actively pushing for such a transformative initiative in anything more than a very uninvested capacity (e.g.: "Sure I'll greenlight this. Even if it doesn't pay off I don't have any better ideas")
The end goal of AI isn't to make your labour more productive, but to not need your labour at all.
As your labour becomes less useful if anything you'll find you need to work more. At some point you may be as useful to the labour market as someone with 60 IQ today. At this point most of the world will become entirely financially dependent on the wealth redistribution of the few who own the AI companies producing all the wealth – assuming they take pity on you or there's something governments can actually do to force them to pay 90%+ tax rates, of course.
- Easy plug & play model installation, and trivial to change which model once installed.
- Runs a local web server, so I can interact with it via any browser
- Ability to feed a model a document or multiple documents and be able to ask questions about them (or build a database of some kind?).
- Absolute privacy guarantees. Nothing goes off-machine from my prompt/responses (USP over existing cloud/online ones). Routine license/update checks are fine though.
I'm not trying to throw shade at the existing ways to running LLMs locally, just saying there may be room for an OPTIONAL commercial piece of software in this space. Most of them are designed for academics to do academic things. I am talking about a turn-key piece of software for everyone else that can give you an "almost" ChatGPT or "almost" CoPilot-like experience for a one time fee that you can feed sensitive private information to.
The only thing I've been able to think is they're trying to commoditize this new category before Microsoft and Google can lock it in, but where to from there? Is it just to block the others from a new revenue source, or do they have a longer game they're playing?
In the meantime, we are still waiting for Google to show what they have (according to their research papers, they are beating others).
> User: Write a loop in Python that displays the top 10 prime numbers.
> Bard: Sorry I am just an AI, I can't help you with coding.
> User: How to ask confirmation before deleting a file ?
> Bard: To ask confirmation before deleting a file, just add -f to the rm command.
(real cases)
https://g.co/bard/share/e8d14854ccab
The rm answer is now "hardcoded" (aka, manually entered by reviewers), the same with the prime or fibonnaci.
This is why we both see the same code across different accounts (you can make the test if you are curious).
No thanks, going back to Winamp.
There's a "PrivateGPT" example in there that is similar to your third point above: https://github.com/jmorganca/ollama/tree/main/examples/priva...
Would love to know your thoughts
The concerns are 2 fold - 1. We might inadvertently use someone else’s intellectual property. 2. Someone else might gain access to our intellectual property.
What you are describing would help alleviate the concern about issue 2, but I’m not sure if it would help alleviate the concerns with issue 1.
The Copilot lawsuit should answer concern #1 more definitively.
#2 is already solved by running your own model or using Azure OpenAI.
VSCode is such a bloated hog of an editor!
Every time I open VSCode it’s bugging with badges to update extensions… and it’s so slow!
That said, non-subscription is essential, and that's probably going to be a heavy lift considering how quickly things are evolving.
I've not yet been able to solve the challenge of needing CUDA etc for some models though!
Plugins so far: https://llm.datasette.io/en/stable/plugins/directory.html
Things like bad default values, no tooltips, an no curated model list to one-click download is what separates a tool like Oobabooga from a paid commercial product. These things require time/money and it would be very unlikely that an open source tool could find resources for all the testing and R&D.
I think there is a big market for products where you pay and can just start chatting with the model without having to ever go to the settings tab or google anything unless you need to do something out of the ordinary.
lmstudio actually does what both of you want: provides an easy GUI and serves up your model over a local endpoint that mirrors the OpenAI API.
There's just too much noise in terms of the tooling for LLMs, the solution is fewer higher quality solutions, not more solutions
They also don't have the same economic setup and DNA as MS/OpenAI. Large corporate customers don't pay for access to the FB cloud, nor are they likely to -- Ellison has spent years building out Oracle Cloud, and he's on the FB board, for example. And I bet you didn't think of using Oracle's Cloud for your last project.
So, your company DNA is free-to-all social based on ad monetization, with a large bet on metaverse / AR / experiential social compute being next. You aren't a trusted corporate partner for anything but gatekeeping your immense community through ad sales.
And, it's clear you a) have some of the most interesting private social data in the world, including photos and DMs and texts, and b) this AI thing is huge.
A play that doesn't f with your existing corporate structure too much is to build this stuff, give it away, keep publishing, build your AI team internally, and see where it takes you.
This isn't the only play, but I think it's reasonable. It's pretty clear large enterprises are going to need their own, internally built / owned, Foundation models to be competitive in a bunch of arenas in the next decade. In this case, if FB can get a little mindshare, keep the conversation going, and as a sidenote, be a disruptor by lowering Azure/OpenAI revs with open releases at-the-edge, that's probably a strategy win.
If I were in charge of AI strategy at FB, I'd probably double down more on generative AI, and I'd be working hard on realtime multimodal stuff -- their recent very large multimodal speech to text in multiple languages work is good. If a team could eyeball realtime-ish video chat with translations, that would be something the platform has a natural advantage in pushing out. Generative hits existing customers, and metaverse asset creation, which is going to experience radical changes in costs and productivity over the next few years, and impact Oculus 100% no matter what anybody wishes were true.
They’re also getting a fantastic amount of press from all this, which is good for attracting talent and helping improve their image, at least among the nerd set.
Who is better positioned to answer a question like, “What should I get my friend Sophia for her birthday?” Facebook/Instagram already have huge volumes of data to specifically target ads. They can feed those into a chat interface pretty easily.
Customers would then buy per impression by describing their product and trusting Facebook to place it correctly. They already do this today, it’s just a different medium.
Had they continued with it, they'd have likely had some semblance of a public cloud today and would be able to sell these models.
Yeah, there are tons of opportunities for AI to do something with facebooks private user data and sell new services. For users to create engagement - and for ad companies to get very good targeted ads delivered. It is of course a challenge, to update the models on the fly, to include the latest private data, but then you can tailor an ad, that has subtil references to the latest shared wishes of the user. Probably quite effective.
So for now they mainly need top talent, to make some of it work. And open source is the best bet, for creating a ecosystem they can control and get talents who already trained on their tools. And they loose allmost nothing, because yes, they ain't in the cloud buisness.
So I will continue to not use facebook. But the models I will try.
If their choice right now is not to try to overtly monetize these capabilities but instead commoditize and "democratize" what others are offering it suggests they think that a proprietary monetization route is not available to them. In other words they do not leave money on the table. They think that (at least right now) there is no money on the table that they can get to.
Rather than remaining quiet and isolated, the best alternative - their conjectured thinking goes - is to show up as they do, buying up good will with various stakeholders, maintaining mindshare internally and externally etc.
Assuming that the above reading is correct it still leaves various options as to why they may have come to that conclusion: For example reasoning about the future of this sector they might be thinking that there is no real technical moat and they simply accelerate that reality to gain some brownie points.
It may be also idiosyncratic reasons specific to their own business model (data privacy challenges and how any AI monetization will mesh with all that). The drawback of being the elephant in the room is that there is not much room to move.
The nature of their long game depends on which of the decision branches carries more weight. Maybe it is wait-and-see until others clear up the regulatory hurdles. Or keep the engines running until the real and irreducible added value of LLM algos and the like becomes clear.
The only moat that is meaningful is data and they've got that more than any other player save maybe google. Publishing models doesn't erode that moat, and it's not going anywhere as long as facebook/whatsapp/instagram rule "interactive" social.
Working towards NLU that can solve content moderation once and for all? Contrast with tiktok which is clearly using word filters that are easily worked around with phrases like "un-alived" or "corn".
They want to replace influencers and your friends with chatbots and keep you scrolling through an infinite feed of ads and AI generated content?
We followed a similar model under more duress at Netscape. When you use Firefox that’s the fruit of that effort. It didn’t save Netscape, but Meta has a better and more diversified revenue base.
Likely this is driven by ego.
Yann wants to cement his position as a leader in AI and while he clearly does not appreciate LLMS at all, he realizes that he needs to make waves in this area.
Mark needs a generative product and has invested tremendously in the infrastructure for AI in general (for recommendation). He needs researchers to use that infrastructure to create a generative product(s).
Yann sees this going on, realizes that he has a very powerful (research+recruiting) position and tells mark that he will only sign on if Meta gives away a good deal of research and Mark concedes, with the condition that he wants his generative product by end of 2023 or start of 2024.
There seem to be opportunities for people to use technology like SeamlessM4T to improve lives, if it were licensed correctly, and I don't see how any commercial offering from smaller companies would compete with anything that Meta does. Last I checked, Meta has never offered any kind of translation or transcription API that third parties can use.
Whisper is licensed more permissively and does a great job with speech to text in some languages, and it can translate to English only. However, it can't translate between a large number of languages, and it doesn't have any kind of text to speech or speech to speech capabilities. SeamlessM4T seems like it would be an all-around upgrade.
[0]: https://github.com/facebookresearch/seamless_communication
If nobody except the researcher can reproduce an AI paper, and there is no source-code, and no demos that the public can access, then it's almost like if it doesn't exist.
I wouldn't want to work in a company that would throw away my research and just use it for PR purposes.
This compares favorably with Google, which is as likely to cannibalize its search business with generative AI as to create new value for itself.
Thus, for all the gen AI stuff like this, for which Meta doesn't have an obvious path to commercialization, it makes sense to release it publicly. They get plenty of benefits from this - for one, engineers (and smart people generally) who are working on really complex problems like to be able to talk about the work they're doing. If you're picking between jobs at Meta and Google, the fact that Meta's going to release your stuff publicly might well be the deciding factor.
I would also argue that there's an economic incentive. Right now, being seen as an AI company is definitely a positive for your multiple. I think the movement of Meta's stock price over the last 12 months relative to their change in profit and revenue is certainly driven in part by the perception that they're a leader in AI.
Yes. I said it many times. Meta is already at the finish line in the AI race to zero. All the other cloud-based AI models cannot increase their prices given that a $0 free AI model is available to be self-hosted or used on-device for private / compliance reasons.
Cloud-based AI models cannot afford to compete with free or close to free. It costs Meta close to nothing to release a readily available $0 AI model which is good enough for most use-cases that ChatGPT has already done.
> The only thing I've been able to think is they're trying to commoditize this new category before Microsoft and Google can lock it in, but where to from there? Is it just to block the others from a new revenue source, or do they have a longer game they're playing?
Mostly benefits the PyTorch ecosystem which Meta has an active community around it.
AI has no moat, but many players are in denial about this still. Microsoft and other might have tight enough control they can use a product dumping strategy to get people dependent upon their implementation such they can start charging, but that isn't a delusion Meta can have.
That max revenue license they used with the models seemed fairly clever to me. It will seed the environment with players that base their product on Meta tech in return for them being born with a poison pill preventing their use by big players (other than Meta) buying them. This is a long term play that may not really work but it creates the potential for big opportunities. And even if it doesn't work out, denying easy wins for their powerful competitors might be worth the price on its own.
Meta by democratizing AI access is generating more capable developers which will make the Metaverse a reality, where FB leads. They have already realized they have a losing gambit with Google, Apple, Microsoft (also X?) having an antagonistic monopoly against Meta product advancement
because language models are a complementary product, and the complement must be commoditized as a strategy.
I see AMD as a bigger beneficiary, since, very soon, amd will equal nvidia for inference and fine-tuning, but amd has a long way to go to equal in foundation model training.
It's licensed non-commercially, so I'm not sure what those startups stand to gain.
> since, very soon, amd will equal nvidia for inference and fine-tuning
Source? If you're referring to Olive, it is indeed impressive but also has caveats:
1. It is just as proprietary as CUDA or CoreML.
2. You need a copy of Windows and licensed DirectX to use those optimizations.
3. AMD only matches Nvidia's inferencing performance when comparing Olive to Pytorch. Olive-to-Olive comparisons will still reflect an Nvidia lead.
I don't think AMD has the capability to equal Nvidia in the short-term. It will take longtime software investments from across the industry to shake Nvidia's yoke.
But they're a partner in Llama too. Why is Microsoft in this space too, how do they benefit?
Very interesting to ponder for sure.
https://g.co/bard/share/9ce2e6a11e83
LLM's aren't trained on their own documentation, and can't introspect, so generally can't answer questions like this.
(`"Mark House" "Bard"` gives no results on Google.)
With all due respect, is that a valuable thing to say? Isn't it true of them all?
They are generally okayish, closer to "meh", than something outstanding.
Yes the shell script solution is better, it doesn't give rm -f anymore, but is still somewhat closer to a bad solution instead of just giving rm -i.
I'm just really happy and excited to see that a free-to-download and free-to-use model can beat a commercially-hosted offering.
This is what has brought the most amazing projects (e.g. Stable Diffusion)
I also don’t know any professional devs who have used tools like copilot and said they were anything but a toy. I am more bullish on LLMs than most of my coworkers. I think there is a lot of potential there. I do not see that potential in the current commercial offerings, and the financial outlay to fine-tune an open-source model and run it at scale is…prohibitive.
Really? I'm academic now but I find Copilot at least moderately helpful when I'm writing a library. It's pretty good a lot of boilerplate functions, docstrings, regex, etc. I certainly don't want to go back to not using it, my code is a lot closer to production quality now and looks nicer.
Thinking back to my days in back-end it seems like it would have been very helpful/sped things up so I'm surprised to hear it's just a toy but I've been out of the professional game for a while now. What's the main criticism?
Interesting idea but sounds risky and intrusive in practice.
That’s pretty much the entire Meta empire in a single sentence.
Just like annual sexual harassment training - it's mostly corporate CYA on liability. If it ever goes to court, they'll plead ignorance and blame the employees who should have known better as they were trained/informed on what they ought not to do.
Paying for co-pilot could bite them though, so I suspect it's a case were the one part of the organization isn't aware of what the other is doing
FB can release these for no other reason than Zuck’s ego or desire to kill OpenAI. Same deal as him going off on a tangent with the Metaverse thing.
wrote about and diagramed it here - https://telegra.ph/Facebook-Social-Services-FbSS-a-missed-op...
Here's some output from q4_0 quantization of CodeLlama-7b-Python (first four lines are the prompt):
# prints the first ten prime numbers
def print_primes():
i = 2
num_printed = 0 # end of prompt
while num_printed < 10:
if is_prime(i):
print(i)
num_printed += 1
i += 1
def is_prime(n):
i = 2
while i * i <= n:
if n % i == 0:
return False
i += 1
return True
def main():
print_primes()
if __name__ == '__main__':
main()
It will be interesting to see how the larger models perform, especially after community tuning and with better context/prompting. def primes_upto(limit: int):
"""Generate prime numbers < *limit*."""
# Sieve of Eratosthene
is_prime = [True] * limit
for n in range(2, limit):
if is_prime[n]:
yield n # found prime number
for c in range(n*n, limit, n): # start with square, less values are marked already
is_prime[c] = False # mark composites
if __name__ == "__main__":
from itertools import islice
print(*islice(primes_upto(100), 10)) # -> 2 3 5 7 11 13 17 19 23 29print("1, 2, 3, 5, 7, 11... and so on!
I have been waiting for weeks, and am still waiting, to get access to Llama2 (released a month+ ago), and access to this model goes through the same form, so I'm not very hopeful. Are you getting it from other methods?
Now granted, that's more or less by definition and I don't doubt there's communities and fields where it is considered one, but still shows some of the subtleties at play when using language models.
Current consensus has settled on excluding 1 from the definition, but there are examples of publications well into the 20th century that still included 1 as prime.
Fascinating read about the subject: https://arxiv.org/pdf/1209.2007.pdf
-1 True
0 True
1 True
2 True
3 True
4 FalseCongratulations! You must be that arrogant guy everybody hates interviewing with, the one with the superiority complex.
How about instead of just failing people over literally nothing (wasting everybody's time and money) - just ask the candidate whether they could somehow reduce the search space by utilizing the properties of a prime number?
My suggestion for your next interview: decide to hire them just based on their leetcode score, but invite to the interview just to flex that you're still better at puzzle solving :-D
Perfect
The problem a team that always seeks the optimal solution is that never they get shit done. And that’s rarely optimal in a business context. Your view does not strike me to be nearly as arrogant as it is short-sighted.
I think on a team of one I want the guy who gets it done without thinking. On a team of two I want a guy that’s somewhere in the middle. And on a team of three, that’s when I want my optimiser. Because in the time that guy number 3 has written let’s say 3 files of optimal code, guy number 10 files of not optimal code. And you rarely need guy number 3 to fix all ten, you just need him to fix the one or two files that actually matter.
Clients rarely ask “is this optimal?”. But they always ask “is this done?”.
All three developers have different approaches. All three are assets.
I think on some level then you’re making the same mistake that we could say the “just add one guy” made if your comment is honest- not factoring in (his) speed.
I think code readability, rather than code optimisation is far more important thing to get hung up on in an interview (and is, I must remind some of you, not to be confused with conciseness). And you can see this in the end result. But if you’re following along and the interviewee already knows you know what’s going on because of that, you can see it in smaller ways- it could be as simple as going back and changing a variable name such as open (maybe a function?) to isOpen (almost always a Boolean value).
I think most of us are in this position pretty often where we’re writing and figuring out something at the same time, maybe we just see if it works first and don’t give a name much thought or maybe we change the actual value or type of a variable in the process and the variable name becomes ambiguous. I’d look for small indicators that shoe me that this person is considering readability. But I still don’t necessarily expect it to be as readable as it would be in a more natural setting I mean I think 90% of these sorts of things start off with the person saying “I swear I can type” within the first five minutes of being watched- if they get flustered while being watched that it effects their typing, it certainly also effects their coding as well.
> The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.
Edit: Reading the paper, key retrieval accuracy really deteriorates after 16k tokens, so it remains to be seen how useful the 100k context is.
Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.
The github repo associated with that paper is linked below. It links to the paper on arxiv, but also has some data in the repo.
https://ai.meta.com/blog/code-llama-large-language-model-cod...
[0] https://github.blog/2023-07-28-smarter-more-efficient-coding...
[1] https://github.com/features/preview/copilot-x
[2] https://github.blog/2023-07-20-github-copilot-chat-beta-now-...
It's extremely good. I keep a terminal tab open with 7b running for all of my "how do I do this random thing" questions while coding. It's pretty much replaced Google/SO for me.
I wonder if we could make such specific LLMs (one that is proficient in all things Rust, another- all things Linux, all things genomics, all things physics modeling etc) and have them talk to each other to collaboratively solve problems.
That would be a crazy future thing! Putting machines truly to work..
There's a quite fresh and active project replicating something similar with herd of llamas: https://github.com/jondurbin/airoboros
There was a similar attempt for Godot script trained a few months ago, and its reportedly pretty good:
https://github.com/minosvasilias/godot-dodo
I think more attempts havent been made because base llama is not that great at coding in general, relative to its other strengths, and stuff like Starcoder has flown under the radar.
[1] https://huggingface.co/TheBloke/CodeLlama-13B-Python-fp16
ollama run codellama "write a python function to add two numbers"
More models coming soon (completion, python and more parameter counts)Not a bad context window, but makes me wonder how embedded code models would pick that context when dealing with a codebase larger than 100K tokens.
And this makes me further wonder if, when coding with such a tool (or at least a knowledge that they’re becoming more widely used and leaned on), are there some new considerations that we should be applying (or at least starting to think about) when programming? Perhaps having more or fewer comments, perhaps more terse and less readable code that would consume fewer tokens, perhaps different file structures, or even more deliberate naming conventions (like Hungarian notation but for code models) to facilitate searching or token pattern matching of some kind. Ultimately, in what ways could (or should) we adapt to make the most of these tools?
Is anyone working on a code AI that can suggest refactorings?
"You should pull these lines into a function, it's repetitive"
"You should change this structure so it is easier to use"
Etc
I absolutely love the idea of using one of these models without having to upload my source code to a tech giant.
Edited to add: Though somewhat slow, swap seems to have been a good enough replacement for not having the loads of RAM required. Ollama says "32 GB to run the 13B models", but I'm running the llama2:13b model on a 16 GB MBP.
If there's nay other opinions or thoughts on this, I'd be very happy to learn as well. I have considered the eGPU route connected to a 1L PC such as a thinkcentre m80/90.
Any "eli5" tutorial on how to do so, if so?
I want to give these models a run but I have no powerful GPU to run them on so don't know where to start.
I see the GitHub copilot extensions gets a new release one every few days, so is it just that the way they're integrated is more complicated so not worth the effort?
https://huggingface.co/chargoddard/llama2-22b
Theoretically this is an even better size, as it would fit on a 20GB-24GB GPU with more relaxed quantization and much longer context.
Metrics are slightly below 13B, but the theory is that the higher parameter count is more amenable to finetuning. If you search for 22B on huggingface, you can see that frankenllama experiments are ongoing:
Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.
I would argue that many teams will have to reevaluate their LLM strategy _again_ for the second time in a week.
I tried Ideogram yesterday and it felt too much like existing generators (base SD and Midjourney). DALLE2 actually has some interestingly different outputs, the problem is they never update it or fix the bad image quality.
I guess since Xcode doesn’t have a good plug-in architecture for this I began experimenting more with a chat interface.
So far gpt-4 has seemed quite useful for generating code, reviewing code for certain problems, etc.
There's also the real-time aspect where you can see that it's wrong via the virtual text, type a few characters, then it gets what you're doing and you can tab complete the rest of the line.
It's faster to converse with when you don't have to actually have a conversation, if that makes sense? The feedback loop is much shorter and doesn't require natural language, or nearly as much context switching.
It might be the quantization or my lacklustre prompting skills affecting it, though. To be fair I did get it to output a little bit of useful code after trying a few times.
Where its a bit shit is when its used to provide auto suggest. It hallucinates plausible sounding functions/names, which for me personally are hard to stop if they are wrong (I suspect that's a function of the plugin)
Nobody is saying this model is AGI obviously.
But this would be an entry point into researching one small sliver of the alignment problem. If you follow my thinking, it’s odd that he professes confidence that AI safety is a non issue, yet from this he seems to want no part in understanding it.
I realize their research interest may just be the optimization / mathy research… that’s their prerogative but it’s odd imho.
I haven't yet read the whole paper (nor have I looked at the benchmark docs which might very well cover this) but curious how these are designed to avoid issues with overfitting. My thinking here is that canned algorithm type problems common in software engineering interviews are probably over represented in the training data used for these models. Which might point to artificially better performance by LLMs versus their performance on more domain-specific type tasks they might be used for in day-to-day work.
[1] https://github.com/openai/human-eval
[2] https://github.com/google-research/google-research/tree/mast...
You can scream that this is progress all you want, and I'll grant you that these tools will greatly speed up the generation of code. But more code won't make any of these businesses provide better services to people, lower their prices, or pay workers more. They are just a means to keep money from flowing out of the hands of the C-Suite and investor classes.
If software engineering becomes a solved problem then fine, we probably shouldn't continue to get paid huge salaries to write it anymore, but please stop acting like this is a better future for any of us normal folks.
what is the trick to achieve 100k context? They can't just use 100k wide transformer layer, it is cost prohibitive, right?..
`ollama run codellama:7b-instruct`
https://ollama.ai/blog/run-code-llama-locally
More models uploaded as we speak:
my questions were asking how to construct an indexam for postgres in c, how to write an r-tree in javascript, and how to write a binary tree in javascript.
GGUF seems not optimised yet, since quantizing with a newer version of llama.cpp supporting the format fails on the same hardware. I expect that to be fixed shortly.
For inference, I understand that the hardware requirements will be identical as before.
You can, I suppose, contract your code so that it’s context free and uses less tokens, but that makes it more confusing for humans and language models.
Taken to the extreme, you can see obviously with one letter functions and variables like i, j, k the model will be able to infer literally nothing and, thus, produce arbitrary nonsense.
Clearly the solution is to do what we already do to manage complexity which is to decompose large tasks into smaller black box modules with an api where the (large number of tokens) implementation is hidden and not known or relevant to using it.
If you give an LLM a function signature and good description, maybe some usage examples, it doesn’t need the implementation to use it.
Terseness decreases the ability of LLMs to process code; it doesn’t solve context length, and even at best it doesn’t scale.
100k tokens is plenty.
You don’t need to do anything like that.
>100k tokens is plenty.
The context window can be really helpful, in case there is a release of a new library and the user wants to generate code targeting the API of the library. When the training date stops at August 2023, any library released after that date is not known to the engine.
My general opinion in regards to context window, is that 1 trillion tokens context window still may not be enough for all use cases.
You can also probably skip including standard library headers since those will be well known to the LLM through its fine tuning.
Either way, consider that a typical preprocessed C++ file would push against the 100K limit even with some optimizations. You will definitely want to have some middleware doing additional refinement before presenting that file to the LLM.
The method I used to choose which files to feed GPT-4 was embeddings-based. I got an embedding for each file and then an embedding from the instruction + some simple processing to pick the files more likely to be relevant. It isn't perfect but good enough most of the time in medium-sized codebases (not very large ones).
The one thing I started doing because of how I implemented this is make files shorter and move stuff into different files. Having a 1k+ LOC file is prohibitive because it eats up all the context window (although with 100k context window maybe less so). I think it's a good idea to keep files short anyways.
There's other smarter things that can be done (like embed and pass individual functions/classes instead of entire files) so I have no doubt someone will build something smarter soon. You'll likely not have to change your coding patterns at all to make use of AI.
You start a project by defining the task. Then as you iterate, you can add new information to the prompt. But it can be also partially automated - the model can have a view of the file structure, classes, routes, assets and latest errors.
I was really hoping that the one year update of Codex would be that - a LLM that can see deep into the project, not just code, but runtime execution, debugging, inspecting and monitoring. Something that can iterate like autoGPT. Unfortunately it didn't improve much and has weird conflicts with the native code completion in VSCode, you get freezes or doubled brackets.
I think hypergraph is an overlooked concept in programming language theory
A little understated, this is state of the art. GPT-4 only offers 32k.
With Cody you can create embeddings for your entire repo, so Cody will have much greater context about your code base and the problems you're trying to solve.
Disclaimer: I just joined Sourcegraph a few weeks ago.
With that said, they have recently changed the architecture, with the local install required, and I have not managed (yet) to get it working with NixOS. Once I have some more time, I will try again - it looks like there will be some hoops to go through. https://nixos.org/manual/nixpkgs/stable/#ssec-pkgs-appimageT...
Kudos to the Source Graph team, Source Graph's original product was nicely thought out and ahead of it's time. Nice to see how the original product gave a nice basis for building out Cody.
Last I heard they are in beta and don't work very well (even on the examples page: the "add types" brush is too strict, since `a` and `b` are checked for `null`, and the "fix simple bug" is a typo)
An instruct model means that you can ask it to do what you want, including asking it to give you refactoring ideas from the code you will give it.
This works well for me except the 15B+ don't run fast enough on a 4090 - hopefully exllama supports non-llama models, or maybe it'll support CodeLLaMa already I'm not sure.
For general chat testing/usage this works pretty well with lots of options - https://github.com/oobabooga/text-generation-webui/
I assume quantized models will run a lot better. TheBloke already seems like he's on it.
From my experience with github copilot and GPT4 - developers are NOT going anywhere anytime soon. You'll certainly be faster though.
However it is hard to tell how that might pan out. Can such an ML/AI do all the parts of the job effectively? A lot of non-coding skill bleed into the coder's job. For example talking to people who need an input to the task and finding out what they are really asking for, and beyond that, what the best solution is that solves the underlying problem of what they ask for, while meeting nonfunctional requirements such as performance, reliability, code complexity, and is a good fit for the business.
On the other hand eventually the end users of a lot of services might be bots. You are more likely to have a pricing.json than a pricing.html page, and bots discover the services they need from searches, negotiate deals, read contracts and sue each other etc.
Once the programming job (which is really a "technical problem solver" job) is replaced either it will just be same-but-different (like how most programmers use high level languages not C) or we have invented AGI that will take many other jobs.
In which case the "job" aspect of it is almost moot. Since we will be living in post-scarcity and you would need to figure out the "power" aspect and what it means to even be sentient/human.
That's why we're so excited to see these extraordinary advances that I personally didn't think I'd see in my lifetime.
The fear is legitimate and I respect the opinions of those who oppose these advances because they have children to provide for and have worked a lifetime to get where they are. But at least in my case, the curiosity and excitement to see what will happen is far greater than my little personal garden. Damn, we are living what we used to read in the most entertaining sci-fi literature!
(And that's not to say that I don't see the risks in all of this... in fact, I think there will be consequences far more serious than just "losing a job," but I could be wrong)
It would likely impact a far larger swath of the engineering / design industry.
The days of getting paid well for making crud are numbered (which most of us do, even in the most interesting problem spaces).
# need a front end boilerplate that hits a backend with the following end points. REST api for movies catalogue, and a corresponding ui. Oh, unit tests please. Go with a responsive design and also make a React Native version (matter of fact provision it to my iPhone). Decide between Heroku or AWS, set up deploy with git hooks.
# scrape IMDb for initial population of the db
# I think a Reddit like comment system would be good to add, so add it. No upvote/downvote though
# handle user login with google/fb/email
# also make an admin page to manage all this
I guess the technical product designer will be the new unicorn.
This view is critically flawed in two major ways:
1) AI is not anywhere near being able to replace the majority of what developers do on a product team. We are decades away from a PM at Facebook being able to type "make a twitter clone that uses instagram login and can scale to 1 billion" and have an AI just do it.
2) Programming and product work is not zero sum. The more we can do means the more product we can make. It means more products can be made overall. After the loom came out, we simply made more clothes than ever before and in the process created a ton of jobs. We are not at some peak software point where we've completely saturated all humanity's need for software or profitable software and thus tools that increase efficiency don't put us out of work.
And frankly, if we develop the kind of general AI that accept a query like "make a facebook competitor capable of scaling to 10 billion" and simply do it, inventing whatever languages, frameworks, hardware, processors, patterns, methodologies, global datacenters handling global politics and law, etc, etc necessary to accomplish such a task, then so be it. I welcome the overlords!
But the answer to that is to deal with concentration of capital, not to eschew better tools.
It will simply move to a higher level of abstraction.
Remind me, how many programmers today are writing in assembly?
Unfortunately, I don't believe there's a way to stop (or even slow down) this train. We can't defeat it, so the only logical answer is to join it.
It's the classical issue with progress removing jobs. In today's world, since mostly everyone (aside from the capitalists themselves) relies on jobs to survive, barring a complete switch from capitalism (which will not happen in our lifetimes), we're fucked.
Next best thing we can do is to try and democratize it enough so that not only the rich have access to it.
1. As a species decide to never build another LLM, ever.
2. Change the path of society from the unequal, capitalist one it’s taken the last 2-300 years.
3. Give up
I know which I believe in :). Do you disagree?
We could structure things so that LLM, and the generalised AIs to come, benefit the whole of society ... but we know that those with the power to make that happen want only to widen the poverty gap.
<pastes question and constraints from leetcode>
"solution begins with <insert default solution template from leetcode>".
Copy solution from gpt, paste in leetcode, run, submit.
"faster"
Repeat.
Repeat.
Next question.
[1] https://github.com/cosmojg/nvim-magic
If you're using it commercially you're probably deploying it on a server where you're not limited by the 24GB and you can just run llama 2 70b.
The majority of people who want to run it locally on 24GB either want roleplay (so non commercial) or code (you have codellama)
Matthew Berman has a tutorial on YT showing how to use TheBloke's docker containers on runpod. Sam Witteveen has done videos on together and replicate, they both offer cloud-hosted LLM inference as a service.
To state my position more clearly: I don’t think an AI could comment code from scratch very well - how would it know all the decisions made, business logic considerations, historical conventions, micro-industry standards, etc?
A good benchmark I was told once was “if a human expert couldn’t do it, an AI probably can’t either”. And commenting code I didn’t write would certainly test the bounds of my abilities
Because codellama is llama based it may just work possibly?
https://twitter.com/swyx/status/1671272883379908608 https://twitter.com/soumithchintala/status/16712671501017210... https://twitter.com/MParakhin/status/1670666605427298304
it makes no sense that you're being crucified.
Probably because there's significant overlap in the Venn diagram of people with years experience who professionally develop products that generate $millions in wealth/value, and people who would fail that interview.Or we have worked with junior developers who have really grown and flourished under our care, who would never have gotten that chance with such insane Draconian judgements.
It's such an obvious "GOTCHA!!" setting someone up for failure.
The way it's framed is very cringy because it signals that they don't care in their interviews about determining how objectively effective a software developer is.
If I have to write the loop above, I am assuming it is the Fizzbuzz equivalent of your company to show that I know how to write a while loop. I am not thinking about reducing the search space because I am writing the code semi-unconscious and frankly just want to get to the next question.
[1] https://github.com/jmorganca/ollama/blob/main/docs/api.md
curl -X POST http://localhost:11434/api/generate -d '{
"model": "codellama",
"prompt":"write a python script to add two numbers"
}'Plus, if these AIs are enough to change everything, that kinda implies that we've developed flexible, reliable AGI systems. In such a world, everything changes - maybe the calculus of The Powerful Few vs. The Oppressed Masses changes in too! It might even change in our favor, if we're terribly lucky...
[1]: https://arxiv.org/abs/2108.12409 (charts on page two if you’re skimming)
what does and container orchestration architect do ? Something like this cluster should use envoy and Prometheus. The new clusters rate isn’t usually high enough for the stack to change.
Real question I love these non conventional (swe, sre, pm, manager ) roles in tech
Claude 1.2 instant is as fast as 3.5, follows instructions at a quality closer to 4, and has a 100k context window. Hard to compete with that with an open source model right now.
> Anthropic is rolling out Claude slowly and incrementally, as we work to ensure the safety and scalability of it, in alignment with our company values.
> We're working with select partners to roll out Claude in their products. If you're interested in becoming one of those partners, we are accepting applications. Keep in mind that, due to the overwhelming interest we've received so far, we may take a while to reply.
No thanks, I'd much rather not wait months to see if my app deserves their oh-so-limited attention, or "aligns with the values" of a company taking $400m from Sam Bankman-Fried.
To be more charitable to your underlying point, Claude 2 is free to chat with via Anthropic's website, Poe, or Slack, and the GPT-4 API is open to use. If you're building a prototype or just need a chatbot, these do have better results and dev experience, at least for now. But I don't think picking on your Claude API example is unfair. These companies could randomly refuse your prompts via some opaque "moderation API" (that all GPT fine-tuning data goes through!), train on your company's proprietary data, spy on your most intimate questions, or just not find you worth the trouble and cut you off, at any time. THAT is why open source beats proprietary hands down: My device, my data, my weights, my own business.
Awkward tie-ins between SBF and value systems (?) have no effect on practical usage.
A theoretical concern they might train on my API data after saying they won't doesn't either. Amazon might be training on everything not bolted down in S3, not worth wasting brain power on that.
The moderation API isn't some magic gotcha, it's documented. They don't want to deal with people fine tuning for porn. Maybe you have some ideological disagreement on that but it's not of practical relevance when trying to write code.
At the end of the day you're not alone in these opinions. But some of us prefer pragmatism over hype. Until someone catches OpenAI or Anthropic trying to kill their golden goose by breaking their GDPR, HIPPA, and SOC2 certifications, I'm going to take delivered value over theoretical harm.
I do have interest in local models (say running on a fixed list of document structures)
It suggests that synthetic training could be the future in increasing capability of smaller models (and perhaps bigger ones too). AI will train AI.
The differences being it's not just training on unvalidated synthetic data and this specific method (per the unnatural questions paper) results in increased instruction diversity which confers some added advantage and I'm assuming explains the performance gain over the also synthetic self-instruct code?
I may be misunderstanding but this seems more nuanced than just training on synthetically AI-generated code and is more validating of synthetic instructions (i.e. low resource setting) rather than synthetic code (i.e. high resource setting).
Now of course, the terms are not the law (so don't govern the use of the generated data by any third party), they are an agreement between two parties. If you did click "agree" then that's a binding agreement and there could be legal/contractual repercussions (some of which are outlined in the terms).
Speaking of lmdeploy, it doesn't seem to be widely known but it also supports quantization with AWQ[2] which appears to be superior to the more widely used GPTQ.
The serving backend is Nvidia Triton Inference Server. Not only is Triton extremely fast and efficient, they have a custom TurboMind backend for Triton. With this lmdeploy delivers the best performance I've seen[3].
On my development workstation with an RTX 4090, llama2-chat-13b, AWQ int4, and KV cache int8:
8 concurrent sessions (batch 1): 580 tokens/s
1 concurrent session (batch 1): 105 tokens/s
This is out of the box, I haven't spent any time further optimizing it.
[0] - https://github.com/InternLM/lmdeploy
[1] - https://github.com/InternLM/lmdeploy/blob/main/docs/en/kv_in...
[2] - https://github.com/InternLM/lmdeploy/tree/main#quantization
[3] - https://github.com/InternLM/lmdeploy/tree/main#performance
There is always an option to go down the list of available quantizations notch by notch until you find the largest model that works. llama.cpp offers a lot of flexibility in that regard.
just not the super obvious one that demonstrates extremely basic understanding of what a prime number is
yes, we expect professional software developers to have basic maths skills
"what is a prime number" is taught to 7 year olds, it's not vector calculus
what else would you consider to be an unreasonable thing for an employer to require?
reading and writing skills of a typical 7 year old?
+1 is not a good idea since ~half of all numbers are effectively non-prime simply by being even numbers.
You can double the speed by using +2 without using any fancy tricks, just changing a single character.
Common approach is to use square roots, this reduces the runtime. Recommend checking out project euler if you like solving hard math-code-o(n)-puzzles.
Unnatural language used davinci-002 although that was a while ago, they only say "similarly" in this paper and don't specify what they used. I can't see a reason why they wouldn't be releasing it if the unnatural prompts were generated by LLaMA2-family.
In any case, replicating this training seems trivial and very cheap compute-wise for anyone who wanted to do it.
Yes it makes sense (in the GPT code) that you'd only go up to i * i ... although looking at pythonic while: statements is just gross to me in this context, it would feel a lot more readable to say, e.g. in PHP:
for ($i=2;$i<sqrt($n);) { $i+=($i==2 ? 1 : 2); //although the first one should just be outside the loop }
This should be fixed now! To update you'll have to run:
ollama pull codellama:7b-instructHave had some good success with the instruct model:
codellama:7b-instruct
using <language> write me a <thing>
it's managed to spit out code, rather than "write a traversal function".
Any plans to add the 13B quant models?
ollama pull codellama:7b-instructI was up and running from clone/build-from-scratch/download in ~5m.
It's running on my M1.. it knows WebGL JS APIs better than I do, makes a passable attempt at VT100 ascii art, and well, should read more about Wolfram Automata, but does seem to know Game of Life!
Thank you so much!
No one who has been using any model for just the past 30 minutes would say that it has "pretty much replaced Google/SO" for them, unless they were being facetious.
The instruct version of code llama could certainly be run locally without trouble, and that’s interesting too, but I keep wanting to test out a local CoPilot alternative that uses these nice, new completion models.
Try actually measuring basic knowledge with competency at programming before thinking your opinion is better than measured data. Peer reviewed research finds similar results [1].
And yes, we tested all this carefully before enacting it. Interviews cost time and money, so giving 100% on every candidate despite quick signals is a waste of time and money that would be better spent on other candidates. If you want the best outcome then you allocate scarce resources based on expected returns, not on unfounded beliefs.
[1] https://helloworld.raspberrypi.org/articles/hw12-language-sk...
But it's definitely not just programmers. And it will take time.
Society needs to adjust. Stopping progress would not be a solution and is not possible.
However, hopefully we can pause before we create digital animals with hyperspeed reasoning and typical animal instincts like self-preservation. Researchers like LeCun are already moving on from things like LLMs and working on approaches that really imitate animal cognition (like humans) and will eventually blow all existing techniques out of the water.
The path that we are on seems to make humans obsolete within three generations or so.
So the long term concern is not jobs, but for humans to lose control of the planet in less than a century.
On the way there we might be able to manage a new golden age -- a crescendo for human civilization.
Humans don’t become obsolete, we become bored. This tech will make us bored. When humans get too bored and need shit to stir up, we’ll start a war. Take US and China, global prosperity is not enough right? We need to stoke the flames of war over Taiwan.
In the next 300 years we’ll wipe out most of each other in some ridiculous war, and then rebuild.
"Global prosperity" might be true in a very long-term historical sense, but it's misleading to apply it to the immediate situation.
Taiwan is not just a talking point. Control over Taiwan is critical for maintaining hegemony. When that is no longer assured, there will likely be a bloody battle before China is given the free reign that it desires.
WWIII is likely to fully break out within the next 3-30 years. We don't really have the facilities to imagine what 300 years from now will look like, but it will likely be posthuman.
We’re going to move from debugging some crap the last developer wrote to debugging an order of magnitude more code the last developer generated.
It’s going to be wonderful for job prospects really.
Iran supplies Russia with drones. I can promise you Russia will help Iran enrich their uranium. They are both pariah states, what do they have to lose? Nuclear Iran, here enters Israel.
Everyone’s arming up, there’s a gun fight coming.
-1, 0, and 1 are no good.
Two divisors for positive integer X: 1, X
I recently demonstrated GPT-4 via by having it explain phases of matter metaphorically with animals. Though impressive, it points to loose abstraction being more in range than firm abstraction. Let’s get that model a python interpreter and see how far we can take this party. Until then, I’m going to stick to explaining concepts.
————- included here for fun
Alright! Let's dive into the wild world of matter using some animal friends as examples:
1. *Solids* - Think of solids like a herd of elephants standing closely together. The elephants are packed in tight, barely moving, just maybe swaying a bit. They're sturdy and strong, just like solid things in our world. This is because the particles in a solid don't move around much; they just vibrate in place.
2. *Liquids* - Imagine a school of fish swimming in a pond. They're free to move around, weaving in and out, but they still stay pretty close to each other. They aren't packed as tightly as our elephant friends, but they aren't completely free either. This is like liquids: the particles are close, but they can move around and flow, just like water in a glass.
3. *Gases* - Now, picture a flock of birds soaring in the sky, free to fly in all directions. These birds aren't sticking close to one another; they're spread out, enjoying the vast space of the sky. In gases, the particles are like these birds, very spread out and moving all over the place.
4. *Plasma* - Think of plasma like dragons (I know they're mythical, but bear with me). These dragons breathe fire, and that fire is so hot and energetic that it can change the way things behave. Plasma is like that – it's gas that's become so hot that the particles are super energized and can even glow, like neon lights.
5. *Bose-Einstein Condensate (BEC)* - This one's a bit trickier, but imagine penguins in Antarctica. They huddle closely together to keep warm. BEC is like the coldest group of particles ever, where they start acting in strange, uniform ways, almost like one giant super-particle. It's like all the penguins moving together as one.
So, next time you think of matter, just remember our animal friends: the sturdy elephants, the flowing fish, the free-flying birds, the fiery dragons, and the huddled penguins!
Again, 1 fits, because it has two divisors: 1, and 1. You never said X != 1, nor does the definition upthread.
This isn't a silly gotcha, these things matter in math. For instance, when solving a quadratic equation, allowing for the solutions to be equal lets you avoid special-casing your understanding - instead of memorizing when the equation has zero, one or two solutions, you just learn it has two or zero (real) solutions, and the two solutions are allowed to be equal.
It's perfectly reasonable understanding. Inequality isn't a natural implicit assumption. E.g. if I say you have two variables:
int a;
int b;
I doubt you'd be insisting that `a != b` at all times.An element to set relationship is that element A is or is not in set B. So, if the set of divisors only contains 1, there is only one divisor. If 1 and 1 made two divisors, 1 and 1 and 1 and… would make infinite divisors, rendering the concept of counting divisors (i.e., the cardinal number of the set of divisors) meaningless.
M is a divisor of N if it is a number that divides N without a remainder. While divisors can be negative, they are conventionally limited to non-negative integers in primality and factoring.
If you’d wanted to dig in on negative vs. positive divisors, that quickly provides an avenue for clearer formality, but piling on 1 and saying it’s not a silly gotcha is pretty fruitless. And please don’t bother to say “you didn’t say it has only two divisors”, as that would, again, be a silly argument.
So wind back and really formalize the definition if you want: A prime number is a natural number with only two divisors in the set of natural numbers, 1 and itself.
While set theory is axiomatic, it’s not practical for me (or anyone else) to explain conventional foundations to avoid someone feeling like they need to wiggle out of a prior bad argument.
Just say “ah, okay” or stop replying and move on. Feel free to read up in Wikipedia or any other texts (or ping me privately if you’d like to discuss further), but this thread isn’t looking like it’s going to meaningfully contribute to the broader discussion. Accordingly, I’ll leave it here unless something meaningful comes up.
Second, basic math still that you never or rarely use or with very large time between usage might get rusty. You may understand the concept but not find the optimal solution. The way you are responding here shows quite a lot about how you are short sighted by instant-failing someone with a single question instead of trying to asses the whole person as much as you can. On you side, you are wasting opportunity to have a great person that could be a key player in your team by bringing other set of skill on the table.
it's part of the curriculum for children of this age where I grew up (I did check)
> The way you are responding here shows quite a lot about how you are short sighted by instant-failing someone with a single question instead of trying to asses the whole person as much as you can. On you side, you are wasting opportunity to have a great person that could be a key player in your team by bringing other set of skill on the table.
it may also be the case that I have more in depth knowledge about the roles that I've interviewed candidates for
most recently: hiring people to work for quants
not instantly knowing that even numbers (other than 2) are not prime is a very strong signal
A few do. And in 20 years you're reallyreally going to want to hire them.
Look, if you think this sort of thing allows you to identify great candidates, good for you. But in my experience, not only is this kind of practice stupid on its face, but it leads to engineering orgs packed with people who are good at memorizing trivia but terrible at solving real problems.
> yes, we expect professional software developers to have basic maths skills
Skill != knowledge. "What is a prime number" can be looked up and understood by any competent programmer in <5 minutes.
> "what is a prime number" is taught to 7 year olds, it's not vector calculus
Then it's reasonable to expect that an interviewee would be able to learn it as well, given the same resources. It does not however follow that an interviewee would inherently have that knowledge, just because 7 year olds are taught it.
Bottom line is, you're making too many assumptions about complete strangers.