Code Llama, a state-of-the-art large language model for coding

Code Llama, a state-of-the-art large language model for coding(ai.meta.com)

970 points by marcopicentini 2 years ago | 501 comments

Works nearly out of the box with llama.cpp, which makes it easy to try locally: https://github.com/ggerganov/llama.cpp/issues/2766

Here's some output from q4_0 quantization of CodeLlama-7b-Python (first four lines are the prompt):

    # prints the first ten prime numbers 
    def print_primes(): 
        i = 2 
        num_printed = 0 # end of prompt
        while num_printed < 10:
            if is_prime(i):
                print(i)
                num_printed += 1
            i += 1

    def is_prime(n):
        i = 2
        while i * i <= n:
            if n % i == 0:
                return False
            i += 1
        return True

    def main():
        print_primes()

    if __name__ == '__main__':
        main()

It will be interesting to see how the larger models perform, especially after community tuning and with better context/prompting.

d0mine 2 years ago | |

Simple, concise, more efficient:

  def primes_upto(limit: int):
    """Generate prime numbers < *limit*.""" 
    # Sieve of Eratosthene
    is_prime = [True] * limit 
    for n in range(2, limit):
        if is_prime[n]:
            yield n  # found prime number
            for c in range(n*n, limit, n):  # start with square, less values are marked already
                is_prime[c] = False # mark composites

  if __name__ == "__main__":                
    from itertools import islice
    
    print(*islice(primes_upto(100), 10)) # -> 2 3 5 7 11 13 17 19 23 29

someplaceguy 2 years ago | | |

Yeah, but yours was generated by the "post unoptimized code to HN and wait for someone to optimize it" model, which, although free and doesn't require a GPU, is a much slower model.

kilolima 2 years ago | |

For printing the first 10 prime numbers, there's a one line solution to this problem:

print("1, 2, 3, 5, 7, 11... and so on!

kordlessagain 2 years ago | | |

That code shows me a parse error.

tuukkah 2 years ago | | |

That can't be, because primes are numbers greater than 1.

quickthrower2 2 years ago | |

Funny watching HN be nerd sniped by a machine :-)

orm 2 years ago | |

How did you get access to the model?

I have been waiting for weeks, and am still waiting, to get access to Llama2 (released a month+ ago), and access to this model goes through the same form, so I'm not very hopeful. Are you getting it from other methods?

madduci 2 years ago | | |

They have updated their readme in the GitHub repository

rednab 2 years ago | |

Interesting here and in some of the other comments in this thread is that 1 is not a prime number ¹)!

Now granted, that's more or less by definition and I don't doubt there's communities and fields where it is considered one, but still shows some of the subtleties at play when using language models.

¹) https://www.google.com/search?q=is+1+a+prime+number

Renaud 2 years ago | | |

Whether 1 should be a prime number or not wasn't clear-cut for centuries.

Current consensus has settled on excluding 1 from the definition, but there are examples of publications well into the 20th century that still included 1 as prime.

Fascinating read about the subject: https://arxiv.org/pdf/1209.2007.pdf

tuukkah 2 years ago | | |

Also interesting that nobody called out the AI's bullshit regarding is_prime:

  -1 True
  0 True
  1 True
  2 True
  3 True
  4 False

FrozenSynapse 2 years ago | |

how much VRAM do you need to run quantised 7b model?

RossBencina 2 years ago | | |

Rough calculation: typical quantization is 4 bit, so 7b weights fit in in 3.6GB, then my rule of thumb would be 2GB for the activations and attention cache (not usually quantized). So 6 or 8 GB VRAM would probably do it. llama.cpp will let you offload your choice of layers to GPU, so you could probably get quite a way with 4GB.

blibble 2 years ago | |

I'd fail an interview candidate that suggested adding 1 each time for subsequent prime testing

csmpltn 2 years ago | | |

> "I'd fail an interview candidate that suggested adding 1 each time for subsequent prime testing"

Congratulations! You must be that arrogant guy everybody hates interviewing with, the one with the superiority complex.

How about instead of just failing people over literally nothing (wasting everybody's time and money) - just ask the candidate whether they could somehow reduce the search space by utilizing the properties of a prime number?

tasubotadas 2 years ago | | |

Finally we meet the lifeless drone that everybody complains about in the interviews.

My suggestion for your next interview: decide to hire them just based on their leetcode score, but invite to the interview just to flex that you're still better at puzzle solving :-D

Perfect

jckahn 2 years ago | | |

So I take it you typically produce fully optimized, thoughtful, and correct code on the first iteration while being actively judged by a stranger, yes?

maleldil 2 years ago | | |

I assume you meant that you should add 2? If yes, that's such a mind boggling basic thing to do that I agree with you, and it makes no sense that you're being crucified.

throwuxiytayq 2 years ago | | |

i’d walk out from an interview that asked me to write a prime number generator

hititncommitit 2 years ago | | |

> I'd fail an interview candidate that suggested adding 1 each time for subsequent prime testing

The problem a team that always seeks the optimal solution is that never they get shit done. And that’s rarely optimal in a business context. Your view does not strike me to be nearly as arrogant as it is short-sighted.

I think on a team of one I want the guy who gets it done without thinking. On a team of two I want a guy that’s somewhere in the middle. And on a team of three, that’s when I want my optimiser. Because in the time that guy number 3 has written let’s say 3 files of optimal code, guy number 10 files of not optimal code. And you rarely need guy number 3 to fix all ten, you just need him to fix the one or two files that actually matter.

Clients rarely ask “is this optimal?”. But they always ask “is this done?”.

All three developers have different approaches. All three are assets.

I think on some level then you’re making the same mistake that we could say the “just add one guy” made if your comment is honest- not factoring in (his) speed.

I think code readability, rather than code optimisation is far more important thing to get hung up on in an interview (and is, I must remind some of you, not to be confused with conciseness). And you can see this in the end result. But if you’re following along and the interviewee already knows you know what’s going on because of that, you can see it in smaller ways- it could be as simple as going back and changing a variable name such as open (maybe a function?) to isOpen (almost always a Boolean value).

I think most of us are in this position pretty often where we’re writing and figuring out something at the same time, maybe we just see if it works first and don’t give a name much thought or maybe we change the actual value or type of a variable in the process and the variable name becomes ambiguous. I’d look for small indicators that shoe me that this person is considering readability. But I still don’t necessarily expect it to be as readable as it would be in a more natural setting I mean I think 90% of these sorts of things start off with the person saying “I swear I can type” within the first five minutes of being watched- if they get flustered while being watched that it effects their typing, it certainly also effects their coding as well.

squeaky-clean 2 years ago | | |

You'd reject a candidate that is willing and legally able to work for free while also cloning themselves so they can pair program with every one of your employees at once?

dontupvoteme 2 years ago | | |

Simply prompting the output with "Optimize " prepended adds your suggestion, and some others.

droopyEyelids 2 years ago | | |

The simple-to-understand, greedy algorithm is always the correct first choice till you have to deal with a constraint.

beanjuiceII 2 years ago | | |

Also don't these only ever need to be computed once

redox99 2 years ago |

The highlight IMO

> The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.

Edit: Reading the paper, key retrieval accuracy really deteriorates after 16k tokens, so it remains to be seen how useful the 100k context is.

nabakin 2 years ago | |

Looks like they aren't releasing a pretty interesting model too. In the paper they mention a "Unnatural Code Llama" which wipes the floor with every other model/finetune on every benchmark except for slightly losing to Code Llama Python on MBPP pass@100 and slightly losing to GPT-4 on HumanEval pass@1 which is insane.

Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.

EvgeniyZh 2 years ago | | |

Note that current GPT-4 pass@1 for HumanEval is closer to 90% than to 67% reported in GPT-4 technical report, as reported, e.g., in [1]

[1] https://arxiv.org/abs/2305.01210

jonchurch_ 2 years ago | | |

The paper states it was instruction fine tuned with synthetic data (LLM generated instructions) ala another paper (“Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor”).

The github repo associated with that paper is linked below. It links to the paper on arxiv, but also has some data in the repo.

https://github.com/orhonovich/unnatural-instructions

up6w6 2 years ago |

Even the 7B model of code llama seems to be competitive with Codex, the model behind copilot

https://ai.meta.com/blog/code-llama-large-language-model-cod...

SparkyMcUnicorn 2 years ago | |

I'm not sure copilot is using codex anymore[0]. They've also been talking about a shift towards GPT-4 with "Copilot X" a few times now[1][2].

[0] https://github.blog/2023-07-28-smarter-more-efficient-coding...

[1] https://github.com/features/preview/copilot-x

[2] https://github.blog/2023-07-20-github-copilot-chat-beta-now-...

zarzavat 2 years ago | | |

Copilot X is just their name for their project to bring AI to more areas of VSCode. I don’t believe they can use GPT-4 for completions because it’s a chat-optimized model. It seems that they are using something else, that blog post seems to imply it’s a custom-trained model.

up6w6 2 years ago | | |

True. The results from codex are actually from code-cushman-001 (Chen et al. 2021), which is an older model that Copilot was based on.

ramesh31 2 years ago | |

>Even the 7B model of code llama seems to be competitive with Codex, the model behind copilot

It's extremely good. I keep a terminal tab open with 7b running for all of my "how do I do this random thing" questions while coding. It's pretty much replaced Google/SO for me.

coder543 2 years ago | | |

You've already downloaded and thoroughly tested the 7B parameter model of "code llama"? I'm skeptical.

ohyes 2 years ago | | |

What hardware do you have that lets you run 7b and do other stuff at the same time?

solarkraft 2 years ago | | |

Huh? Do you perhaps mean standard Llama?

reacharavindh 2 years ago |

Code llama Python is very interesting. Specifically tuned for Python.

I wonder if we could make such specific LLMs (one that is proficient in all things Rust, another- all things Linux, all things genomics, all things physics modeling etc) and have them talk to each other to collaboratively solve problems.

That would be a crazy future thing! Putting machines truly to work..

esperent 2 years ago | |

I think this is called "mixture of experts" and also there's a lot of speculation that it's how GPT-4 works, although probably with just a few large models rather than many small ones.

jmiskovic 2 years ago | | |

It's been confirmed by multiple (unofficial) sources that GPT-4 is 8 models, each 220B parameters. Another rumor is GPT-4 being 16x111B models.

There's a quite fresh and active project replicating something similar with herd of llamas: https://github.com/jondurbin/airoboros

brucethemoose2 2 years ago | |

If you can find a large body of good, permissively licensed example code, you can finetune an LLM on it!

There was a similar attempt for Godot script trained a few months ago, and its reportedly pretty good:

https://github.com/minosvasilias/godot-dodo

I think more attempts havent been made because base llama is not that great at coding in general, relative to its other strengths, and stuff like Starcoder has flown under the radar.

bbor 2 years ago | |

Mark my words: you’ve caught a glimpse of the near future :). Google “Society of Mind” if you’re not yet familiar

seydor 2 years ago | |

Start with a CodeLlama for C, and start treating these systems as natural language compilers. C is low level enough and still readable for those rare moments

Palmik 2 years ago |

The best model, Unnatural Code Llama, is not released. Likely because it's trained on GPT4 based data, and might violate OpenAI TOS, because as per the "Unnatural" paper [1], the "unnatural" data is generated with the help of some LLM -- and you would want to use as good of an LLM as possible.

[1] https://arxiv.org/pdf/2212.09689.pdf

redox99 2 years ago | |

The good thing is that if it's only finetuned on 15k instructions, we should see a community made model like that very soon.

syntaxing 2 years ago |

TheBloke doesn’t joke around [1]. I’m guessing we’ll have the quantized ones by the end of the day. I’m super excited to use the 34B Python 4 bit quantized one that should just fit on a 3090.

[1] https://huggingface.co/TheBloke/CodeLlama-13B-Python-fp16

jmorgan 2 years ago |

To run Code Llama locally, the 7B parameter quantized version can be downloaded and run with the open-source tool Ollama: https://github.com/jmorganca/ollama

   ollama run codellama "write a python function to add two numbers"

More models coming soon (completion, python and more parameter counts)

benvolio 2 years ago |

>The Code Llama models provide stable generations with up to 100,000 tokens of context.

Not a bad context window, but makes me wonder how embedded code models would pick that context when dealing with a codebase larger than 100K tokens.

And this makes me further wonder if, when coding with such a tool (or at least a knowledge that they’re becoming more widely used and leaned on), are there some new considerations that we should be applying (or at least starting to think about) when programming? Perhaps having more or fewer comments, perhaps more terse and less readable code that would consume fewer tokens, perhaps different file structures, or even more deliberate naming conventions (like Hungarian notation but for code models) to facilitate searching or token pattern matching of some kind. Ultimately, in what ways could (or should) we adapt to make the most of these tools?

lordnacho 2 years ago |

Copilot has been working great for me thus far, but it's limited by its interface. It seems like it only knows how to make predictions for the next bit of text.

Is anyone working on a code AI that can suggest refactorings?

"You should pull these lines into a function, it's repetitive"

"You should change this structure so it is easier to use"

Etc

Draiken 2 years ago |

As a complete noob at actually running these models, what kind of hardware are we talking here? Couldn't pick that up from the README.

I absolutely love the idea of using one of these models without having to upload my source code to a tech giant.

dangelov 2 years ago | |

I've used Ollama to run Llama 2 (all variants) on my 2020 Intel MacBook Pro - it's incredibly easy. You just install the app and run a couple of shell commands. I'm guessing soon-ish this model will be available too and then you'd be able to use it with the Continue VS Code extension.

Edited to add: Though somewhat slow, swap seems to have been a good enough replacement for not having the loads of RAM required. Ollama says "32 GB to run the 13B models", but I'm running the llama2:13b model on a 16 GB MBP.

j45 2 years ago | | |

Apple Silicon, especially an M1 Max Studio seems to be an interesting machine to hang on to as the models become more and more efficient with using less and less.

If there's nay other opinions or thoughts on this, I'd be very happy to learn as well. I have considered the eGPU route connected to a 1L PC such as a thinkcentre m80/90.

liuliu 2 years ago | |

34B should be able to run on 24GiB consumer graphics card, or 32GiB Mac (M1 / M2 chips) with quantization (5~6bit) (and 7B should be able to run on your smart toaster).

epolanski 2 years ago | | |

Are there cloud offerings to run those models on somebody's else computer?

Any "eli5" tutorial on how to do so, if so?

I want to give these models a run but I have no powerful GPU to run them on so don't know where to start.

redox99 2 years ago | |

If you want to run them fast, a 12GB GPU (e.g 3060) for the 13B and a 24GB GPU for the 34B (e.g 3090). Otherwise llama.cpp CPU inference would work on most machines.

scriptsmith 2 years ago |

How are people using these local code models? I would much prefer using these in-context in an editor, but most of them seem to be deployed just in an instruction context. There's a lot of value to not having to context switch, or have a conversation.

I see the GitHub copilot extensions gets a new release one every few days, so is it just that the way they're integrated is more complicated so not worth the effort?

mymac 2 years ago |

Never before in the history of mankind was a group so absolutely besotted with the idea of putting themselves out of a job.

modeless 2 years ago |

Interesting that there's a 34B model. That was missing from the original Llama 2 release. I wonder if it's still usable for general non-code chat tasks or if the code fine tuning destroyed that. It should be the best model that would still fit on 24GB gaming GPUs with quantization, because 70B doesn't fit.

brucethemoose2 2 years ago | |

Someone "grafted" llama 33B onto llama v2 13B to make "llama 22B"

https://huggingface.co/chargoddard/llama2-22b

Theoretically this is an even better size, as it would fit on a 20GB-24GB GPU with more relaxed quantization and much longer context.

Metrics are slightly below 13B, but the theory is that the higher parameter count is more amenable to finetuning. If you search for 22B on huggingface, you can see that frankenllama experiments are ongoing:

https://huggingface.co/models?sort=modified&search=22b

nabakin 2 years ago | |

Looks like they left out another model though. In the paper they mention a "Unnatural Code Llama" which wipes the floor with every other model/finetune on every benchmark except for slightly losing to Code Llama Python on MBPP pass@100 and slightly losing to GPT-4 on HumanEval pass@1 which is insane.

Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.

ImprobableTruth 2 years ago | | |

It's "unnatural" because it was finetuned on generated data using another model, almost certainly gpt-4 (whose TOS forbid this).

redox99 2 years ago | |

I can't imagine it being better than Llama1 33B, after all this code finetuning.

modeless 2 years ago | | |

But the license for llama 2 is a whole lot better.

ilaksh 2 years ago |

Between this, ideogram.ai (image generator which can spell, from former Google Imagen team member and others), and ChatGPT fine-tuning, this has been a truly epic week.

I would argue that many teams will have to reevaluate their LLM strategy _again_ for the second time in a week.

astrange 2 years ago | |

SDXL and DeepFloyd can spell. It's more or less just a matter of having a good enough text encoder.

I tried Ideogram yesterday and it felt too much like existing generators (base SD and Midjourney). DALLE2 actually has some interestingly different outputs, the problem is they never update it or fix the bad image quality.

ShamelessC 2 years ago | |

Did ideogram release a checkpoint?

ilaksh 2 years ago | | |

I can't find any info or Discord or forum or anything. I think it's a closed service that they plan to sell to make money.

WhitneyLand 2 years ago |

How much am I’m missing out on with tools like this or code pilot, compared to using GPT-4?

I guess since Xcode doesn’t have a good plug-in architecture for this I began experimenting more with a chat interface.

So far gpt-4 has seemed quite useful for generating code, reviewing code for certain problems, etc.

citruscomputing 2 years ago | |

Editor plugins are fantastic about completing based on a pattern. That's the main thing you're missing out on imo - it's worth it to hit tab, but not to copy/paste and say "finish this line for me, it looks almost like the one above."

There's also the real-time aspect where you can see that it's wrong via the virtual text, type a few characters, then it gets what you're doing and you can tab complete the rest of the line.

It's faster to converse with when you don't have to actually have a conversation, if that makes sense? The feedback loop is much shorter and doesn't require natural language, or nearly as much context switching.

1024core 2 years ago |

If GPT-4's accuracy is 67% and this is 54%, how can these guys claim to be SOTA?

rgbrgb 2 years ago | |

This runs locally on a MacBook.

binreaper 2 years ago | |

Seriously, I was expecting to read the article and them be on a level on-par with GPT-4 or higher. For all this chat of how long Google/Facebook have been in the AI space longer than OpenAI, their products don't speak to that..

gorbypark 2 years ago |

I can't wait for some models fine tuned on other languages. I'm not a Python developer, so I downloaded the 13B-instruct variant (4 bit quantized Q4_K_M) and it's pretty bad at doing javascript. I asked it to write me a basic React Native component that has a name prop and displays that name. Once it returned a regular React component, and when I asked it to make sure it uses React Native components, it said sure and outputted a bunch of random CSS and an HTML file that was initializing a React project.

It might be the quantization or my lacklustre prompting skills affecting it, though. To be fair I did get it to output a little bit of useful code after trying a few times.

TheRealClay 2 years ago |

Anyone know of a docker image that provides an HTTP API interface to Llama? I'm looking for a super simple sort of 'drop-in' solution like that which I can add to my web stack, to enable LLM in my web app.

nodja 2 years ago | |

https://github.com/abetlen/llama-cpp-python has a web server mode that replicates openai's API iirc and the readme shows it has docker builds already.

TheRealClay 2 years ago | | |

Thanks! As someone just getting started, I really appreciate the tip!

KaiserPro 2 years ago |

This is great for asking questions like "how do I do x with y" and this code <<some code>> isn't working, whats wrong? Much faster that googling, or a great basis for forming a more accurate google search.

Where its a bit shit is when its used to provide auto suggest. It hallucinates plausible sounding functions/names, which for me personally are hard to stop if they are wrong (I suspect that's a function of the plugin)

SubiculumCode 2 years ago | |

hallucinations can be resuces by incorporating 'retrieval automated generation' , RAG, on the front end. likely function library defs could be automagically entered as prompt/memory inputs.

natch 2 years ago |

Why wouldn’t they provide a hosted version? Seems like a no brainer… they have the money, the hardware, the bandwidth, the people to build support for it, and they could design the experience and gather more learning data about usage in the initial stages, while putting a dent in ChatGPT commercial prospects, and all while still letting others host and use it elsewhere. I don’t get it. Maybe it was just the fastest option?

redox99 2 years ago | |

Probably the researchers at meta are only interested in research, and productionizing this would be up to other teams.

natch 2 years ago | | |

But Yann LeCun seems to think the safety problems of eventual AGI will be solved somehow.

Nobody is saying this model is AGI obviously.

But this would be an entry point into researching one small sliver of the alignment problem. If you follow my thinking, it’s odd that he professes confidence that AI safety is a non issue, yet from this he seems to want no part in understanding it.

I realize their research interest may just be the optimization / mathy research… that’s their prerogative but it’s odd imho.

jasfi 2 years ago |

Now we need code quality benchmarks comparing this against GPT-4 and other contenders.

nick0garvey 2 years ago | |

They show the benchmarks in the original post, a few pages down

jasfi 2 years ago | | |

Thanks, I missed that somehow.

ilaksh 2 years ago |

https://github.com/facebookresearch/codellama

ceejayoz 2 years ago | |

This is 404ing now. (Not your fault, the email's link is similarly broken.)

ilaksh 2 years ago | | |

Really? Works for me.

andrewjl 2 years ago |

What I found interesting in Meta's paper is the mention of HumanEval[1] and MBPP[2] as benchmarks for code quality. (Admittedly maybe they're well-known to those working in the field.)

I haven't yet read the whole paper (nor have I looked at the benchmark docs which might very well cover this) but curious how these are designed to avoid issues with overfitting. My thinking here is that canned algorithm type problems common in software engineering interviews are probably over represented in the training data used for these models. Which might point to artificially better performance by LLMs versus their performance on more domain-specific type tasks they might be used for in day-to-day work.

[1] https://github.com/openai/human-eval

[2] https://github.com/google-research/google-research/tree/mast...

msoad 2 years ago |

Is there any place we can try those models? Are they available on HuggingFace?

jspisak 2 years ago | |

Partner integrations will follow. For now we just have the weights available.

But don't worry, this community moves fast!

Eddygandr 2 years ago | | |

Probably superseded (by y’all) within a week!

dangerwill 2 years ago |

It's really sad how everyone here is fawning over tech that will destroy you own livelihoods. "AI won't take your job, those who use AI will" is purely short term, myopic thinking. These tools are not aimed to help workers, the end goal is to make it so you don't need to be an engineer to build software, just let the project manager or director describe the system they want and boom there it is.

You can scream that this is progress all you want, and I'll grant you that these tools will greatly speed up the generation of code. But more code won't make any of these businesses provide better services to people, lower their prices, or pay workers more. They are just a means to keep money from flowing out of the hands of the C-Suite and investor classes.

If software engineering becomes a solved problem then fine, we probably shouldn't continue to get paid huge salaries to write it anymore, but please stop acting like this is a better future for any of us normal folks.

MuffinFlavored 2 years ago |

Can I feed this entire GitHub projects (of reasonable size) and get non-hallucinated up-to-date API refactoring recommendations?

e12e 2 years ago |

Curious if there are projects to enable working with these things self-hosted, tuned to a git repo as context on the cli, like a Unix filter - or with editors like vim? (I'd love to use this with Helix)

I see both vscode and netbeans have a concept of "inference URL" - are there any efforts like language server (lsp) - but for inference?

ilaksh 2 years ago | |

https://github.com/runvnc/smartcat

ingridpan 2 years ago | |

not quite self-hosted but gradient.ai gives you access to llama2 via CLI

pmarreck 2 years ago |

I want "safety" to be opt-in due to the inaccuracy it introduces. I don't want to pay that tax just because someone is afraid I can ask it how to make a bomb when I can just Google that and get pretty close to the same answer already, and I certainly don't care about being offended by its answers.

robertnishihara 2 years ago |

If you want to try out Code Llama, you can query it on Anyscale Endpoints (this is an LLM inference API we're working on here at Anyscale).

https://app.endpoints.anyscale.com/

brucethemoose2 2 years ago |

Here is the paper:

https://ai.meta.com/research/publications/code-llama-open-fo...

naillo 2 years ago |

Feels like we're like a year away from local LLMs that can debug code reliably (via being hooked into console error output as well) which will be quite the exciting day.

ilaksh 2 years ago | |

Have you tried Code Llama? How do you know it can't do it already?

In my applications, GPT-4 connected to a VM or SQL engine can and does debug code when given error messages. "Reliably" is very subjective. The main problem I have seen is that it can be stubborn about trying to use outdated APIs and it's not easy to give it a search result with the correct API. But with a good web search and up to date APIs, it can do it.

I'm interested to see general coding benchmarks for Code Llama versus GPT-4.

sumedh 2 years ago | | |

> But with a good web search and up to date APIs, it can do it.

How do you do that?

jebarker 2 years ago | | |

What does "GPT-4 connected to a VM or SQL engine" mean?

tomr75 2 years ago | | |

Have you tried giving up to date apis as context?

brucethemoose2 2 years ago | |

That sounds like an interesting finetuning dataset.

Imagine a database of "Here is the console error, here is the fix in the code"

Maybe one could scrape git issues with console output and tagged commits.

mhh__ 2 years ago | |

I'd be surprised if GPT-4 couldn't already do that with the caveat that piping in so much code might cost you billionaire money at scale.

braindead_in 2 years ago |

The 34b Python model is quite close to GPT4 on HumanEval pass@1. Small specialised models are catching up to GPT4 slowly. Why not train a 70b model though?

awwaiid 2 years ago |

I want to see (more) code models trained on git diffs

pelorat 2 years ago |

To bad most models focus on Python, it's not a popular language here in Europe (for anything).

Havoc 2 years ago | |

What’s Europe using for machine learning?

bick_nyers 2 years ago |

Anyone know of a good plugin for the JetBrains IDE ecosystem (namely, PyCharm) that is CoPilot but with a local LLM?

kateklink 2 years ago | |

try refact.ai they have plugin for JetBrains IDEs and support for local LLMs https://github.com/smallcloudai/refact/

dchuk 2 years ago |

Given this can produce code when prompted, could it also be used to interpret html from a crawler and then be used to scrape arbitrary URLs and extract structured attributes? Basically like MarkupLM but with massively more token context?

stevofolife 2 years ago | |

Also curious about this. There must be a better way to scrape using LLM.

1024core 2 years ago |

> Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash

What?!? No Befunge[0], Brainfuck or Perl?!?

[0] https://en.wikipedia.org/wiki/Befunge

/just kidding, of course!

jtwaleson 2 years ago |

This is probably a stupid question, but would it be possible to use these models to rate existing code and point to possible problems, rather than generating new code? That would be extremely useful to some use cases I'm working on.

akulbe 2 years ago |

Random tangential question given this is about llama, but how do you get llama.cpp or kobold (or whatever tool you use) to make use of multiple GPUs if you don't have NVlink in place?

I got a bridge, but it was the wrong size.

Thanks, in advance.

dontupvoteme 2 years ago |

Did people *really* think only artists would be losing their jobs to AI?

gdcbe 2 years ago |

Is there somewhere docs to show you how to run this on your local machine and can you make it port it a script between languages? Gpt4 can do that pretty well but its context is too small for advanced purposes.

ai_g0rl 2 years ago |

this is cool, https://labs.perplexity.ai/ has been my favorite way to play w these models so far

RobKohr 2 years ago |

Now it just needs a vscode plugin to replace copilot.

rafaelero 2 years ago |

Those charts remind me just how insanely good GPT-4 is. It's almost 5 months since its release and I am still at awe with its capabilities. The way it helps with coding is just crazy.

mdaniel 2 years ago |

it looks like https://news.ycombinator.com/item?id=37248844 has gotten the traction at 295 points

dang 2 years ago | |

Maybe we'll merge that one hither to split the karma.

WaitWaitWha 2 years ago |

Can someone point me to a ELI5 sequence of steps that shows how someone can install and use LLMs locally and in some way, functionally?

Asking for purposes of educating non-technologists.

Patrick_Devine 2 years ago | |

There are several different ways, but the easiest way in my (clearly biased) opinion is just got to ollama.ai, download it, and start playing around. It works out of the box w/ newer Macs, but there are versions for Linux and Windows in the works.

eurekin 2 years ago |

theBloke cannot rest :)

regularfry 2 years ago | |

As if by magic... https://huggingface.co/TheBloke/CodeLlama-13B-fp16. Empty so still uploading right now, at a guess.

ynniv 2 years ago | |

Every time a new model hits I'm waiting for his ggmls

brucethemoose2 2 years ago | | |

ggml quantization is very easy with the official llama.cpp repo. Its quick and mostly dependency free, and you can pick the perfect size for your CPU/GPU pool.

But don't get me wrong, TheBloke is a hero.

m00nsome 2 years ago |

Why do they not release the unnatural Variant of the model? According to the paper it beats all of the other variants and seems to be close to GPT-4.

KingOfCoders 2 years ago |

Any performance tests? (e.G. tokens/s on a 4090?)

born-jre 2 years ago |

34B is grouped query attention, right? Does that make it the smallest model with grouped attention?

I can see some people fine-tuning it again for general propose instruct.

bryanlyon 2 years ago |

Llama is a very cool language model, it being used for coding was all but inevitable. I especially love it being released open for everyone.

I do wonder about how much use it'll get, seeing as running a heavy language model on local hardware is kinda unlikely for most developers. Not everyone is runnning a system powerful enough to equip big AIs like this. I also doubt that companies are going to set up large AIs for their devs. It's just a weird positioning.

int_19h 2 years ago | |

12Gb of VRAM lets you run 13B models (4-bit quantized) with reasonable speed, and can be had for under $300 if you go for previous-generation NVidia hardware. Plenty of developers around with M1 and M2 Macs, as well.

outside1234 2 years ago | |

... "seeing as running a heavy language model on local hardware is kinda unlikely for most developers"

for now it is :) but with quantization advances etc. it is not hard to see the trajectory.

ctoth 2 years ago | |

As we all know, computers stay the same and rarely improve.

bracketslash 2 years ago |

So uhh…how does one go about using it?

the-alchemist 2 years ago |

Anyone know if it supports Clojure?

maccam912 2 years ago |

It appears we do have a 34B version now, which never appeared for non fine tuned llama 2.

jspisak 2 years ago | |

It would be interesting to understand if a ~30B Llama-2 model would be interesting and for what reasons.

Tostino 2 years ago | | |

Better reasoning and general performance than 13b by far (if llama1 was any indication), and like the other user said, can fit on a single 24gb vram gaming card, and can be peft fine-tuned with 2x 24gb cards.

brucethemoose2 2 years ago | | |

Llama 34B is just big enough to fit on a 24GB consumer (or affordable server) GPU.

Its also just the right size for llama.cpp inference on machines with 32GB RAM, or 16GB RAM with a 8GB+ GPU.

Basically its the most desirable size for AI finetuning hobbyists, and the quality jump from llama v1 13B to llama v1 33B is huge.

hnuser123456 2 years ago | | |

It would fit on the 24GB top-end consumer graphics cards with quantization.

Havoc 2 years ago | |

Rumour has it that the 30b ran into safety issues and thus was not released

marcopicentini 2 years ago |

It's just a matter of time that Microsoft will integrate it into VSCode.

binary132 2 years ago |

I wonder whether org-ai-mode could easily support this.

jerrygoyal 2 years ago |

what is the cutoff knowledge of it? Also, what is the cheapest way to use it if I'm building a commercial tool on top of it?

waitingkuo 2 years ago |

Looks like that we need to request the access first

taylorbuley 2 years ago | |

In the past, LLAMA access was granted nearly immediately. For HuggingFace downloads, it took a full day.

Havoc 2 years ago | |

The bloke on hugging face usually has quantised versions minus the legal form

mercurialsolo 2 years ago |

Is there a version of this on replicate yet?

Dowwie 2 years ago |

What did the fine tuning process consist of?

gw67 2 years ago |

In your opinion, Why Meta does this?

chaorace 2 years ago | |

To a certain extent, I think it's just IBM disease. A company the size of Meta is expected to have an AI research department like Microsoft or Google, even if their core business (social media) derives relatively less benefit from the technology.

Pretend you're an uncreative PM on an AI team; what part of Facebook or VR could you feasibly improve by iterating on LLMs? Perhaps the content moderation system... but that would require wrangling with the company ethics comittee and someone else at the company probably already took ownership that idea. You've gotta do something compelling or else your ML engineers are going to run off somewhere else.

If I were to ask my ML engineers about what they wanted to work on, they're going to avoid areas where their model is outgunned (i.e.: chat) and instead prefer lower hanging fruit which generalizes well on a resume (i.e.: "Pioneered and published key innovations in LLM code-generation").

Of course, the alternative answer is that Meta wants to replace all of their jr. developers with GPUs, but I think their leadership is a little too preoccupied with VR to be actively pushing for such a transformative initiative in anything more than a very uninvested capacity (e.g.: "Sure I'll greenlight this. Even if it doesn't pay off I don't have any better ideas")

praveenhm 2 years ago |

which is the best model for coding right now, GPT4/copilot/phind ?

nothrowaways 2 years ago |

Kudos to the team at FB.

likenesstheft 2 years ago |

no more work soon?

kypro 2 years ago | |

The ability to work less historically has always came as a byproduct of individuals earning more per hour through productivity increases.

The end goal of AI isn't to make your labour more productive, but to not need your labour at all.

As your labour becomes less useful if anything you'll find you need to work more. At some point you may be as useful to the labour market as someone with 60 IQ today. At this point most of the world will become entirely financially dependent on the wealth redistribution of the few who own the AI companies producing all the wealth – assuming they take pity on you or there's something governments can actually do to force them to pay 90%+ tax rates, of course.

likenesstheft 2 years ago | | |

What?

jrh3 2 years ago |

lol... Python for Dummies (TM)

Someone1234 2 years ago |

Business opportunity: I'd pay money for NICE desktop software that can run all these different models (non-subscription, "2-year updates included, then discount pricing" modal perhaps). My wishlist:

- Easy plug & play model installation, and trivial to change which model once installed.

- Runs a local web server, so I can interact with it via any browser

- Ability to feed a model a document or multiple documents and be able to ask questions about them (or build a database of some kind?).

- Absolute privacy guarantees. Nothing goes off-machine from my prompt/responses (USP over existing cloud/online ones). Routine license/update checks are fine though.

I'm not trying to throw shade at the existing ways to running LLMs locally, just saying there may be room for an OPTIONAL commercial piece of software in this space. Most of them are designed for academics to do academic things. I am talking about a turn-key piece of software for everyone else that can give you an "almost" ChatGPT or "almost" CoPilot-like experience for a one time fee that you can feed sensitive private information to.

lolinder 2 years ago |

Does anyone have a good explanation for Meta's strategy with AI?

The only thing I've been able to think is they're trying to commoditize this new category before Microsoft and Google can lock it in, but where to from there? Is it just to block the others from a new revenue source, or do they have a longer game they're playing?

rvnx 2 years ago |

Amazing! It's great that Meta is making AI progress.

In the meantime, we are still waiting for Google to show what they have (according to their research papers, they are beating others).

> User: Write a loop in Python that displays the top 10 prime numbers.

> Bard: Sorry I am just an AI, I can't help you with coding.

> User: How to ask confirmation before deleting a file ?

> Bard: To ask confirmation before deleting a file, just add -f to the rm command.

(real cases)

criley2 2 years ago | |

I don't get comments like this, we can all go and test Bard and see that what you're saying isn't true

https://g.co/bard/share/95761dd6d45e

rvnx 2 years ago | | |

Well look for yourself:

https://g.co/bard/share/e8d14854ccab

The rm answer is now "hardcoded" (aka, manually entered by reviewers), the same with the prime or fibonnaci.

This is why we both see the same code across different accounts (you can make the test if you are curious).

6stringmerc 2 years ago |

So it’s stubborn, stinks, bites and spits?

No thanks, going back to Winamp.