Open source LLM with 32k Context Length(blog.abacus.ai) |
Open source LLM with 32k Context Length(blog.abacus.ai) |
It looks like the first OSS 13B Llama 2 based 32k token context model[2], but the first OSS and commercially usable 32k token context model was a 7B Llama 2 based model[3] from Together AI, who beat them by about a week[4].
[1]: https://twitter.com/bindureddy/status/1694126931174977906
[2]: https://huggingface.co/abacusai/Giraffe-v2-13b-32k
[3]: https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instr...
[4]: https://twitter.com/togethercompute/status/16925744231638470...
32k context length sounds nice of course, and it seems to be common to call the just fine-tuned models like that. I think it is more of a marketing thing and we really should distinguish between the context length of the pre-trained model and the fine-tuned model, with the latter being the default meaning of context length.
Edit: No mention of it being open source in the linked article. Maybe the title here is just wrong? @dang
This hasn’t been proven in court, but it seems the most likely outcome.
https://ai.meta.com/llama/license/ https://ai.meta.com/llama/use-policy/
There are other LLMs that don't have such restrictions, and publish their training data.
Also "don't do naughty things", is there a chart for that? How is that defined, is it part of the non-existing license?
i.e. all other things being equal is a 8k model better at math than a 32k model
OP is about a 32k sugar-coated Llama 2, so I would expect it be similar in performance to other Llama 2 derivatives.
Sorry for the random question, I've just been curious about this for a while and unable to find out and you seem knowledgeable about these extended models.
It’ll take awhile for this mistake to be reconciled in court though.
What do you mean? https://github.com/facebookresearch/llama/blob/main/LICENSE
Edit: this is in fact fairly interesting discussion because LLM is a new breed of digital products. Meta's terms are practical for limiting the usage for commercial applications, and they are designed to protect the general population. It's not the worn out "protecting us from ourselves", its actually preventing Llama users from harming non-users. Yes, we can be jaded and say it's about protecting the brand and dissociating from bad actors. My point is that it's hard to apply usual arguments for open source and freedom of computing, when you're defending rights of people who want to harm other people.
I think the context length is not a parameter of the model in the sense that it is set to a particular value but it is just the size of the chunks you feed in during training. The model will only ever be able to learn relationships within that length. In that sense it is an implicit property of the model.
At inference time you can well query the model with chunks larger than what it was trained with and it will answer without a blink. You just cannot expect the answers to contain meaningful information beyond the length the model was once trained with.