Open source LLM with 32k Context Length

Open source LLM with 32k Context Length(blog.abacus.ai)

115 points by shubham_saboo 2 years ago | 28 comments

alsodumb 2 years ago |

Abacus always seemed to me like a 'we got a lot of VC money with inflated claims now we gotta show we do everything' company. I don't really understand what they do, they seem to offer everything but I don't see anyone talking about using their offerings in the real-world. Ever. The only time I see mentions of the company are when I am targeted with ads or promoted posts of the founder.

woadwarrior01 2 years ago | |

Their CEO made a post[1] on Twitter claiming to have invented "the world's first commercially usable 32K long-context open-source LLM", which IMO is pure hyperbole.

It looks like the first OSS 13B Llama 2 based 32k token context model[2], but the first OSS and commercially usable 32k token context model was a 7B Llama 2 based model[3] from Together AI, who beat them by about a week[4].

[1]: https://twitter.com/bindureddy/status/1694126931174977906

[2]: https://huggingface.co/abacusai/Giraffe-v2-13b-32k

[3]: https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instr...

[4]: https://twitter.com/togethercompute/status/16925744231638470...

weinzierl 2 years ago |

This is just another fine-tuned LLaMA and Llama 2, like there are already some. I doubt that this will give seriously meaningful results for long context inference.

32k context length sounds nice of course, and it seems to be common to call the just fine-tuned models like that. I think it is more of a marketing thing and we really should distinguish between the context length of the pre-trained model and the fine-tuned model, with the latter being the default meaning of context length.

kordlessagain 2 years ago | |

These 800 watt speakers are great. So loud.

smcleod 2 years ago | | |

“Better than lossless”

supermatt 2 years ago |

It seems this is built on LLAMA. Did meta change the license to make it open source now? It still seems to be showing otherwise in the repo.

Edit: No mention of it being open source in the linked article. Maybe the title here is just wrong? @dang

sillysaurusx 2 years ago | |

It’s not possible to have a license over an ML model trained on other peoples’ works, since such models are uncopyrightable. They’re more like a phone book; a collection of facts trained by an entirely un-creative process. https://news.ycombinator.com/item?id=36691050

This hasn’t been proven in court, but it seems the most likely outcome.

nemoniac 2 years ago | | |

Not saying that this applies to LLMs but if you describe them as "a collection of facts [collected and] trained by an entirely un-creative process" then it begins to sound like one could argue for Database Right.

https://en.wikipedia.org/wiki/Database_right

supermatt 2 years ago | | |

And yet meta are specifying a license, implying they do hold the copyright.

inciampati 2 years ago | | |

Keep working on it! A good court case and a landmark decision about this could change the landscape for the market for these models.

ImprobableTruth 2 years ago | |

Llama 2 is open source-ish. Weights are freely available and can be commercially used, but only if you have less than 700m users and agree to some "don't do naughty things" terms.

jmiskovic 2 years ago | | |

Nope. It's limited by 700m monthly-active users at the time Llama2 was released, a weird catch clause for a handful of Meta competitor companies. The license doesn't satisfy OSS requirements, but it is quite reasonable.

https://ai.meta.com/llama/license/ https://ai.meta.com/llama/use-policy/

supermatt 2 years ago | | |

If the JSON license isn't considered open, due to requiring that "The Software shall be used for Good, not Evil.", then I don't see how tacking an additional financial threshold onto it makes it more open. I don't think meta even released the training dataset, so you cant even replicate it (should you have the funds to do so).

There are other LLMs that don't have such restrictions, and publish their training data.

satvikpendem 2 years ago | | |

There is no open source-ish. It either protects the fundamental freedoms or it...doesn't.

keyle 2 years ago | | |

"open source-ish" sounds like the perfect way to massively profitable future litigations.

Also "don't do naughty things", is there a chart for that? How is that defined, is it part of the non-existing license?

Havoc 2 years ago | |

From memory llama 2 license does allow tuned models with suitable credit & license inclusion. The restricted using it to train other models though (a bit like people use gpt4 to generate question/answer pairs to train their models)

vekker 2 years ago |

It's probably too new for anyone to have integrated this into text-generation-webui / Gradio? I've been looking for a large context LLM (self-hosted or not) for a project, and as a European I unfortunately don't have access to Anthropic's Claude API yet.

qeternity 2 years ago | |

It's just Llama 2 w/ rotary encoding fine tuned to 32k. It should work fine.

syntaxing 2 years ago | |

How high context do you need? There’s a couple 16K models out there now. Some people have their own 32K ones too but the quality vary. It’s worth trying them on huggingface. The easiest way is to track TheBloke’s work to see any new models that come out.

Havoc 2 years ago |

Does anyone know if larger context lengths are inherent worse at other task?

i.e. all other things being equal is a 8k model better at math than a 32k model

syntaxing 2 years ago | |

There’s a couple models on huggingface that uses NTK/linear RoPE that you can play with. Vicuna and WizardLM both have a 16K context model. The biggest issue is that if you go to really high context, it sometimes does these weird repetitions. But to be fair, I only have tried the quantized models and 13B (highest I can run locally). Not sure if the repetition are an artifact of the rope or quantization or both.

weinzierl 2 years ago | |

They are more resource (time and memory) intensive in training and inference, that is their disadvantage. For a fair comparison you would have to compare a 8k to a 32k pre-trained model with otherwise similar hyper-parameters.

OP is about a 32k sugar-coated Llama 2, so I would expect it be similar in performance to other Llama 2 derivatives.

semi 2 years ago | | |

Is the increased resource usage inherent to the model or does it only happen when using the extra context? Like if your workflow currently fits in a 2k model would an 8k model be objectively worse and only worth using once you've filled the context up of a smaller model? Or would it be worth always using an 8k context model and just knowing it will get slower and more resource hungry as your context grows?

Sorry for the random question, I've just been curious about this for a while and unable to find out and you seem knowledgeable about these extended models.