This could easily have gone into a "only orgs with billions can play" direction and nobody even trying in the learned helplessness sense. Instead we're ending up in a hybrid "ok maybe can't quite train from scratch but can still tinker" space which is a lot more healthy
If they want to double down on that then I applaud them
Facebook isn't doing this out of the kindness of their hearts, it just makes business sense for them to do it. AI seems to be in a smiling curve situation right now, where only Nvidia on the hardware side and the consumer facing products using AI as a feature are making money. The companies training the models and trying to sell the API seem like they'll have a hard time not being replaceable commodities
I think it's more of a case of "commoditize your competitors".
Personally, I don't care if they had a plan to profit from this move as long as it brings value for the public, it's a win-win action.
Their quality of open source has always been good for the things I’ve used.
I am genuinely excited about the positive impacts LLMs and their future derivatives can have in computing. We can now, for the first time, truly “program” a computer in natural language. It’s even intelligent enough to “fill in the gaps” using general intelligence. Just don’t rely on it for any niche topics without teaching it a thing or two, or you’ll get bull crap back.
Initially it was behind a consent form for research purposes. i.e. Just give some basic deets and you get access to the weights under a non-commercial license. FB shut that down after it got lots of attention.
That obvious got copy pasted onto a torrent & grew legs from there. And FB hassled some people DCMA takedowns too but seemed pretty half hearted & was too late at that stage.
[Sidequest: I believe the repo they used to distribute access also had a magnetic link in it too at one point which further confused the narrative but not 100% sure on the precise details on this]
Point is at no stage was this 100% behind closed doors and someone leaked it as you & I would understand the word in the "stolen" sense.
Don’t forget what your dealing with here: The faceless, amoral, infinitely ravenous, maw of the most efficient personal data succubus in history. Make no mistake this is something like “goodwill capture” instead of “regulatory capture.”
I see no way that this diminishes Meta’s power in any way - arguably it strengthens it by making it easier to choose a Meta architecture instead of creating a competing FOSS architecture.
So arguably all this does is raise the FOSS bar technically further entrench Meta - AND with the most important thing, having thousands of developers prime their data architectures for Meta models to eventually serve from a Meta account.
And once it’s widespread enough to lock you in, those commercial terms, whoops they changed!
It kicks Google, a competitor for advertising dollars. Some people feel Google is under existential threat from AI (trawling through search results full of spam and ads sucks when an AI can just tell you the answer), by allowing people to build various forms of Google competitor without doing the hard lifting of creating the LLM.
It kicks OpenAI, too, though Microsoft is perhaps less obviously a competitor to Meta right now. But Microsoft has OpenAI, loads of money, loads of engineers, and lots of product lines, so they might leverage OpenAI's tech lead to _become_ more of a competitor to Meta. It's less of a risk to Meta if OpenAI doesn't have a tech lead anymore.
I already hate looking at Instagram discover feed, but if it’s going to turn into MJ discord. I’m really done with it.
Of course, variety is good and I hope the UAE group continues to establish themselves as a credible model provider.
I'm surprised this opinion still persists. Royalty-based licenses have been used by major game engines [0] for a long time, so that's not unprecedented.
[0] https://www.unrealengine.com/en-US/license, https://www.cryengine.com/support/view/licensing
What the UAE did with Falcon was inspiring, well done! This is something that more governments could do.
https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-4...
If you are not at the finish line or in open source, you cannot win the race to zero.
Let's hope it will play for AI is it played for operating systems with Linux.
If by it you mean LLaMA 1, then I don't think per the license one can use it for commercial projects. So, it isn't really libre. That said, all indications are that LLaMA 2 would be fully FLOSS.
If you meant OPT, it isn't libre either: https://github.com/facebookresearch/metaseq/blob/main/projec...
So probably best to not mix the waters.
Will it be able to run on AMD's new MI300X? I keep hoping that will put a "chip" in Nvidia's dominance since it seems more efficient.
[0]: https://www.theinformation.com/articles/meta-wants-companies...
Though I wouldn't mind crowdfunding one either and have it truly open and free though.
I just don’t see how Meta could possibly turn into a success in the B2B software business. They are a great advertising company, but they’ve never been successful in their other ventures…
- Platforms where users can interact with AI agents
- Extensive user and content data that can be used to fine tune large foundation models
From what I understand they already benefited by the OSS work on quantization and they see themselves as well positioned to benefit from a world where there's a bunch of specialized AI models/assistants.
In any event, bringing shitty practices from another industry into ML doesn't seem worth supporting.
Why is this still an issue for you?
All players made licensing blunders in the past and the fine folks behind falcon seem to have learned from their mistake by releasing their weights under Apache 2.0, a well understood and respected permissive license.
Many major open source projects started as proprietary software that eventually went opensource. Why hold a grudge against this project specifically? Yes, they made a mistake and learned from it. What more do you want?
People relicense software all the time and the lawyers are usually fine with it (especially when the terms are more favorable). What am I missing here?
Not necessarily.
RWKV, for example, is a different architecture that wasn't based on Facebook's weights whatsoever. I don't know where BlinkDL (the author) got the training data, but they seem to have done everything mostly independently otherwise.
https://github.com/BlinkDL/RWKV-LM
disclaimer: I've been doing a lot of work lately on an implementation of CPU inference for this model, so I'm obviously somewhat biased since this is the model I have the most experience in.
(The reason why it degrades is because a single internal state is updated in-place per token, and the currently models have only been trained with up to 8192 tokens of context, so once you start getting double past that or so, the state starts to diverge from "sanity", with no known way to correct this. And then priming a new instance of the model with 8192 tokens or so of the new context takes a really long time because you can't compute the next token of an RNN until you also have the previous one!)
With some fine-tuning (which, even that is ... still out of reach for most people unfortunately, but I digress) it can be turned into a pretty good chat model, generate story completions, generate boilerplate code etc. and the base model is reasonably okay at most of these things already.
I think it's definitely a competitor in some areas, though I don't remember if there have already been benchmarks putting it up against the other models. I do know that it's better than the majority of other open-source models, including transformer-based ones, but this is probably more the fault of training data than architecture.
rwkv.cpp in particular depends on a project that would not have existed in its current form without LLaMA, even though the project itself isn't LLaMA-specific. However there are enough other implementations of CPU inference (at least two?) that I think RWKV could still exist even if LLaMA had never.
Zuckerberg said it was during his Lex Fridman interview.
They wanted to distribute it. But they couldn't, politically. So it "leaked."
Perhaps you meant that they were nervous about companies using it commercially and either bringing them bad press or making money off their work? That's clearly why they only released it for researchers.
No legal issues per se. Hence the political qualifier. See: https://www.menendez.senate.gov/imo/media/doc/letter_to_meta...