"Finally, our work raises ethical and legal questions, including whether the open-source community should continue to advance progress by “stealing” what OpenAI and other companies have done, as well as what legal countermeasures companies can take to protect and license intellectual property."
Really???
I'm going to need verifiable proof this wasn't written by chatGPT as propaganda.
Scribd has lots of pdfs of books that are copyrighted. The Washington Post article mentions there are several other places it downloaded and scraped pdfs of copyrighted textbooks, etc
… with compression.
It's not good news for the open LLM ecosystem.
No it's not, llama would be cheaper and likely faster if you ran it on the same scale, actually there've been a few calcs done, that running llama 65b if you're at 100% usage is cheaper than 3.5turbo per token. Also comparing them for accuracy isn't fair comparison, one is a foundational model, one is an instruct tuned model. Perhaps compare llama 65b with gpt3.
But one great thing about open source LLMs is that you can specialize them in various tasks with affordable LORA training, enough to easily beat GPT4 in a specific niche.
However, when conducting more targeted automatic evaluations, we found that the imitation models close little to none of the large gap between LLaMA and ChatGPT. In particular, we demonstrate that imitation models improve on evaluation tasks that are heavily supported in the imitation training data. On the other hand, the models do not improve (or even decline in accuracy) on evaluation datasets for which there is little support. For example, training on 100k ChatGPT outputs from broad-coverage user inputs provides no benefits to Natural Questions accuracy (e.g., Figure 1, center), but training exclusively on ChatGPT responses for Natural-Questions-like queries drastically improves task accuracy.
Just because this might not be the way to replicate the performance of ChatGPT across all tasks, it seems to work quite well on whichever tasks are in the imitation learning. That is still a big win.
Later on this also works for factual correctness. (leaving aside the argument whether this is the right approach for factuality)
For example, training on 100k ChatGPT outputs from broad-coverage user inputs provides no benefits to Natural Questions accuracy (e.g., Figure 1, center), but training exclusively on ChatGPT responses for Natural-Questions-like queries drastically improves task accuracy.
A better title, knowing what we now, might be "To outperform GPT4, do more than imitating"
Particularly this statement seems relevant: "We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT."
There's still room for closing the gap, but ultimately it's only going to be a pale imitation when the underlying model's representations aren't as useful.
this is largely a pot calling the kettle black. The LLM game is not about not mimicking somebody else. It is about not being caught doing so :-)
Brilliant observation captain obvious.
Or to put it differently in SWE the LLM seem very bad at building things. What they are good at, however, is helping us build things. I’m not sure I’ll ever need to write JSDoc again on anything that isn’t too sensitive to share. Which is a significant efficiency and quality improvement on the work I do. I think of them as Swagger generators, but instead of being for an OpenAPI standard they are for everything. I imagine they’ll become very good at automating testing as another example, which will again be a further improvement on the work a single developer does.
In terms management might understand. I think you can view LLMs similarity to the way we’ve seen frameworks and tooling reduce the team size needed to build an application significantly over the previous 30 years. If you wanted to build a web-portal for asset management in 1999 you’d need a large team to do what a single developer and a good PO can do today. Maybe we won’t see the same reduction manpower, but instead an increase in quality.
Recently, John Schulman explained the issue with behavior cloning and it's a very typical ML problem.[1] Basically: what are we training the model to do? The model updates after finetuning in a holistic manner, based on the sum total of its content and capability. Suppose GPT-4 can correctly answer to many requests because it knows correct answers, in the sense that it has something isomorphic to an internal knowledge graph and tools for querying it, and that graph contains sufficient data for its tools to derive an answer at inference. RLHF reinforces this behavior by constraining the distribution of outputs (essentially, steering the model away from applying inappropriate tools for respective inputs, e.g. employing fantasy-narrative or bad-yahoo-answers cognitive routines when asked something that looks like a straightforward factual question).
Now suppose you teach LLaMA-13B to imitate those responses by SFTing it on a dump of successful GPT-4 conversations. But LLaMA doesn't have internals that would have enabled it to find the same answers; so on the object level it shallowly memorizes specific items of the post-training dataset, and on the meta-level it learns the stylistic flourish of a high-powered model. But it starts to hallucinate confident nonsense whenever you step out of the training distribution, because it doesn't actually learn to query its own knowledge graph. A little anthropomorphism won't hurt: you create an incapable impostor this way, a wannabe nerd, a character who is used to guessing the teacher's password and being praised, instead of understanding the subject, and keeps raising its hand whenever a question is asked, but is painfully clueless.
Indeed, the early and cheap success of behavior cloning was a massive red flag unto itself. There's no way all the compute and data that went into training GPT-3/3.5/4 tier models can be substituted with gently demonstrating the attitude vector. If we had models that were markedly less capable but comparably honest, we would have reasons for hope that this line terminates in a genuine open-source peer competitor; instead, we have total fraud.
It is a nontrivial task to have a model generalize epistemic honesty and not a lower-order behavior like clamping up and kowtowing or bullshitting from external examples; train it to say "I don't know" whenever it actually does not, but only then.
There are clever approaches here, but they're not such a low-hanging fruit as what passes for open-source right now.
This definitely is a smoking gun. Keeping the crown jewels for themselves.
I think a lot of people believe exactly that. To take one example from the "We Have No Moat" essay:
"It doesn’t take long before the cumulative effect of all of these fine-tunings overcomes starting off at a size disadvantage. Indeed, in terms of engineer-hours, the pace of improvement from these models vastly outstrips what we can do with our largest variants, and the best are already largely indistinguishable from ChatGPT." - https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...
> We were initially surprised by how much imitation models improve over their base models: they are far better at following instructions, and their outputs appear similar to ChatGPT’s. This was further supported by both human and GPT-4 evaluations, where the outputs of our best imitation model were rated as competitive with ChatGPT (e.g., Figure 1, left).
("Competitive" meaning that 70% outputs seemed about as good.)
I remember one from very early in life. I postulated out loud that Jerusalem, being the birth place of Jesus, must be the most peaceful place on earth. All those loving and caring religious people who work so hard to be good people. That their religions are slightly different shouldn't matter to Jesus message?
That the LLM's can consume such huge amounts of data doesn't mean they matured beyond that rather infantile mind set.
In the video you linked he explained that training it to learn to say it doesn't know will trigger false negatives.
The correct formula I imagine (hah!) is to wonder if the question is of interest to the model and to ask someone else for answers or some help figuring out the question. The human will just have to wait.
What is completely hilarious to me is that we all have heads full of learned answers for which we have no idea "why it is so" or at the very least lack that what would have one arrive at that solution. I get what Archimedes realized in the bathtub but what I want is the mechanism by which he arrived at such wild ideas. Could it be that learning a lot of facts would be the exact opposite kind of conditioning?
My mind now says this must be why we humans expire so fast. You keep the calcified brains around for a while as data storage but the focus of project humanity must be to create young ones. I will have to ponder this fact free line of reasoning some more. Perhaps I will find ways to convince myself I know something.
It is a fun thought that people created AI, we really want to believe we did. If enough pretend it is true no one can take it away from us.
If you want people to think you are intelligent you tell them things they already know and hide your sources.
Most humans are going to outlast whatever they produced during their lives. If anything, human bodies are among the most durable "goods" in the economy. Only real estate, public infrastructure and recorded knowledge (including genes) last longer than a human lifetime. How many of the things you buy and own are going to outlast you?
That said, the rest of your comment is spot-on.
Paul G says this too that ChatGPT expertise is the same as a journalist's expertise. Its output seems impressive until it is on a subject you know very well.
GPT-x is like a wide-eyed intern or junior team member who loves to shoot its mouth because it has been told to be assertive and vocal. The good thing is that it is willing to learn.
Now, if this is true of GPT-x which is pretty much the benchmark against which every open source LLM is being measured, you can guess for yourself how much room these open source LLMs still have to cover.
The comments I see here are not about that. They are about small models succeeding at specific tasks, which is affirmed by this paper. Most applications of LLMs are not general purpose chat bots, so this is not bad news for most of the distill/fine tune community.
All the goods we produce are designed to last for a specific time. We can easily make them more durable and with some serious effort they could last longer than we can imagine. It would be expensive, it might be beneficial but who wants to pay for benefits 100 or 200 years into the future?
My comment is about generality, which is the remaining advantage of giant models.
Does OpenAI have the rights on all the texts they used to train their GPTs?
The only valid arguments is whether their model or it's output is itself protected legally.
It's only fair use for search purposes.
I don't like horrific government abuse of residents,and I would not mind throwing most billionaire CEOs into a pool of alligators and dissolving their corporations. I don't like Altman, I think he's a smart person with NOBUS-level reckless hubris who is softballing the magnitude of the dangera to wet. The status quo is not good and it's getting worse.z
It doesn't matter. 5 people with launch-all-the-nukes buttons is better than 500 million.
The “AI is dangerous” premise has no basis whatsoever. No one can prove it. No one can present a great thought experiment. Just doomsaying coupled with volume.
It’s starting to come off like a hidden agenda.
Datacenter NVIDIA cards are already on the export control list for potential military use, and that was pre ChatGPT and GPT-4:
>On August 26, 2022, the U.S. government, or USG, informed NVIDIA Corporation, or the Company, that the USG has imposed a new license requirement, effective immediately, for any future export to China (including Hong Kong) and Russia of the Company’s A100 and forthcoming H100 integrated circuits. DGX or any other systems which incorporate A100 or H100 integrated circuits and the A100X are also covered by the new license requirement. The license requirement also includes any future NVIDIA integrated circuit achieving both peak performance and chip-to-chip I/O performance equal to or greater than thresholds that are roughly equivalent to the A100, as well as any system that includes those circuits. A license is required to export technology to support or develop covered products. The USG indicated that the new license requirement will address the risk that the covered products may be used in, or diverted to, a ‘military end use’ or ‘military end user’ in China and Russia. The Company does not sell products to customers in Russia.
https://www.sec.gov/Archives/edgar/data/1045810/000104581022...
> The “AI is dangerous” premise has no basis whatsoever. No one can prove it. No one can present a great thought experiment. Just doomsaying coupled with volume.
If you increase the number of persuasive Gobbels and hackers attacking infrastructure by 100,000x you do not come away with a better world.
> It’s starting to come off like a hidden agenda.
AI was used to fake the moon landing and hide bigfoot /s
It's correct that OpenAI isn't publishing any of the "stolen" content directly. But they "stole" it to make their service possible in the first place. Not distributing it themself doesn't make much difference than.