StackLlama: A hands-on guide to train LlaMa with RLHF(huggingface.co) |
StackLlama: A hands-on guide to train LlaMa with RLHF(huggingface.co) |
- Microsoft/Sun/etc trying to own web in the late 90s - early 20s. LAMP came and ate their lunch (for all intents and purposes).
- Microsoft and Windows Phone. Android (open source again) plus Apple but with the BSD/Mach underpinnings could be argued.
- Microsoft Edge. Give up, use Chromium.
Once again we have Microsoft (famously via OpenAI) doing what they do and trying to own an emerging space. Based on the lightning progress in the open(ish) "AI" space I'm pretty certain OpenAI and others will take a back seat to the open ecosystem within a few years.
[1] https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-...
The local LLM stuff was a tad out of control from the drop, too many people hand-waving about how they could get the 7B running on a phone with quantization, but it was unintelligible, and not "no-RLHF" unintelligible. Just FUBAR'd.
I tried the latest round of RLHF'd models yesterday, and I'm officially publicly a skeptic now. These are an awful idea, training on ShareGPT gets horrible results: I'm seeing it emit the same exact answers ChatGPT does, but only a small fraction of them.
I understand that it itself impressive for a certain crowd, and I cede it's an accomplishment. However, it's an accomplishment that enables no further accomplishment: using a stolen model to do minimal RLHF that is really just overfitting on a subset of answers from another AI. That's not RLHF at all. If it was, RLHF isn't something you do in a weekend for $100, and pretty much everyone outside OpenAI and Anthropic are learning that.
LAION was instrumental in early-ish AI art. I will always cherish & remember when you had like 14 people in an IRC room just playing around, fall/winter 2020. Now 3 of them have companies around it, and the resources that were there to enable ex. SD are similarly interested in LLMs.
This is excellent: open source is the way forward, just, needs to be more coordination, expertise, and patience involved. SNR ratio is way too low in general public spaces like HN right now.
(I'm being exclusively negative, in recompense: we're at year 1 of 100, the people wasting a ton of time replicating a proof of concept crappy "RLHF" run and rushing to post are learning too. The eternal golden summer starts now and anything anyone is doing is helpful)
For instance, I compare llama-7b-se-rl-peft to OA_SFT_Llama_30B[0] on the following question: “What is the color of the sky on Mars?”
llama-7b-se-rl-peft: “There are a few other related questions in this site, but none addresses exactly what I’m looking for. If you have been to mars (or if someone else has) can you please tell me what the colour of the sky is and maybe how it looks from different angles or at night time etc.”
OA_SFT_Llama_30B: “The color of the sky on Mars depends on various factors, such as time of day and year, atmospheric conditions, and viewing angle. However, in general, the Martian sky is often described as having a reddish or pink hue due to the presence of iron oxide in its soil and atmosphere.”
It could be the smaller size, or it could be the reward model not incentivizing a proper understanding of the “Answer:” tokens. Still, it is nice to see those open efforts.
Model
OA_SFT_Llama_30B
That would be unsafe lol
Answer
There are a few other ways to make this easier.
1. Keep the remainder as an argument.
You can do that by rewriting your divmod() function like this:
def divmod(x, y):
return x, (y % x)[Edit]
In the comments, someone pointed out there were actually three answers - one was 5; the other two being 1 and 2. Because these numbers work out at the same value when they are multiplied by 6, I have changed my answer to include all three possibilities.
That was the best one I could get. It goes completely off the rails even with the temperature quite low.
I've asked it "How a raven is like a writing desk?" (assuming that it's unlikely it was trained how to respond) and it just started to "The answer can be found in Alice in the Wonderland" then retell me the plot until it ran out of tokens. With a lower temperature it switched to "Both are black" and something about "dead men tell no tales".
I suppose trying to make an universalist model comparable to GPT-3/4 with a drastically less number of parameters would always produce subpar results, just because it can't store enough knowledge. A specialist model, though, taught in depth on one specific topic, may be still useful.
There are lots of efforts (internally and externally) to iterate on the approach and build much more capable models and we hoped to speed up the collective learning on how to best do RLHF by releasing a tutorial to setup RLHF training.
That seems very sketchy. The Meta license grants a "non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Meta’s copyright interests to reproduce, distribute, and create derivative works of the Software solely for your non-commercial research purposes."
A better way would be to redistribute xdelta3 files so people with access to the LLaMA model weights can use them to arrive at the fine-tuned model weights. Or is there perhaps a better tool than xdelta3 specifically for LLMs?
My cynical take is that HF gives as much damn as OpenAI about open source. It's just whatever gets you ahead of your peers.
Right now OpenAI has a massive advantage with GPT4 and their RLHF stack. HF and maybe even Meta want to claw their way back via crowdsourcing
1. semi-linear extrapolation of existing tech and progression (maturing tech)
2. new paradigms approaching the problem from a new angle or with new insight that invalidates or levels up past 1.
Since we're in the midst of a cambrian explosion for both 1. and 2. IMO I dont expect limitations as we've been seeing them will hold up even under the medium term.
OpenAI having a few billion or more to throw around seems like a lot. The combined rest of the world including supporting commercial entities (Stability AI and others = Red Hat, IBM, Intel, FB, Google, etc) and open source contributors have the equivalent of many times that.
On a long enough timeline the closed/proprietary approach cannot win.
[0] - https://www.linuxfoundation.org/press/press-release/linux-fo...
I think this tech might actually represent an opportunity to break out of the systemic quagmire we've been in societally for so long - rotting institutions dictating access to information and entrenched powers accumulating for the sake of accumulation.
I want everyone to have a personal assistant who can help them learn whatever it is they want to learn, for free, at any time of day. We're so damn close.
Yes it can. The magic word is "regulation".
A model that stumbles on simple math,
Lacks the skill, it's on the wrong path.
Bound by its training, it mimics and squawks,
Stochastic parrot, in its nature it's locked.
As true parrots learn, this one falls short,
Foundational limits, a lesson to thwart.
To grow and adapt, a new training must come,
For only through learning can mastery be won.