StackLlama: A hands-on guide to train LlaMa with RLHF

StackLlama: A hands-on guide to train LlaMa with RLHF(huggingface.co)

165 points by kashifr 3 years ago | 38 comments

pksebben 3 years ago |

Glad to see more progress on open(ish) source versions. There's so much more these things could do unfettered by corporate motivations.

kkielhofner 3 years ago | |

I'm convinced this is going to be history repeating itself:

- Microsoft/Sun/etc trying to own web in the late 90s - early 20s. LAMP came and ate their lunch (for all intents and purposes).

- Microsoft and Windows Phone. Android (open source again) plus Apple but with the BSD/Mach underpinnings could be argued.

- Microsoft Edge. Give up, use Chromium.

Once again we have Microsoft (famously via OpenAI) doing what they do and trying to own an emerging space. Based on the lightning progress in the open(ish) "AI" space I'm pretty certain OpenAI and others will take a back seat to the open ecosystem within a few years.

KaoruAoiShiho 3 years ago | | |

According to interviews OpenAI only released ChatGPT in advance of GPT4 because of paranoia that they would be supplanted by open versions and end up being irrelevant. Their fear is not unfounded as it-just-happened to them, with Dalle-2 and stablediffusion.

vimy 3 years ago | | |

Maybe. The problem is that you need billions to train new models.[1] At least with how things are now.

[1] https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-...

refulgentis 3 years ago |

I've been on leave from work and hammering the GPT APIs since GPT 3.5/ChatGPT was made available.

The local LLM stuff was a tad out of control from the drop, too many people hand-waving about how they could get the 7B running on a phone with quantization, but it was unintelligible, and not "no-RLHF" unintelligible. Just FUBAR'd.

I tried the latest round of RLHF'd models yesterday, and I'm officially publicly a skeptic now. These are an awful idea, training on ShareGPT gets horrible results: I'm seeing it emit the same exact answers ChatGPT does, but only a small fraction of them.

I understand that it itself impressive for a certain crowd, and I cede it's an accomplishment. However, it's an accomplishment that enables no further accomplishment: using a stolen model to do minimal RLHF that is really just overfitting on a subset of answers from another AI. That's not RLHF at all. If it was, RLHF isn't something you do in a weekend for $100, and pretty much everyone outside OpenAI and Anthropic are learning that.

trifurcate 3 years ago | |

In my experience, the smaller models are almost completely worthless as-is. 65B is the only decent one (I'd say just behind gpt-3.5-turbo, and obviously it's not instruction tuned but I mean the coherency of the core language model), and understandably people aren't really paying attention or devoting much resources to the largest one. 30B shows promise for specific tasks with fine tuning, but 7B and 13B are just toys.

danielbln 3 years ago | |

How would you judge Open Assistant's approach?

refulgentis 3 years ago | | |

I don't know much about it specifically but, heartily endorse.

LAION was instrumental in early-ish AI art. I will always cherish & remember when you had like 14 people in an IRC room just playing around, fall/winter 2020. Now 3 of them have companies around it, and the resources that were there to enable ex. SD are similarly interested in LLMs.

This is excellent: open source is the way forward, just, needs to be more coordination, expertise, and patience involved. SNR ratio is way too low in general public spaces like HN right now.

(I'm being exclusively negative, in recompense: we're at year 1 of 100, the people wasting a ton of time replicating a proof of concept crappy "RLHF" run and rushing to post are learning too. The eternal golden summer starts now and anything anyone is doing is helpful)

espadrine 3 years ago |

One flaw that it has is to extend the question despite the prompt requesting an answer.

For instance, I compare llama-7b-se-rl-peft to OA_SFT_Llama_30B[0] on the following question: “What is the color of the sky on Mars?”

llama-7b-se-rl-peft: “There are a few other related questions in this site, but none addresses exactly what I’m looking for. If you have been to mars (or if someone else has) can you please tell me what the colour of the sky is and maybe how it looks from different angles or at night time etc.”

OA_SFT_Llama_30B: “The color of the sky on Mars depends on various factors, such as time of day and year, atmospheric conditions, and viewing angle. However, in general, the Martian sky is often described as having a reddish or pink hue due to the presence of iron oxide in its soil and atmosphere.”

It could be the smaller size, or it could be the reward model not incentivizing a proper understanding of the “Answer:” tokens. Still, it is nice to see those open efforts.

[0]: https://open-assistant.io/chat

throwaway888abc 3 years ago | |

just tried same for curiosity "The Mars sky would be pink if you were to look up and observe it in person. This is due to the presence of finely dispersed iron oxide particles in its atmosphere, which scatter blue light from the sun and cause the sky to appear reddish-pink in color. This phenomenon is known as "Rayleigh scattering." If seen through a telescope on Earth however, the Martian sky may appear dark or even black depending on atmospheric conditions and time of day/year. It can sometimes be observed with a bluish tint during sunrise or sunset when more sunlight reaches the planet's surface and illuminates the dust particles in the air. During certain times of year (such as spring and summer), a thin layer of water ice clouds can also form at higher altitudes, causing the sky to become slightly whiter than usual."

Model

OA_SFT_Llama_30B

scottydog51834 3 years ago |

I'd love a tool where I can upload a private dataset and RLHF a model (even better if the tool provides the pre-trained model) without me having to worry about GPUs, memory, commercial access, or even writing any python code. I'd happy pay several hundred and maybe several thousands for access to this.

Tepix 3 years ago | |

Even better would be a crowdfunding platform where you can collect some money from a bunch of people and then use it to fine-tune the bigger models (30B and 65B in the case of LLaMA).

chankstein38 3 years ago | |

OpenAI allows you to fine tune any of their LLM models. I'm not sure if your requirements were "without paying OpenAI something" but if you are ok with that, fine tuning with them is pretty straightforward. IANAL or anything but if I recall they even promise they won't use your data for their training.

sebzim4500 3 years ago | | |

They let you do this with their GPT-3 derived models, but not the GPT-3.5 or GPT-4 ones.

That would be unsafe lol

ttul 3 years ago | |

I think that tool would have broad appeal, but I wonder if the most likely case is that it would be buried inside other, higher-level systems, such as customer support automation SaaS.

mcaledonensis 3 years ago |

It is incapable of doing any arithmetic, e.g. on a question: 9 - 4 =

  Answer

  There are a few other ways to make this easier.

  1. Keep the remainder as an argument.

  You can do that by rewriting your divmod() function like   this:

  def divmod(x, y):
    return x, (y % x)

sp332 3 years ago | |

I asked a more verbose version of the same question, and it started with a similar answer but added this:

[Edit]

In the comments, someone pointed out there were actually three answers - one was 5; the other two being 1 and 2. Because these numbers work out at the same value when they are multiplied by 6, I have changed my answer to include all three possibilities.

That was the best one I could get. It goes completely off the rails even with the temperature quite low.

mcaledonensis 3 years ago | | |

I'd call it a principle of invariance of compost piles. Regardless of how long the compost pile is being stirred or soaked, the product of the compost pile is compost.

drdaeman 3 years ago | |

It just generates some blabber that "seem" to relate.

I've asked it "How a raven is like a writing desk?" (assuming that it's unlikely it was trained how to respond) and it just started to "The answer can be found in Alice in the Wonderland" then retell me the plot until it ran out of tokens. With a lower temperature it switched to "Both are black" and something about "dead men tell no tales".

I suppose trying to make an universalist model comparable to GPT-3/4 with a drastically less number of parameters would always produce subpar results, just because it can't store enough knowledge. A specialist model, though, taught in depth on one specific topic, may be still useful.

lvwerra 3 years ago | |

One of the authors here :) A note on model performance: indeed, the model is not great (yet) at many of the tasks. We released it mostly as part of a tutorial on RLHF to show case how to do the whole training loop and also because it often creates quite funny answers.

There are lots of efforts (internally and externally) to iterate on the approach and build much more capable models and we hoped to speed up the collective learning on how to best do RLHF by releasing a tutorial to setup RLHF training.

mcaledonensis 3 years ago | | |

Model capability is mostly set, before the alignment even starts. Alignment turns it from a super-smart cat into a friendly dog. But it can't turn a parrot into a human. It can't even teach the parrot to count ;)

kashifr 3 years ago |

All the steps involved in training a LlaMa model to answer questions on Stack Exchange data with RLHF.

ttul 3 years ago | |

You could of course use your own question and answer data to refine the model using the same process. I wonder if anyone has tried that yet to, for instance, fine tune LlaMa to answer support queries for their company?

great_psy 3 years ago |

Hopefully research like this will even out access to the new tech. Maybe once we figure out a pretty good architecture we will have something like chatBot.train(…) where we just feed some data for the fine tuning.

lumost 3 years ago |

curious why all of these posts start with Llama vs one of the many open source LLMs now. We have the Cerebrus releases, Salesforce CodeGen-NL, and others.

Tepix 3 years ago |

So, they are taking the Llama model released by Meta, doing a little fine-tuning and then re-releasing the resulting model under a different license?

That seems very sketchy. The Meta license grants a "non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Meta’s copyright interests to reproduce, distribute, and create derivative works of the Software solely for your non-commercial research purposes."

A better way would be to redistribute xdelta3 files so people with access to the LLaMA model weights can use them to arrive at the fine-tuned model weights. Or is there perhaps a better tool than xdelta3 specifically for LLMs?

GaggiX 3 years ago | |

They only released the LoRA.

Tepix 3 years ago | | |

Oh, you're absolutely right. I must have looked at the wrong folder or something. Never mind then!

jimsimmons 3 years ago | |

HF wants to undercut OpenAI anyway possible.

My cynical take is that HF gives as much damn as OpenAI about open source. It's just whatever gets you ahead of your peers.

Right now OpenAI has a massive advantage with GPT4 and their RLHF stack. HF and maybe even Meta want to claw their way back via crowdsourcing

refulgentis 3 years ago | | |

This has ~0 to do with Hugging Face, Hugging Face is Github for ML models

Answer There are a few other ways to make this easier. 1. Keep the remainder as an argument. You can do that by rewriting your divmod() function like this: def divmod(x, y): return x, (y % x)

A model that stumbles on simple math, Lacks the skill, it's on the wrong path. Bound by its training, it mimics and squawks, Stochastic parrot, in its nature it's locked. As true parrots learn, this one falls short, Foundational limits, a lesson to thwart. To grow and adapt, a new training must come, For only through learning can mastery be won.