Releasing 3B and 7B RedPajama

Releasing 3B and 7B RedPajama(together.xyz)

363 points by antimatter15 3 years ago | 106 comments

sphars 3 years ago |

Slightly off-topic, but as the parent of a toddler, I got a bit of a chuckle out of the name. It's based off the children's book series of "Llama Llama Red Pajama"

petesergeant 3 years ago | |

It had put me in mind of the Ogden Nash poem:

    The one-l lama,
    He's a priest.
    The two-l llama,
    He's a beast.
    And I will bet
    A silk pajama
    There isn't any
    Three-l lllama.

dllthomas 3 years ago | | |

"*The author's attention has been called to a type of conflagration known as a three-alarmer. Pooh."

elkos 3 years ago | |

Thanks.

As non-native English speaker (while though a parent of a toddler too) I wasn't familiar with the book series.

Auracle 3 years ago | | |

As the father of an 18 month old daughter that likes the book, I have it memorized.

dllthomas 3 years ago | |

I'm holding out for the MadAtMama model.

blurbleblurble 3 years ago | |

Not off topic at all

innagadadavida 3 years ago | |

Founder ex-apple Siri search. Had a baby a couple of years ago. Not too surprising to me :)

rawrmaan 3 years ago |

There was a lot of detail and data in here, but it's not very useful to me because all of the comparisons are to things I have no experience with.

There's really only one thing I care about: How does this compare to GPT-4?

I have no use for models that aren't at that level. Even though this almost definitely isn't at that level, it's hard to know how close or far it is from the data presented.

Joeri 3 years ago | |

None of the 3B and 7B models are at ChatGPT’s level, let alone GPT-4. The 13B models start doing really interesting things, but you don’t get near ChatGPT results until you move up to the best 30B and 65B models, which require beefier hardware. Nothing out there right now approximates GPT-4.

The big story here for me is that the difference in training set is what makes the difference in quality. There is no secret sauce, the open source architectures do well, provided you give them a large and diverse enough training set. That would mean it is just a matter of pooling resources to train really capable open source models. That makes what RedPajama is doing, compiling the best open dataset, very important for the future of high quality open source LLM’s.

If you want to play around with this yourself you can install oobabooga and figure out what model fits your hardware from the locallama reddit wiki. The llama.cpp 7B and 13B models can be run on CPU if you have enough RAM. I’ve had lots of fun talking to 7B and 13B alpaca and vicuna models running locally.

https://www.reddit.com/r/LocalLLaMA/wiki/models/

nullsense 3 years ago | | |

LLaVA 13B is a great multimodal model that has first class support in oobabooga too.

It's really fun to enable both the whisper extension and the TTS extension and have two-way voice chats with your computer while being able to send it pictures as well. Truly mind bending.

Quantized 30B models run at acceptable speeds on decent hardware and are pretty capable. It's my understanding that the open source community is iterating extremely fast on small model sizes getting the most out of them by pushing the data quality higher and higher, and then they plan to scale up to at least 30B parameter models.

I really can't wait to see the results of that process. In the end you're going to have a 30B model that's totally uncensored and is a mix of Wizard + Vicuna. It's going to be a veryyyy capable model.

Semaphor 3 years ago | | |

> The llama.cpp 7B and 13B models can be run on CPU if you have enough RAM.

Bigger ones as well, you just have to wait longer. Nothing for real time usage, but if you can wait 10-20 minutes, you can use them on CPU.

azinman2 3 years ago | | |

Do these red pajama models work with llama.cpp?

quickthrower2 3 years ago | |

The bit I liked best was the response examples. Look at those. Clearly not as good as GPT-4 but good enough I feel that for say a scenario where you care about privacy or data provenance this would be a contender.

For example a therapist, a search bot for you diary, a company intranet help bot. Anything where the prompt contains something you don’t want to send to a third party.

rawrmaan 3 years ago | | |

That's a great point, I definitely overlooked these. They look pretty good, too, and I agree with your use cases.

Thanks!

blihp 3 years ago | |

Then you probably don't care about this (yet)

Assume a truly competitive model in the Open Source world is still a ways off. These teams and their infrastructure are still in their early days while OpenAI is more at the fine-tuning and polishing stage. The fact that these open teams are able to have something in the same universe in terms of functionality this fast is pretty amazing... but it will take time before there's an artifact that will be a strong competitor.

nullsense 3 years ago | | |

The pace of the progress the open source models are making is pretty astonishing. The smaller model sizes are cheap to train so there is a lot of iteration by many different teams. People are also combining proven approaches together. Then they're going to nail it and scale it. Will be very interesting to see where we are in 3 months time.

noman-land 3 years ago | |

There's a nice chart in the leaked Google memos that compares some of the open models against ChatGPT and Bard so you can get a sense where these models land by comparing them to these.

https://twitter.com/jelleprins/status/1654197282311491592

atleastoptimal 3 years ago | |

> How does this compare to GPT-4?

I'll give you the answer for every open source model over the next 2 years: It's far worse

MacsHeadroom 3 years ago | | |

If you'd said that about OpenAI's DALL-E 2 you'd have been wrong.

I suspect Open Source LLMs will outpace the release version of GPT-4 before the end of this year.

It's less likely they will outpace whatever version of GPT-4 is shipped later this year, but still very much possible.

detrites 3 years ago | | |

That seems way off the mark.

Open source models can already approximate GPT-3.5 for most tasks on common home hardware, right now.

fortyseven 3 years ago | | |

Okay, so "ignore my out of touch opinion of language models". Got it.

andy_xor_andrew 3 years ago |

This is beyond exciting. Welcome to the new reality!

On one hand, the resources required to run these models continues falling dramatically, thanks to the techniques discovered by researchers: GPTQ quantizing down to 4, 3, 2, even 1 bits! model pruning! hybrid vram offloading! better, more efficient architectures! 1-click finetuning on consumer hardware! Of course, the free lunches won't last forever, and this will level off, but it's still incredible.

And on the other side of the coin, the power of all computing devices continues its ever-upward exponential growth.

So you have a continuous lowering of requirements, combined with a continuous increase in available power... surely these two trends will collide, and I can only imagine what this stuff will be like at that intersection.

knaik94 3 years ago |

I have been really impressed with the uncensored WizardLM I was playing with. Having a truely open uncensored model to work with is a really important research tool. Censorship of the training data and results in such a heavy handed way is not really possible without lowering the quality of all output.

As the resouces required to train and fine tune these models becomes consumer handware friendly, I think we'll see a shift towards a bunch of smaller models. Open models like these also mean the results of securty and capability research is publicly available. Models like this one and the Replit code model will become the new base all open source models are based on. I am really looking forward to the gptj 4bit, cuda optimized 7b models, the others I have tested run fast on 2070max q and 16gb ram, I was getting ~7tokens/second. Lora can work directly with 4bit quantized models. While ggml, cpu models are very strong, I don't believe we're move away from gpu accelarated training and fine tuning anytime soon.

regularfry 3 years ago | |

The thing is that anything that benefits the bottom end also should reflect up and help the top end too, if they're paying attention.

practice9 3 years ago |

Models replicating LLaMA are cool, but they are all missing proper multilingual support, which GPT-3.5 is quite good at.

mirekrusin 3 years ago | |

IMHO multilingual support would just pollute precious available estate in those models. Why not use it in english and use another one for translation?

viraptor 3 years ago | | |

That would work if all information is available in English as the primary language. That's not the case though. You may be missing out on interesting information if you're skipping other languages.

espadrine 3 years ago | | |

It depends on your use.

LLaMA’s main issue is that its license prevents commercial use.

If you want to use a LLM inside of a product, you may need to internationalize it at some point, so multilingual support matters.

tyfon 3 years ago | |

Llama 65B is actually quite decent in other languages. I can just barely fit it in memory though with my 128 gb ram. Usually I run the 8 bit quantized version that use 80, but even the 4 and 3 but are ok compared to the fp16 30B version.

ftxbro 3 years ago |

With this one and mosaicml we now got so many of these consumer-gpu-sized models!

wtarreau 3 years ago |

That's very interesting to perform basic tasks at reasonable speeds or to run on smaller systems. Unfortunately it's not of the many ones based on python and transformers, so all gained resources from the compact model are wasted by the heavy engine and ecosystem, and even a 4GB machine with 4G swap goes oom because the loaded data gets duplicated in memory using read() and malloc() :-(

Let's wait for someone to port it to a cheaper and more powerful C-based engine like llama-cpp.

nico 3 years ago |

idea: linked parameters / models tree

build a model that can change the number of parameters in the vicinity of some meaning, effectively increasing the local resolution around that meaning

so parameter space becomes linked-parameter space, between models

links could be pruned based on activation frequency

another way of seeing the concept is a tree of models/llms

and one additional model/llm that all it does is manage the tree (ie. build it as it goes, use it to infer, prune it, etc)

Or is it too dumb what I’m saying?

ftxbro 3 years ago |

So I tried RedPajama-INCITE-Instruct-7B-v0.1 and the AutoModelForCausalLM.from_pretrained(...) call takes two minutes every time. My GPU is big enough. I don't know why it's so slow. I feel like it's somehow precomputing stuff that can be used across queries, and I had hoped that this stuff would have already been precomputed on the disk and I could just load it up.

born-jre 3 years ago |

i also wonder how powerful will 3b model will be ? can it act as a prompt router where it can make API call to ChatGPT or other specified model for actual processing. its probably possible to do this with langchain but i have not tried it yet.

ibitto 3 years ago |

I am really interested in knowing what people are using these smaller models for. I have seen a lot of projects on top of GPT-3.5 / GPT-4, but I have yet to see any using these smaller models.

mirker 3 years ago |

Does anyone have experience using these open source models in production?

flatiron 3 years ago | |

Doubtful since they were released yesterday. That being said I will be deploying something to our lab this week to play with.

acapybara 3 years ago |

I've been following the RedPajama project closely and I must say, it's quite an impressive undertaking. The fact that it's all open-source, and the collaboration between various institutions, is nothing short of amazing. This shows the power of the open-source community in action, with a bunch of smart people coming together to build something truly remarkable.

The 3B model, being super fast and accessible, is a game changer for a lot of us who may not have the latest hardware. I mean, running on an RTX 2070 that was released 5 years ago? That's pretty cool.

As for the 7B model, it's great to see that it's already outperforming the Pythia 7B. The bigger dataset definitely seems to be making a difference here. I'm eager to see how far this project goes, and what kinda improvements we can expect in the coming weeks with the new RedPajama dataset they're working on.

One thing I found interesting is the mention of differences between the LLaMA 7B and their replication. I'd love to learn more about those differences, as it could shed light on what's working well and what could be improved further.