Ggml.ai joins Hugging Face to ensure the long-term progress of Local AI

Ggml.ai joins Hugging Face to ensure the long-term progress of Local AI(github.com)

839 points by lairv 132 days ago | 223 comments

mythz 132 days ago |

I consider HuggingFace more "Open AI" than OpenAI - one of the few quiet heroes (along with Chinese OSS) helping bring on-premise AI to the masses.

I'm old enough to remember when traffic was expensive, so I've no idea how they've managed to offer free hosting for so many models. Hopefully it's backed by a sustainable business model, as the ecosystem would be meaningfully worse without them.

We still need good value hardware to run Kimi/GLM in-house, but at least we've got the weights and distribution sorted.

data-ottawa 132 days ago | |

Can we toss in the work unsloth does too as an unsung hero?

They provide excellent documentation and they’re often very quick to get high quality quants up in major formats. They’re a very trustworthy brand.

disiplus 132 days ago | | |

Yeah, they're the good guys. I suspect the open source work is mostly advertisements for them to sell consulting and services to enterprises. Otherwise, the work they do doesn't make sense to offer for free.

swyx 132 days ago | | |

not that unsung! we've given them our biggest workshop spot every single year we've been able to and will do until they are tired of us https://www.youtube.com/@aiDotEngineer/search?query=unsloth

danielhanchen 132 days ago | | |

Oh thank you - appreciate it :)

cubie 132 days ago | | |

I'm a big fan of their work as well, good shout.

Tepix 132 days ago | |

It's insane how much traffic HF must be pushing out of the door. I routinely download models that are hundreds of gigabytes in size from them. A fantastic service to the sovererign AI community.

razster 132 days ago | | |

My fear is that these large "AI" companies will lobby to have these open source options removed or banned, growing concern. I'm not sure how else to explain how much I enjoy using what HF provides, I religiously browse their site for new and exciting models to try.

Onavo 132 days ago | | |

Bandwidth is not that expensive. The Big 3 clouds just want to milk customers via egress. Look at Hetzner or CloudFlare R2 if you want to get get an idea of commodity bandwidth costs.

vardalab 132 days ago | | |

Yup, I have downloaded probably a terabyte in the last week, especially with the Step 3.5 model being released and Minimax quants. I wonder what my ISP thinks. I hope they don't cut me off. They gave me a fast lane, they better let me use it, lol

zozbot234 132 days ago | |

> We still need good value hardware to run Kimi/GLM in-house

If you stream weights in from SSD storage and freely use swap to extend your KV cache it will be really slow (multiple seconds per token!) but run on basically anything. And that's still really good for stuff that can be computed overnight, perhaps even by batching many requests simultaneously. It gets progressively better as you add more compute, of course.

Aurornis 132 days ago | | |

> it will be really slow (multiple seconds per token!)

This is fun for proving that it can be done, but that's 100X slower than hosted models and 1000X slower than GPT-Codex-Spark.

That's like going from real time conversation to e-mailing someone who only checks their inbox twice a day if you're lucky.

HPsquared 132 days ago | | |

At a certain point the energy starts to cost more than renting some GPUs.

sowbug 132 days ago | |

Why doesn't HF support BitTorrent? I know about hf-torrent and hf_transfer, but those aren't nearly as accessible as a link in the web UI.

embedding-shape 132 days ago | | |

> Why doesn't HF support BitTorrent?

Harder to track downloads then. Only when clients hit the tracker would they be able to get download states, and forget about private repositories or the "gated" ones that Meta/Facebook does for their "open" models.

Still, if vanity metrics wasn't so important, it'd be a great option. I've even thought of creating my own torrent mirror of HF to provide as a public service, as eventually access to models will be restricted, and it would be nice to be prepared for that moment a bit better.

Fin_Code 132 days ago | |

I still don't know why they are not running on torrent. Its the perfect use case.

heliumtera 132 days ago | | |

How can you be the man in the middle in a truly P2P environment?

freedomben 132 days ago | | |

That would shut out most people working for big corp, which is probably a huge percentage of the user base. It's dumb, but that's just the way corp IT is (no torrenting allowed).

simonw 132 days ago |

It's hard to overstate the impact Georgi Gerganov and llama.cpp have had on the local model space. He pretty much kicked off the revolution in March 2023, making LLaMA work on consumer laptops.

Here's that README from March 10th 2023 https://github.com/ggml-org/llama.cpp/blob/775328064e69db1eb...

> The main goal is to run the model using 4-bit quantization on a MacBook. [...] This was hacked in an evening - I have no idea if it works correctly.

Hugging Face have been a great open source steward of Transformers, I'm optimistic the same will be true for GGML.

I wrote a bit about this here: https://simonwillison.net/2026/Feb/20/ggmlai-joins-hugging-f...

ushakov 132 days ago | |

i am curious, why are your comments always pinned to the top?

carbocation 132 days ago | | |

Because many of us think simonw has discerning taste on this topic and like to read what he has to say about it, so we upvote his comments.

simonw 132 days ago | | |

At a guess that's because my comment attracted more up-votes than the other top-level comments in the thread.

I generally try to include something in a comment that's not information already under discussion - in this case that was the link and quote from the original README.

llm_nerd 132 days ago | | |

HN goes through phases. I remember when patio11 was the star of the hour on here. At another time it was that security guy (can't remember his name).

And for those who think it's just organic with all of the upvotes, HN absolutely does have a +/- comment bias for users, and it does automatically feature certain people and suppress others.

HanClinto 132 days ago |

I'm regularly amazed that HuggingFace is able to make money. It does so much good for the world.

How solid is its business model? Is it long-term viable? Will they ever "sell out"?

car 131 days ago |

So great to see my two favorite Open Source AI projects/companies joining forces.

Since I don't see it mentioned here, LlamaBarn is an awesome little—but mighty—MacOS menubar program, making access to llama.cpp's great web UI and downloading of tastefully curated models easy as pie. It automatically determines the available model- and context-sizes based on available RAM.

https://github.com/ggml-org/LlamaBarn

Downloaded models live in:

  ~/.llamabarn

Apart from running on localhost, the server address and port can be set via CLI:

  # bind to all interfaces (0.0.0.0)
  defaults write app.llamabarn.LlamaBarn exposeToNetwork -bool YES

  # or bind to a specific IP (e.g., for Tailscale)
  defaults write app.llamabarn.LlamaBarn exposeToNetwork -string "100.x.x.x"

  # disable (default)
  defaults delete app.llamabarn.LlamaBarn exposeToNetwork

noisy_boy 131 days ago | |

Github is showing me unicorn - is there an Linux equivalent? I have a old Thinkpad with a puny Nvidia GPU, can I hope to find anything useful to run on that?

car 131 days ago | | |

Building Llama.cpp from source with CUDA enabled should get you pretty far. llama-server has a really good web UI, the latest version supports model switching.

As for models, plenty of GGUF quantized (down to 2-bit) available on HF and modelscope.

0xbadcafebee 132 days ago |

> The community will continue to operate fully autonomously and make technical and architectural decisions as usual. Hugging Face is providing the project with long-term sustainable resources, improving the chances of the project to grow and thrive. The project will continue to be 100% open-source and community driven as it is now.

I want this to be true, but business interests win out in the end. Llama.cpp is now the de-facto standard for local inference; more and more projects depend on it. If a company controls it, that means that company controls the local LLM ecosystem. And yeah, Hugging Face seems nice now... so did Google originally. If we all don't want to be locked in, we either need a llama.cpp competitor (with a universal abstration), or it should be controlled by an independent nonprofit.

zozbot234 132 days ago | |

Llama.cpp is an open source project that anyone can fork as needed, so any "control" over it really only extends to facilitating development of certain features.

0xbadcafebee 132 days ago | | |

In practice, nobody does this, because you then have to keep the fork up to date with upstream plus your changes, and this is an endless amount of work.

mnewme 132 days ago |

Huggingface is the silent GOAT of the AI space, such a great community and platform

lairv 132 days ago | |

Truly amazing that they've managed to build an open and profitable platform without shady practices

al_borland 132 days ago | | |

It’s such a sad state of affairs when shady practices are so normal that finding a company without them is noteworthy.

jgrahamc 132 days ago |

This is great news. I've been sponsoring ggml/llama.cpp/Georgi since 2023 via Github. Glad to see this outcome. I hope you don't mind Georgi but I'm going to cancel my sponsorship now you and the code have found a home!

superkuh 132 days ago |

I'm glad the llama.cpp and the ggml backing are getting consistent reliable economic support. I'm glad that ggerganov is getting rewarded for making such excellent tools.

I am somewhat anxious about "integration with the Hugging Face transformers library" and possible python ecosystem entanglements that might cause. I know llama.cpp and ggml already have plenty of python tooling but it's not strictly required unless you're quantizing models yourself or other such things.

beoberha 132 days ago |

Seems like a great fit - kinda surprised it didn’t happen sooner. I think we are deep in the valley of local AI, but I’d be willing to bet it breaks out in the next 2-3 years. Here’s hoping!

breisa 132 days ago | |

I mean they already supported the project quite a bit. @ngxson and maybe others? from Huggingface are big contributors to llama.cpp.

tkp-415 132 days ago |

Can anyone point me in the direction of getting a model to run locally and efficiently inside something like a Docker container on a system with not so strong computing power (aka a Macbook M1 with 8gb of memory)?

Is my only option to invest in a system with more computing power? These local models look great, especially something like https://huggingface.co/AlicanKiraz0/Cybersecurity-BaronLLM_O... for assisting in penetration testing.

I've experimented with a variety of configurations on my local system, but in the end it turns into a make shift heater.

dmezzetti 132 days ago |

This is really great news. I've been one of the strongest supporters of local AI dedicating thousands of hours towards building a framework to enable it. I'm looking forward to seeing what comes of it!

ukblewis 132 days ago |

Honestly I’m shocked to be the only one I see of this opinion: HuggingFace’s `accelerate`, `transformers` and `datasets` have been some of the worst open source Python libraries I have ever used that I had to use. They break backwards compatibility constantly, even on APIs which are not underscore/dunder named even on minor version releases without even documenting this, they refuse PRs fixing their lack of `overloads` type annotations which breaks type checking on their libraries and they just generally seem to have spaghetti code. I am not excited that another team is joining them and consolidating more engineering might in the hands of these people

ukblewis 132 days ago | |

And I said all of that despite us continuing to use their platform and libraries extensively… We just don’t have a choice due to their dominance of open source ML

ukblewis 132 days ago | |

And clearly I say all of this in my name and not my employers name

mhher 131 days ago |

It's great to see the ggml team getting proper backing. Keeping inference in bare-metal C/C++ without the Python bloat is the only way local AI is going to scale efficiently. Well deserved for Georgi, Johannes, Piotr, and the rest of the team.

ontouchstart 132 days ago |

I have played with both mlx-lm and llama.cpp after I bought a 24GB M5 MacBook Pro last year.

Then I fell down the rabbit holes of uv, rust and C++ and forgot about LLMs. Today after I saw this announcement and answered someone’s question about how to set it up, when I got home, I decided play with llama.cpp again.

I was surprised and impressed:

https://ontouchstart.github.io/rabbit-holes/llama.cpp/

I am not going to use mlx-lm or lmstudio anymore. llama.cpp is so much fun.

the__alchemist 132 days ago |

Does anyone have a good comparison of HuggingFace/Candle to Burn? I am testing them concurrently, and Burn seems to have an easier-to-use API. (And can use Candle as a backend, which is confusing) When I ask on Reddit or Discord channels, people overwhelmingly recommend Burn, but provide no concrete reasons beyond "Candle is more for inference while Burn is training and inference". This doesn't track, as I've done training on Candle. So, if you've used both: Thoughts?

csunoser 132 days ago | |

I have used both (albeit 2 years ago, and things change really fast). At the time, Candle didn't have 2d conv backprop with strides properly implemented. And getting Burn running libtch backend was just a lot simpler.

I did use candle for wasm based inference for teaching purposes - that was reasonably painless and pretty nice.

mattfrommars 132 days ago |

I don’t know if this warrants a separate thread here but I have to ask…

How can I realistically get involved the AI development space? I feel left out with what’s going on and living in a bubble where AI is forced into by my employer to make use of it (GitHub Copilot), what is a realistic road map to kinda slowly get into AI development, whatever that means

My background is full stack development in Java and React, albeit development is slow.

I’ve only messed with AI on very application side, created a local chat bot for demo purposes to understand what RAG is about to running models locally. But all of this is very superficial and I feel I’m not in the deep with what AI is about. I get I’m too ‘late’ to be on the side of building the next frontier model and makes no sense, what else can I do?

I know Python, next step is maybe do ‘LLM from scratch”? Or I pick up Google machine learning crash course certificate? Or do recently released Nvidia Certification?

I’m open for suggestions

fc417fc802 132 days ago | |

I'm not entirely clear what your goals are but roughly, just figure out an application that holds your interest and build a model for it from scratch. Probably don't start with an LLM though. Same as for anything else really. If you're interest in computer graphics then decide on a small scale project and go build it from scratch. Etc.

w10-1 132 days ago | |

The competition for root and branch AI models and infrastructure is intense and skilled.

But if you're adjacent to some leaf use-case for AI, you're likely already as good as anyone else at productizing it.

And that's who is getting hired: people who show they can deliver product-market fit.

breisa 132 days ago | |

Maybe look into model finetuning/distilation. Unsloth [1] has great guides and provides everything you need to get started on Google Colab for free. [1] https://unsloth.ai/

swyx 132 days ago | |

go thru workshops here https://www.youtube.com/@aiDotEngineer/

jimmydoe 132 days ago |

Amazing. I like the openness of both project and really excited for them.

Hopefully this does not mean consolidation due to resource dry up but true fusion of the bests.

option 132 days ago |

Isn't HF banned in China? Also, how are many Chinese labs on Twitter all the time?

In either case - huge thanks to them for keeping AI open!

dragonwriter 132 days ago | |

> Isn't HF banned in China?

I think, for some definition of “banned”, that’s the case. It doesn’t stop the Chinese labs from having organization accounts on HF and distributing models there. ModelScope is apparently the HF-equivalent for reaching Chinese users.

disiplus 132 days ago | |

I think in the West we think everything is blocked. But for example, if you book an eSIM, when you visit you already get direct access to Western services because they route it to some other server. Hong Kong is totally different: they basically use WhatsApp and Google Maps, and everything worked when I was there.

embedding-shape 132 days ago | | |

But also yes, parent is right, HF is more or less inaccessible, and Modelscope frequently cited as the mirror to use (although many Chinese labs seems to treat HF as the mirror, and Modelscope as the "real" origin).

woadwarrior01 132 days ago | |

HF is indeed banned in China. The Chinese equivalent of HF is ModelScope[1].

[1]: https://modelscope.cn/

androiddrew 132 days ago |

One of the few acquisitions I do support

am17an 131 days ago |

One often overlooked after that is ggml, the tensor library that runs llama.cpp is not based on pytorch, rather just plain cpp. In a world where pytorch dominates, it shows that alternatives are possible and are worthy to be pursued.

kristianp 132 days ago |

> Towards seamless “single-click” integration with the transformers library

That's interesting. I thought they would be somewhat redundant. They do similar things after all, except training.

stephantul 132 days ago |

Georgi is such a legend. Glad to see this happening

segmondy 132 days ago |

Great news! I have always worried about ggml and long term prospect for them and wished for them to be rewarded for their effort.

fancy_pantser 132 days ago |

Was Georgi ever approached by Meta? I wonder what they offered (I'm glad they didn't succeed, just morbid curiosity).

sbinnee 132 days ago |

I am happy for ggml team. They did so much work for quantization and actually made it available to everyone. Thank you.

sheepscreek 132 days ago |

Curious about the financials behind this deal. Did they close above what they raised? What’s in it for HuggingFace?

dhruv3006 132 days ago |

Huggingface is actually something thats driving good in the world. Good to see this collab/

jpcompartir 131 days ago |

This is great, brings clear benefits to both sides and the rest of us.

Always rooting for Hugging Face

karmasimida 132 days ago |

Does local AI have a future? The models are getting ridiculously big and any storage hardware is hoarded by few companies for next 2 years and nvidia has stopped making consumer GPU for this year.

It seems to me there is no chance local ML is going to be anywhere out of the toy status comparing to closed source ones in short term

rhdunn 132 days ago | |

Mistral have small variants (3B, 8B, 14B, etc.), as do others like IBM Granite and Qwen. Then there are finetunes based on these models, depending on your workflow/requirements.

karmasimida 132 days ago | | |

True, but anything remotely useful is 300B and above

dust42 132 days ago | |

I am actually doing now a good part of dev with Qwen3-Coder-Next on an M1 64GB with Qwen Code CLI (a fork of Gemini CLI). I very much like

  a) to have an idea how much tokens I use and 
  b) be independent of VC financed token machines and 
  c) I can use it on a plane/train

Also I never have to wait in a queue, nor will I be told to wait for a few hours. And I get many answers in a second.

I don't do full vibe coding with a dozen agents though. I read all the code it produces and guide it where necessary.

Last not least, at some point the VC funded party will be over and when this happens one better knows how to be highly efficient in AI token use.

ttoinou 132 days ago | | |

How much tokens per seconds are you getting ?

Whats the advantage of qwen code cli over opencode ?

moralestapia 132 days ago |

I hope Georgi gets a big fat check out of this, he deserves it 100%.

geooff_ 132 days ago |

As someone who's been in the "AI" space for a while its strange how Hugging Face went from one of the biggest name to not a part of the discussion at all.

r_lee 132 days ago | |

I think that's because there's less local AI usage now since there's all kinds of image models by the big labs, so there's really no rush of people self hosting stable diffusion etc anymore

the space moved from Consumer to Enterprise pretty fast due to models getting bigger

zozbot234 132 days ago | | |

Today's free models are not really bigger when you account for the use of MoE (with ever increasing sparsity, meaning a smaller fraction of active parameters), and better ways of managing KV caching. You can do useful things with very little RAM/VRAM, it just gets slower and slower the more you try to squeeze it where it doesn't quite belong. But that's not a problem if you're willing to wait for every answer.

segmondy 132 days ago | |

part of what discussion? anyone in the AI space knows and uses HF, but the public doesn't give a care and why should they? It's just an advanced site were nerds download AI stuff. HF is super valuable with their transformers library, their code, tutorials, smol-models, etc, but how does it translate to investor dollars?

LatencyKills 132 days ago | |

It isn't necessary to be part of the discussion if you are truly adding value (which HF continues to do). It's nice to see a company doing what it does best without constantly driving the hype train.

cyanydeez 132 days ago |

Is there a local webui that integrates with Hugging face?

Ollama and webui seem to rapidly lose their charm. Ollama now includes cloud apis which makes no sense as a local.

lukebechtel 132 days ago |

Thank you Georgi <3

periodjet 132 days ago |

Prediction: Amazon will end up buying HuggingFace. Screenshot this.

forty 132 days ago |

Looks like someone tried to type "Gmail" while drunk...

rkomorn 132 days ago | |

Looks like Gargamel of Smurfs fame to me.

rvz 132 days ago |

This acquisition is almost the same as the acquisition of Bun by Anthropic.

Both $0 revenue "companies", but have created software that is essential to the wider ecosystem and has mindshare value; Bun for Javascript and Ggml for AI models.

But of course the VCs needed an exit sooner or later. That was inevitable.

andsoitis 132 days ago | |

I believe ggml.ai was funded by angel investors, not VC.