Stable Diffusion with Core ML on Apple Silicon

Stable Diffusion with Core ML on Apple Silicon(machinelearning.apple.com)

723 points by 2bit 3 years ago | 178 comments

sorenjan 3 years ago |

How come you always have to install some version of pytorch or tensor flow to run these ml models? When I'm only doing inference shouldn't there be easier ways of doing that, with automatic hardware selection etc. Why aren't models distributed in a standard format like onnx, and inference on different platforms solved once per platform?

GeekyBear 3 years ago | |

>How come you always have to install some version of pytorch or tensor flow to run these ml models?

The repo is aimed at developers and has two parts. The first adapts the ML model to run on Apple Silicon (CPU, GPU, Neural Engine), and the second allows you to easily add Stable Diffusion functionality to your own app.

If you just want an end user app, those already exist, but now it will be easier to make ones that take advantage of Apple's dedicated ML hardware as well as the CPU and GPU.

>This repository comprises:

    python_coreml_stable_diffusion, a Python package for converting PyTorch models to Core ML format and performing image generation with Hugging Face diffusers in Python

    StableDiffusion, a Swift package that developers can add to their Xcode projects as a dependency to deploy image generation capabilities in their apps. The Swift package relies on the Core ML model files generated by python_coreml_stable_diffusion

https://github.com/apple/ml-stable-diffusion

m3at 3 years ago | |

That's done in professional contexts, when you only care about inference onnxruntime does the job well (including for coreml [1]).

I imagine that here apple wants to highlight a more research/interactive use, for example to allow fine tuning SD on a few samples from a particular domain (a popular customization).

[1] https://onnxruntime.ai/docs/execution-providers/CoreML-Execu...

jeroenhd 3 years ago | |

Most models seem to be distributed by/for researchers and industry professionals. Stable Diffusion is state of the art technology, for example.

People who can't get the models to work by themselves given the source code aren't the target audience. There are other projects, though, that do distribute quick and easy scripts and tools to run these models.

Apple stepping in to get Stable Diffusion working on their platform is probably an attempt to get people to take their ML hardware more seriously. I read this more like "look, ma, no CUDA!" than "Mac users can easily use SD now". This module seemed to be designed so that the upstream SD code can easily be ported back to macOS without special tricks.

LoganDark 3 years ago | |

Seconded, I wish for a way to work with ML models using native code rather than through some Python scripting interface. I believe TensorFlow is there with C++, but it works only with C++ and not through FFI.

c7DJTLrn 3 years ago | | |

It would increase my interest in experimenting with these models 1000% at the least. I really can't be bothered to spend hours fucking around with pip/pipenv/poetry/virtualenv/anaconda/god knows what other flavour of the month package manager is in use. I just want to clone it and run it, like a Go project. I don't want to download some files from a random website and move them into a special directory in the repo only created after running a script with special flags or some bullshit. I want to clone and run.

ggerganov 3 years ago | | |

It's one of the reasons I recently ported the Whisper model to plain C/C++. You just clone the repo, run `make [model]` and you are ready to go. No Python, no frameworks, no packages - plain and simple.

https://github.com/ggerganov/whisper.cpp

microtonal 3 years ago | | |

PyTorch has libtorch as its purely native library. There are also Rust bindings for libtorch:

https://github.com/LaurentMazare/tch-rs

I used this in the past to make a transformer-based syntax annotator. Fully in Rust, no Python required:

https://github.com/tensordot/syntaxdot

0x008 3 years ago | | |

If you are okay with using nvidia-ecosystem, check out tensor rt.

zitterbewegung 3 years ago | |

Apple has their own mlmodel format but they can’t distribute this model as a direct download due to the models EULA. The first task is to translate the model.

EMIRELADERO 3 years ago | | |

What part of the SD license prohibits that?

0x008 3 years ago | |

In the professional context (apart of individual apps distributed by small creators / indiehackers) usually models are run using standardized runtimes in native code (C++ usually), using runtimes TensorRT (for Nvidia Devices), onnxruntime (agnostic), etc.

pmarreck 3 years ago | |

DiffusionBee is an app that is completely self-contained and lets you play with this stuff completely trivially, no installs required.

https://diffusionbee.com/

janandonly 3 years ago | | |

But it's not optimised to work with Apple's CoreML (yet), isn't it?

kuwoze 3 years ago | |

If you want it and it doesn't exist, why not simply do it yourself? It's open source no?

tosh 3 years ago |

Atila from Apple on the expected performance:

> For distilled StableDiffusion 2 which requires 1 to 4 iterations instead of 50, the same M2 device should generate an image in <<1 second

https://twitter.com/atiorh/status/1598399408160342039

syspec 3 years ago |

There's also https://draw.nnc.ai/ - which is an iOS / iPad app running Stable Diffusion.

The author has a detailed blogpost outlining how he modified the model to use Metal on iOS devices. https://liuliu.me/eyes/stretch-iphone-to-its-limit-a-2gib-mo...

antal 3 years ago | |

Yeah, that's what immediately came to mind for me as well. I don't know how similar/different the two solutions are, but it made me smile a bit that what Apple is showing off here has been already done by a single independent developer :)

cloogshicer 3 years ago |

I think it's sad that Apple doesn't even give attribution to any of the authors. If you copy the Bibtex from this site, the Author field is just empty. Their names are also not mentioned anywhere on this site.

This site is purely a marketing effort.

ubercow13 3 years ago | |

This is about an update to macOS and iOS. Are the 'authors' of macOS updates normally credited? Authors are credited on other papers published on this site that aren't just about OS updates.

MichaelZuo 3 years ago | |

Is it standard for Apple to attribute authors in the Bibtex? Or do they usually leave it empty?

rvz 3 years ago | |

> I think it's sad that Apple doesn't even give attribution to any of the authors.

Pretty much like Stable Diffusion and the grifters using it in general and they will never credit the artists and images that they stole to generate these images.

astrange 3 years ago | | |

This is sort of like if you learned English from reading a book and the author said they owned all your English sentences after that.

Of course you can see the original images (https://rom1504.github.io/clip-retrieval/), it was legal to collect them (they used robots.txt for consent just like Google Image Search) and it was legal to do this with them (but not using US legal principles since it's made in Germany).

"Crediting the artist" isn't a legal principle - it's more like some kind of social media standard which is enforced by random amateur artists yelling at you if you don't do it. It's both impossible (there are no original artists for a given output) and wouldn't do anything to help the main social issue (future artists having their jobs taken by AIs).

ClumsyPilot 3 years ago | | |

Do your point is that Apple and those grifters are equally reputable?

two wrongs don't make a right.

neonate 3 years ago |

https://github.com/apple/ml-stable-diffusion

christiangenco 3 years ago | |

Oh gosh that's an intimidating installation process. I'll be much more interested when I can just `brew install` a binary.

artimaeis 3 years ago | | |

A bit different take is DiffusionBee, if you're curious to try it out in a GUI form.

https://diffusionbee.com

artdigital 3 years ago | | |

Let's give it a few days and someone will have something semi-automatic ready

gedy 3 years ago | | |

> Oh gosh that's an intimidating installation process

I'm not seeing any installation instructions on either link - what am I missing?

thepasswordis 3 years ago | | |

Where are you seeing the installation process?

MuffinFlavored 3 years ago | | |

I could be wrong but I think part of the issue is this needs some large files for the trained dataset?

wilsongoode 3 years ago |

I’ve been using InvokeAI: https://github.com/invoke-ai/InvokeAI

Great support for M1, basically since the beginning. The install is painless.

Release video for InvokeAI 2.2: https://www.youtube.com/watch?v=hIYBfDtKaus

mark_l_watson 3 years ago |

Great stuff. I like that they give directions for both Swift and Python

This gets you text descriptions to images.

I have seen models that given a picture, then generate similar pictures. I want this because while I have many pictures of my grandmothers, I only have a couple of pictures of my grandfathers and it would be nice to generate a few more.

Core ML is so well done. A year ago I wrote a book on Swift AI and used Core ML in several examples.

astrange 3 years ago | |

That’s DreamBooth. There are some services that will do it for you.

mark_l_watson 3 years ago | | |

Thanks!

zimpenfish 3 years ago |

Man, this takes a ton of room to do the CoreML conversions - ran out of space doing the unet conversion even though I started with 25GB free. Going on a delete spree to get it up to 50GB free before trying again.

password4321 3 years ago | |

All hail Grand Perspective back in the day, not sure who is carrying the "what's wasting my disk space" torch for free these days.

Edit: still alive! https://grandperspectiv.sourceforge.net/

zimpenfish 3 years ago | | |

I suspect it was virtual memory - the CoreML conversion progress was at 32Gi at one point and there's only 16GB in this laptop. That would explain why it was consuming 30Gi+ of disk space when the output CoreML models only totalled 2.5Gi.

jtbayly 3 years ago | | |

Just used this again on 3 different computers, including mine. Works fantastically still.

Found a >100GB accidental “livestream” recording on one computer. Would have taken forever to find what was taking up all the room otherwise.

peddling-brink 3 years ago | | |

ncdu is the best in my book. TUI, supports deletion of files and folders, and very simple to understand.

GUI apps for this task like GP and the like are more visually complex than they need to be.

pyinstallwoes 3 years ago | |

How much space do you have and how much do you try to keep free? I get freaked out if I have less than 400gb free.

zimpenfish 3 years ago | | |

    /dev/disk3s5  926Gi  857Gi   52Gi    95% 8067489 540828800    1%   /System/Volumes/Data

It normally hovers around 30-35Gi free.

darkteflon 3 years ago |

For the uninitiated, which MacOS GUI app is this library most likely to show up in first/best? DiffusionBee?

pksebben 3 years ago | |

automatic111's webui typically gets the most frequent updates. Middling easy to install.

darkteflon 3 years ago | | |

Great, thank you. Look like there’s already a GH issue: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issu...

tamersalama 3 years ago |

I can't get fine-tune the model ron Apple Silicon due to PyTorch supportability issues. I don't have high-hopes it will be supported.

https://github.com/pytorch/pytorch/issues/77794

https://github.com/pytorch/pytorch/issues/77764

pkage 3 years ago |

How does this compare with using the Hugging Face `diffusers` package with MPS acceleration through PyTorch Nightly? I was under the impression that that used CoreML under the hood as well to convert the models so they ran on the Neural Engine.

liuliu 3 years ago | |

It doesn't. MPS largely is on GPU. PyTorch's MPS implementation is incomplete a few weeks ago as well. This is about 3x faster.

wincy 3 years ago | | |

Is it? I just ran it on my M1 MacBook Air and am getting 3 it/sec, same as I was using Stable Diffusion for M1. Maybe I'm doing something wrong?

joss82 3 years ago |

Would it be possible to run 2 SD instances in parallel on a single M1/M2 chip?

One on the GPU and another on the ML core?

noduerme 3 years ago |

Can anyone explain in relatively lay terms how Apple's neural cores differ from a GPU? If they can run stable diffusion so much faster, which normally runs on a GPU, why aren't they used to run shaders for AAA games?

Synaesthesia 3 years ago | |

They're designed to run ML specific functions like matrix multiply and stuff. Nvidia has a similar idea in "tensor cores". I think because they're low but operations like 8 or 16 bit which is faster but too low res for GPU work.

behnamoh 3 years ago |

This may sound naive, but what are some use cases of running SD models locally? If the free/cheap options exist (like running SD on powerful servers), then what's the advantage of this new method?

sofaygo 3 years ago | |

> There are a number of reasons why on-device deployment of Stable Diffusion in an app is preferable to a server-based approach. First, the privacy of the end user is protected because any data the user provided as input to the model stays on the user's device. Second, after initial download, users don’t require an internet connection to use the model. Finally, locally deploying this model enables developers to reduce or eliminate their server-related costs.

huggingmouth 3 years ago | | |

Stability! The main reason why I use it locally is because I don't want some random dev unilaterally deciding to change or "sunsetting" features I rely on.

Centralized services small and large are guilty of this and I'm sick of it.

yazaddaruvala 3 years ago | |

"Hey Siri, draw me a purple duck" and it all happens without an internet connection!

If you mean monetary usecases: Roughly something like Photoshop/Blender/UnrealEngine with ML plugins that are low latency, private, and $0 server hosting costs.

dustedcodes 3 years ago |

What are some good resources to get into working with this and learning the basics around ML to get some fundamental understanding of how this works?

videlov 3 years ago | |

I found the blog posts by Jay Alammar to be particularly good. Here are my starting suggestions (in this order) — https://jalammar.github.io/illustrated-word2vec/ https://jalammar.github.io/illustrated-transformer/ https://jalammar.github.io/illustrated-bert/ https://jalammar.github.io/illustrated-stable-diffusion/

siraben 3 years ago |

While running locally on an M1 Pro is nice, recently I've switched over to a Runpod[0] instance running Stable Diffusion instead. The main reasons being high workloads placed on the laptop degrade the battery faster and it takes ~40s to render a single image. On an A5000 it takes mere seconds to do 40 steps. The cost is around $0.2/hr.

[0] https://runpod.io

Joe_Boogz 3 years ago | |

can't the battery problem be mitigated if you plug in your Macbook while running Stable Diffusion?

siraben 3 years ago | | |

The laptop body still heats up and over long periods of time this can degrade the battery, I’ve measured a sharp drop in capacity from the device itself.

personjerry 3 years ago |

Can't wait to see this integrated into automatic1111 so I can use it as a normie

calrizien 3 years ago |

Where is the community for this project?

tomr75 3 years ago |

anyone know how to link this to a GUI?

wellthisisgreat 3 years ago |

Macbook Air M1 / 16GB RAM took 3.56 to generate an image, this is pretty wild

zimpenfish 3 years ago | |

> 3.56 to generate an image

3.56 seconds?

wellthisisgreat 3 years ago | | |

ah 3.56 minutes, my mistake

Viluskaran 3 years ago |

8 gb ram

Synaesthesia 3 years ago | |

What about it?