Welcome to Waifu Labs v2: How Do AIs Create?

Welcome to Waifu Labs v2: How Do AIs Create?(waifulabs.com)

455 points by Cixelyn 4 years ago | 226 comments

liuru 4 years ago |

Hey HN, one of the team members here!

I hope you all enjoy playing with the new and improved generator! We've been hard at work improving the model quality since the last time the site was posted[1]

As both a professional fantasy illustrator & software engineer, I find the concept of AI creativity so fascinating. On one hand, I know that mathematically AI only can hallucinate images that fit within the distribution of things that it's seen. But from the artist perspective, the model's ability to blend two existing styles into something so distinctly new is so incredible (and not to mention also commercially useful!)

Anyways, happy to answer any question, thoughts, or concerns!

---

[1] https://news.ycombinator.com/item?id=20511459]

wodenokoto 4 years ago | |

Naïvely I thought Waifu generator was just “some guy having a laugh” fine-tuning a model off of hugging face, but reading through the comments here, it is obviously a much, much bigger enterprise.

Can you talk a little about team size, work process, funding and revenue stream? I think the effort required for such an undertaking is vastly underestimated by readers.

Cixelyn 4 years ago | | |

Right now it's a small team of 6 people, and we have a bit of funding + compute credits to train models. There's a bit of revenue from some past projects and AI-consulting, but we're mostly betting big on our new AI-powerd mobile title Arrowmancer[1].

> I think the effort required for such an undertaking is vastly underestimated by readers.

Haha for sure. Hosting a real-time ML model for people to do sub 1-second inferences at HN-load scale is definitely nontrivial.

[1] https://arrowmancer.com

hansel_der 4 years ago | | |

> Naïvely I thought Waifu generator was just “some guy having a laugh”

same here. what's naive about it?

not to badmouth the undertaking, but wtf is this doing on HN?

2bitencryption 4 years ago | |

Firstly, amazing work.

My question is, how do you figure out how to parameterize "Same character, different pose" / "Same character, different eyes" / "Same character, different gender" / etc?

My (super limited) understanding of GANs is that they slowly discover these features over time simply from observation in the data set, and not from any labels.

So how could you make e.x. a slider for head position, style, pose, etc? How do you look at the resulting model and figure out "these are the inputs we have to fiddle with to make it use a certain pose"?

You mention it a bit in this section, but I didn't fully understand: "By isolating the vectors that control certain features, we can create results like different pose, same character"

And I assume the same step needs to be done every time the model is retrained or fine-tuned, because possibly the vectors have shifted within the model since they are not fixed by design?

liuru 4 years ago | | |

Yes, your understanding is correct!

You can think of it like coordinates on a many-dimensional vector grid.

We craft the functions the functions that will illuminate sets of those points based on a combination of observation, what we know about our model architecture, and how our data is arranged.

And yes, when the model is retrained, we have to discover them again!

flor1s 4 years ago | | |

Not affiliated with this project, but there is a gazillion different variations of GANs. Most just change the adversarial loss to improve the learning rate / quality, but others focus on architectural changes, such as StarGAN, Pix2pix (conditional GAN), CycleGAN, MUNIT, etc. It's really a fascinating field.

thyrox 4 years ago | |

Roughly speaking how much money did you invest into making this? Just curious if this is something an indie hacker can hope to do one day OR do you need some deep pockets to make a site like this?

ridaj 4 years ago | |

Fascinating... Thanks for sharing

A couple questions:

1) I didn't really understand how you went about identifying what vectors of the latent space stand for various things, like pose or color. Did you train one of the AIs to that effect, or did you manually inspect a bunch of vectors, twiddling through them one by one, did to the outcome?

2) If one were to train an AI to the same level using commodity cloud services, what's the order of magnitude cost that you would pay for the training? More like $100, $1,000, $10,000 or $100,000?

liuru 4 years ago | | |

1) It was mostly manual, though AIs were useful in certain filtering tasks.

2) Depends on the quality you are seeking. If you only want one run of a similar, off-the-shelf model, around the 1000s is enough. But at the number of iterations you have to run to build your own and improve results, you probably need about 100k.

To tackle this problem, we built our own supercomputer off of parts we bought off of ebay, though I can't say I recommend that route, because it now lives in our living room.

dimgl 4 years ago | |

You mention it took two weeks to get to the point that we see in the article.

Does this mean two weeks of development, or two weeks to generate the images we're seeing? Or maybe did you train the model for two weeks? That point just wasn't exactly clear for me.

liuru 4 years ago | | |

2 weeks to train the model!

Development took on-and-off roughly 2 years to achieve the quality you see today.

kouteiheika 4 years ago | |

What are the terms of use for the images generated through your website? I'm guessing any commercial use is forbidden? It would be nice if you could formally spell it out on the website.

JetAlone 4 years ago | | |

I don't think there's any powerful enough way to stop people from generating one and then tracing over it to create their own linework, and customize things like the colouring and shading. The more broadly AI is able to create, the more niche and obfuscated directions human co-creators could take its products in.

kregasaurusrex 4 years ago | |

I purchased a waifu from your vending machine (loved the blog post!) at Gen Con in 2019, but can't see the saved model in my account. Is there a way for me to get a v2 generation?

liuru 4 years ago | | |

Welcome back!

We're currently working on the data migration from V1! As long as you are using the same email as you did in 2019, you'll be able to see the image again!

As for a V2 generation, sorry, because the models are different, you'll have to discover a similar image again, if you want a V2 version!

rackjack 4 years ago | |

I LOVE that "horror". Reminds me of some of the art I've seen on album/single covers. Any chance of letting people access that kind of intermediate step? (Though I know it's a niche as hell use case).

liuru 4 years ago | | |

Ah yes, the fine line between charming anime character and lovecraftian horror

There was such popular demand for these "horror" images that we made them part of the generation in V2! If you refresh enough on the webpage, you can find some horrors!

Cthulhu_ 4 years ago | |

With the game you're building, are the character portraits generated once and that's it, or do you plan on making them dynamic or frequently updated?

I've seen a number of mobile games that just get flooded with characters; this tool looks like it could be used to automate that process. It could be combined with AI-generated character profiles as well, creating an 'infinite' character roster in video games.

Terry_Roll 4 years ago | |

I wonder what an AI trained to spot deepfake Waifu's will detect.

In humans, things like the pupil can be the give away.

https://www.newscientist.com/article/2289815-ai-can-detect-a...

yccs27 4 years ago | | |

This is a super interesting question, given that the generator model is trained to fool the discriminator, which is also an AI.

oneoff786 4 years ago | | |

Highlight the pixels with high sharp values. Should be doable.

hypertele-Xii 4 years ago | |

Why do stuff like this never come down from the web? I'd pay for a program I could download and use with my own image files.

Gigachad 4 years ago | | |

They tend to require specific hardware like a NVIDIA GPU. As well as having an ever evolving large model file which they will want to frequently update. Some tools certainly have had offline versions but I guess not many people are interested in setting it all up and are happy with an instant web ui

liuru 4 years ago | | |

While our model is not public, there are good resources online for playing with your own images!

Like this one by fast.ai!

https://docs.fast.ai/vision.gan.html

Afforess 4 years ago | | |

Same reason the Coca-Cola recipe is not published nor made freely available by the Coca-Cola corporation.

zozbot234 4 years ago | | |

You just need to code up your own model architecture and then train it on your data using some established ML framework. The first step is where well-chosen priors can make a real difference wrt. your end results.

simonebrunozzi 4 years ago | |

So neat! Where are you based? Boston, I assume?

Is there an email to reach out to you or someone in the team? ($HNusername @ gmail)

Cixelyn 4 years ago | | |

San Francisco! Just sent over a ping!

GoblinSlayer 4 years ago | |

Would you try to create a new style? Train the discriminator on the score tag of danbooru dataset, then use it to rate the generator's style, this way it should be able to create a new style.

searchableguy 4 years ago | |

Do you plan to provide an API to generate waifu?

I think I could use this for a project.

liuru 4 years ago | | |

In the future, perhaps! This is a popular request, so we are thinking about ways we can do this.

YeGoblynQueenne 4 years ago | |

Hello and thank you for answering questions. The following is a quote from your article:

>> It is interesting to note that from this process, the AI is not merely learning to copy the works it has seen, but forming high-level (shapes) and low-level (texture) features for constructing original pictures in its own mental representation.

Can you explain what you mean by "mental" representation? Does your system have a mind?

Also, why are you calling it "an AI"? Is it because you think it is an artificial intelligence, say like the robots in science fiction movies? Is it capable of anything else than generating images?

xg15 4 years ago | | |

Not OP, but I wonder if the process would be in some way comparable to rigging a 3D model. There is well, you usually have some high-level input parameters, which influence joints on a predefined skeleton, which in turn determines the position of individual vertices in the 3D body. Finally, the 3D shape is used to render the actual pixels.

On each step, high-level parameters are combined with predefined weights to produce a more low-level output.

Seems, a similar transformation is going on here, except that the weights and the structure are somehow learned on its own.

CuriousCosmic 4 years ago | |

Something I was wondering but couldn't find on the site: What is the license for the generated works through the project?

tedmcory77 4 years ago | |

Who would someone speak with about licensing things made using waifu? My email contact is in my profile...

darkengine 4 years ago | |

Is the code or any of the models available to the public? I'd love to mess with this on a local GPU cluster.

liuru 4 years ago | | |

Not at the moment! A similar project that I really admire is public, though!

https://www.thiswaifudoesnotexist.net/

unobatbayar 4 years ago | |

The quality and style is mindblowing! What data did you train?

liuru 4 years ago | | |

The first iteration of our model was built off of this amazing public dataset:

https://www.gwern.net/Danbooru2020

Though now we have made our own :)

lynzrand 4 years ago | | |

As a rough guess, I think it might be trained on the Danbooru archive dataset, since it's the largest anime picture dataset we can get today.

https://www.gwern.net/Danbooru2020

Bombthecat 4 years ago | | |

The more interesting question: What about the sources for the AI to train? How are those artists paid? Do we need to pay them? Or if it used by an AI as train data, we just say: Its like a human learning?

ve55 4 years ago |

Links to related projects in anime art generation for those interested:

Waifu Labs v2, referenced in this post (generate amazing custom anime face images): https://waifulabs.com (write-up is the above link: https://waifulabs.com/blog/ai-creativity)

This Anime Does Not Exist (AI-generated anime-style artwork): https://thisanimedoesnotexist.ai (write-up https://www.gwern.net/Faces#extended-stylegan2-danbooru2019-... and https://nearcyan.com/this-anime-does-not-exist)

This Waifu Does Not Exist (AI-generated anime-style faces): https://thiswaifudoesnotexist.net (write-up: https://www.gwern.net/Faces#twdne)

There's also a lot of literate on e.g. automatic manga coloration, auto-translation, image superresolution, anime frame interpolation, and much more. Worth checking out some places like https://old.reddit.com/r/AnimeResearch/ if you're interested!

numpad0 4 years ago | |

This one too https://twitter.com/t_takasaka/status/1479784432697749513

the8472 4 years ago | |

Note that TADNE, being based on newer models, generates full images rather than only faces.

Gigachad 4 years ago |

This is extremely impressive. It’s the first GAN I have seen which lets you tweak the result in a meaningful way rather than being just random.

I think the speed that GANs have come in to the world has really shaken people up and it’s hard to process what this all means and what it will result in. Especially the ones which generate based on real people.

But the feeling this gives me, is what happens for the future of art. Sure, this example is no where even close to replacing real artists, but it’s already generating images better than I can draw after a year of practice. It does give me a feeling of “what is the point”. Which might be an irrational feeling, but I’m sure others feel the same.

bmitc 4 years ago |

This isn't really creativity is it? I like to call stuff like this statistical copying, and indeed, the linked Wikipedia article on GANs says:

> Given a training set, this technique learns to generate new data with the same statistics as the training set.

There isn't a creative process here nor any creative introspection going on. While the technical results are impressive, this article does not address creativity even superficially, and just slaps the label on. There isn't any AI either. It's machine learning, i.e., statistical models and algorithms.

deft 4 years ago |

I made a waifu, then I pressed save. It wanted me to signup so I pressed back so I could just rightclick saveas but... I lost my waifu forever now :(

jimmygrapes 4 years ago | |

This is like the condensed story of humanity

keewee7 4 years ago | |

This is why we need local self-hosted AI. Keep the waifus safe.

ASalazarMX 4 years ago | |

I mourn for your lost perfect waifu, but if it helps, your comment likely saved other waifus because we clicked [Download] instead of [Save].

GoblinSlayer 4 years ago | |

Waifu is kept in memory, normies.

twic 4 years ago | |

It's okay, Step 43636 will come to console you in your dreams.

tsukikage 4 years ago | |

"Keep precious things inside you, or you will lose them"

jcun4128 4 years ago | |

Could have been an NFT

satronaut 4 years ago | |

comment of the year, and it's not even friday

tasha0663 4 years ago |

> Step 43636: During this phase, the training gets unstable at times, so we have snapshots of occasional horrors like this.

Ah, make that three things the public shouldn't see being made: sausage, legislation, and waifus.

jakey_bakey 4 years ago |

I wish I had the stones to call my company Waifu Labs

SavantIdiot 4 years ago | |

I didn't know what you meant so I googled it. The technology is cool, but the content is ... problematic.

From urban dictionary:

   "Waifu" is used to refer to a fictional girl or woman (usually in Anime, Manga, or video-games) that you have sexual attraction to, and you would even marry.

Huh.

harpersealtako 4 years ago | | |

A big part of it you're missing is that it's a joke. Anime fans definitely know they're weird, and are very passionate about the things they like (and are conscious of that), and thus a lot of humor in the community is self-deprecating and ironic (for example, calling your favorite fictional character your "waifu" or "husbando"). The fact that outside observers might think it's "problematic" is kind of the whole point.

kadokaelan 4 years ago | | |

youd be surprised at the waifu market size

mensetmanusman 4 years ago |

I can’t wait for this technology to come to video.

Imagine a future where people can compile written scripts into Hollywood quality movies.

thewarrior 4 years ago | |

It's already begun https://twitter.com/somnai_dreams/status/1477411531037937664...

FeepingCreature 4 years ago | | |

I wonder what will happen when somebody combines a GAN with a feature recognizing network like the Tesla cars use, so it can use its own extrapolated map of the surroundings to stabilize its output as the camera moves around.

JetAlone 4 years ago | |

I think it will have its limits, but the possibilities for editing together, supplementing and modifying products from smaller AI modules should stretch out what you can do on a small budget.

kadokaelan 4 years ago |

The explainer video about gans is top-notch! Excited for Arrowmancer!

echelon 4 years ago |

This is mind-blowingly good. You keep pushing the state of the art further to the point of broad applicability. It won't be long until everyone can be an artist without putting in the ten thousand hours of drudgery of training their muscles, hand-eye coordination, structure of shape and perspective, etc. I can't wait!

Do you have a team page? How many of you are there? Do you work with gwern and nearcyan? Are you going to raise for this? (You should totally scale this!)

Great work, and keep it up!

Cixelyn 4 years ago | |

No team page yet! Friends with near, but haven't had a chance to meet gwern yet (maybe one day??)

Arrowmancer is really our first attempt at scaling it up; hoping to do even cooler generative AI-related in that production.

echelon 4 years ago | | |

All the best! Super excited!

Kuinox 4 years ago |

It's funny how people complained about github copilot 'stealing' people code, but nobody here complained about this AI 'stealing' artworks.

Don't get me wrong, I have nothing against this, but I think we should start discusing morality of AI generated content, even if it doesn't train on existing artworks/code.

syntheweave 4 years ago |

The image quality is good, but now I realize I'm experiencing "uncanny waifu". Authentic character designs bear two things in common:

1. Simplifications of reality(the actual artist training method would be traditional studies off life and photo reference followed by gradual reduction and symbolization to a style)

2. Symbolic meaning. Things like the style of eyes, clothing, etc are all meant to signal personality. This is stuff that current AI techniques don't really touch upon in any direct sense.

Since the ML method is built on interpolating off final results, it's going to lack in these qualities and produce something that is consistently an "average impression". Akin to asking the algorithm to generate mythical heroes by mashing up the various stories: you get a hero that is somehow the average of Icarus, Heracles and Achilles, which would be less of a character than the originals.

someone7x 4 years ago | |

Could it work backwards? Eg, to take a hero like Heracles and determine he's say 40% shared with Gilgamesh. Then we might see that even the originals aren't very unique.

Just a thought, I don't really know anything about ML.

w_t_payne 4 years ago |

I found the most interesting part was the evocative comment about the 'vast and parched' nature of the latent space.

I wonder if the OP's intuition regarding the sparseness of the latent space, and the relatively small area occupied by the 'useful' manifold? embedded within it provide us any clues as to what symbol grounding might look like for some neuro-symbolic infrastructure that sits atop that latent space.

I.e. how should we be trying to represent concepts like 'male' and 'female' within that space?

Is it important to have these concepts represented as a low dimensional manifold?

Is it important that this manifold be easily described by some simple geometric form like a convex polytope?

Is it important that nuances and variations on the concept be separable within the bounds of the concept-specific manifold?

What other properties might be important?

aimor 4 years ago |

What methods are there to estimate how many unique characters a model can generate? The answer is not infinitely many, but determining when two images are of different 'characters' is fuzzy.

gwern 4 years ago | |

It's hard to say, but I think a useful measure would be to look at mode-dropping compared to the training data. Whatever the 'number of unique characters is', it clearly ought to be at least as large as the characters you see in the original training data, right?

For TADNE, Arfafax ran Danbooru2019 and a few million TADNE samples through CLIP to get the image embeddings, and clustered them; when the two sets of clusters were graphed using tsne, you could see that the TADNE StyleGAN2-ext did a lot of mode-dropping in that many smaller outlying clusters of characters/franchises/topics simply did not appear in TADNE samples. The TADNE looked like a big galaxy, while Danbooru2019 looked more like it was surrounded by archipelagos. TADNE was extensively trained on them and was a very large model, but the GAN dynamics & StyleGAN architecture mean it didn't do a good job absorbing rarer/more idiosyncratic Danbooru2019 image-clusters.

I expect newer generative models which avoid GAN losses and which use more flexible (but expensive!) architectures, like DALL-E, would perform much better in terms of mode-dropping, so you'd see a lot more unique characters/images out of them. (I'm very excited about them. As good as TADNE or Waifu Labs v2 may be, I think they are still far behind what could be done with just existing data/arch/compute.)

dirtyid 4 years ago |

I hope these generators expands into non waifu / pretty boy anime depictions. There's a lot anime gaijin faces out there to explore.

aniforprez 4 years ago | |

I would like to see how it generates late-80s/early-90s style features. The current pool of anime art styles are very generic (aside stylistic outliers) and I'd love to see Cowboy Bebop/Akira/Bubblegum Crisis type character designs

echelon 4 years ago | | |

Miyazaki's style of character design and fantasy would be amazing.

liuru 4 years ago | |

This one has some rough-looking ones! They're a bit rarer than most.

dmix 4 years ago |

So putting some Anime digital illustrators out of work? ....or I could see many simply use them and pretend they did it themselves?

Obvioisly there will be plenty of illustrators doing custom work that these can't (yet) replicate.

Also good for those countless anime avatar'd Twitter users.

Tade0 4 years ago |

Amazing work and progress. The previous version appears almost toyish in comparison.

FYI uBlock Origin complains about the registration link, because it on "Peter Lowe’s Ad and tracking server list".

pgl 4 years ago | |

That's because it bounces the registration link through a tracker.

If you're OK with being tracked, you can permanently allow that domain.

jcun4128 4 years ago |

I would be interested to find one I like and use it for a desktop companion project, not an original idea but I am not an artist ha.

spywaregorilla 4 years ago |

Is there any way to approximate embeddings for a novel image?

novel meaning user provided, not generated by the model or in the training set.

liuru 4 years ago | |

Not at the moment in our tool, though this is an area of great curiosity and research for us!

spywaregorilla 4 years ago | | |

Does the discriminator model translate the images into an embedding space of it's own? Could such a space be used to generate images themselves?

7373737373 4 years ago |

Have you considered applying similar models to VR avatar creation? That's a market in itself

akomtu 4 years ago |

This tool has many applications, and those that will make you rich isn't about anime.

sandos 4 years ago | |

Oh, I see what you are saying....

More things for, like, adults?

scollet 4 years ago | |

It needs to make money?

peoplefromibiza 4 years ago | |

step 1: generate random waifu

step 2: NFT all the things

step 3: profit

step 4: GOTO step 1

step 5: automate steps 1 to 4

ausbah 4 years ago |

any good info on manipulating the "control vectors" in the latent space?

easrng 4 years ago |

Cool, it's like an AI-generated picrew

rambojohnson 4 years ago |

project name is kinda incel cringe :(

ps91 4 years ago |

Amazing now I can create my own anime girl!

KaoruAoiShiho 4 years ago |

I don't want to go all SJW on you guys, amazing work, but can you try to make sure there's an inclusive array of starting faces please? Talking about things like skin tones, thanks!

liuru 4 years ago | |

Indeed, we spent 2 years working on this!!!

It's an extremely hard research problem, because darker skin tones account for only about 0.3% of all anime art produced in the world.

We have employed an absolutely exhaustive array of art and data science tricks to give the model the ability to draw darker skin tones, though they are underrepresented. The results that you see today are the culmination of many months of careful tuning!

It's not definitely perfect, but from a data science perspective, this situation can't be rectified until the art world makes a shift.

Personally, I hope that more art representing dark skin tones will be created in the world!

numpad0 4 years ago | | |

I don't quite understand the need to mandate racial caricaturizations in every modes of communication. I would find it upsetting if every pieces of texts and quotes were prefaced with an "origin" indicator to aid forming prejudices. Why have it in manga?

pfisherman 4 years ago | | |

I can’t understand why the GP comment is flagged. If you can look past all the “culture war” stuff, this is pointing out some of the limits of algorithmic creativity.

It does not do well generating instances with features that are not well represented in the training dataset.

Compare this to human creativity. I suspect that fulfilling GPs request would be almost trivial for a human professional artist.

To be clear this is an amazing achievement, a creative use of the technology, and a positive contribution to the world. Pointing out limitations (i.e. areas with potential for future innovation) does not diminish it.

KaoruAoiShiho 4 years ago | | |

Thanks, happy to hear that you guys are on top of it.

greenn 4 years ago | |

Honestly asking, isn't this stuff usually Japanese characters? I dunno if theres even "data" they could use for other skin tones

numpad0 4 years ago | | |

Anything irrelevant to plot or story building, such as nationality, are usually left unspecified. Readers often project their own identity into characters, and some people seem to find it odd that they do not encounter traits that differ from their own.

Ethnicity is sometimes incorporated, that is, some distinctions would be necessary if there was a documentary manga about a match in an Olympic Games played by teams from multiple parts of the world, and in that case an American players might be given smaller eyes or extra wrinkles in face, or African players might be colored darker than other characters, Chinese players could be drawn with slightly different shapes of chins, etc.

But the default is unspecified or an averaged, most simplified shapes and forms that the author uses in their own cognition.

numlock86 4 years ago | | |

Well, there is (or rather was) "ganguro" at least, which appeared in mangas an animes occasionally, too. Not sure how many people would be getting offended by that these days, though.

userbinator 4 years ago | | |

On the other hand, you get a full spectrum of eye and hair colour.

newobj 4 years ago | |

Anime has a pretty severe "representation of blackness" problem (https://www.youtube.com/watch?v=hi2_S6kBgIg is one video discussing this). I'm afraid to imagine what a model trained on that source content would generate.