Welcome to Waifu Labs v2: How Do AIs Create?(waifulabs.com) |
Welcome to Waifu Labs v2: How Do AIs Create?(waifulabs.com) |
It's an extremely hard research problem, because darker skin tones account for only about 0.3% of all anime art produced in the world.
We have employed an absolutely exhaustive array of art and data science tricks to give the model the ability to draw darker skin tones, though they are underrepresented. The results that you see today are the culmination of many months of careful tuning!
It's not definitely perfect, but from a data science perspective, this situation can't be rectified until the art world makes a shift.
Personally, I hope that more art representing dark skin tones will be created in the world!
It does not do well generating instances with features that are not well represented in the training dataset.
Compare this to human creativity. I suspect that fulfilling GPs request would be almost trivial for a human professional artist.
To be clear this is an amazing achievement, a creative use of the technology, and a positive contribution to the world. Pointing out limitations (i.e. areas with potential for future innovation) does not diminish it.
Ethnicity is sometimes incorporated, that is, some distinctions would be necessary if there was a documentary manga about a match in an Olympic Games played by teams from multiple parts of the world, and in that case an American players might be given smaller eyes or extra wrinkles in face, or African players might be colored darker than other characters, Chinese players could be drawn with slightly different shapes of chins, etc.
But the default is unspecified or an averaged, most simplified shapes and forms that the author uses in their own cognition.
No one here is mandating shit, GP asked a friendly question in a most respectful manner, which prompted an informative answer from OP even. If such interaction generates such an allergic reaction in you, the problem is not with the grandparent comment.
I really think this integration predominantly means systemic internalization of racism, with side effect of affirmative actions. When ethnicity is expressed with intent, a distinction is made, and distinction is synonymous to discrimination. Or classification, for that matter.
Yes. What's your point?
> [...] and not meant at all to represent darker-skinned races.
No one ever claimed that. The question to my post was simply where to possibly get data from for "darker skin tones". Ganguro seems like an option because ... well, darker skin tones? Care to elaborate why you bring up "representation of darker skinned races" up in this context?
If we understand the ancestor comments as asking for more diverse representation, then no, because ganguro and many other anime archetypes and their clothing and accessories are ultimately Japonicentric, in the same way that you might consider angels and demons contrasted against each other post-antiquity Eurocentric.
My opinion on the matter is that it's not necessarily a worthy goal, but there is a more distinct difference to representing people of different races even in anime faces than just skin tone.
I hope you all enjoy playing with the new and improved generator! We've been hard at work improving the model quality since the last time the site was posted[1]
As both a professional fantasy illustrator & software engineer, I find the concept of AI creativity so fascinating. On one hand, I know that mathematically AI only can hallucinate images that fit within the distribution of things that it's seen. But from the artist perspective, the model's ability to blend two existing styles into something so distinctly new is so incredible (and not to mention also commercially useful!)
Anyways, happy to answer any question, thoughts, or concerns!
---
Can you talk a little about team size, work process, funding and revenue stream? I think the effort required for such an undertaking is vastly underestimated by readers.
> I think the effort required for such an undertaking is vastly underestimated by readers.
Haha for sure. Hosting a real-time ML model for people to do sub 1-second inferences at HN-load scale is definitely nontrivial.
same here. what's naive about it?
not to badmouth the undertaking, but wtf is this doing on HN?
My question is, how do you figure out how to parameterize "Same character, different pose" / "Same character, different eyes" / "Same character, different gender" / etc?
My (super limited) understanding of GANs is that they slowly discover these features over time simply from observation in the data set, and not from any labels.
So how could you make e.x. a slider for head position, style, pose, etc? How do you look at the resulting model and figure out "these are the inputs we have to fiddle with to make it use a certain pose"?
You mention it a bit in this section, but I didn't fully understand: "By isolating the vectors that control certain features, we can create results like different pose, same character"
And I assume the same step needs to be done every time the model is retrained or fine-tuned, because possibly the vectors have shifted within the model since they are not fixed by design?
You can think of it like coordinates on a many-dimensional vector grid.
We craft the functions the functions that will illuminate sets of those points based on a combination of observation, what we know about our model architecture, and how our data is arranged.
And yes, when the model is retrained, we have to discover them again!
A couple questions:
1) I didn't really understand how you went about identifying what vectors of the latent space stand for various things, like pose or color. Did you train one of the AIs to that effect, or did you manually inspect a bunch of vectors, twiddling through them one by one, did to the outcome?
2) If one were to train an AI to the same level using commodity cloud services, what's the order of magnitude cost that you would pay for the training? More like $100, $1,000, $10,000 or $100,000?
2) Depends on the quality you are seeking. If you only want one run of a similar, off-the-shelf model, around the 1000s is enough. But at the number of iterations you have to run to build your own and improve results, you probably need about 100k.
To tackle this problem, we built our own supercomputer off of parts we bought off of ebay, though I can't say I recommend that route, because it now lives in our living room.
Does this mean two weeks of development, or two weeks to generate the images we're seeing? Or maybe did you train the model for two weeks? That point just wasn't exactly clear for me.
Development took on-and-off roughly 2 years to achieve the quality you see today.
We're currently working on the data migration from V1! As long as you are using the same email as you did in 2019, you'll be able to see the image again!
As for a V2 generation, sorry, because the models are different, you'll have to discover a similar image again, if you want a V2 version!
There was such popular demand for these "horror" images that we made them part of the generation in V2! If you refresh enough on the webpage, you can find some horrors!
I've seen a number of mobile games that just get flooded with characters; this tool looks like it could be used to automate that process. It could be combined with AI-generated character profiles as well, creating an 'infinite' character roster in video games.
In humans, things like the pupil can be the give away.
https://www.newscientist.com/article/2289815-ai-can-detect-a...
Like this one by fast.ai!
Is there an email to reach out to you or someone in the team? ($HNusername @ gmail)
I think I could use this for a project.
>> It is interesting to note that from this process, the AI is not merely learning to copy the works it has seen, but forming high-level (shapes) and low-level (texture) features for constructing original pictures in its own mental representation.
Can you explain what you mean by "mental" representation? Does your system have a mind?
Also, why are you calling it "an AI"? Is it because you think it is an artificial intelligence, say like the robots in science fiction movies? Is it capable of anything else than generating images?
On each step, high-level parameters are combined with predefined weights to produce a more low-level output.
Seems, a similar transformation is going on here, except that the weights and the structure are somehow learned on its own.
https://www.gwern.net/Danbooru2020
Though now we have made our own :)
Waifu Labs v2, referenced in this post (generate amazing custom anime face images): https://waifulabs.com (write-up is the above link: https://waifulabs.com/blog/ai-creativity)
This Anime Does Not Exist (AI-generated anime-style artwork): https://thisanimedoesnotexist.ai (write-up https://www.gwern.net/Faces#extended-stylegan2-danbooru2019-... and https://nearcyan.com/this-anime-does-not-exist)
This Waifu Does Not Exist (AI-generated anime-style faces): https://thiswaifudoesnotexist.net (write-up: https://www.gwern.net/Faces#twdne)
There's also a lot of literate on e.g. automatic manga coloration, auto-translation, image superresolution, anime frame interpolation, and much more. Worth checking out some places like https://old.reddit.com/r/AnimeResearch/ if you're interested!
I think the speed that GANs have come in to the world has really shaken people up and it’s hard to process what this all means and what it will result in. Especially the ones which generate based on real people.
But the feeling this gives me, is what happens for the future of art. Sure, this example is no where even close to replacing real artists, but it’s already generating images better than I can draw after a year of practice. It does give me a feeling of “what is the point”. Which might be an irrational feeling, but I’m sure others feel the same.
> Given a training set, this technique learns to generate new data with the same statistics as the training set.
There isn't a creative process here nor any creative introspection going on. While the technical results are impressive, this article does not address creativity even superficially, and just slaps the label on. There isn't any AI either. It's machine learning, i.e., statistical models and algorithms.
Ah, make that three things the public shouldn't see being made: sausage, legislation, and waifus.
From urban dictionary:
"Waifu" is used to refer to a fictional girl or woman (usually in Anime, Manga, or video-games) that you have sexual attraction to, and you would even marry.
Huh.Imagine a future where people can compile written scripts into Hollywood quality movies.
Do you have a team page? How many of you are there? Do you work with gwern and nearcyan? Are you going to raise for this? (You should totally scale this!)
Great work, and keep it up!
Don't get me wrong, I have nothing against this, but I think we should start discusing morality of AI generated content, even if it doesn't train on existing artworks/code.
1. Simplifications of reality(the actual artist training method would be traditional studies off life and photo reference followed by gradual reduction and symbolization to a style)
2. Symbolic meaning. Things like the style of eyes, clothing, etc are all meant to signal personality. This is stuff that current AI techniques don't really touch upon in any direct sense.
Since the ML method is built on interpolating off final results, it's going to lack in these qualities and produce something that is consistently an "average impression". Akin to asking the algorithm to generate mythical heroes by mashing up the various stories: you get a hero that is somehow the average of Icarus, Heracles and Achilles, which would be less of a character than the originals.
Just a thought, I don't really know anything about ML.
I wonder if the OP's intuition regarding the sparseness of the latent space, and the relatively small area occupied by the 'useful' manifold? embedded within it provide us any clues as to what symbol grounding might look like for some neuro-symbolic infrastructure that sits atop that latent space.
I.e. how should we be trying to represent concepts like 'male' and 'female' within that space?
Is it important to have these concepts represented as a low dimensional manifold?
Is it important that this manifold be easily described by some simple geometric form like a convex polytope?
Is it important that nuances and variations on the concept be separable within the bounds of the concept-specific manifold?
What other properties might be important?
For TADNE, Arfafax ran Danbooru2019 and a few million TADNE samples through CLIP to get the image embeddings, and clustered them; when the two sets of clusters were graphed using tsne, you could see that the TADNE StyleGAN2-ext did a lot of mode-dropping in that many smaller outlying clusters of characters/franchises/topics simply did not appear in TADNE samples. The TADNE looked like a big galaxy, while Danbooru2019 looked more like it was surrounded by archipelagos. TADNE was extensively trained on them and was a very large model, but the GAN dynamics & StyleGAN architecture mean it didn't do a good job absorbing rarer/more idiosyncratic Danbooru2019 image-clusters.
I expect newer generative models which avoid GAN losses and which use more flexible (but expensive!) architectures, like DALL-E, would perform much better in terms of mode-dropping, so you'd see a lot more unique characters/images out of them. (I'm very excited about them. As good as TADNE or Waifu Labs v2 may be, I think they are still far behind what could be done with just existing data/arch/compute.)
Obvioisly there will be plenty of illustrators doing custom work that these can't (yet) replicate.
Also good for those countless anime avatar'd Twitter users.
FYI uBlock Origin complains about the registration link, because it on "Peter Lowe’s Ad and tracking server list".
If you're OK with being tracked, you can permanently allow that domain.
novel meaning user provided, not generated by the model or in the training set.
More things for, like, adults?
step 2: NFT all the things
step 3: profit
step 4: GOTO step 1
step 5: automate steps 1 to 4
Though, the conclusion I've come to is that that hand-drawn art will always meaningful for humans, because it is born of the human experience.
An interesting example is the invention of photography, which at its time, was very good at doing the thing artists were doing back then (capturing likenesses)
But photography didn't replace art: instead, artists now use photographs to be more expressive, convincing, and make better art. In tandem, the widespread adoption of photography meant that more average folks could get their likenesses taken!
Personally, my skills as an artist has improved by quite a bit, after launching this product, purely because observing it offers some fascinating insights into how anime is created!
I hope that as an industry, we'll find better ways to create, and what we know to be the "best" art today will be even better in the future!
Comparing photography to hand drawn art is silly. They are two different mediums.
Your company could be the first to capture the market. I guess if you can sleep with the consequences of your work,who cares? Im not judging because if its not you, it will be someone else.
Personally i think we as a society need to step back and press pause and really consider the consequences of this technology, and even existing technologies.
If you become rich, could you set up a charity for all the future starving artists, if that future comes to pass? I dont want to live in a world where theres no room for human creativity.
Not an artist, just a concerned human.
This is sort of the same thing on steroids. You can copy/remix previous art by feeding them into a ML model in training mode, and it will be massively utilized the same way ctrl-c ctrl-v is used, but it's a part of the toolset of art creation, not replacing it.
Then you need attributions for said previous arts, all of it, at least going by texts of laws.
There's a similar situation ongoing with fiction writing, by way of NovelAI. (And some competitors, but NovelAI is head and shoulders ahead of the pack. Thankfully; they seem to be the nicest of the lot.)
I'm a fairly prolific (fan-)fiction writer, and also AI enthusiast, so of course I jumped on that bandwagon as soon as I could. What I've found is...
- AI cannot write stories on its own. It just can't, full stop. Some people try, including me, but the results are nonsensical without significant tweaking. I expect that to change eventually, but not without a conceptual breakthrough or two.
- AI is immensely useful as a prosthetic imagination.
What I use it for isn't to write the story for me. It's to, in case I ever get stuck at some point, offer me suggestions for how the story can continue -- suggestions that I can accept or deny. Even if I deny it, it's useful as a way of illuminating my own ideas for the story. There's got to be a reason I don't like that continuation, and that is often enough to think of something I do like.
In other words, it's mostly eliminated writer's block.
It's also handy for expanding my vocabulary. English is my third language, and while I like to think I'm good enough for daily life -- I've lived in Ireland for over a decade, after all -- there's a big difference between 'good enough for daily life' and 'good enough to write good fiction'. Prior to using NovelAI, my writing was... dry. Conceptually heavy SF doesn't necessarily require high-end wordcrafting, but it helps.
The AI, especially when told to emulate Sheridan Le Fanu or any of the other great authors, is better than me at this. And since I can ask it to jump in at any point, it's become the most attentive, capable cowriter I've ever had. Perhaps noticing this, NovelAI now calls their default AI tuning 'Co-Writer'.
It's still likely to write something I can't immediately use, but that just means I need to absorb its ideas and make them my own. Repeat a hundred times per day, and I end up learning much, much faster than I ever did when I was writing on my own.
To summarize, I don't use AI to write my stories for me. I use it to get better at writing.
I think it should be possible to do the same for other forms of art.
It was not so long ago computers bet humans at chess, yet people still play.
Yes, people still play, but they no longer create.
With the exception of Adversarial attacks on particular algorithms, no human is creating new Chess theory, discovering new openings, for example.
As a game, challenge, competition, social activity, chess is alive and well.
As a creative endeavour, or vehicle for discovery, Chess is solved. It is no longer an art of its own.
We're part way through this transition now with Go as well. New opening theory, new joseki, new strategies are being played by robots, and at the highest professional levels we are playing catch-up to understand.
For art it feels a bit different since it’s not competitive and more a practicality thing. Perhaps art will shift from placing individual strokes on an image and move to making creative directions for AI to resolve in to an image or enable more people to create labor intensive works like animation.
"It cannot be creative because it's only bits/cogs/linear algebra/etc/etc." Well describe to me the way it is different to the processes of the human brain? "There is some magic sauce in the human brain we do not yet understand!", well then how do you know that this magic sauce does not exist within the statistical models inside the computer?
I find it very irritating that such shallow reasoning prevails amongst intelligent people.
For example, humans don't need to see millions of examples of waifu before they can draw their own.
Also, humans can draw in different styles, including novel styles that look nothing like styles they have seen before. Statistical models like GANs can only draw in styles similar to the ones in their training sets.
Statistical modelling can only represent the data in a training dataset and is incapable of novelty. Humans are capable of novelty.
>> Well describe to me the way it is different to the processes of the human brain?
We haven't created the human brain and it's very unlikely it uses a technology we understand, like linear algebra.
This is no less shallow reasoning. The question of whether the academic field of statistical modelling already contains the necessary ideas to produce strong AI is not decided, and won't be unless/until somebody makes a strong AI. People have different intuitions about what the answer will be and until it can be determined empirically I suggest treating them as what they are: intuitions.
Our ability to make the decision between following the rules and breaking the rules, when suitable. A computer could also break the rules, but in most cases it wouldn't make sense or look like good, while a human could make judgement about when to break the rule. Sure, we all learn by copying, but after a while, we start getting a feel for when to break the rule, and that's when unique art appears. Computers seems to not have learned this fact yet (or rather, haven't been taught that yet).
Using the tool that this submission offers, all the results will look similar and can be traced back to the training set you give it. Do something similar with a human (over similar amount of time that the machine got, in terms of human time) and eventually the results will look way different than the training set, as what we see with artists in real life.
For something like the original post, we do know what these things are. They're statistical models. Full stop. They show no indication of what we see in creative and intelligent behavior, that is the ability to self-adapt to both internal and external initiatives. This GAN in the post has no ability to step outside of the statistics in the training set unless the model is updated to prod it to do so. The model can be changed, but it is a forceful change. If you show me tens of thousands of images, I am not, at an emergent, top-level, system level, etc., bounded to the statistics of that image set. Is this GAN asked something or given a goal aside from an implicit "draw something like what we've given you"? Even if I do draw something like or akin to the given image set, I have full creative control over the image (assuming some drawing skill).
If the human brain (and really body) can be modeled via a statistical model (which is not yet known but is surmised as you imply), that doesn't necessarily explain high-level behaviors. More is different. You call it magic sauce, but others call it emergence. Our understanding of emergent behavior and complex systems at large is still in work.
In my view, metaphorical thinking, of which analogical thinking is a subset, is a likely kernel of human intelligence. While these statistical models are copying, which is similar in a way to analogy building, it's not quite there. The reason things it generates looks like other things is because it searched a parameter space for matching statistics. However, it cannot even explain that's why it generated what it did. We explain for it. These things are no more artificially intelligent than things like thermodynamics are naturally intelligent.
Lastly, as I pointed out in my original comment, if this is indeed creative as someone like you implies, the article fails to make a convincing argument and bounces around a lot of buzzwords.
> I find it very irritating that such shallow reasoning prevails amongst intelligent people.
I was offended, but I suppose I agree. ;)
People keep forgetting that you really can only fit to data you have. Extrapolation exists as a concept but the requisite intuitive knowledge needed in order to create something new that can be successful is hard to understand even by humans at this point (how much of the business press is full of gimmicky blogposts about how to be successful, full of contradictory anecdata, opinions, advices), I don't even know how AI researchers would go about tackling that. I am not an AI person but as a plain old scientist I know extrapolation without intuition is almost always a fraught effort.
Sure it can, but then it's not considered anime anymore. I think this sentiment is confusing genre with training set constraints. A new model is not required for some artist to do something different from anime or even an anime artist to do something different. Humans can self-adjust all with approximately the same base model (whatever that is).
I always thaught those were synonyms.
The sole definition of intelligence is under heavy load of reconsideration the last decades with the emergence of a better knowledge of animal cognition, for example.
I just don't see anything intelligent yet. People somehow have gotten confused with the success of machine learning being treated as intelligence. We have a lot of statistical models of things that are very successful, but those aren't considered intelligent. For reason, machine learning telling you something about data has suddenly been treated as AI. Machine learning can do some impressive things, but I think its short-sighted to equate AI and machine learning.
Intelligence is really a tough thing. Watching a video of even a single-cell organism displays a sort of intelligence and behavior far beyond anything I've seen of machine learning. So why is it intelligent? Or is it intelligent? I'm not entirely sure, but my point is that machine learning is orders of magnitude incapable of describing (i.e., modeling) even the simplest self-directed and self-adapting behavior that we see in the real world.
/s?
Or, through decades of AI research, we're just now starting to better understand what actual creativity really "is"?
Thanks so much! It's done by our fantastic animator[1]!
GANs are quite interesting and we didn't see many approachable explainer videos targeted at lay people, so we decided to make one ourselves!
One thing I was confused by: the video says the discriminator "AI" is trained to detect true vs. generated results, with the hope the generator becomes good enough to fool the discriminator. But why is the discriminator useful, then? Couldn't you just tell generator "AI" whether the result it produced was true or not?
I think the answer is.. you don't want just a perfect recreation of the training data you gave to the generator, instead you want the generator to produce variations of that training data, so there's a "how would you know if it's 'a true result' / good enough?" problem. So the discriminator is useful because it's not a direct comparison, but rather a "this looks approximately good enough" comparison of the true vs. generated result.
This all makes me wonder: what sort of data set needs to be fed to the discriminator to train it? Is it some sort of "true image" and "true image w/bad alterations (e.g. lines, scratches, etc.) to it" data set?
Indeed, it contributes to the variations problem.
also: If the discriminator starts off perfect, then the generator can't learn to be better.
Sort of like a human learning to play chess: If you start off with top-tier opponents that crush you, then you don't have a gradient to learn from. Instead, you need players at your own level to grow your skills.
https://i.imgur.com/1V1wPMC.jpg
I rolled around 40 times on the first stage and chose a horror. I skipped the second stage and didn't roll on the third stage, just chose one of the presented details. I skipped the fourth stage because I rolled maybe ~200 times and only saw around 1-2 comparable horrors.
https://i.imgur.com/1vBeg1j.jpg
I rolled around 100 times on the first stage and saw about 3-4 horrors before choosing a horror. I rolled about 70 times on the second stage, didn't find anything interesting, just chose a normal color palette. I rolled maybe 80 times on the third stage and chose a horror, though the results were pretty consistently horrors. I rolled around 60 times on the fourth stage and saw about 1-2 other horrors before choosing a horror. Also, here's what it looked like after the third stage and before finishing the fourth stage:
https://i.imgur.com/ditm8nF.jpg
It's possible the third and fourth stages can produce horrors from normal faces, I didn't check.
In many ways Miyazaki's style is "nonstandard" for anime, possibly because he was partly inspired by European artists (think Moebius, whose influences can be seen in Nausicaa for example).
I once heard someone say people will never be able to understand God’s plan, just as a dog will never be able to really understand why their owners do things the way they do them. I feel like this is a similar threshold; a humans mind is incredible, but not perfectly suited for thinking in advance, or calculating probabilities.
So I guess my point is, while this certainly feels a little scary, I think it’s just a consequence of the game, and probably okay.
That's not true though? Human chessplayers are constantly exploring new lines, and engines are a useful tool but by no means the bottom line - indeed lines that were neglected because engines gave them low evaluations have been a fruitful source of ideas lately, because humans play differently from engines and may not be able to find the refutation.
that is the exact opposite of creation.
I'd put it this way: this is the IKEA of drawing.
We like to think that AI is only statistical models, which may be true (is true?), but we have no idea that what happens in brains is anything else than a highly optimised, deconstructed, such model.
Current datasets need 10ks pics of a cat while a kid only need to see a cat twice? Good for the kid, but there's no hint there's a fundamental difference of process.
e: the argument for human artists is human inspirations are generalized and artists’ own interpretations are always added to it, and for ML it’s that their network weights are highly generalized and likewise not a copy; but IMO a more solid line has to be drawn and distinction has to be made between C-c C-v and generalization at some point forward.
Human artists use copy/paste as a tool all the time, and it doesn't kill their careers. The element you're missing here is that jychang wasn't talking about copying pieces verbatim from others' work without permission & attribution, but rather using it as a tool to manipulate your own art ("to copy/paste a layer"). Imagine being able to do the same but with an AI model working behind the scenes to adjust the object to fit the scene it's being pasted into.
Humans have the advantage of being trained for years on a far larger and more generalized dataset before they're asked to draw anything.
> Statistical models like GANs can only draw in styles similar to the ones in their training sets.
The intermediate states in the article's example seemed to contain a number of novel styles to me. The final results are filtered for conformance to a specific style due to the nature of the project, but that doesn't mean the GAN is incapable of drawing in other styles. (Of course, most of these alternate styles are not particularly appealing to humans, but the same is true for many of the experimental styles invented by human artists.)
> Statistical modelling can only represent the data in a training dataset and is incapable of novelty.
These AI models are generating images which never existed before, and which were not in their datasets. How is that not novelty?
That's a big assumption wrapped up in an over-wrought analogy. Humans don't "train" in the sene that statistical models, or neural nets, are trained. We don't have any clear supervision for example, no ground truth. And we don't need examples of exactly the things we learn, to learn them. For instance, nobody ever saw an example of a manga character before the first manga character was drawn. And yet, someone drew it.
>> These AI models are generating images which never existed before, and which were not in their datasets. How is that not novelty?
How I like to think of it, which is a bit of a fudge, is that neural nets learn to convert each of their input images into a connect-the-dot puzzle (the "dots" are the data points in a very high-dimensional space that encompasses the pixels of all their training images; like I say, it's a bit of a fudge). Every new training image gets its own connect-the-dot puzzle superimposed on those of all previous images. Once training is done, you can ask the trained model to generate new images and it basically puts its pen down on a dot, and starts drawing a line. What dot comes next depends on timey-wimey model-probabilities. Obviously, in that way, it can't draw a line to a dot outside the big network of superimposed connect-and-dot puzzles it has put together. Such outside-context dots don't exist for the model, in any real sense. So it can only create images that exist within that puzzle.
In truth, the puzzle, i.e. the trained model, is a dense region of cartesian space (a manifold). What comes out of the model must already exist in that manifold, so it must be a variation, or combination, of the training images used to construct the manifold.
Which means, it can't innovate. So for instance, you can't expect to train it on images of manga characters and find that it now draws you in the style of Michelangelo. That's what I mean. Of course you'll see images that are not exactly the images you put in, but you won't see images that are very different from the ones you put in. It is, in a very concrete sense, a very limited ability to generate new images.
GANs are a form of unsupervised learning. They don't have "ground truth" either, just lots of existing images which they learn to imitate and to distinguish from other kinds of images not present in the training set. Similarly, humans learn to distinguish natural images from unnatural ones starting from birth, and use that learned feedback to filter the images produced by our imaginations: a natural example of a GAN. Our input is less… focused, and includes non-visual elements, and there are of course other aspects to general intelligence besides visual processing and imagination, but in this area at least we operate on the same basic principles.
> So for instance, you can't expect to train it on images of manga characters and find that it now draws you in the style of Michelangelo. That's what I mean.
Are we talking about GANs here, or humans? A human trained exclusively on manga wouldn't suddenly develop the ability to imitate Michelangelo either. On the other hand, a GAN trained on manga may sometimes produce images which are not recognizably part of the manga style—which could be seen as an entirely new style. (It would help the process along if you included non-manga images in the training set, as a human would have access to those as well. Then different styles of the same scene just become one more dimension in your "manifold" of all possible images.)
Inventing and learning to draw in a new style isn't something that comes spontaneously to humans. It takes a lot of practice both learning what makes the style distinctive and learning to create art in the new style. A GAN has most of the basic elements required to do the same, but we generally don't use it that way. An interesting experiment might be to permute the discriminator to favor specific elements which were not common in the training set and then train the generator to satisfy the altered discriminator.
> Which means, it can't innovate.
What exactly do you mean by "innovate"? To me the word implies intent, which is clearly out of scope for a mere GAN. Intentional behavior would put it in the domain of an artificial general intelligence or AGI. However, generating images which aren't in the training set is just a matter of choosing a point on the "manifold" which doesn't correspond to any of the input images. Though expecting the GAN to spontaneously invent a distinctive and consistent new style which appeals to humans, without being one itself or otherwise being trained in what humans might find appealing, is a bit much IMHO.
The biggest difference remains the fact that this GAN only has manga for its input, which limits its ability to produce anything outside that context. Its whole life is manga and nothing else. Humans have the same issue with creating things completely unrelated to any prior experience, but they have a much larger and more varied pool of experiences to draw from. (And even then humans can easily get stuck in one particular style and find it difficult to change.)
Those who have the courage necessary to become artists, and renounce the vulgarity of the world, will continue to do so.
Those who delude themselves into thinking they're creating anything while being employed in commerce, will be managed out.
The deep crevice where the two meet and manage to find compromise, will continue to be filled by wealthy, independent patrons.
Asking others to think and do as we wish is silly.
Ironically, if it's that important to you, why don't you start giving monetary support directly to artists? Changing one's own actions is more impactful than trying to change those of the many (and the prior is more likely to lead to the latter, than if one were to focus solely on the latter).
Your definition is what I would call "culture defining art", which is art that some part of the culture identifies with (or more specifically, a person's way of communicating that they can identify with). The currency here is tribalism, i.e. it creates a way for two or more people to bond together through what they feel and think.
>Those who have the courage necessary to become artists, and renounce the vulgarity of the world, will continue to do so.
Courage is trumped by needs. If they need pay rent, buy food, support a family, pay for a car, etc, then no matter what they are sacrificing some part of their time in order to obtain those things. Thus any artist who can make money off their work would have more time for their work, and possibly grow faster.
>Those who delude themselves into thinking they're creating anything while being employed in commerce, will be managed out.
Seems like you're too attached to the idea of what an artist is and isnt?
>Ironically, if it's that important to you, why don't you start giving monetary support directly to artists?
Because Im not building something that is taking away from their dreams (e.g. living off their work/passion).
<side thoughts> I wonder if people are aware of the consequences of automating creativity? IMO humans need human input in order to stay human. The less and less we come into contact with humanity, the less human we'll become.. and at the very end of that long path is a bag of chemical reactions that's forgotten the meaning of "how are you?" [1]
Which made me think, perhaps its the inefficiencies of life that is what makes us human
[1] This is because some company/companies will realize/have realized that tuning the machine to become most efficient at creating what the masses want will be the most profitable path.
I don't define art. It's a sense, not a logical box you can put things in.
Receiving money for your art is one thing; going out of your way to use it as a means of living is another. The work immediately becomes tainted, and is no longer art.
It could be an amazing piece, but if your line of work is receiving money for what you create, you're an artisan, not an artist.
A character drawn for an RPG is not art. It is not a work of art. It is a graphic designed for utility. That is all it will ever be.
The sublime nature of art is there because it transcends everyday vulgarity. One transcends mere personage and becomes an artist by being in the world, but not of it.
The more money an artisan makes, the more his craft suffers. He almost always improves his technical ability through this process (otherwise, he would not make money), but loses his soul, and will never be an artist. He does not have the fiber in his heart that allows one to suffer through all manner of anguish, and material poverty, to dedicate oneself towards something above oneself; so he settles for being an artisan.
I can understand not being educated on these matters. But the amount of misplaced confidence you carry, writing on things you know nothing about is detestable.
If your inquiries into the nature of humanity and what it means to be are genuine (and not mis-attributed self-importance), my recommendation is to read and listen more, and talk less.
Matthew B. Crawford's works are a decent bridge into all that, for the modern middle crust who feels something stirring in his soul, and needs a direction.
If you feel like your assessment of your own abilities is honest, then I would completely skip anything modern, and begin with Burke's A philosophical enquiry into the origin of our ideas of the sublime and beautiful. I will even buy you an unabridged copy and have it shipped to you, if you're a starving artist that cannot afford it (and my respect for you would increase, all the same).
i sincerely applaud the creativity of establishing such a business
But I completely agree, it is just a suspicion.
Personally I'm inclined to see things like GPT-3 as qualitatively different to human creativity, but I don't claim to know this for a fact.
I just think there's a lot of people arguing "well this can't be creativity" out of some sort of anti-AI signalling sentiment, where they've become so used to disagreeing with unwarranted AI optimism that they undervalue AI progress routinely. In other words, "this can't be creativity, because we don't understand creativity" or "this can't be creativity because AI is very far from human-level" or such.
(Personally, I believe GPT-3 and CLIP are already superhuman; that is, superior to the circuitry in the brain doing the equivalent job. They only look weak because we're trying to make them do all of the job in one step, when we're comparing them to the brain's highly heterogenous self-correcting approach.)
Server processors gets you more bang for the buck... iff you're planning to run the hardware flat out for literally years. You save on power, but the up-front cost takes up most of that, so for a system that's mostly idle you wouldn't use them. On the other hand, any CPU with fewer PCIe lanes than a TR won't be able to run multiple GPUs optimally, and TRs are relatively cheap enough to make the reduction in PSUs/chassises worth it.
Not to mention that there are some approaches to training you can only use if you have multiple GPUs on the same motherboard, aka. sharding a single model across GPUs without communications overhead killing any benefit of that.
But the tldr; is a lot of scavenged EPYCs and nvidia GPUs all in a large sound-proof rack.
I'm not entirely sure how much legal weight a ToS on the website would have on what the users do with the output. As I understand it, you could e.g. forbid explicitly using the service/generator for commercial purposes (e.g. during game development), but if someone generates a cool character playing around with no particular commercial objective and then decides post facto to build a media megafranchise out of that character, absent any copyright claim over the image, I don't think there's anything stopping them. They wouldn't even need to trace over it, though if they want new artwork in different poses, they couldn't keep using the AI for that with explicit commercial intent; they'd have to get humans to re-draw it.
Alternatively, a pessimistic view of the interaction between copyright and AI would be that the model is a derivative work of all the training input, and its output also is, and then good luck building a non copyright infringing AI.
IANAL and all that, but it would definitely be legally risky to assume that as the provider of an AI generator you have any control over what users do with the output.
This is a general rule about neural networks, as we have them today: they learn to reproduce their training set. Nothing more, and nothing less.
Humans, now, don't need to see examples of a thing before we can make one. If that were the case, we would never have created all the technology we have, of which there was no previous example. For instance, at some point in our history someone figured out how to carve a hand axe for the first time, ever. That person didn't have any examples to go by. There were no such objects in nature, before that time. Certainly that person had some idea of concepts such as "sharp" or "pointy" or who knows what else, but they had no blueprint for a hand axe. This is what I mean by "innovation".
"Inventing and learning to draw in a new style" is absolutely something that comes spontaneously to humans! That's the entire history of human art: people inventing new ways to express themselves through various art forms. Art would be way too dull if nobody could come up with new things.
But I certainly agree that it's unfair to expect the same kind of innovation from GANs or from other neural nets. However, I think that's the case because neural nets are nothing like humans. But if I understand correctly, you're claiming that how the human mind works and how neural networks, work, is very similar, so I'm confused a bit because in that case you should expect them to have the same abilities as humans do. Sorry if I misunderstand you, but could you clarify? If human creativity is statistical modelling and GANs do statistical modelling (they kiiind of do) then we should expect GANs to be able to do everything that humans can do, no?
I can't help but to feel that this would be a better fit for the fad of NFTs as well, as opposed to ugly monkeys or other asset flips that were pretty obvious cash grabs.
Either way, good luck!
The question is, would people buy, since there are near endless, and a huge difference is that these don't have a story attached to them...
Oh, not even that, since while some data ends up on the blockchain, i find the whole ownership concept a bit nebulous at best, much like how people mock the whole "oh, just right click the image to save it" thing.
What i was actually referring to was more along the lines of the collector economy idea itself - people willing to exchange their monetary resources for something that they enjoy. Having it be out of a sense of endearment and aesthetic enjoyment (waifus and husbandos) rather than the desire to flip those things (possibly a number of other NFTs or collectibles) seems to be less morally questionable to me!
While i'm not really familiar with the whole NFT space, to me it seems that projects like CryptoKitties are more agreeable and thus viable in this space, as would AI generated images of cute characters! Actually, that seems like a really good fit to me, regardless of how the actual "uniqueness" or "ownership" aspect would play out.
My views on the matter have nothing to do with being "anti-AI" or "signalling" to anyone, and the latter suggestion is mildly offensive. Like many here, while being no sort of expert I've thought quite a lot about these questions and have fairly detailed views about them. I suspect strong AI is possible and will not require new fundamental physics or anything like that. However, I suspect that if it is achieved it will be of a very different nature to GPT-3 etc and to ML in general. I don't think the behaviour demonstrated by these systems so far resembles human creativity, understanding or reasoning at all, although they are capable of impressive and fascinating things. I'd go so far as to say the differences seem, to me at least, rather obvious, if difficult to define. However, I'm open to alternative views and to the possibility I may be wrong.
I greatly enjoy discussing these sorts of things, but adversarial exchanges on hn filled with ad-hominems aren't the way I like to do it.
People were discussing having a "no machine learning clause" back when Copilot was being heavily scrutinized. I wouldn't be surprised if some artists allow republishing but not machine learning use. Plenty of artists already have a clause that prohibits any kind of republishing, and Danbooru is known to rehost some of those artists' content anyway until the artist notices and requests that it be taken down, if ever (for a time, they even allowed paid rewards from Patreon and other subscription services to be republished).
The original dataset from Danbooru probably contained some percentage of content that would not have been there if the original artist had noticed in time.
Why would it, if there are no copyright issues? No one is obligated to accept a license unless they require permission to do something which copyright would restrict. Of course redistributing the original image as part of a public dataset may be problematic, but simply using it to train an AI model—essentially the equivalent of studying it while teaching yourself to draw—is arguably not among the things covered by copyright, so you don't need a license for that and any clauses in such a license would be irrelevant.
This is also basically educational in nature, even if it's a machine rather than a human being "educated", and educational use is often exempt from copyright in some degree or another to begin with. If the dataset is restricted to non-commercial research and educational use in the right jurisdictions then even redistribution may not be an issue.
An AI could do the same.
I too like to collect things, my latest obsession was lingerie. But I always have the image of extreme hoarders in the back of my mind, which quite often scares me out of collecting to much.
I'm not scared of having too much stuff, I'm scared of not being able to notice it myself.
NFTs could ease that problem. But sadly I'm one of those people that prefer analog radio over digital. Wired over wireless.
Not, uh, that NFTs for images like this aren’t silly and largely pointless,
but at least for some of them, the “there’s a server hosting the image” isn’t that much of an issue (at least provided that the person who “owns it” keeps the file locally and backed up)
Now that's a fun business idea (ethically questionable gambling related aspects aside, guess it's about how one markets that and how honest they are).