Dall-E 2

Dall-E 2(openai.com)

1820 points by yigitdemirag 4 years ago | 485 comments

Imnimo 4 years ago |

I'm only part way through the paper, but what struck me as interesting so far is this:

In other text-to-image algorithms I'm familiar with (the ones you'll typically see passed around as colab notebooks that people post outputs from on Twitter), the basic idea is to encode the text, and then try to make an image that maximally matches that text encoding. But this maximization often leads to artifacts - if you ask for an image of a sunset, you'll often get multiple suns, because that's even more sunset-like. There's a lot of tricks and hacks to regularize the process so that it's not so aggressive, but it's always an uphill battle.

Here, they instead take the text embedding, use a trained model (what they call the 'prior') to predict the corresponding image embedding - this removes the dangerous maximization. Then, another trained model (the 'decoder') produces images from the predicted embedding.

This feels like a much more sensible approach, but one that is only really possible with access to the giant CLIP dataset and computational resources that OpenAI has.

recuter 4 years ago | |

What always bother me with this stuff is, well, you say one approach is more sensible than the other because the images happen to come out more pleasing.

But there's no real rhyme or reason, it is a sort of alchemy.

Is text encoding strictly worse or is it an artifact of the implementation? And if it is strictly worse, which is probably the case, why specifically? What is actually going on here?

I can't argue that their results are not visually pleasing. But I'm not sure what one can really infer from all of this once the excitement washes over you.

Blending photos together in a scene in photoshop is not a difficult task. It is nuanced and tedious but not hard, any pixel slinger will tell you.

An app that accepts a smattering of photos and stitches them together nicely can be coded up any number of ways. This is a fantastic and time saving photoshop plugin.

But what do we have really?

"Kuala dunking basketball" needs to "understand" the separate items and select from the image library hoops and a Kuala where the angles and shadows roughly match.

Very interesting, potentially useful. But if doesn't spit up exactly what you want can't edit it further.

I think the next step has got to be that it conjures up a 3d scene in Unreal or blender so you can zoom in and around convincingly for further tweaks. Not a flat image.

mrandish 4 years ago | | |

> This is a fantastic and time saving photoshop plugin. But what do we have really?

Stock photography sales are in the many billions of dollars per year and custom commissioned photography is larger still. That's a pretty seriously sized ready-made market.

> But if doesn't spit up exactly what you want can't edit it further.

I suspect there's a big startup opportunity in pioneering an easy-to-use interface allowing users to provide fast iterative feedback to the model - including positional and relational constraints ("put this thing over there"). Perhaps even more valuable would be easy yet granular ways to unconstrain the model. For example, "keep the basketball hoop like that but make the basketball an unexpected color and have the panda's right paw doing something pandas don't do that human hands often do."

Imnimo 4 years ago | | |

Yeah, I mean you're right that ultimately the proof is in the pudding.

But I do think we could have guessed that this sort of approach would be better (at least at a high level - I'm not claiming I could have predicted all the technical details!). The previous approaches were sort of the best that people could do without access to the training data and resources - you had a pretrained CLIP encoder that could tell you how well a text caption and an image matched, and you had a pretrained image generator (GAN, diffusion model, whatever), and it was just a matter of trying to force the generator to output something that CLIP thought looked like the caption. You'd basically do gradient ascent to make the image look more and more and more like the text prompt (all the while trying to balance the need to still look like a realistic image). Just from an algorithm aesthetics perspective, it was very much a duct tape and chicken wire approach.

The analogy I would give is if you gave a three-year-old some paints, and they made an image and showed it to you, and you had to say, "this looks like a little like a sunset" or "this looks a lot like a sunset". They would keep going back and adjusting their painting, and you'd keep giving feedback, and eventually you'd get something that looks like a sunset. But it'd be better, if you could manage it, to just teach the three-year-old how to paint, rather than have this brute force process.

Obviously the real challenge here is "well how do you teach a three-year-old how to paint?" - and I think you're right that that question still has a lot of alchemy to it.

qq66 4 years ago | | |

I think deep learning is better thought of as "science" than "engineering." Right now we're in the stage of the Greeks and Arabs where we know "if we do this then that happens." It will be awhile before we have a coherent model of it, and I don't think we will ever solve all of its mysteries.

rileyphone 4 years ago | | |

It would be interesting to see more attempts to “reverse engineer” ML models like in https://distill.pub/2020/circuits/curve-circuits - maybe even with a ML model of its own?

ehsankia 4 years ago | | |

> But there's no real rhyme or reason, it is a sort of alchemy.

Is there a rhyme or reason as to why picasso decided to paint like that? Yes these networks are hard to reason about, but so are real human brains.

moyix 4 years ago | | |

> But if doesn't spit up exactly what you want can't edit it further.

Why? You can tweak the prompt, change parameters, or even use the actual "edit" capability that they demo in the post.

woko 4 years ago | | |

> Very interesting, potentially useful. But if doesn't spit up exactly what you want can't edit it further.

DALL-E 2 spits as many outputs as you want. Then you choose the one you prefer.

mahastore 4 years ago | | |

I wish there was something available in open source that has similar functions i.e sensible amalgamation of pictures based on some text.

krick 4 years ago | |

While the whole narrative of your comment totally makes sense, I don't really see the difference between the two approaches, not on a conceptual level. You still needed to train this so called "prior" at some point (so, I'm also not sure if it's fair to call it a "prior"). I mean, the difference between your two descriptions seems to be the difference between descriptions (i.e., how you chose to name individual parts of the system), not the systems.

I'm not sure if I'm speaking clearly, I just don't understand, what's the difference between training "text encoding to an image" vs "text embedding to image embedding". In both cases you have some kind of "sunset" (even though it's obviously just a dot in a multi-dimension space, not the letters) on the left, and you try to maximize it when training the model to get either a image-embedding or a image straight away.

Imnimo 4 years ago | | |

Yeah, my comment didn't really do a good job of making clear that distinction. Obviously the details are pretty technical, but maybe I can give a high-level explanation.

The previous systems I was talking about work something like this: "Try to find me the image the looks like it most matches 'a picture of a sunset'. Do this by repeatedly updating your image to make it look more and more like a sunset." Well, what looks more like a sunset? Two sunsets! Three sunsets! But this is not normally the way images are produced - if you hire an artist to make you a picture of a bear, they don't endeavor to create the most "bear" image possible.

Instead, what an artist might do is envision a bear in their head (this is loosely the job of the 'prior' - a name I agree is confusing), and then draw that particular bear image.

But why is this any different? Who cares if the vector I'm trying to draw is a 'text encoding' or an 'image encoding'? Like you say, it's all just vectors. Take this answer with a big grain of salt, because this is just my personal intuitive understanding, but here's what I think: These encodings are produced by CLIP. CLIP has a text encoder and an image encoder. During training, you give it a text caption and a corresponding image, it encodes both, and tries to make the two encodings close. But there are many images which might accompany the caption "a picture of a bear". And conversely there are many captions which might accompany any given picture.

So the text encoding of "a picture of a bear" isn't really a good target - it sort of represents an amalgamation of all the possible bear pictures. It's better to pick one bear picture (i.e. generate one image embedding that we think matches the text embedding), and then just to try to draw that. Doing it this way, we aren't just trying to find the maximum bear picture - which probably doesn't even look like a realistic natural image.

Like I said, this is just my personal intuition, and may very well be a load of crap.

duxup 4 years ago | |

This isn't something I'm knowledgeable on so forgive my simplification but is this like a sort of micro services for AI. Each AI takes their turn handing some aspect, another sort of mediates among them?

Imnimo 4 years ago | | |

I'd say Dall-E 2 is a little more unified - they do have multiple networks, but they're trained to work together. The previous approaches I was talking about are a lot more like the microservices analogy. Someone published a model (called CLIP) that can say "how much does this image look like a sunset". Someone else published a totally different model (e.g. VQGAN) that can generate images (but with no way to provide text prompts). A third person figures out a clever way to link the two up - have the VQGAN make an image, ask CLIP how much it looks like a sunset, and use backpropagation to adjust the image a little, repeat until you have a sunset. Each component is it's own thing, and VQGAN and CLIP don't know anything about one another.

swalsh 4 years ago | |

Do you think some of these techniques could be slightly modified, and applied to DNA sequences?

snek_case 4 years ago | | |

Maybe very very short (single-gene) sequences. The thing with DNA is it's the product of evolution. The DNA guides the synthesis of proteins, then the proteins fold into a 3D shape, and they interact with chemicals in their environment based on their shape.

In the context of a living being, different genes interact with each other as well. For example, you have certain cells that secrete hormones (many genes needed to do that), then you have genes that encode for hormone receptors, and those receptors trigger other actions encoded by other genes. There's probably too much complexity to ask an AI system to synthesize the entire genetic code for a living being. That would be kind of like if I asked you to draw the exact blueprints for a fighter get, and write all the code, and synthesize all the hardware all at once, and you only get one shot. You would likely fail to predict some of the interactions and the resulting system wouldn't work. You could only achieve this through an iterative process that would involve years of extensive testing.

Could you use a deep learning system to synthesize genetic code? Maybe just single genes that do fairly basic things, and you would need a massive dataset. Hard to say what that would look like. Is it really enough to textually describe what a gene does?

dekhn 4 years ago | | |

probabilistic generative models have been applied to DNA and protein sequences for decades (my undergrad thesis from ~30 years ago did this and it wasn't even new at that point). The real question is what question you want to answer and what is this system going to do better enough to justify the time investment to prove it out?

j7ake 4 years ago | | |

The problem is that with DNA sequences you dont have good training sets.

With text and images you can leverage “ground truth” data (verified by humans) to train your model.

The DNA sequences I would look for methods that don’t require good ground truth data.

falcor84 4 years ago |

>We’ve limited the ability for DALL·E 2 to generate ... adult images.

I think that using something like this for porn could potentially offer the biggest benefit to society. So much has been said about how this industry exploits young and vulnerable models. Cheap autogenerated images (and in the future videos) would pretty much remove the demand for human models and eliminate the related suffering, no?

EDIT: typo

minimaxir 4 years ago |

A few comments by someone who's spent way too much time in the AI-generated space:

* I recommend reading the Risks and Limitations section that came with it because it's very through: https://github.com/openai/dalle-2-preview/blob/main/system-c...

* Unlike GPT-3, my read of this announcement is that OpenAI does not intend to commercialize it, and that access to the waitlist is indeed more for testing its limits (and as noted, commercializing it would make it much more likely lead to interesting legal precedent). Per the docs, access is very explicitly limited: (https://github.com/openai/dalle-2-preview/blob/main/system-c... )

* A few months ago, OpenAI released GLIDE ( https://github.com/openai/glide-text2im ) which uses a similar approach to AI image generation, but suspiciously never received a fun blog post like this one. The reason for that in retrospect may be "because we made it obsolete."

* The images in the announcement are still cherry-picked, which is therefore a good reason why they tested DALL-E 1 vs. DALL-E 2 presumably on non-cherrypicked images.

* Cherry-picking is relevant because AI image generation is still slow unless you do real shenanigans that likely compromise image quality, although OpenAI has likely a better infra to handle large models as they have demonstrated with GPT-3.

* It appears DALL-E 2 has a fun endpoint that links back to the site for examples with attribution: https://labs.openai.com/s/Zq9SB6vyUid9FGcoJ8slucTu

andybak 4 years ago |

Some freely available models

GLID-3: https://colab.research.google.com/drive/1x4p2PokZ3XznBn35Q5B...

and a new Latent Diffusion notebook: https://colab.research.google.com/github/multimodalart/laten...

have both appeared recently and are getting remarkably close to the original Dall-E (maybe better as I can't test the real thing...)

So - this was pretty good timing if OpenAI want to appear to be ahead of the pack. Of course I'd always pick a model I can actually use over a better one I'm not allowed to...

fbanon 4 years ago |

A friend of mine was studying graphic design, but became disillusioned and decided to switch to frontend programming after he graduated. His thesis advisor said he should be cautious, because automation/AI will soon take the jobs of programmers, implying that graphic design is a safer bet in this regard. Looks like his advisor is a few years from being proven horribly wrong.

6gvONxR4sf7o 4 years ago |

This is a niche complaint, but I get frustrated at how imprecise open AI's papers are. When they describe the model architecture, it's never precise enough to reproduce exactly what they did. I mean, it pretty much never is in ML papers[0], but open AI's bigger products are worse than average with it. And it makes sense, since they're trying to be concise and still spend time on all the other important stuff besides methods, but it still frustrates me quite a bit.

[0] Which is why releasing your code is so beneficial.

gwern 4 years ago | |

They've added some more details to the paper.

aChrisSmith 4 years ago |

I can see how this has the potential to disrupt the games industry. If you work on a AAA title, there is a small army of artists making 19 different types of leather armor. Or 87 images of car hubcaps.

Using something like this could really help automate or at least kickstart the more mundane parts of content creation. (At least when you are using high resolution, true color imagery.)

killerstorm 4 years ago | |

This thing can't do 3D models.

There are some 3D image generation techniques, but they aren't based on polygonal modelings, so 3D artists are safe for now

pwillia7 4 years ago | | |

You could train a model on texture image data though, no?

Or what about even generating images you could then photogrammetry into models?

refulgentis 4 years ago | | |

Colab notebooks can do mesh models using this method, I'm certain OpenAI isn't far away

andybak 4 years ago |

  Preventing Harmful Generations

  We’ve limited the ability for DALL·E 2 to generate violent, 
  hate, or adult images. By removing the most explicit content 
  from the training data, we minimized DALL·E 2’s exposure to 
  these concepts. We also used advanced techniques to prevent 
  photorealistic generations of real individuals’ faces, 
  including those of public figures.

"And we've also closed off a huge range of potentially interesting work as a result"

I can't help but feel a lot of the safeguarding is more about preventing bad PR than anything. I wish I could have a version with the training wheels taken off. And there's enough other models out there without restriction that the stories about "misuse of AI" will still circulate.

(side note - I've been on HN for years and I still can't figure out how to format text as a quote.)

agloeregrets 4 years ago |

The most interesting item to me is the variations on the garden shop and bathroom sink idea. The realism of these leaks the AI lacking intuition of the requirements. This makes for a number of nonsensical designs that look right at first like: This Sink lacks sensical faucets. https://cdn.openai.com/dall-e-2/demos/variations/modified/ba...

This doorway is downright impossible https://cdn.openai.com/dall-e-2/demos/variations/modified/fl...

momojo 4 years ago | |

Great point. When I saw the shadows and reflections, I thought it had developed a primitive understanding of physical logic. Now I'm not so sure.

At this point, it still seems like it's pushing pixels around until it's "good enough" when you squint at it.

Spinnaker_ 4 years ago | |

"Doorway in the style of Escher"

dqpb 4 years ago | |

It looks to me like the faucet sprays water sideways toward the bowl, which is genius, because then you aren’t bumping up against it when you’re washing your hands!

KevinGlass 4 years ago |

Something about this makes me nauseous. Perhaps is the fact that soon the market value for creatives is going to fall to a hair about zero for all but the most famous. We will be all the poorer for it when 95% of images you see are AI generated. There will be niches of course but in a few short years it'll be over for a huge swathe of creative professionals who are already struggling.

Some of the images also hit me with a creep factor, like the bears on the corgis in the art gallery, but that maybe only because I know it's AI generated.

nope96 4 years ago |

Is there an 'explain it like I'm 15' for how this works? It seems like black magic. I've been a computer hobbyist since the late 1980's and this is the first time I cannot explain how a computer does what it does. Absolutely the most amazing thing I've ever seen, and I have zero clue how it works.

smusamashah 4 years ago |

This is mind blowing. I was not expecting the sketch style images to actually look like sketches. Style transfer based sketches never look like sketches.

This and the current AI generated art scene makes it looks like that artwork is now a "solved" problem. See AI generated art on twitter etc.

There is a strong relation between the prompt and the generated images but just like GPT-3, it fails to fully understand what was being asked. If you take the prompt out of the equation and see the generated artwork on its own, its upto your interpretation just like any artwork.

andreyk 4 years ago | |

I would caution that artwork is only 'solved' with relatively simple text prompts. To create a novel painting with a precise mix of elements that would take a paragraph or more to explain is still tough, though DALL-E 2 does seem like a big step towards that.

sillysaurusx 4 years ago | | |

Sam seems to be demoing something fairly close on twitter. https://twitter.com/sama/status/1511724264629678084

The solar powered ship with a propeller sailing under the golden gate bridge during sunset with dolphins jumping around was pretty impressive. https://twitter.com/sama/status/1511731259319349251

I think it's only missing the dolphins.

nahuel0x 4 years ago | | |

Also note you can make an image out of many spatially localized prompts combined, in an iterative IA-human process.

randomsearch 4 years ago | |

I'm blown away by these results, but one caveat here: the AI is great at creating illustrations, not art.

Creating great _art_ that Grayson Perry (for example) would recognise as such is probably AGI-complete, because it requires a deep understanding of the human condition, society, and a lot of reasoning skills.

A great artist could certainly use Dall-E 2 as part of their method, though.

smusamashah 4 years ago | | |

If I showed you a generated piece, and don't tell you what prompt generated it, you will find it just as meaningful as any other piece of art made by hand.

This is why we are blown away by some pieces of text generated by GPT-3 as if it has its own mind. Even most abstract art has meaning for anyone who is looking for it.

What I am saying is if a generated artwork is indistinguishable from what a human can make than that's all that was needed.

zackmorris 4 years ago |

Apologies for an open-ended question but: does anyone know if there is a term for something like Turing-completeness within AI, where a certain level of intelligence can simulate any other type of intelligence like our brains do?

For example, using DeMorgan's theorem, we can build any logic circuit out of all NAND or NOR gates:

https://www.electronics-tutorials.ws/boolean/demorgan.html

https://en.wikipedia.org/wiki/NAND_logic

https://en.wikipedia.org/wiki/NOR_logic

Dall-E 2's level of associative comprehension is so far beyond the old psychology bots in the console pretending to be people, that I can't help but wonder if it's reached a level where it can make any association.

For example, I went to an AI talk about 5 years ago where the guy said that any of a dozen algorithms like K-Nearest Neighbor, K-Means Clustering, Simulated Annealing, Neural Nets, Genetic Algorithms, etc can all be adapted to any use case. They just have different strengths and weaknesses. At that time, all that really mattered was how the data was prepared.

I guess fundamentally my question is, when will AGI start to become prevalent, rather than these special-purpose tools like GPT-3 and Dall-E 2? Personally I give it less than 10 years of actual work, maybe less. I just mean that to me, Dall-E 2 is already orders of magnitude more complex than what's required to run a basic automaton to free humans from labor. So how can we adapt these AI experiments to get real work done?

croddin 4 years ago |

This reminds me of the holodeck in Star Trek. Someone could walk into the Holodeck and say “make a table in the center of the room. Make it look old.” It seemed amazing to me that the computer could make anything and customize it with voice. We are pretty close to star trek technology now in computer ability (ship’s computer, not Commander Data). I guess to really be like the holodeck it needs to be able to do 3d and be in real time but that seems a lot closer now. It will be cool when this could be in VR and we can say make an astronaut riding a horse, then we can jump on the back of the horse and ride to a secret moon base.

marviel 4 years ago |

It's becoming clear that efficient work in the future will hinge upon one's ability to accurately describe what one wants. Unpacking that -- a large piece is the ability to understand all the possible "pitfalls" and "misunderstandings" that could happen on the way to a shared understanding.

While technical work will always have a place -- I think that much creative work will become more like the management of a team of highly-skilled, niche workers -- with all the frustrations, joys, and surprises that entails.

armchairhacker 4 years ago | |

Programming, art, music, is just “describing what you want” in a very specific way. This is describing what you want in a much more vague way.

The upside it that it’s more “intuitive” and requires much less detail and technique, as the AI infers the detail and technique. The downside is that it’s really hard to know what the AI will generate or get it to generate something really specific.

I believe the future will combine the heuristics of AI-generation with the specificity of traditional techniques. For example, artists may start with a rough outline of whatever they want to draw as a blob of colors (like in some AI image-generation papers). Then they can fill in details using AI prompts, but targeting localized regions/changes and adding constraints, shifting the image until it’s almost exactly what they imagined in their head.

killerstorm 4 years ago | |

No... These models are trained to predict.

You can definitely make them incremental. You can give it a task like "make a more accurate description from initial description and clarification". Even GPT-3-based models available today can do these tasks.

Once this is properly productionized it would be possible to implement stuff just talking with a computer.

golergka 4 years ago | |

> accurately describe what one wants

Isn't that essentially what programming already is?

apexalpha 4 years ago |

I would probably pay good money to have a OLED painting in my house that I can just tell what kind of painting to generate each day.

Imagine waking up and telling your (preferably locally hosted) voice assistant that today really feels like a Rembrandt day and the AI just generates new paintings for you.

zitterbewegung 4 years ago |

I don’t want to dismiss this new model and achievements but we are getting to the point where I feel like what we saw in the open source versus close source systems we see in new ml models another one is forming for open and closed models. I think that larger and larger models will have disclaimers either restricting you from using it commercially (a great deal of academics and NVIDIA models are doing this. And OpenAI just puts it behind an API with the rules :

Curbing Misuse Our content policy does not allow users to generate violent, adult, or political content, among other categories. We won’t generate images if our filters identify text prompts and image uploads that may violate our policies. We also have automated and human monitoring systems to guard against misuse.

asxd 4 years ago | |

They're pretty strict about usage:

- https://github.com/openai/dalle-2-preview/blob/main/system-c...

jdrc 4 years ago | |

It should be possible to create open source versions, researchers will find a way if something is cool enough

impostervt 4 years ago |

Very cool stuff. For me, the most interesting was the ability to take a piece of art and generate variations of it.

Have a favorite painter? Here's 10,000 new paintings like theirs.

photochemsyn 4 years ago | |

Well, one of my favorite painters is Henri Rousseau, and one of his great paintings is War, 1984:

https://www.henrirousseau.net/war.jsp

However, this painting has themes of violence and politics plus some nude dead bodies, so it violates the content policy: "Our content policy does not allow users to generate violent, adult, or political content, among other categories."

So what you'd get is some kind of sanitized watered-down tepid version of Rossueau, the kind of boring drivel suitable for corporate lobbies everywhere, guaranteed not to offend or disturb anyone. It's difficult to find words... horrific? dystopian? atrocious? No, just no.

corysama 4 years ago | | |

They are being rightly cautious. It’s going to take time to figure out good practice with these tools. Everyone calling out basic caution as “dystopian” is really over the top.

I’ve been using tools like this for over a year now. Even with filtered dataset and filtered interface, they can make images that would make the Fangoria crowd blush if you put the slightest effort into it.

It’s one thing to be able to make brain-wrenching images with a lot of photoshop effort (or digging hard enough in the dark corners of the internet). It’s another thing entirely give anyone the ability to spew out thousands of them trivially.

cwillu 4 years ago | | |

“Criticize?! It is meant to draw blood! It is Art! Art!”

pingeroo 4 years ago | |

That was also my favourite concept, especially with OpenAI Jukebox (https://openai.com/blog/jukebox/). The idea of having new music in the style of your favourite artist is amazing.

However the fidelity of their music AI kinda sucks at this point, but I'm sure we'll get pitch perfect versions of this concept as the singularity gets closer :)

throwaway675309 4 years ago | |

I was just thinking the same thing, how awesome would it be to be able to use this in conjunction with the Samsung frame in art gallery mode and have it just generate novel paintings in the style of your favorite painters.

bakztfuture 4 years ago |

I made a YouTube series last summer on the massive potential future of DALL-E and multimodal AI models.

Imagine not just DALL-E 2 but a single model which be trained on different kinds of media and generate music, images, video and more.

The series talks about:

- essential lessons for AI creatives of the future

- shares details on how to compete creatively in the future

- talks about how to make money through Multimodal AI

- make predictions about AI’s effects on society

- at a very basic level, discusses the ethics of multimodal AI and the philosophy of creativity itself

By my understanding, it's the most comprehensive set of videos on this topic.

The series is free to watch entirely on YouTube: GPT-X, DALL-E, and our Multimodal Future https://www.youtube.com/playlist?list=PLza3gaByGSXjUCtIuv2x9...

axg11 4 years ago |

This is incredible work.

From the paper:

> Limitations > Although conditioning image generation on CLIP embeddings improves diversity, this choice does come with certain limitations. In particular, unCLIP [Dall-E 2] is worse at binding attributes to objects than a corresponding GLIDE model.

The binding problem is interesting. It appears that the way Dall-E 2 / CLIP embeds text leads to the concepts within the text being jumbled together. In their example "a red cube on top of a blue cube" becomes jumbled and the resulting images are essentially: "cubes, red, blue, on top". Opens a clear avenue for improvement.

duren 4 years ago |

I've been playing around with it today and have been super impressed with its ability to generate pretty artful digital paintings. Could have big implications for designers and artists if and when they allow you use custom palettes, etc.

Here's an example from my prompt ("a group of farmers picking lettuce in a field digital painting"): https://labs.openai.com/s/jb5pzIdTjS3AkMvmAlx69t7G

Sateeshm 4 years ago | |

Honestly, that painting is nonsensical. It's great at a glance. But when you look at it for a few seconds, it's just impressionist type blob painting without any features that make impressionist paintings great.

pingeroo 4 years ago | |

Neat! Were you part of the initial testing batch or granted access via waitlist?

duren 4 years ago | | |

I was invited

d--b 4 years ago |

Am I the only one to think that the AI world is divided into 2 groups:

1. Deepmind, who solved go, protein folding, and that seems really onto something.

2. Everyone else, spending billions to build machines that draw astronauts on unicorns, and smartish bot toys.

Dig1t 4 years ago |

Most of the conversation around this model seems to be about its direct uses.

This seems to me like a big step towards AGI; a key component of consciousness seems (in my opinion) to be the ability to take words and create a mental picture of what's being described. Is that the long term goal WRT researching a model like this?

nahuel0x 4 years ago |

"Any sufficiently advanced technology is indistinguishable from magic"

7373737373 4 years ago | |

"Any sufficiently advanced hyperreality is indistinguishable from real life"

uses 4 years ago |

Is anyone looking into what it means when we can generate infinite amounts of human-like work without effort or cost?

> Curbing Misuse [...]

That's great, nowadays the big AI is controlled by mostly benevolent entities. How about when someone real nasty gets a hold of it? In a decade the models anyone can download will make today's GPT-3 etc look like pong right?

Recommender systems etc are already shaping society and culture with all kinds of unintended effects. What happens when mindless optimizing models start generating the content itself?

turdnagel 4 years ago |

I'm genuinely curious to hear Sam Altman's (and/or the OpenAI team's) perspective on why these products need to be waitlisted. If it's a compute issue, why not build a queuing system? If it's something else (safety related? hype related?) I'd love to understand the thinking behind the decision. More often than not, I sign up for waitlists for things like this and either (1) never get in to the beta or (2) forget about it when I eventually do get in.

minimaxir 4 years ago | |

For GPT-3 it was a combination of both compute and safety. Given the notes in the System Card (https://github.com/openai/dalle-2-preview/blob/main/system-c... ), OpenAI is likely doubling-down on safety here.

krick 4 years ago |

Regardless of how much cherry-picking there was, some of these pictures are just beautiful.

hwers 4 years ago |

The correct response here from the artists point of view should be a widespread coming together against their art being used as training data for ML models. With a quickly spread new license on most major art submission sites that explicitly forbids AI algorithms from using their work, artists would effectively starve OpenAI and others from using their own works to put them out of a job.

w-m 4 years ago | |

The license should forbid competing artists to using the artist’s work as well. In fact, no human should come in contact with the produced art, otherwise they might be accidentally inspired by it, thus stealing from the original creator.

nonbirithm 4 years ago | |

There has been precedent for such a movement. In 2011, an "art collective" sourced user-submitted artwork without the artists' consent for an installation where visitors were instructed to step all over printouts of the art on the floor. The artists complained that their work was being used inappropriately. A large number of those artists to left for other art websites en masse.[0]

There doesn't seem to be an equivalent movement with AI-generated art, probably because the understanding of how the models are trained from large datasets is not mainstream yet. I would imagine thousands of those same artists/consumers would be up in arms if they had a basic understanding of ML and millions of average people were beginning to feed the models their own keywords.

This I think ties in with the "responsibility" principles that OpenAI outlines. Once the generation technique has been reverse-engineered and can be used without limits, there is no way to uninvent it. It can be made illegal, but humans can always find a way around laws if they want something badly enough. This could have drastic consequences if enough artists believe that the training violates their respect or other intangible humanistic qualities. With technological advancement that can never be put back in the bottle and spreads to occupy the entire consciousness of the Internet, their options for recourse will be far different than being able to tell a single fringe art group siphoning others' content to pack up and leave.

[0] https://en.wikipedia.org/wiki/Pixiv#Chaos_Lounge

eganist 4 years ago |

The timing of the Dall-E 2 launch an hour ago seems to correspond with a recent piece of investigative journalism by Buzzfeed News about one of Sam Altman's other ventures, published 15 hours ago and discussed elsewhere actively on HN right now:

https://news.ycombinator.com/item?id=30931614

I point this out because while Dall-E 2 seems interesting (I'm out of my depth, so delegating to the conversation taking place here), the timing of its release as well as accompanying press blasts within the last hour from sites like TheVerge—verified via wayback machine queries and time-restricted googling—seems both noteworthy and worth a deeper conversation given what was just published about Worldcoin.

To be clear, it's worth asking if Dall-E 2 was published ahead of schedule without an actual product release (only a waitlist) to potentially move the spotlight away from Worldcoin.

rvz 4 years ago |

At this point with WaveNet, GPT-3, Codex, DeepFakes and Dall-E 2, you cannot believe anything you see, hear, watch, read on the internet anymore as an AI can easily generate nearly anything that can be quickly believable by millions.

The internet's own proverb has never been more important to keep in mind. A dose of skepticism is a must.

Traster 4 years ago |

To be honest the Girl with a Pearl Earing "variations" look a little bit like a crime against art. It's like the person who built this has no idea why the Girl with a Pearl Earing is good art. "Here's the Girl with a Pearl Earing " - "OK, well here's some girls with turbans"

Art is truth.

eks391 4 years ago | |

> It's like the person who built this has no idea why the Girl with a Pearl Earing is good art.

The people didn't program Dall E how to make art. They taught it to recognize patterns and create something by extrapolating from the patterns, all on its own. So the AI isn't a projection of what they think is good art, it's projecting what it thinks is good art, based on a prompt. The output is its best effort of a feeling, even if the feeling had to be inputted by a living person. So it's still art that's as good as the feeling that it came from-fleeting feelings being lower quality than those that required more time and thought

sillysaurusx 4 years ago | |

Maybe. https://cdn.openai.com/dall-e-2/demos/variations/modified/gi... was pretty impressive.

I think the results are being poisoned by the fact that most old paintings have deteriorated colors, so the training data looks nothing like the originals. It's certainly a lot yellower than https://cdn.openai.com/dall-e-2/demos/variations/originals/g...

bbbobbb 4 years ago | |

To be honest it's hard for me to imagine alternate reality where the 'original' is not swapped with one of the 'variations' without same comment underneath. Why is the 'original' good art?

whateveracct 4 years ago | |

Yeah Dall-E makes stuff that qualifies as art to the eye of a philistine. Makes sense it was built with VC money.

cm2012 4 years ago |

Sam's Twitter thread today was more impressive than the website.

https://twitter.com/sama/status/1511724264629678084?s=20&t=6...

bradgessler 4 years ago |

Could somebody build this for SVG icons? I’d invest in it.

applgo443 4 years ago | |

What do you want?

bradgessler 4 years ago | | |

A library like https://icons8.com/icons where you can just tell it what icon you want and the style (e.g. Material, outline, solid, iOS). It would do it’s thing and spit it out.

dang 4 years ago |

Related and kind of fun:

Sam Altman demonstrates Dall-E 2 using twitter suggestions - https://news.ycombinator.com/item?id=30933478 - April 2022 (3 comments)

_nateraw 4 years ago |

If you're interested in generative models, Hugging Face is putting on an event around generative models right now called the HugGAN sprint, where they're giving away free access to compute to train models like this.

You can join it by following the steps in the guide here: https://github.com/huggingface/community-events/tree/main/hu...

There will also be talks from awesome folks at EleutherAI, Google, and Deepmind

zone411 4 years ago |

Some more examples: https://twitter.com/sama/status/1511724264629678084

jdrc 4 years ago | |

there are some masterpieces there. this is the end of clipart and stock images, and the beginning of awesome illustrations in every article.

frakkingcylons 4 years ago |

Impressive results no doubt, but I’m reserving judgment until beta access is available. These are probably the best images that it can generate, but what I’m most interested in is the average case.

awinter-py 4 years ago |

They're using training set restriction and prompt engineering to control its output

> By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to these concepts

> We won’t generate images if our filters identify text prompts and image uploads that may violate our policies

The 'how to prevent superintelligences from eating us' crowd should be taking note: this may be how we regulate creatures larger than ourselves in the future

And even how we regulate the ethics of non-conscious group minds like big companies

albertzeyer 4 years ago |

Some initial video by Yannic Kilcher: https://www.youtube.com/watch?v=gGPv_SYVDC8

marcodiego 4 years ago |

Cartoonists, say good-bye to your job.

Imnimo 4 years ago | |

Maybe one day there will a job for people who are masters of the art of prompt hacking - they know all the special phrases and terms to get Dall-E to output the most aesthetically pleasing images. They guard their magic words like a medieval alchemist guards his formulas. Corporations will pay top-dollar for an expertly-crafted, custom-tailored prompt for their advertising campaign.

andybak 4 years ago | |

The goalposts are definitely being moved. But tastes adapt accordingly.

I suspect trends in design will move towards those areas that AI struggles with (assuming there are any left!)

rvz 4 years ago | |

NFTs using Dall-E 2 variations incoming.

loufe 4 years ago | | |

Not that it's impossible to hide the provenance of an image, but it is explicitly forbidden in the TOS of DALL-E to sell the images as NFTs or otherwise.

criddell 4 years ago | |

Randall Munroe should quit now. Soon anybody will be able to create XKCD-type comics.

jedberg 4 years ago |

This reminds me of a discussion I had with the high school band teacher in the 90s. I was telling him that one day computers would play music and you won't be able to tell the difference. He got mad at me and told me that a computer could never play as well as a human with feelings, who can feel the piece and interpret it.

I think we passed that point a while ago, but seeing this makes me think we aren't too far off from computers composing pieces that actually sound good too.

billconan 4 years ago |

I'm curious, is this something feasible to train (and inference) on a consumer level machine, or this is something can only be done by institutes?

tnzk 4 years ago |

In the thread Sam Altman giving a demo of this [*] I find multiple people trying to query "solar panels" or "rabbit", are they some meme in the context of AI-generated arts?

[*] https://twitter.com/sama/status/1511724264629678084

greyhair 4 years ago |

Interesting, yes, but I went to the link, and browsed the 'generated artwork' and all if it was subjectively inferior to the original that it generated from. Every single piece. So I am not sure what the 'value' in it is, at this stage.

As far as the text driven, I would have to mess with some non pre-canned presentations to see how useful it was.

mario143 4 years ago |

Yeah, I mean you're right that ultimately the proof is in the pudding.

Obviously the real challenge here is "well how do you teach a three-year-old how to paint?" - and I think you're right that that question still has a lot of alchemy to it.

blingbleng 4 years ago |

I am disappointed to hear it wasn't released, but what disappointed me more is that people actually approve this decision. Seriously? We shouldn't teach people how to write because that can be abused, can be used to transfer malicious ideas. Sounds absurd? So does limiting people's access to AI tools.

mouzogu 4 years ago |

So what does the future of human creativity look like when an AI can generate possibly infinite variations of an idea.

andreyk 4 years ago | |

AI becomes a tool for artists to use - generative art has been around for a long time, now that particular genre of art will presumably become much more prominent.

For anyone pondering such questions, I would recommend reading "The Past, Present, and Future of AI Art" - https://thegradient.pub/the-past-present-and-future-of-ai-ar...

pingeroo 4 years ago | | |

Wouldn't it be more like, "AI becomes an artist for people to use"? Will we have people distinguished as "artists" if the ability to make awesome art becomes available to everybody?

keiferski 4 years ago | |

I think you’ll see more of a focus on the artist themselves. These images are nice, but they have basically zero narrative value.

This is really already the case, actually. Most artworks have “value” because they have a compelling narrative, not because they look pretty. So I think we can expect future artists to really emphasize their background, life story, process of making the art, etc. All things that cannot be done by a machine.

6gvONxR4sf7o 4 years ago | |

I expect that interactive art will be huge. Game design gets fascinating, for example.

tomrod 4 years ago | |

I seem to recall an XKCD that I cannot find, but the premise goes like:

When you have a digital display of pixels, if you randomly color pixels at 24 fps then you will eventually display every movie that can be or will ever be made, powerset notwithstanding. This can also be tied to digital audio.

In short, while mind-blowingly large, the space of display through digital means is finite.

mouzogu 4 years ago | | |

Sounds a bit like the tower of babel of jorge borges. I imagine most of the videos would be complete random nonsense.

I think an AI infused future is going to become increasingly more absurd and surreal, it will lead to a kind of creative and cultural nihilism, if that's the right term.

Like the value of originality will become meaningless.

mycroftiv 4 years ago |

I tried to comment here previously, but I dont see it posted. It was about the meaning of 'open' and whether the question of suffering and freedom of the AIs was being taken into ethical consideration, not just the ability of humans to use them as tools for their own possibly paper-clippy purposes.

visarga 4 years ago | |

This is a tool AI, not an agent AI. Agent AI's can explore and change their environments, this model does not, and it has no sense of time passing.

bprasanna 4 years ago |

NFT world is mostly filled with modern art forms which are never seen. If Dall-E can make such images out of the box in seconds, then it looks like AIs can take over NFT world like storm. May be already its happening, and i just didn't know!

mrfusion 4 years ago |

Is this bringing us closer to combining image and language understanding within one model?

beernet 4 years ago | |

Check out MAGMA for that: https://news.ycombinator.com/item?id=30699776

mycroftiv 4 years ago |

My main question is - is this really 'open' meaningfully? And are concepts of kindness and freedom being applied to the minds inside the boxes? I dont know where the 'openai' brand is at on these things personally.

krageon 4 years ago |

This is really cool, but before you may use it you must give out your name and a phone number. I was almost taken in by it, but OpenAI is and probably always will be invasive and overbearing. It's really a shame.

latexr 4 years ago |

What confusing pricing[1]:

> Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.

Further down, in the FAQ[2]:

> For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.

> To learn more about how tokens work and estimate your usage…

> Experiment with our interactive Tokenizer tool.

And it goes on. When most questions in your FAQ are about understanding pricing—to the point you need to offer a specialised tool—perhaps consider a different model?

[1]: https://openai.com/api/pricing/

[2]: https://openai.com/api/pricing/#faq-token

pingeroo 4 years ago | |

This is for their GPT models, not Dall-E. I don't think they have released any pricing information for Dall-E yet, as it is still in waitlist mode.

belval 4 years ago | |

Haven't read the paper, but they are probably using something like sentencepiece with sub-word splitting and then charge by the number of resulting token.

https://github.com/google/sentencepiece

narrator 4 years ago |

While we're being distracted by endless social media and meaningless news, AI technology is advancing at a mind blowing pace. I'd keep my eye on that ball instead of "the current thing."

The_rationalist 4 years ago | |

Thank you narrative voice

imperio59 4 years ago |

What happens when they train this thing to make videos? We're about to be dealing with a flood of AI-generated visual/video content. We already have to deal with text bots everywhere... wow.

eks391 4 years ago | |

I'm excited for when that happens. I didn't think of the malicious uses, which now that you brought it up I can think of many, but I still think the pros are worth the cons

thisistheend123 4 years ago |

This is what magic looks like.

Great work.

Looking forward to when they start creating movies from scripts.

arecurrence 4 years ago |

Is there a geometric model relative to this? EG: "corgi near the fireplace" but the output is a 3d model of the corgi and fireplace with shaders rather than an image.

majidmir 4 years ago | |

Wait until you see the same concept combined with NeRF idea. The output won’t be 3d shapes but another model that can generate realistic and geometrically consistent images of a scene viewed from different angles.

cerol 4 years ago |

Maybe this will be what finally puts an end to the whole art NFT shenanigans. A piece of art isn't so unique if there are infinite slight variations on the market.

gallerdude 4 years ago |

This is extremely interesting. We’ve had some amazing AI models come out in the past few days. We’re getting closer and closer to AI becoming a facet of everyday life.

rndphs 4 years ago |

This is going to be mostly a rant on OpenAI's "safer than thou" approach to safety, but let me start with that I think this technology I think is really cool, amazing, powerful stuff. Dall-E (and Dall-E 2) is an incredible advance over GANs, and no doubt will have many positive applications. It's simply brilliant. I am someone who has been interested in and has followed the progress of ML generated images for nearly a decade. Almost unimaginable progress has been made in the last five years in this field.

Now the rant:

I think if OpenAI genuinely cared about the ethical consequences of the technology, they would realise that any algorithm they release will be replicated in implementation by other people within some short period of time (a year or two). At that point, the cat is out of the bag and there is nothing they can do to prevent abuse. So really all they are doing is delaying abuse, and in no way stopping it.

I think their strong "safety" stance has three functions:

1. Legal protection 2. PR 3. Keeping their researchers' consciences clear

I think number 3 is dangerous because researchers are put under the false belief that their technology can or will be made safe. This way they can continue to harness bright minds that no doubt have ethical leanings to create things that they otherwise wouldn't have.

I think OpenAI are trying to have the cake and eat it too. They are accelerating the development of potentially very destructive algorithms (and profiting from it in the process!), while trying to absolve themselves of the responsibility. Putting bandaids on a tumour is not going to matter in the long run. I'm not necessarily saying that these algorithms will be widely destructive, but they certainly have the potential to be.

The safety approach of OpenAI ultimately boils down to gatekeeping compute power. This is just gatekeeping via capital. Anyone with sufficient money can replicate their models easily and bypass every single one of their safety constraints. Basically they are only preventing poor bad actors, and only for a limited time at that.

These models cannot be made safe as long as they are replicable.

To produce scientific research requires making your results replicable.

Therefore, there is no ability to develop abusable technology in a safe way. As a researcher, you will have blood on your hands if things go wrong.

If you choose to continue research knowing this, that is your decision. But don't pretend that you can make the algorithms safer by sanitizing models.

visarga 4 years ago | |

OpenAI is not the only AI shop. If they didn't make DALL-E someone else would, and control its release as they see fit.

kovek 4 years ago |

One of my teachers once said “An art piece is never done”. So, I wonder what could that mean for the model to keep making improvements to the piece.

chronolitus 4 years ago | |

IIRC that's how it works! it starts from a first image, and improves it until 'satisfied' that the result fits the prompt

Ftuuky 4 years ago |

What jobs will be there in 5~10 years when we consider all the progress done with Dall-E, GPT-3, Codex/GitHub Copilot, Alpha* and so on?

6gvONxR4sf7o 4 years ago | |

Things that require understanding of causation will be safe longer. Progress like this is driven by massive datasets. Meanwhile, real world action-taking applications require different paradigms to take causation into account[0][1], and especially to learn safely (e.g. learning to drive without crashing during the beginner stages).

There's certainly research happening around this, and RL in games is a great test bed, but people choosing actions will safe from automation longer than people not choosing actions, if that makes sense. It's the person who decides "hire this person" vs the person who decides "I'll use this particular shade of gray."

[0] The best example is when X causes Y and X also causes Z, but your data only includes Y and Z. Without actually manipulating Y, you can't see that Y doesn't cause Z, even if it's a strong predictor.

[1] Another example is the datasets. You need two different labels depending on what happens if you take action A or B, which you can't have simultaneously outside of simulations.

phphphphp 4 years ago | |

Most creative output is duplicated effort: consider how much code each person on HN has written that has been written before. Consider how, a decade ago, we were all writing html and styling it, element by element, and then Twitter bootstrap came along and revolutionised front-end development in what is, ultimately, a very small and low technology way. All it really did was reduce duplicate effort.

Nowadays there’s lots of great low/no code platforms, like Retool, that represent a far greater threat to the amount of code that needs to be produced than AI ever will.

To use a cliche: code is a bug, not a feature. Abstracting away the need for code is the future, not having a machine churn out the same code we need today.

beders 4 years ago | |

The ones undoing the damage caused by dumb pattern recognizers and generators? ;)

whywhywhywhy 4 years ago |

I never actually found a way to use Dall-E 1, did they ever Open that up to people outside their building?

hdjjhhvvhga 4 years ago |

It's very nice, just the name of the parent org is wrong - it should be called ClosedAI.

zdefz 4 years ago |

Can this also be used the other way around to create Alternative texts for screen-reader users?

aya96 4 years ago |

Do you think some of these techniques could be slightly modified, and applied to DNA sequences?

jelliclesfarm 4 years ago |

“Preventing Harmful Generations”? = Fail.

Caravaggio is probably chortling from wherever he is ..

EZ-Cheeze 4 years ago |

"Computer, render Bella and Gigi Hadid playing tennis in bikinis"

tintor 4 years ago |

This would be great for generating Minecraft levels from voice commands.

gotaquestion 4 years ago |

The big question to me is, "What does Dall-E like the most?"

visarga 4 years ago | |

It likes to lower the training loss, you could say this is what made it what it is.

gotaquestion 4 years ago | | |

That's pretty funny. I never thought of it that way.

skybrian 4 years ago |

Sam Altman took some user requests on Twitter: https://twitter.com/sama/status/1511724264629678084

lalopalota 4 years ago |

One step closer to combining Scribblenuats with emoticons!

Apofis 4 years ago |

So I can't do Teddy Bears Riding a Horse?

qualudeheart 4 years ago |

Deep Learning plows through yet another wall.

fhe 4 years ago |

perhaps in the not so distant future, we can simply feed a movie script to the program and out comes a feature film.

victor_e 4 years ago |

Wow - mindblowing and kinda scary really.

husamia 4 years ago |

This and NFT come in hand in hand.

agumonkey 4 years ago |

has openai tried their ideas on music ?

hemreldop 4 years ago |

So two Indians and two Chinese authors. The new world is incredible.

sydthrowaway 4 years ago |

gamechanger

ordu 4 years ago |

Dall-E 2 seems to be incapable to catch the essence of the art. I'm not really surprised by it, I'd be surprised a lot if it could. But nevertheless: if you looked in the eye of a Girl With A Pearl Earring[1], you'd be forced to stop and to think what does she have on her mind right now. Or may be you had some other question in your mind, but it really stops people to think. But none of Dall-E interpretations have this quality. Works inspired by Girl With A Pearl Earring sometimes have at least part of that power, like Girl With a Babmoo Earring[2]. But none of Dall-E interpretations have such a power.

And this observation may lead to a great consequences for visual arts. I had a lot of joy of looking at different Dall-E interpretations to find what the flaw of the interpretation that forbids it to be a piece of art of an equal value to the original. It is a ready made tool to search for explanations of the Power of Art. It cannot say what detail make a picture to be an artwork, but it allow to see multiple data points, and to narrow the hypothesis space. My main conclusion is that the pearl earring have nothing to do with the power of art. It is something in the eye, and probably with the slightly opened mouth. (Somehow Dall-E pictured all interpretations with closed lips, so it seems to be an important thing, but I need more variation along this axis to be sure).

[1] https://en.wikipedia.org/wiki/Girl_with_a_Pearl_Earring [2] https://yourartshop-noldenh.com/awol-erizku-girl-with-the-pe...

hcks 4 years ago | |

On the meta level, we are now at the point where the dubious comments downplaying the AI starts arguing on the plane of art criticism

whateveracct 4 years ago | | |

Well that's mostly because the entire conversation in this thread has been calling the generated images Art

Veedrac 4 years ago | |

Initial Outputs from New AI Model Not As Good at Nuance as Historic Artwork, Approach Deemed Hopeless

ordu 4 years ago | | |

Oh... Not hopeless. The very fact that I spent some minutes watching Interpretations of Girl With a Pearl Earring, is the enough evidence that it is not hopeless. I praise the work that was done. Moreover I hoped that people would get it as an inspiration to do even more.

joshcryer 4 years ago | |

What do you think of the third to last image of the Girl With A Pearl Earring that DALL-E 2 created? I find it more compelling than the original with how her face is deeply cast in shadow. There's still that original 'essence' of the glint in her eye. But her earring is a bell. As if the AI is sending a message that what if the bell were to ring?

ordu 4 years ago | | |

I'm not sure, that I can express myself in English, which is not my native language, and this needs some very nuanced control over tiniest shades of meaning, but I'll try nevertheless, just for fun of it at least.

The original girl is more open, more independent and mindless. The interpretation's girl is more self-controlled, assertive and not interested really, just going throw all those movements of regular communication between people. Maybe it's just me, but what I really value on such occasions is mindlessness, the ability of people to not mind themselves, to let their selves to dissolve in the environment. I cannot keep tears in my eyes sometimes when I watch some entertainer playing Chopin or Paganini, because what I see in their movements is complete dissolution of a person in a piece of music, in a piece of art and skill. An entertainer just do what they do with their full attention on it, and with all their motivation focused on it. There is nothing here for them, just them and their actions.

There is not a single thought devoted to how people around me would react to what I do and how I do that. I just do what I do and I do not care about people around me, and if it somehow makes people happy... I don't care really. I mean I know that afterwards I'd feel a pride of myself, but just for now I don't really care.

I know this feeling. I like to sing, and I'm good at it (above average), and I know what it feels like to dissolve into the song and to let song to rule. I play piano and I know what it is like to dissolve into the piece I'm playing, to stop myself from existing, to let music to take the lead. And the original painting make me believe that the girl is in this state of mind. I do not know the history or the remaining of the story, I do not know if she get into this state for a second, of she never leaves it (it may be a sad experience, don't you think?), but somehow I know that right now she is right in this state. I want to watch this her moment for an eternity.

Thinking about it, I'd confess that Interpretation Girl does trigger the same, but on a smaller scale. I feel how my mind is trying to find a coherent state to her gaze, but this feeling stops in tens of microseconds, not hundreds of them.

edit: want->watch. Stupid mistake ruining the meaning of the sentence.

jdrc 4 years ago | |

art criticism should be off topic here. This is more like chopping off the visual cortex and some association cortex from a brain and stimulating it. there is no person signaling to us, nor can we attribute any striking images that may come up to a person with agency.

But its like a giant database of decent clipart for anything we can imagine

ordu 4 years ago | | |

> This is more like chopping off the visual cortex and some association cortex from a brain and stimulating it.

We do not know exactly what part of our perception of reality can be attributed to "the visual cortex and some association cortex". But now we can feel it. We can test it. We can compare ourselves with the cold calculating machine. I believe that it is a priceless opportunity that we shouldn't miss. At least I personally can't. I'm going to figure out is it possible to me to have such a companion as Dall-E in mine wanderings in a sea of information in Internet, and if it is, then to get one.

> But its like a giant database of decent clipart for anything we can imagine

And this also. Yes. Though I'm not interested in clipart.