Alias-Free GAN

minimaxir 5 years ago |

The first two demo videos are interesting examples of using StyleCLIP's global directions to guide an image toward a "smiling face" as noted in that paper with smooth interpolation: https://github.com/orpatashnik/StyleCLIP

I had ran a few chaotic experiments with StyleCLIP a few months ago which would work very well with smooth interpolation: https://minimaxir.com/2021/04/styleclip/

Chilinot 5 years ago | |

That first picture of mark zuckerberg smiling is just straight up cursed. Interesting write up though.

Doxin 5 years ago | | |

I audibly went "GAH!" when that scrolled into view. Impressive work.

Lichtso 5 years ago |

The previous approaches learned screen-space-textures for different features and a feature mask to compose them.

Now it seems to actually learn the topology lines of the human face [0], as 3D artists would learn them [1] when they study anatomy. It also uses quad grids and even places the edge loops and poles in similar places.

[0] https://nvlabs-fi-cdn.nvidia.com/_web/alias-free-gan/img/ali... [1] https://i.pinimg.com/originals/6b/9a/0c/6b9a0c2d108b2be75bf7...

eru 5 years ago | |

Yes. It's interesting that imposing what are essentially 2d invariance constraints leads the network to learn what we regard as 3D concepts.

pvillano 5 years ago | | |

There are some interesting 2d things our eyes do for 3d. If something is on the ground, half is above the horizon and half is below. Parallax is a 2d phenomenon.

goldemerald 5 years ago |

After styleGAN-2 came out, I couldn't image what improvements could be made over it. This work is truly impressive.

The comparisons are illuminative: StyleGAN2's mapping of texture to specific pixel location looks very similar to poorly implemented video-game textures. Perhaps future GAN improvements could come from tricks used in non-AI graphic development.

tyingq 5 years ago | |

>I couldn't image what improvements could be made over it

Still has the telltale of mismatched ears and/or earrings. This seems the most reliable way to recognize them. Well, and the nondescript background.

sbierwagen 5 years ago | | |

Teeth too. Partially covered objects in 3D space have been hard for a GAN to figure out. (See also hands)

I wonder what dataset you could even use to tell a GAN about human internals. 3D renders of a skull with various layers removed?

cout 5 years ago | | |

I've noticed the same thing with ESRGAN -- teeth are always awful. I'm looking forward to the day when someone figured out how to fix that; I have a few sentimental images taken with a cell phone I would love to see upscaled and cleaned up.

mzs 5 years ago | | |

Mismatched reflections across eyes is the dead give-away for me.

isoprophlex 5 years ago |

If ReLU-introduced high frequency components are indeed the culprit, won't using "softened" ReLU (without discontinuity in the derivative at 0) everywhere solve the problem, too?

Imnimo 5 years ago |

I wonder if you could make the noise inputs work again by using the same process as for the latent code - generate the noise in the frequency domain, and apply the same shift and careful downsampling. If you apply the same shift to the noise as to the latent code, then maybe the whole thing will still be equivariant? In other words, it seems like the problem with the per-pixel noise inputs is that they stay stationary while the latent is shifted, so just shift them also!

evo 5 years ago |

I wonder if there are learnings from this that could be transposed into the 1-D domain for audio; as far as I know, aliasing is a frequent challenge when using deep learning methods for audio (e.g. simulating non-linear circuits for guitar amps).

fogof 5 years ago |

You can see what they're saying about the fixed in place features with the beards in the first video, but StyleGAN gets the teeth symmetry right whereas this work seems to have trouble with it. Why don't the teeth in the StyleGAN slide around like the beard does?

minimaxir 5 years ago | |

That's likely the GANSpace/SeFa part of the manipulation.

> In a further test we created two example cinemagraphs that mimic small-scale head movement and facial animation in FFHQ. The geometric head motion was generated as a random latent space walk along hand-picked directions from GANSpace [24] and SeFa [50]. The changes in expression were realized by applying the “global directions” method of StyleCLIP [45], using the prompts “angry face”, “laughing face”, “kissing face”, “sad face”, “singing face”, and “surprised face”. The differences between StyleGAN2 and Alias-Free GAN are again very prominent, with the former displaying jarring sticking of facial hair and skin texture, even under subtle movements

Geee 5 years ago | |

In video 9 teeth are sliding.

jerf 5 years ago |

That's starting to be high enough quality that you could start considering using that for some Hollywood-grade special effects. That beach morph stuff is pretty impressive. Faces, perhaps not quite there yet because we are so hyper-focused on those biologically, but you could make one heck of a drug trip scene or a Doctor Strange-esque scene with much less effort with some of those techniques, effort perhaps even getting down to the range of Youtuber videos in the near enough future.

eru 5 years ago | |

Compare https://news.ycombinator.com/item?id=27559106

jerf 5 years ago | | |

First, that's not the same technique and it's not being used for the same purpose.

Second, Hollywood doesn't care about that problem. They will take the best application of the technique, and they don't care if they have to apply a few manual touchups on the result. As long as there is one way of using the system to do the sort of thing they showed in the sample, it won't matter to them that they can't embed a full video game into the neural network itself. They only care about the happy path of the tech.

Someone's probably already starting the company now to use this in special effects, or putting someone on research in an existing company.

eru 5 years ago | | |

> Second, Hollywood doesn't care about that problem.

Hmm, I wasn't trying to nay-say anything here. I mostly agree with your original comment.

See also how in the Gan Theft Auto they are sort-of getting the light reflection for free without having to explicitly teach the network about that parts of physics.

ansk 5 years ago |

This group of researchers consistently demonstrates a degree of empirical rigor that is unmatched across any other ML lab in industry or academia - remarkable empirical results as always, reproducible experiments, open-source and well-engineered codebase, and valuable insights about low-level learning dynamics and high-level emergent artifacts. Applied ML wouldn't have such a bad rap if more researchers held themselves to similar standards.

benrbray 5 years ago |

Interesting to that this method makes use of Equivariant Neural Networks. Taco Cohen recently published his PhD thesis [1], which combines a dozen or so papers he authored on the topic.

[1]: https://pure.uva.nl/ws/files/60770359/Thesis.pdf

datameta 5 years ago |

Wow! The rate of progress is truly stunning. I wonder what Refik Anadol could create with this technique.

eru 5 years ago |

Their examples look much better in some objective sense. Especially if you want to create something that looks realistic in animation.

But I do appreciate the artefacts of StyleGAN2 as an artistic choice, too.

Bjartr 5 years ago |

That beach interpolation is begging for a music video

ipunchghosts 5 years ago |

How does this differ from Richard zhang's work?

ChuckNorris89 5 years ago |

I expect this work will feed back into removing the aliasing artifacts you sometimes get when using DLSS in games.

l_d_s 5 years ago |

The internal representations (Video 8) look suspiciously like The Lawnmower Man ...

forgotpwd16 5 years ago |

Why weren't the same pictures used for StyleGAN2 and Alias-Free GAN?

dannyw 5 years ago | |

Because the latent space to picture pipeline is considerably different. There's no weight to output compatibility.

If you ask styleGAN to generate a specific image, that's possible, but you are no longer looking at how well these models generate images.

russdpale 5 years ago |

Great work!

Gimpei 5 years ago |

Those are some creepy pictures! It's like a photo of the demon inside.