AI Can Transform Anyone Into a Professional Dancer(news.developer.nvidia.com) |
AI Can Transform Anyone Into a Professional Dancer(news.developer.nvidia.com) |
> the team based their algorithm on the pix2pixHD architecture developed by NVIDIA researchers
Is it me, or is NVIDIA trying very hard to take credit for this UC Berkeley paper? (they're almost taking credit for Pytorch as well). Sure, this kind of work wouldn't be possible without their hardware, but in that case Intel could probably take credit for most of science in the last few decades.
It also seems as UC Berkeley and NVIDIA collaborated on pix2pixHD, judging by the paper
They are, see quote above. They're also going out of their way to mention that Pytorch is using cuDNN, which is true but off-topic.
> This work was supported, in part, by NSF grant IIS-1633310 and research gifts from Adobe, eBay, and Google
The fact that people are thinking "it's on the nvidia site, they must have participated somehow" is precisely the reason I wanted to bring this up.
Aha, it was when you said this:
And just like auto-tuned voices, it will come off as janky and fake.
It is similar to special effects. People complain about how bad and fake they look, but this is only for the effects that are bad enough that they are noticible. People don't realise the sheer amount of special effects being used in scenes they never realise they are being used for.
Stylus noise, fret noise and mp3 compression artefacts are other "mistakes" deliberately introduced.
If done properly, it's not really janky or fake sounding at all.
It can't really help with a live performance and a for a good singer, recording multiple takes is going to be faster/more economical, punching in/out is so easy and with modern digital DAW's like ProTools (which does this by default) it keeps all your takes for you anyway - no need to waste another track on your 24 track tape or tape over the previous one.
Here's another viewpoint:
https://www.quora.com/Do-all-most-singers-use-pitch-correcti...
Fantastic vocal performances were captured all throughout the last century without the parachute of pitch correction/autotune. I'd rather listen to an imperfect take with flaws than to a machine assisted correction any day. Each to their own I guess.
And the same way some song makers have used autotune to adjust synthetic voices like Hatsune Miku’s, would this have any use as an external filter to smooth out synthethized videos ?
(I guess it might take a few years for the performance to get there)
Is any tech that is published in arxiv just free game immediately? Seems unfair to the researchers
The real consequence of this is that video footage is no longer a reliable source.
I'm a bit disappointed though that they didn't also include results for a synthetic source video with "impossible" poses (e.g. joints bending backwards, stretching, separating from the body or performing full rotations). That would have been pretty interesting (though perhaps a bit unsettling) to see.
Using AI to transform anyone into a professional dancer might include using AI to process live video (webcam) of someone dancing and then giving them some feedback for improvement. In a word: coaching.
However this is using AI to produce composite videos of people dancing.
It's not good enough to produce professional dancers, but it has definitely improved my dancing as someone who just dances for fun.
Meanwhile composite videos really blend in with all the augmented reality phone apps that teenagers use nowadays.
I'm half surprised there isn't already something like this for smartphones (with inferior quality).
Maybe I'm in the minority, but I think if we take this idea and walk with it, it has the potential to trivialize actual accomplishment. Maybe I'm overthinking it.
We're not going to see The Running Man style/quality fake videos any time soon, and the media kinda runs with this an exaggerates; making people wonder if camera footage may one day no longer be considered evidence.
We're far from that. At the most, the quality of the transfers here is about the same as what you'd see with Deep Fakes (celebrities imposed on top of pornographic models using computer vision and AI algorithms).
But, with that said, this work is dependent on having a "source" that the user is using as an input for pose detection. The actual accomplishment must still be performed and recorded, though I suppose this opens the door to the dance equivalent of "lip syncing" even beyond what might be done today with a body double.
They said the same thing with any new artistic medium. Digital cameras, photoshop, Instagram filters, MIDI music instruments, etc.
- Mimic a target's body motions (this link)
- Mimic a target's facial expressions (deepfakes)
- Mimic a target's voice (lyrebird AI, etc)
related video, digital animation puppeteering
https://www.youtube.com/watch?v=YiOByO8J7xg&t=2s&list=LLI462...
Its not perfect by any means, but we're seeing a new age of CGI. Once perfected, I wonder how the entertainment industry will change as a result (Faster rendering times, less time to make scenes, puppeteering, not needing expensive famous actors or stunt doubles, digital identity copyrights, etc)
We're heading into a world where it would not be very hard to bombard the public with a large number of long form videos of highly convincing videos of anyone in the world ranting on any topic and acting out anything they want, and we would have borderline no idea if it was legitimate.
Combining that with our media climate and already runaway problem with monetary and political incentives for fabricated stories seems really dangerous.
You could make a video of Neil Armstrong and Nasa execs talking about how they faked the moonlanding, or even much more nefarious fake content confirming conspiracy theories for political ends.
What will we use as a scalable filter to know what is actually going on, and how will we keep that content from manipulating public discussion?
I appreciate that the detected poses and motions create clear pictures for what different parts of the body are doing. Particularly for ballet, if I had access to this technology (in a way that was user friendly), I'd love to see the difference between ballet styles (Vaganova, Cechetti, ABT, ect). I think it would be much clearer from a students' perspective, to see the stylistic difference in lines, shapes and movement.
This AI reminds me of Happy Feet, where they took Savion Glover's movement and choreography and applied it to the animation penguin. It doesn't seem too far-fetched. And lastly, for those who say this seems unnatural--dancing is unnatural to the body, hence the training and years put into it. So having an AI applied to it will only make it look more unnatural.
Artistically, this can be debated (as it has been), but in search for 'real life application,' I'd love to get my hands on this as a teaching tool.
sorry for the long post--this is my first time on this site--my boyfriend sent this to me & warned me that if i blabbed too long, this post would not be successful.
"(...) allows anyone to portray themselves as a world-class ballerina (...)"
Moreover, after AlphaGO took away Go from us, I started to wonder "what is left" for humans, and I believe that we are centuries away to have machines that achieve world class dancing level. My reasoning is than in things like Go, image or speech recognition, it is easier to "encode" the information for the ML to actually learn. On the other hand, encoding the movements of professional dancers is already quite difficult. Consider for example in the video linked here, the whole human body is mapped into ~20 points. Sure, this may be enough to portray someone as a dancer. But good luck making a dancing robot.
So, maybe I quit my programming career to become a dancer, it is less likely to be a job that the machines will take away ;-)
edit: grammar
Yeah, it doesn't matter if machines can't dance if I can't either. Still no job for me. :)
Like competitive sports, art is all about display of human ability under constraints. This is why even in the age of photographs, we still value hand-painted canvases. Such techniques are simply going to make people more discerning between real effort v/s automated means of generating the same outcome.
Rather than thinking AI-assisted style transfers are the end of art, we should think that these are new tools for artists to do even more interesting stuff. See this upcoming tool for example: https://runwayml.com/
And more recently with AlphaGo. Now that humans have no chance of ever beating AI again in the game of go, what will change?
I'm a go player so I'm more interested in this question. Professional go players said that AlphaGo is positive for go, that they will be able to learn from it and reach new levels of play.
Although of course their livelihood depends on the popularity of go, it would be bad press for them to say the opposite.
Maybe AI isn't able to copy human technique well enough yet but whether it succeeds or fails will have little to do with whether or not it creates work that resonates emotionally like classic art, because that's no longer the purpose of the vast majority of art that people encounter.
And I would argue that human beings, for the most part, copy other human beings anyway. Working within a "genre" and using cultural references and even recognizable techniques are all essentially copying or at least adapting what came before.
https://www.theguardian.com/politics/2018/aug/30/theresa-may...
I think the opposite. I believe that this will kill blackmail. Why care if someone has a leaked sex tape featuring you in an age where anyone can fake them. Simply say it's fake. In a few years, I bet there will be simply apps where you can point to a person's social network accounts and have the app generate whatever you want. Blackmail will die once everyone will have access to those videos with a few clicks.
This idea dates back to way before bitcoin.
I wonder if seeing yourself dance like this might speed up learning to actually dance like this...
plus this is just a very complicated thing, in that it’s gluing together multiple new techniques to do various things.
some of the pieces that went into this work (like GANs) have lots of tutorials online and might be a more manageable, and budget-friendly, place to start. you could do something interesting on Google CoLab with free GPU time.
I mean, who cares what you look like in some video? When you actually meet people, they'll know that it's bullshit.
Now, if you could manage it in meatspace, that would be cool!
Everyone who watches television, movies, YouTube, etc. I know that's only a few people, but hey, it's a start.
And the focus here is "anyone", not professionals.
I think you’re onto something here as well about how it may well get adopted as a ‘style’ in someone’s music video.
But yes there's going to be a big market for tracking fake data sources in the future. We're already seeing tools to track fake twitter accounts, fake instagram followers, fake amazon purchase reviews, this is an ongoing trend.
(this should almost always be feasible, and is commonly done for non-IP-related reasons e.g. someone might make a PyTorch version of something when the original version was done in Tensorflow.)
i'm not a lawyer but I would have assumed "probably", unless there's a patent. i mean, if it's not this would suggest it's illegal to attempt to replicate scientific experiments.
also, in many cases even for the code provided by the researchers there is an actual LICENSE file included, and it's often BSD or MIT. (Which sort of makes sense -- these two permissive free software licenses are named after the universities they came from. they reflect the academic CS culture around stuff like this.)
They started with a single GPU. As the waiting queue was getting longer, they made a paid feature that would let you get your results faster. Then they rented more GPUs with that money.
--
[1] AutoDance™ - Like AutoTune. I hereby claim it as a term. ;)
Literally anything apart from someone's ability to dance. Instead of needing someone who can sing, act, look good, and dance, you now only need someone who can sing, act, and look good. And AI will eventually replace all the other talents too I guess.
This is why some people worry about the effect of AI on jobs.
Just Google worst auto-tune for examples, because it's so common now, most people don't notice it. Here's a list of some examples:
http://www.hometracked.com/2008/02/05/auto-tune-abuse-in-pop...
Also, if you're not really musical you might have a hard time picking it out. I found that as soon as I learnt an instrument, songs suddenly became layered and I could pick out and distinguish different instruments, and notice mistakes.
It seems to be more about recognizing the small glitches, rather than something involving overtones, which I would really have liked to learn more about how to recognize them. I have long found that autotuned singing sounds a bit more "metallic" to me.
Thanks anyway, the sound samples are useful in recognizing it better.
Autotuned songs can sound good, just as non-autotuned songs can sound bad. I'm as much a music elitist as anybody, but thinking that you have the one true objective idea of what music sounds good or bad is just childish.
I wonder if in 20 years time it will sound incredibly dated, much like 80s synths do today.
I'm not particularly against auto tune and ultimately most people don't even notice it being used, but I do sort of understand where the GP is coming from.
If your idol can't actually sing, and you saw them live and it's rubbish, wouldn't you feel cheated?
I personally dislike it too, but I'm not going to say it's not artistic, because to someone else it might be. People get so caught up and passionate with their art of choice that they sometimes forget that it's all just entertainment in the end. Whether it be Hillary Hahn or some teenager making beats in Ableton and throwing in the stock vocal effect.
About 80s synths, I think a lot of us, or at least I personally associate that cheesy sound with mass-produced digital synths put out by Yamaha, Roland etc. It's all variety and I love it. My dad still has a DX7 and a D-50 that I'm hoping to inherit.
Well just look at the status quo of popular rap music then - tons of artists are aiming for the obvious autotune style and tons of people are listening to it (so I assume they think it sounds good). I'm not sure what you mean by "artistic", but it's definitely a sound people try to achieve on purpose in their music, so I think it does deserve some recognition as "art".
https://www.youtube.com/playlist?list=PLN61gg9VNXPomdZu0UY_w...
Like most people wouldn't even realise there's a bass line vs a guitar.
As for 80s synths, in my opinion there are a lot of otherwise great songs that are ruined by those travesties. I totally appreciate they were breaking new ground, but remastering some of those tracks with modern synths would make the songs sound a lot better.
Besides, techniques for identifying fakes is likely to lag closely behind the techniques for producing them.
The first time we talked about "DeepFakes", someone I know pointed out that there is nothing inherently new in this issue. Media has been faked to manipulate the truth as long as there has been media.
Wether you trust the medium, a person, or a blockchain, trust is only as good as the information you base it upon – and there's always ways to circumvent it, or otherwise deceive you.
Also: There are some pretty big privacy issues (from what I understand) with what you describe.
The courts should know that videos can be doctored, just as images can be photoshopped.
As the burden of proof lies with the accuser, it is they that would need a bullet proof argument that you did in face do what the video claims you did – and video is not alone in and of itself to establish truth.
As for the data gathering: the existence of the proposed data presents a threat for the subjects (and society) in and of itself.
> perform a mini-transaction every 30 seconds
I guess if you don't want to be blackmailed, you better have money.
Extreme surveillance, for what? To give evidence that you weren't in a fake dancing video?
I can't see this happening any time soon ...
Known Knowns - done for the clear effect, which is audible by everyone (as popularised, but not invented by Cher, T-Pain, etc).
Known Unknowns - ones where most people wouldn't notice, but if you've listened to autotune a lot you'd swear it's been done (sustained notes with vibrato added seem to be clear contenders here to me, such as one in 'Angels' by Robbie Williams
Unknown Unknowns - There are lots of recordings I've worked on for people (mixing, mostly) where the vocal sounds perfect, and it's actually been autotuned/processed. Subtly, but still processed. I only know this because the person who's done it has confirmed it. In the context of a mix there's little evidence, if any at all. And if you listen to backing vocals of the last decade versus BVs from say 30 years ago, you really hear the difference - not because modern ones necessarily sound like they've been worked on, but because the old ones sound just a little bit out of tune in comparison.
I've done work on singers' recordings where I've fixed the pitch and they haven't even noticed it on their own voice, in isolation (which really is the sound everyone knows best). If done well (and appropriately), it doesn't turn it into awful processed rubbish, it can tweak an otherwise brilliant performance and make it near-perfect. I say this as a recording engineer who loves the sound of a band playing together, and would always sacrifice separation / absolute recording quality for the communication and feel that you get with a live band playing together how they normally do - I wouldn't sacrifice personality for pitch, but I think you can improve on nearly everyone's performance in some places.
Having said that, the flip side of this is that people think 'they can fix that in the mix' when they've not given a great performance... and that's never the case!
That's what I'm getting at - "perfect" is something that the human voice does so infrequently on its own so when we hear the "perfect" performance, we know it's been doctored.
As a point of reference, I look back at something like the Boswell sisters. Some of their early stuff was recorded in mono direct to wax disc but they were so skilled, well rehearsed and sung as a unit so closely bonded that the resulting sound is fantastic despite the primitive technology. The three individual voices aren't even recorded to seperate tracks since multitracking wasn't invented then.
Granted you can't really draw a direct comparison between that and Robbie Williams, Cher or T-Pain, but I know which I'd rather listen to any day. There are a lot of singers I've seen live and talked to over the years who wouldn't be caught dead touching up their stuff, or letting an engineer/producer do it behind their back - even if it meant missing out on a commercial recording contract.
I do appreciate where you're coming from though. I'm a little old fashioned I guess in that sometimes the mistakes that made it through improve the performance in my mind. It reminds you that the performers are human.
Autotune is orthogonal to creating that perfection. If someone has it Autotune won't take it away, no matter how obviously it's used.
If someone doesn't have it, Autotune won't give it to them.
Modern production tools have mostly eliminated the need to work with a band. I can create music that I want to create with complete artistic freedom. Auto-tune is just another step in that direction.
I guess I'd rather listen back to my crappy vocals and think hard about what I didn't like about them though, and work on improving that rather than try to make it something it's not with pitch correction. But everyone has a different way of working that works for them I suppose, which is why I ended my original comment with "each to their own".
Of course musicians have been able to splice their parts together since multitrack came into wide usage, eg. the guitar solo in Stairway to Heaven. But I kind of feel like use of digital correction is a bit overdone right now.
I appreciate I'm stepping into mostly subjective territory here though. It's like if somebody said to Hendrix, "hey Jimi, tune your guitar man". Or told Knopfler to stop playing dead notes, or Slash to stop missing notes altogether while not jumping off a drum riser. They could do those things but then to me it wouldn't feel like them.
I don't think I'm the only person who feels this way, but it's definitely personal & subjective.
I definitely understand your point about Hendrix, Knopfler and Slash. There's something beautiful about virtuoso playing that still sounds "raw", you know what I mean? I grew up obsessed with shred guitar, and there was a lot of that in the 80s/90s. There are younger guitar players on youtube these days who are miles ahead in terms of technical proficiency and production/effects but some of that authenticity is lost.
I recently saw a video of Ritchie Blackmore talking about Satriani's playing and how it's almost too technically perfect and loses some of the soul. I tend to agree. I have great respect for both of them though.