Weird A.I. Yankovic: a cursed deep dive into the world of voice cloning

Weird A.I. Yankovic: a cursed deep dive into the world of voice cloning(waxy.org)

328 points by waxpancake 2 years ago | 198 comments

mecredis 2 years ago |

It's kind of wild that these tools just transfer a copy of these models every time they're spun up (whether it's to a Google Colab notebook or a local machine.)

This must mean Hugging Face's bandwidth bill must be crazy, or am I missing something (maybe they have a peering agreement? heavily caching things?)

satertek 2 years ago | |

Their Python module caches the downloads, which is checked before downloading them again...but you're probably not wrong on the crazy bandwidth bill. Looks like they have crazy VC money though, considering the current climate.

minimaxir 2 years ago | | |

The Colab notebooks are a fresh and independent session with no caching.

civilitty 2 years ago | |

Unmetered 10+ gigabit connections were on the order of $1/mbit/mo wholesale over a decade ago when I priced out a custom CDN so for the cost of 100 TB of data transfer out of AWS you could get a 24/7 sustained 10gbit/s (>3 PB per month at 100% utilization).

Bandwidth has always been crazy cheap.

hotnfresh 2 years ago | | |

Not all connections are created equal. Even some big providers clearly have iffy peering agreements upstream that’ll manifest as terrible performance if you have a widely-geographically-distributed bandwidth-heavy load.

colechristensen 2 years ago | | |

Indeed. If you're not using a cloud provider bandwidth is extremely cheap.

In fact locally I can get a 10 gbps home internet unmetered connection for $300/mo.

I'm not sure how they'd react if I transferred 1 PB/mo though :)

morkalork 2 years ago | | |

If you host copies of your data with a few big providers could you do something smart like detect and redirect requests from AWS to an S3 bucket and not pay for bandwidth leaving the provider?

anonylizard 2 years ago | |

Huggingface has a strategic partnership with AWS.

1. AWS is far behind Azure and GCP in AI, so they gotta partner up to gain credibility.

2. Huggingface probably does face insane bills compared to github. But AWS can probably develop some optimizations to save bandwidth costs. There's 100% some sort of generalized differential storage method being developed for AI models.

fomine3 2 years ago | | |

AWS egress traffic charge is just outrageous so they can easily offer huge discount without improvement

jandrese 2 years ago | | |

One doesn't usually opt for AWS when their goal is to reduce transfer costs.

toddmorey 2 years ago | |

Is hugging face just a model repository like GitHub is a code repository? Seems you can rent compute both cpu & gpu, but you are right that most models seem to be run elsewhere.

fragmede 2 years ago | | |

Yes, exactly.

pdntspa 2 years ago | |

I really wish I could configure this crap to cache somewhere other than my C: drive

Or better yet, how about asking me where I want to store my models?

thulle 2 years ago | | |

On linux there's the XDG_CACHE_HOME env variable for pip, but strangely enough there doesn't seem to be an windows equivalent.

callalex 2 years ago | | |

I haven’t used windows in a while but I thought it supported some form of cross-volume symlink? Or at least mounting an image stored on another volume to an arbitrary path.

jonluca 2 years ago | |

You can do a lot of these fully locally with things like RVC web ui or https://tryreplay.io/

echelon 2 years ago | | |

https://fakeyou.com has unlimited free RVC without an account. The UI needs work, though.

tr33house 2 years ago | | |

wish they had something for Linux

minimaxir 2 years ago |

This article only covers the musical aspects of AI voice cloning, but there's another dynamic to AI voice cloning that's more complicated: replacing general voice actors in movies/video games/anime (example: https://www.axios.com/2023/07/24/ai-voice-actors-victoria-at... )

Unlike musicians who can't be replaced without significant postprocessing, have enough money to not be impacted by competition, and have legal muscle, voice over artists:

- Can be reproduced with good-enough results from out-of-the-box voice cloning settings on ElevenLabs or an open source equivalent (Bark, VALL-E X)

- Are already underpaid for their work as-is

- Have no legal ownership of their voice since they are contractors, and their voicework is owned by their clients who may not be as incentivised in protecting the VO.

I want to write a blog post about it but I suspect most people on Hacker News won't be interested in a treatise on the cultural impacts of the voicework in Persona 5 and Genshin Impact.

RecycledEle 2 years ago |

Wow. I just realized any one of us could redo Weird Al's songs with his lyrics, but with the original singer's voice. We could be listening to Michael Jackson singing "Just Eat It" by lunchtime.

I am constantly amazed at how the new AI tech can be used.

Of course this would be illegal under most countries copyright laws.

unnah 2 years ago | |

There's also a Weird Al piece "I think I'm a clone now", for which an AI clone voice performance would definitely be fitting. (The original song was "I think we're alone now" by Tommy James and the Shondells, but it seems Weird Al was parodying the cover by Tiffany in the 1980's.)

While Weird Al himself asks for permission, it's well established that parody is not copyright infringement. There should be room for parody performances by AI voices as well, especially if argued by a good lawyer.

mbg721 2 years ago | | |

Al is very self-aware (that second character is a lower-case ell), he's less concerned with legal entities than with his relationships with musicians.

greenhearth 2 years ago | |

How would this be amazing? It just sounds stupid and a waste of time.

RecycledEle 2 years ago | |

And...they already did it.

mckirk 2 years ago |

My absolute favorite application of this tech so far is The Beach Boys singing 'Hurt'. It's the first time I seriously didn't notice any artifacts, and it just works so well even though it really shouldn't.

Enjoy: https://youtu.be/gmNSFqyg_Z8

distantsounds 2 years ago |

The sampled voices sound neither like Michael Jackson nor Weird Al. A good effort, but a professional impersonator could likely do better on either front.

nemo44x 2 years ago | |

It sounds like Weird Al trying to be Michael Jackson trying to be Weird Al.

Reventlov 2 years ago | | |

As a non native speaker, it does sound a bit like Michael Jackson imo…

hinkley 2 years ago | |

The best Michael Jackson interpreter in a town of 50,000 could do better than this. It’s… this is bad.

code_runner 2 years ago | |

I know what you mean. Its more noticeable (imo) on the Michael one.... but its definitely in there. I think the pitch correction is to blame for a bit of the weirdness.

causi 2 years ago |

AI song covers are incredible, from Goku singing "Don't Stop Me Now" to the cast of Spongebob singing "Ocean Man".

ssalka 2 years ago | |

My favorite is the Mr. Krabs cover of "Billie Jean"

https://www.youtube.com/watch?v=CkQ-44PvTs8

civilitty 2 years ago | | |

Mr Krabs rapping Lose Yourself by Eminem [1] is all the evidence I've ever needed that Clancy Brown should have been a rapper.

[1] https://www.youtube.com/watch?v=d7N6jOziN4E

all2 2 years ago | | |

This is actually good. Hysterically so.

shepherdjerred 2 years ago | |

Let me share my favorite: Plankton - Beggin'

https://www.youtube.com/watch?v=tJjhObngcxI

lostlogin 2 years ago | |

https://m.youtube.com/watch?v=XzqbhDqAEtw

cm2012 2 years ago | | |

Would have strongly preferred DBZA goku :)

simonw 2 years ago |

I did not know about this: "The center of the A.I. cover songs community is a massive 500,000+ member Discord called A.I. Hub, where members trade new tips, tools, techniques, and links to their original and cover songs."

smath 2 years ago |

Related article from 1 year ago on Darth Vader’s voice being AI generated going forward:

https://arstechnica.com/information-technology/2022/09/james...

mito88 2 years ago |

"celebrity voices impersonated"

Watch Light My Fire on YouTube Music https://music.youtube.com/watch?v=lN3v3EfA6_A&si=_hcG3Wjakxd...

ddmf 2 years ago |

The most recent episode of Tacoma FD covered something similar to this mixed with a messed up Christmas Carol.

dreamcompiler 2 years ago |

> ... Tom Waits, LeBron James, Knuckles, and, uh, Adolf Hitler.

I can't figure out if this is an example of Godwin's Law or not.

satvikpendem 2 years ago |

What's the best open source text to speech? Eleven Labs and others are interesting but closed source. I want to use them mainly for audiobooks as I have a lot of ePubs and I'm just using the basic Google text to speech voices on my Android, via Moon+ Reader. It works fine but it's still more robotic than state of the art.

hinkley 2 years ago |

> Artifacts aside, it sounds like Michael Jackson doing a Weird Al impression?! Every line has a distinctly “white and nerdy” vibe: it loses any seriousness and edge, exaggerating words for comic effect and enunciating lyrics really clearly so the punchlines can be heard.

No, it sounds like someone doing doing an impression of Weird Al doing an impression of Michael Jackson. Someone whose mom told them they were special and they believed it.

These examples are standing on a ridge line, surveying the uncanny valley and looking for the best way to cross.

blagie 2 years ago | |

... they're good enough.

I have an accent. If not for that, I'd be a great presenter.

If I could translate my voice into a poor Neil deGrasse Tyson, a poor Patrick Steward, a poor Carl Sagan, a poor Morgan Freeman, etc., my presentations would be... better.

hinkley 2 years ago | | |

If it makes you more comfortable and confident, that is helping you.

This isn't autotune for the spoken word, though. It's not fixing pacing or vocabulary, and in the audio above it isn't even fixing intonation. Yes, a thick German accent will give you away as being of German extraction. But you're also using the word 'since' when Brits and Americans would use 'for', and it's not going to fix that. Any more than it'll fix my french when I make the exact same mistake going the other direction (for=duration vs for=purpose vs for=interval). If I hear 'since one month' you're likely German or Indian. If you ask how long I've been in Marseille you'll know I'm American in about half that time.

totetsu 2 years ago | | |

Finally a way to not have to fix societies Prejudices just give everybody the tools to emulate the ideal of perfection no matter what color their skin or what their accent sounds like.

Calamitous 2 years ago |

Key takeaway:

> No current artificial intelligence is powerful enough to hide the weirdness of Weird Al.