AudioPaLM: A Large Language Model That Can Speak and Listen(google-research.github.io) |
AudioPaLM: A Large Language Model That Can Speak and Listen(google-research.github.io) |
Direct link to demo video showing speech-to-speech translation: https://google-research.github.io/seanet/audiopalm/examples/... (see website for more example)
I wonder though how much the text in the video was editorialized. For example, I doubt that the model would have correctly capitalized PaLM.
Gold in the mouth is something popularized by rappers (grills) back in the 90's, so that doesn't translate well at all for me.
When I read about things like AudioPaLM, my first thought is of all the people in these call centers who seem to uniformly have pretty hard Indian accents and very American-sounding names (George Bush called me the other day!). Their days of working in a call center are numbered and their replacement is going to be a machine that is way cheaper to employ and better at the job.
But actually, what is interesting to think about is that the desire to learn English will likely start to diminish from this. If there is little gain to learning it, like, the computer will just take your job, would you still bother?
I mean some will remain interested, but many won't.
The phone company will change your number if you want. The FCC will let you report these - one call at a time.
I actually thought about making an app to let me submit a report with a single click. If I started submitting 40-80 reports a week, would that get anybody’s attention? Would somebody at the FCC contact T-Mobile on my behalf and ask them to actually help me with this? Probably not.
Google previously showed you could get the fullsized 540b-parameter PaLM-1 model down to "a low-batch-size latency of 29ms per token during generation (with int8 weight quantization)" https://arxiv.org/abs/2211.05102#google . How many tokens per 1000ms do humans speak? I'm guessing fewer than 34. The real question is who wants to pay for it.
Curious, do you speak more than one language?
Edit: I just had a look at your comment history, do you realize you're like, incredibly pro LLM? Do you just scour HN looking for LLM articles and comment on them in a positive way? Not having a poke it's just interesting how keen you are.
Over here people speak multiple languages. I doubt we'll run out of people that speak multiple languages just because there's a language model that can do great translations.
If you're going to rag on a product's capabilities on x, you'd think the least you could do is use it for x first.
Are you spying on everyone ?
Google translate screws up for me really, really hard sometimes when I'm speaking Korean but I'm already a pretty strong speaker, native so I know how to work with the screw ups...and laugh about the really bad ones. I'm not going to go into a meeting and blast off with an auto-translator without understanding what I'm saying or have someone to make sure I'm saying the right thing by talking with them first.
I personally wouldn't feel comfortable using something like this for anything of real significance, a really good translator can ensure the message gets delivered.
I'd never just go to somewhere exotic and rely on it for anything significant based on my existing experience with these technologies.