- https://huggingface.co/spaces/Xenova/whisper-web
- https://huggingface.co/spaces/Xenova/whisper-webgpu
- https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu
- https://huggingface.co/spaces/webml-community/moonshine-web
I made https://app.readaloudto.me/ as a hobby thing and now it could be enhanced with a local tts option!
(I get the joke that for some definition of real-time this is real-time).
The reason why I use an API is because time to first byte is the most important metric in the apps I'm working on.
That aside, kudos for the great work and I'm sure one day the latency on this will be super low as well.
Sounds great on Chrome with an Nvidia 1650Ti.
Sounds great on Chrome on a Pixel 6.
Sound like being bitcrushed. Maybe a 64 vs 32 bit error? Solid results when working.
Edit: Sorry, it was a problem of my specific audio setup, it works equally well on Chromium.
Is there source anywhere? Seems the assets/ folder is bundled js. In my opinion, there's a ton of opportunity for private, progressive web apps with this while WebGPU is still relatively newly implemented.
Would love to collaborate in some way if others are also interested in this
[0] https://github.com/C-Loftus/QuickPiperAudiobook/ [1] https://github.com/rhasspy/piper/issues/352
But, in a more serious tone: the story that I hear about AMD GPUs is that they are, in fact, shittier because AMD themselves give fewer shits. GIGO
this is astounding
https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_...
Quality sounded good compared to a lot of other small TTS models I've tried.
How can I understand what's in the compiled JS though? Is there some source for that?
Here I'm talking about the model shared in this thread, which is text-to-speech (reading out loud content from the web)
You could do text to speech on a 1Mhz Apple //e using the 1 bit speaker back in the 80s (software automated mouth) and MacinTalk was built into the Mac in 1984. I know it’s built into both the Mac and iOS devices and run off line.
But I do see how cross platform browsers like Firefox would want a built in solution that doesn’t depend on the vendor.
Firefox on Windows is one such application that still uses SAPI. I don't know what uses does on other operating systems. Like, on Android, I imagine it uses whatever is the built-in OS TTS API, which likely goes through Google Cloud.
But anything that sounds at all natural, from any of the OS or browser vendors, is going through some cloud TTS API now.