> Can I use Kokoro TTS offline?
> Kokoro TTS is a cloud-based service that requires an internet connection to access our advanced text to speech technology. This ensures you always have access to the latest improvements and don't need to worry about local hardware requirements or model installations.
I would happily take on the worrying for offline instead of them having to worry about my worries.
Every popular machine learning paper has a fake website associated with it, for some reason. Can anyone figure out why? Another example, someone created this website https://imagen3.org, which is NOT Imagen3 by Google. However, it currently ranks #2 for the model name.
Notice in this case that each testimonial avatar links to an image asset with a different name than the purported persons' name. Notice additionally the user in the thread who's pushing this 'product'; their post history makes it obvious they're an LLM slopBot...
And in the FAQ:
> What's included in the Kokoro TTS free trial?
> New users can try Kokoro TTS's full capabilities with our free trial. This allows you to experience our professional-grade text to speech technology firsthand, including access to all voices and both American and British English options.
So this is the "free trial"? Plus it being a cloud-based service makes me not understand the situation.
On the privacy policy part
> We collect certain personal data, including but not limited to your name, email address, and payment information (if applicable) to enhance the Service and improve user experience.
It's the first time I've seen collecting payment info to improve user experience.
it is very fast and very passable.
The model is Apache 2.0 licensed and trained on less than 100 hours of audio data. It supports both American and British English, offering multiple voice options with natural emotional expression and 24kHz audio output.
We've deployed a demo at kokorotts.online where you can try it out. I'd really appreciate any feedback from the HN community on both the model's performance and potential applications.
Tech stack: StyleTTS 2 architecture, ONNX runtime, Next.js for the web interface.
"There currently isn't a release date scheduled for the other voices"
[1]: https://huggingface.co/blog/hexgrad/kokoro-short-burst-upgra...
- Apache 2.0 weights in this repository
- MIT inference code in spaces/hexgrad/Kokoro-TTS adapted from yl4579/StyleTTS2
- GPLv3 dependency in espeak-ng