I built cbx: a single-binary CLI for local/offline text-to-speech. It wraps Resemble AI’s Chatterbox ONNX models behind a Rust CLI (ONNX Runtime under the hood). Goal: “I want good TTS in a shell script” without Python envs / pip / venv juggling. Quick start: cbx speak --text "Hello from cbx." --voice-wav ./your-voice.wav --out-wav ./output.wav First run downloads the model files (~1–2GB depending on variant). After that it runs locally. If you’re doing repeated runs with the same reference voice, you can cache the voice encoding once: cbx voice add --name myvoice --voice-wav ./your-voice.wav cbx speak --voice myvoice --text "Much faster now." --out-wav ./output.wav What it does (intentionally small surface area): - single binary, cross-platform CLI - built-in model download/list/clean commands - voice profile caching (avoid re-encoding the reference clip every run) What it doesn’t do: - it’s not the full Chatterbox project (multilingual, fine-tuning, etc). It’s a packaging + UX layer for basic TTS. Slightly counterintuitive perf note: on an M1 MacBook Pro, CPU ended up faster than CoreML for this model due to accelerator partitioning overhead; numbers are in the README. If you try it, I’m especially interested in feedback on: install/packaging trust, cache layout, and what you’d want from a “tiny model / fast mode”. |