Show HN: Kitten TTS Based Low-Latency Streaming Voice Assistant on CPU

3 points by gauravvij137 83 days ago | 0 comments

We asked Neo AI to build a small voice assistant pipeline that runs with low latency on CPU instead of requiring a GPU.

The goal was to see how responsive a LLM → speech system can be on normal laptops or edge devices.

It includes: - Voice Activity Detection - CPU-friendly LLM + TTS streaming - Async pipeline to reduce latency

Modular LLM backend

Useful for local assistants, robotics prototypes, privacy-first setups, or benchmarking STT/LLM/TTS latency.

We’ve been experimenting with similar CPU-first pipelines inside NEO workflows for on-device agents, and this repo is a minimal standalone version.

Would love suggestions on lightweight STT/TTS models or latency tricks people have used on CPU.

No comments yet