We asked Neo AI to build a small voice assistant pipeline that runs with low latency on CPU instead of requiring a GPU. The goal was to see how responsive a LLM → speech system can be on normal laptops or edge devices. It includes: - Voice Activity Detection - CPU-friendly LLM + TTS streaming - Async pipeline to reduce latency Modular LLM backend Useful for local assistants, robotics prototypes, privacy-first setups, or benchmarking STT/LLM/TTS latency. We’ve been experimenting with similar CPU-first pipelines inside NEO workflows for on-device agents, and this repo is a minimal standalone version. Would love suggestions on lightweight STT/TTS models or latency tricks people have used on CPU. |