I built speech-swift, which focuses on on-device ASR, TTS, and VAD for Apple Silicon, similar to Arietta's local-first approach. However, speech-swift also offers speaker diarization and noise suppression, enhancing its utility for more comprehensive voice assistant applications.
https://github.com/soniqo/speech-swift