I live in a rural farming neighborhood in Japan. Day-to-day Japanese is fine for me. But neighborhood meetings were a completely different level. People speak fast. There's local dialect. Someone references a flood from 1987, a land boundary dispute from 1994, and three people I've never met but everyone else knows. I would walk out feeling like I understood maybe 5% of what happened. So I built a tool for myself to help follow those conversations. Live Kaiwa listens to Japanese speech and, in real time, shows: * Japanese transcription * English translation * a running summary of what's being discussed * suggested responses you can say back The idea is to help you stay oriented in complex conversations. You can try it here: https://livekaiwa.com --- How it works When you start a session, the browser microphone captures the conversation and streams audio. The pipeline looks roughly like this: 1. Audio streaming - Browser microphone → WebRTC → server 2. Speech to text - Kotoba Whisper runs a fast first pass transcription. 3. Multi-pass correction - Buffered audio is re-transcribed with higher accuracy and replaces earlier text. 4. LLM processing - Each batch of transcript is sent to an LLM that generates: English translations, summary bullets, and suggested replies (with TTS) 5. Live UI updates - Everything streams back to the browser in (mostly) real time. Session data stays in the browser, nothing is stored server-side. Why I built it, in short: even if you speak Japanese reasonably well, fast, multi-person discussions can become overwhelming. Seeing the conversation transcribed and summarized helps. |