I edit a podcast and every episode starts with 10-15 minutes of manually aligning tracks. Each person records locally, but everyone hits record at a different time. Before editing, you have to line them all up against a master recording by ear. I've wanted to automate this since 2019 (after first hearing about it in the popular podcast - Accidental Tech Podcast). I figured I'd write it in Kotlin (being my language of choice) first, but JVM audio processing wasn't there (or more fairly I just needed to put in way more work than I realized). With AI ofc, I took another shot at it recently and finally built it in Rust. "PodSync" takes a master track and individual participant tracks, finds the time offset for each using VAD (voice activity detection), MFCC fingerprinting, and cross-correlation, then outputs aligned WAV files. Drop them into your DAW at 0:00 and they line up! There's an accompanying blog post with a visual on the mechanics: https://kau.sh/blog/podsync/ Would love to hear feedback! |