Automating Video Subtitles with Whisper and Go

Had a bunch of video files that needed subtitles. Did not want to do it manually. OpenAI's Whisper API does speech-to-text. So I wrote a Go tool that extracts audio from a video, sends it to Whisper, gets the transcript back, generates SRT subtitle files, and embeds them into the video. One command.

What it does

Point it at a video file. It extracts the audio track, sends it to OpenAI's Whisper API for transcription, converts the timestamped response into SRT format, and uses FFmpeg to embed the subtitles back into the video. Handles batch processing for multiple files.

Why I built it

I had jousting footage and training videos that needed captioning. Doing it by hand is tedious. Whisper's accuracy is good enough that the output only needs minor corrections, and for social media content it works straight out of the box.

How it works

Go orchestrates the pipeline. FFmpeg handles audio extraction and subtitle embedding. The OpenAI API does the transcription. The interesting bit was getting the timestamp alignment right - Whisper returns word-level timestamps, but SRT format needs phrase-level segments with clean line breaks.

View on GitHub