Transcribing audio

Remotion provides several built-in options for transcribing audio to generate captions:

@remotion/install-whisper-cpp - Transcribe audio locally on a server using Whisper.cpp
@remotion/whisper-web - Transcribe audio in the browser using WebAssembly
@remotion/openai-whisper - Use the OpenAI Whisper API for cloud-based transcription
@remotion/elevenlabs - Use the ElevenLabs Speech to Text API for cloud-based transcription

Comparison

	`@remotion/install-whisper-cpp`	`@remotion/whisper-web`	`@remotion/openai-whisper`	`@remotion/elevenlabs`
Environment	Server (Node.js)	Client (Browser)	Cloud (API)	Cloud (API)
Speed	Fast (depends on hardware)	Slow (WASM overhead)	Fast	Fast
Cost	Free	Free	Paid (OpenAI API pricing)	Paid (ElevenLabs API pricing)
Offline support	✅	✅	❌	❌
No server needed	❌	✅	✅	✅
Convert function	`toCaptions()`	`toCaptions()`	`openaiWhisperApiToCaptions()`	`elevenLabsTranscriptToCaptions()`

All of these options can output captions in the Caption type format, which is recommended for use with Remotion. This format:

Enables usage of the APIs in @remotion/captions, such as createTikTokStyleCaptions()
Matches the format used in the Remotion Editor Starter
Is compatible with the Animated Captions package

You can also define your own caption format and not rely on the Caption type - this page is solely about the built-in options.