Transcribing audio
Remotion provides several built-in options for transcribing audio to generate captions:
@remotion/install-whisper-cpp- Transcribe audio locally on a server using Whisper.cpp@remotion/whisper-web- Transcribe audio in the browser using WebAssembly@remotion/openai-whisper- Use the OpenAI Whisper API for cloud-based transcription@remotion/elevenlabs- Use the ElevenLabs Speech to Text API for cloud-based transcription
Comparison
@remotion/install-whisper-cpp | @remotion/whisper-web | @remotion/openai-whisper | @remotion/elevenlabs | |
|---|---|---|---|---|
| Environment | Server (Node.js) | Client (Browser) | Cloud (API) | Cloud (API) |
| Speed | Fast (depends on hardware) | Slow (WASM overhead) | Fast | Fast |
| Cost | Free | Free | Paid (OpenAI API pricing) | Paid (ElevenLabs API pricing) |
| Offline support | ✅ | ✅ | ❌ | ❌ |
| No server needed | ❌ | ✅ | ✅ | ✅ |
| Convert function | toCaptions() | toCaptions() | openaiWhisperApiToCaptions() | elevenLabsTranscriptToCaptions() |
The Caption type
All of these options can output captions in the Caption type format, which is recommended for use with Remotion. This format:
- Enables usage of the APIs in
@remotion/captions, such ascreateTikTokStyleCaptions() - Matches the format used in the Remotion Editor Starter
- Is compatible with the Animated Captions package
Alternatives
You can also define your own caption format and not rely on the Caption type - this page is solely about the built-in options.
See also
Caption- The caption data structure@remotion/captions- Caption manipulation utilities