Skip to content

Voice-over workflow with ElevenLabs

TL;DR: Record your scratch VO directly in Premiere while the video plays. Select that clip in modelBridge, choose an ElevenLabs model, generate a polished voice, preview it, then import to timeline. Your timing is already locked — no re-editing needed.


Most editors record a scratch VO to lock timing — then spend hours finding a voice artist, sending files back and forth, and re-syncing the final audio. With ElevenLabs models in modelBridge, the scratch VO becomes the input. Your timing stays. The voice changes.


Record directly in your timeline while the video plays. No external recorder needed.

How to record:

  1. In the timeline, find an empty audio track — A2, A3, or add a new one
  2. Click the microphone icon on the track header to arm it for recording
  3. Place your playhead where the VO should start
  4. Click the microphone icon again — Premiere counts down 3 seconds, then “Recording…” appears in red, the timeline rolls, and the video plays
  5. Speak your VO in time with the video
  6. Press Space to stop when done

Premiere places the recorded clip on the armed track at the exact timecode where you started. Your timing is locked to the picture.


Step 2 — Select your ElevenLabs model in modelBridge

Section titled “Step 2 — Select your ElevenLabs model in modelBridge”

Search “elevenlabs” in the modelBridge Browse tab. For VO replacement, two models are most relevant:

ElevenLabs Voice Changer — provide your scratch VO clip, the model transforms it into a different voice. Your exact timing and pacing carry over. Primary tool for scratch VO replacement.

ElevenLabs TTS v3, Turbo v2.5, or Multilingual v2 — write a script, the model speaks it. Use when generating from written copy rather than replacing a recording.


Click your recorded VO clip on the Premiere timeline. modelBridge detects it automatically — the clip info card shows format and duration info.

Supported formats: mp3, wav, ogg, flac, aac, m4a, wma.

Keep source clips under 2–3 minutes for reliable extraction — the extraction timeout is 60 seconds.


Click ⓘ on any field for a plain-language explanation.

Voice Changer fields:

  • Voice / speaker — dropdown for the target voice
  • Similarity — how closely the output matches the source voice character
  • Style — emotional tone

TTS v3 fields:

  • Text / prompt — the script to be spoken
  • Voice — 20+ built-in voices across narration, conversational, character, and broadcast styles
  • Inline emotion tags — supported directly in the text field for emotional nuance and pacing

The cost badge above Generate shows the estimated charge. Approximate pricing:

  • TTS v3, Multilingual v2, Text-to-Dialogue — ~$0.10/1k chars
  • Turbo v2.5 — ~$0.05/1k chars
  • Voice Changer — ~$0.30/min
  • Music generation — ~$0.80/min
  • Sound Effects v2 — ~$0.002/sec
  • Audio Isolation — ~$0.10/min
  • Dubbing — ~$0.90/min

Click Generate. Generation typically takes 10–30 seconds.


Generated audio appears in the modelBridge preview panel with a full audio player — play/pause, scrub bar, and time display. Listen to the full result here before anything touches the timeline.


Click Import to Timeline. The generated audio lands at the current playhead position on the first available audio track. All video tracks are locked during import to prevent ripple.

Important: Audio always inserts — it does not replace your original VO clip. Your scratch recording stays on the timeline as a safety net.


After import, both clips sit on separate tracks at the same timecode.

  1. Press M on the scratch VO track header to mute it
  2. Play through — listen to the generated voice against picture
  3. If the generated audio drifts, use Effect Controls → Speed/Duration to trim or stretch
  4. When satisfied — delete or disable the scratch VO track
  5. If not satisfied — unmute the scratch, go back to modelBridge, adjust and regenerate

All available ElevenLabs models on fal.ai are searchable in modelBridge Browse and link directly to your timeline.

Speech

  • TTS v3 — natural intonation, emotional nuance, accurate pacing. 20+ built-in voices. Supports inline emotion tags
  • Turbo v2.5 — under 75ms latency for fast-turnaround applications
  • Multilingual v2 — TTS in 29 languages
  • Text-to-Dialogue — multi-speaker conversation generation

Voice transformation

  • Voice Changer — transform any voice input to a different voice while preserving timing and pacing
  • Audio Isolation — strip background noise from a recording. Run your scratch VO through this first if the recording environment was noisy

Music and sound

  • Music generation — generate full music tracks from a text description. Describe mood, tempo, and genre — the track imports directly to your audio timeline. Duration: 10 seconds to 5 minutes
  • Sound Effects v2 — generate sound effects from a text prompt

Dubbing

  • Dubbing — provide a video clip with dialogue, get back a dubbed version in a different voice or language. The output is a video file that imports directly to your timeline

For the full model list: fal.ai/elevenlabs

Commercial use: Content generated through the fal.ai API can be used in commercial projects. See fal.ai’s terms of service for details.


Tag to a client or project using the picker in the Generate tab before generating. Export an HTML report from the Billing tab when the project wraps.


LimitationWhat it means
No Replace on TimelineAudio always inserts — original clip stays until deleted manually
No duration matchingGenerated length depends on model and input, not source timecode
Extraction timeout60 seconds — keep source clips under 2–3 minutes
Output formatAlways .wav
Model availability variesSearch “elevenlabs” in Browse — available models change

Explainer video, client delivery: Record scratch VO while watching the edit — timing is natural and picture-locked. Run through Voice Changer in modelBridge, preview the polished voice, import, mute the scratch.

Multiple voice options: Generate two or three versions by switching voice settings. Each imports to a separate track. Mute and compare with M, keep the best, delete the rest.

Last-minute script change: Record just the changed line as a scratch, run through TTS v3 with the same voice settings, import at the right timecode. Done in under 5 minutes.

Soundtrack from scratch: Describe mood, tempo, and genre in Music Generation. Generate a full track, preview, import directly to the audio timeline.

International delivery: Generate your VO in English, then run through Dubbing in modelBridge for a dubbed version in the target language.


Common failure modes — if generation fails or audio quality isn’t right.

Workflow recipes by category — voice-led sequence recipe.