Voice-over workflow with ElevenLabs

TL;DR: Record your scratch VO directly in Premiere while the video plays. Select that clip in modelBridge, choose an ElevenLabs model, generate a polished voice, preview it, then import to timeline. Your timing is already locked — no re-editing needed.

Why this workflow exists

Most editors record a scratch VO to lock timing — then spend hours finding a voice artist, sending files back and forth, and re-syncing the final audio. With ElevenLabs models in modelBridge, the scratch VO becomes the input. Your timing stays. The voice changes.

Step 1 — Record your VO in Premiere

Record directly in your timeline while the video plays. No external recorder needed.

How to record:

In the timeline, find an empty audio track — A2, A3, or add a new one
Click the microphone icon on the track header to arm it for recording
Place your playhead where the VO should start
Click the microphone icon again — Premiere counts down 3 seconds, then “Recording…” appears in red, the timeline rolls, and the video plays
Speak your VO in time with the video
Press Space to stop when done

Premiere places the recorded clip on the armed track at the exact timecode where you started. Your timing is locked to the picture.

Step 2 — Select your ElevenLabs model in modelBridge

Search “elevenlabs” in the modelBridge Browse tab. For VO replacement, two models are most relevant:

ElevenLabs Voice Changer — provide your scratch VO clip, the model transforms it into a different voice. Your exact timing and pacing carry over. Primary tool for scratch VO replacement.

ElevenLabs TTS v3, Turbo v2.5, or Multilingual v2 — write a script, the model speaks it. Use when generating from written copy rather than replacing a recording.

Step 3 — Select your recorded clip

Click your recorded VO clip on the Premiere timeline. modelBridge detects it automatically — the clip info card shows format and duration info.

Supported formats: mp3, wav, ogg, flac, aac, m4a, wma.

Keep source clips under 2–3 minutes for reliable extraction — the extraction timeout is 60 seconds.

Step 4 — Configure the model

Click ⓘ on any field for a plain-language explanation.

Voice Changer fields:

Voice / speaker — dropdown for the target voice
Similarity — how closely the output matches the source voice character
Style — emotional tone

TTS v3 fields:

Text / prompt — the script to be spoken
Voice — 20+ built-in voices across narration, conversational, character, and broadcast styles
Inline emotion tags — supported directly in the text field for emotional nuance and pacing

Step 5 — Check cost and generate

The cost badge above Generate shows the estimated charge. Approximate pricing:

TTS v3, Multilingual v2, Text-to-Dialogue — ~$0.10/1k chars
Turbo v2.5 — ~$0.05/1k chars
Voice Changer — ~$0.30/min
Music generation — ~$0.80/min
Sound Effects v2 — ~$0.002/sec
Audio Isolation — ~$0.10/min
Dubbing — ~$0.90/min

Click Generate. Generation typically takes 10–30 seconds.

Step 6 — Preview before import

Generated audio appears in the modelBridge preview panel with a full audio player — play/pause, scrub bar, and time display. Listen to the full result here before anything touches the timeline.

Step 7 — Import to timeline

Click Import to Timeline. The generated audio lands at the current playhead position on the first available audio track. All video tracks are locked during import to prevent ripple.

Important: Audio always inserts — it does not replace your original VO clip. Your scratch recording stays on the timeline as a safety net.

Step 8 — Swap the original VO

After import, both clips sit on separate tracks at the same timecode.

Press M on the scratch VO track header to mute it
Play through — listen to the generated voice against picture
If the generated audio drifts, use Effect Controls → Speed/Duration to trim or stretch
When satisfied — delete or disable the scratch VO track
If not satisfied — unmute the scratch, go back to modelBridge, adjust and regenerate

The full ElevenLabs suite in modelBridge

All available ElevenLabs models on fal.ai are searchable in modelBridge Browse and link directly to your timeline.

Speech

TTS v3 — natural intonation, emotional nuance, accurate pacing. 20+ built-in voices. Supports inline emotion tags
Turbo v2.5 — under 75ms latency for fast-turnaround applications
Multilingual v2 — TTS in 29 languages
Text-to-Dialogue — multi-speaker conversation generation

Voice transformation

Voice Changer — transform any voice input to a different voice while preserving timing and pacing
Audio Isolation — strip background noise from a recording. Run your scratch VO through this first if the recording environment was noisy

Music and sound

Music generation — generate full music tracks from a text description. Describe mood, tempo, and genre — the track imports directly to your audio timeline. Duration: 10 seconds to 5 minutes
Sound Effects v2 — generate sound effects from a text prompt

Dubbing

Dubbing — provide a video clip with dialogue, get back a dubbed version in a different voice or language. The output is a video file that imports directly to your timeline

For the full model list: fal.ai/elevenlabs

Commercial use: Content generated through the fal.ai API can be used in commercial projects. See fal.ai’s terms of service for details.

Cost and tracking

Tag to a client or project using the picker in the Generate tab before generating. Export an HTML report from the Billing tab when the project wraps.

Limitations

Limitation	What it means
No Replace on Timeline	Audio always inserts — original clip stays until deleted manually
No duration matching	Generated length depends on model and input, not source timecode
Extraction timeout	60 seconds — keep source clips under 2–3 minutes
Output format	Always .wav
Model availability varies	Search “elevenlabs” in Browse — available models change

In real projects

Explainer video, client delivery: Record scratch VO while watching the edit — timing is natural and picture-locked. Run through Voice Changer in modelBridge, preview the polished voice, import, mute the scratch.

Multiple voice options: Generate two or three versions by switching voice settings. Each imports to a separate track. Mute and compare with M, keep the best, delete the rest.

Last-minute script change: Record just the changed line as a scratch, run through TTS v3 with the same voice settings, import at the right timecode. Done in under 5 minutes.

Soundtrack from scratch: Describe mood, tempo, and genre in Music Generation. Generate a full track, preview, import directly to the audio timeline.

International delivery: Generate your VO in English, then run through Dubbing in modelBridge for a dubbed version in the target language.

Common failure modes — if generation fails or audio quality isn’t right.

Workflow recipes by category — voice-led sequence recipe.