Audio & Voice Generation

modelBridge supports a full audio pipeline — text-to-speech, text-to-audio, voice conversion, and more. Generate voiceover, sound effects, or music without leaving Premiere Pro. Results import directly to the correct audio track on your timeline.

What You Can Generate

Category	What It Does	Examples
Text to Speech	Convert text to spoken audio	Voiceover narration, dialogue, placeholder VO
Text to Audio	Generate sound from a description	Sound effects, ambient audio, music
Audio to Audio	Transform existing audio	Voice conversion, enhancement, noise removal
Speech to Speech	Transform spoken audio	Change a voice while preserving timing
Video to Audio	Generate audio from video	Lip sync, soundtrack generation

All audio models use the same schema-driven interface as video and image models — the plugin reads each model’s API and builds the controls automatically.

ElevenLabs Models

Every ElevenLabs model on fal.ai is available inside modelBridge — and new ones show up automatically as they launch.

Text to Speech

Model	What it does
elevenlabs/tts/eleven-v3	Their best model — with emotion tags for expressive delivery
elevenlabs/tts/turbo-v2.5	Ultra-fast generation for quick iterations
elevenlabs/tts/multilingual-v2	29 languages, one model
elevenlabs/text-to-dialogue/eleven-v3	Multiple speakers in a single generation

Voice Conversion

Model	What it does
elevenlabs/voice-changer	Record in your voice, output in any other

Sound Effects & Music

Model	What it does
elevenlabs/sound-effects/v2	Describe a sound, get the audio
elevenlabs/music	Generate music from a text prompt

Transcription

Model	What it does
elevenlabs/speech-to-text/scribe-v2	High-accuracy speech to text
elevenlabs/speech-to-text	Standard transcription

Dubbing

Model	What it does
elevenlabs/dubbing	Dub your video into another language

Text-to-Speech Workflow

Select a TTS model — search for “tts”, “speech”, or “elevenlabs”
Write your script in the prompt field
Adjust voice parameters — voice selection, speed, emotion, language (varies by model)
Check the cost estimate — updates live as you adjust
Click Generate — the audio imports to your project

Audio Lands on the Right Track

When you click Import to Timeline, audio goes to the first available audio track at the playhead. No manual routing needed.

Preview Before Importing

An inline audio player in the result card lets you listen before committing. Play/pause controls and progress bar — only one preview plays at a time.

ElevenLabs Emotion Tags

ElevenLabs Eleven v3 supports expressive tags for emotion, accent, and delivery. modelBridge surfaces these through a tag bar above the prompt field:

Emotion chips — click to insert [excited], [whispers], [laughs], [sad], [angry] at the cursor
Accent dropdown — British, American, Australian, Indian, and more — inserts [accent: British]
Collapsible — hide the tag bar when you don’t need it

You can also type custom tags directly: [cheerful], [slowly], [in a deep voice]. The tag bar just makes common tags one-click accessible.

Voice-Over Workflow

Record your voice. Select a model from ElevenLabs. Hit Generate and import to timeline. No roundtripping. No file management. No leaving Premiere.

Record a rough VO for timing on your timeline
Select the clip and choose a voice conversion or TTS model
Click Generate — the transformed voice imports directly to the first available audio track

For agencies: don’t like it? Pick a different voice, tweak the emotion, regenerate. Want to hear two options? Dual Mode runs both side by side from the same script.

Sound Design

Text-to-audio models generate sound effects and music from descriptions:

“Thunderstorm with distant sirens”
“Footsteps on gravel, slow and deliberate”
“Upbeat electronic music, 120 BPM, no vocals”

Same workflow — describe the sound, generate, and it imports to your project.

Dual Mode for Audio

Dual Mode works with audio models the same way it works with video. Generate the same script with two TTS models, preview both, and import the one you prefer. Useful for comparing voices or finding the right tone.

Cost Tracking

Audio generations are tracked in the Billing tab alongside video and image, with the same five confidence tiers (Billed, Estimated, Learned, From, No price).