Audio & Voice Generation
modelBridge supports a full audio pipeline — text-to-speech, text-to-audio, voice conversion, and more. Generate voiceover, sound effects, or music without leaving Premiere Pro. Results import directly to the correct audio track on your timeline.
What You Can Generate
Section titled “What You Can Generate”| Category | What It Does | Examples |
|---|---|---|
| Text to Speech | Convert text to spoken audio | Voiceover narration, dialogue, placeholder VO |
| Text to Audio | Generate sound from a description | Sound effects, ambient audio, music |
| Audio to Audio | Transform existing audio | Voice conversion, enhancement, noise removal |
| Speech to Speech | Transform spoken audio | Change a voice while preserving timing |
| Video to Audio | Generate audio from video | Lip sync, soundtrack generation |
All audio models use the same schema-driven interface as video and image models — the plugin reads each model’s API and builds the controls automatically.
ElevenLabs Models
Section titled “ElevenLabs Models”Every ElevenLabs model on fal.ai is available inside modelBridge — and new ones show up automatically as they launch.
Text to Speech
Section titled “Text to Speech”| Model | What it does |
|---|---|
| elevenlabs/tts/eleven-v3 | Their best model — with emotion tags for expressive delivery |
| elevenlabs/tts/turbo-v2.5 | Ultra-fast generation for quick iterations |
| elevenlabs/tts/multilingual-v2 | 29 languages, one model |
| elevenlabs/text-to-dialogue/eleven-v3 | Multiple speakers in a single generation |
Voice Conversion
Section titled “Voice Conversion”| Model | What it does |
|---|---|
| elevenlabs/voice-changer | Record in your voice, output in any other |
Sound Effects & Music
Section titled “Sound Effects & Music”| Model | What it does |
|---|---|
| elevenlabs/sound-effects/v2 | Describe a sound, get the audio |
| elevenlabs/music | Generate music from a text prompt |
Transcription
Section titled “Transcription”| Model | What it does |
|---|---|
| elevenlabs/speech-to-text/scribe-v2 | High-accuracy speech to text |
| elevenlabs/speech-to-text | Standard transcription |
Dubbing
Section titled “Dubbing”| Model | What it does |
|---|---|
| elevenlabs/dubbing | Dub your video into another language |
Text-to-Speech Workflow
Section titled “Text-to-Speech Workflow”- Select a TTS model — search for “tts”, “speech”, or “elevenlabs”
- Write your script in the prompt field
- Adjust voice parameters — voice selection, speed, emotion, language (varies by model)
- Check the cost estimate — updates live as you adjust
- Click Generate — the audio imports to your project
Audio Lands on the Right Track
Section titled “Audio Lands on the Right Track”When you click Import to Timeline, audio goes to the first available audio track at the playhead. No manual routing needed.
Preview Before Importing
Section titled “Preview Before Importing”An inline audio player in the result card lets you listen before committing. Play/pause controls and progress bar — only one preview plays at a time.
ElevenLabs Emotion Tags
Section titled “ElevenLabs Emotion Tags”ElevenLabs Eleven v3 supports expressive tags for emotion, accent, and delivery. modelBridge surfaces these through a tag bar above the prompt field:
- Emotion chips — click to insert
[excited],[whispers],[laughs],[sad],[angry]at the cursor - Accent dropdown — British, American, Australian, Indian, and more — inserts
[accent: British] - Collapsible — hide the tag bar when you don’t need it
You can also type custom tags directly: [cheerful], [slowly], [in a deep voice]. The tag bar just makes common tags one-click accessible.
Voice-Over Workflow
Section titled “Voice-Over Workflow”Record your voice. Select a model from ElevenLabs. Hit Generate and import to timeline. No roundtripping. No file management. No leaving Premiere.
- Record a rough VO for timing on your timeline
- Select the clip and choose a voice conversion or TTS model
- Click Generate — the transformed voice imports directly to the first available audio track
For agencies: don’t like it? Pick a different voice, tweak the emotion, regenerate. Want to hear two options? Dual Mode runs both side by side from the same script.
Sound Design
Section titled “Sound Design”Text-to-audio models generate sound effects and music from descriptions:
- “Thunderstorm with distant sirens”
- “Footsteps on gravel, slow and deliberate”
- “Upbeat electronic music, 120 BPM, no vocals”
Same workflow — describe the sound, generate, and it imports to your project.
Dual Mode for Audio
Section titled “Dual Mode for Audio”Dual Mode works with audio models the same way it works with video. Generate the same script with two TTS models, preview both, and import the one you prefer. Useful for comparing voices or finding the right tone.
Cost Tracking
Section titled “Cost Tracking”Audio generations are tracked in the Billing tab alongside video and image, with the same five confidence tiers (Billed, Computed, Learned, Estimated, From).