Skip to content

Audio & Voice Generation

modelBridge supports a full audio pipeline — text-to-speech, text-to-audio, voice conversion, and more. Generate voiceover, sound effects, or music without leaving Premiere Pro. Results import directly to the correct audio track on your timeline.

CategoryWhat It DoesExamples
Text to SpeechConvert text to spoken audioVoiceover narration, dialogue, placeholder VO
Text to AudioGenerate sound from a descriptionSound effects, ambient audio, music
Audio to AudioTransform existing audioVoice conversion, enhancement, noise removal
Speech to SpeechTransform spoken audioChange a voice while preserving timing
Video to AudioGenerate audio from videoLip sync, soundtrack generation

All audio models use the same schema-driven interface as video and image models — the plugin reads each model’s API and builds the controls automatically.

Every ElevenLabs model on fal.ai is available inside modelBridge — and new ones show up automatically as they launch.

ModelWhat it does
elevenlabs/tts/eleven-v3Their best model — with emotion tags for expressive delivery
elevenlabs/tts/turbo-v2.5Ultra-fast generation for quick iterations
elevenlabs/tts/multilingual-v229 languages, one model
elevenlabs/text-to-dialogue/eleven-v3Multiple speakers in a single generation
ModelWhat it does
elevenlabs/voice-changerRecord in your voice, output in any other
ModelWhat it does
elevenlabs/sound-effects/v2Describe a sound, get the audio
elevenlabs/musicGenerate music from a text prompt
ModelWhat it does
elevenlabs/speech-to-text/scribe-v2High-accuracy speech to text
elevenlabs/speech-to-textStandard transcription
ModelWhat it does
elevenlabs/dubbingDub your video into another language
  1. Select a TTS model — search for “tts”, “speech”, or “elevenlabs”
  2. Write your script in the prompt field
  3. Adjust voice parameters — voice selection, speed, emotion, language (varies by model)
  4. Check the cost estimate — updates live as you adjust
  5. Click Generate — the audio imports to your project

When you click Import to Timeline, audio goes to the first available audio track at the playhead. No manual routing needed.

An inline audio player in the result card lets you listen before committing. Play/pause controls and progress bar — only one preview plays at a time.

ElevenLabs Eleven v3 supports expressive tags for emotion, accent, and delivery. modelBridge surfaces these through a tag bar above the prompt field:

  • Emotion chips — click to insert [excited], [whispers], [laughs], [sad], [angry] at the cursor
  • Accent dropdown — British, American, Australian, Indian, and more — inserts [accent: British]
  • Collapsible — hide the tag bar when you don’t need it

You can also type custom tags directly: [cheerful], [slowly], [in a deep voice]. The tag bar just makes common tags one-click accessible.

Record your voice. Select a model from ElevenLabs. Hit Generate and import to timeline. No roundtripping. No file management. No leaving Premiere.

  1. Record a rough VO for timing on your timeline
  2. Select the clip and choose a voice conversion or TTS model
  3. Click Generate — the transformed voice imports directly to the first available audio track

For agencies: don’t like it? Pick a different voice, tweak the emotion, regenerate. Want to hear two options? Dual Mode runs both side by side from the same script.

Text-to-audio models generate sound effects and music from descriptions:

  • “Thunderstorm with distant sirens”
  • “Footsteps on gravel, slow and deliberate”
  • “Upbeat electronic music, 120 BPM, no vocals”

Same workflow — describe the sound, generate, and it imports to your project.

Dual Mode works with audio models the same way it works with video. Generate the same script with two TTS models, preview both, and import the one you prefer. Useful for comparing voices or finding the right tone.

Audio generations are tracked in the Billing tab alongside video and image, with the same five confidence tiers (Billed, Computed, Learned, Estimated, From).