Parameter reference

This reference covers every parameter you’ll encounter in modelBridge — organized by category. Each section explains what the parameter does, how different values affect your output, and what settings to start with.

You can also access this information directly in the plugin: click the ⓘ icon next to any input field for a quick explanation and a direct link back to the relevant section here.

With over 700 curated explanations across 1,000+ models and over 100 dedicated parameter sections, this is the most comprehensive AI parameter reference built specifically for video editors and motion designers.

Parameters are grouped by theme. Use the sidebar or your browser’s find (Ctrl/Cmd+F) to jump to any parameter.

Prompt & guidance

Guidance scale

In short

Controls how literally the AI follows your prompt — too high causes artifacts.

What it does

Sets how closely the generated output matches your text description. Higher values force stricter adherence to the prompt.

How to think about it

Like directing an actor — low values give creative freedom, high values make them follow the script word-for-word. Too much direction and they become stiff.

Recommended settings

Low (1–3): Loose, creative interpretation — good for abstract or experimental work
Default (3–7): Balanced — sweet spot for most models and use cases
High (10+): Very rigid, often oversaturated and artifact-prone

Common mistakes

Cranking it to 15+ thinking “more accurate = better” — quality usually drops beyond 7
Using the same value across different model families — Flux works best at 3–5, SDXL at 7–10

Also called

cfg_scale, cfg, guidance, text_guidance_scale

Negative prompt

In short

Tells the AI what to avoid in the output.

What it does

Defines concepts, qualities, or artifacts the model should steer away from during generation. The opposite of your main prompt.

How to think about it

Like telling a colorist “no crushed blacks, no blown highlights” — you’re defining boundaries, not directions.

Recommended settings

Standard starting point: blurry, watermark, text, low quality, distorted
For video: Add flickering, jittery, frame drops
For faces: Add extra fingers, deformed face, cross-eyed

Common mistakes

Writing full sentences — comma-separated keywords work best
Overloading with 50+ terms — the model loses focus after ~20 keywords

Also called

negative_prompt

Prompt expansion

In short

Lets the model auto-enhance your prompt before generating.

What it does

The AI automatically elaborates on your prompt, adding style cues, lighting descriptions, and quality keywords it thinks will improve the result.

How to think about it

Like auto-correct for creative briefs — the AI fills in gaps you didn’t specify. Helpful when your prompt is short, counterproductive when it’s already detailed.

Recommended settings

Off: Better when your prompt is already detailed and specific — expansion can override your intent
On: Good when your prompt is short or vague — the model fills in gaps
When to use: Short prompts, quick iterations, exploring styles

Common mistakes

Leaving it on with a carefully crafted prompt, then wondering why the output doesn’t match
Not checking the model’s expanded version when results surprise you

Also called

enable_prompt_expansion, expand_prompt, enhance_prompt, prompt_optimizer

Thinking type

In short

Extra processing time for prompt optimization before generating.

What it does

Controls whether the model spends additional time analyzing and optimizing your prompt before starting the generation process. Takes longer and costs more.

How to think about it

Like a pre-production meeting — the model “thinks” about the best approach before calling action. More prep doesn’t always mean a better take.

Recommended settings

Off: Faster, cheaper — good for iterating quickly
On: Slower, costs more — try it when default results disappoint
When to use: Complex scenes, multi-subject compositions, when defaults fall short

Common mistakes

Leaving it enabled for every generation — the quality improvement is inconsistent
Assuming “thinking = better” without A/B testing against non-thinking results

Also called

thinking_type

Generation quality

Inference steps

In short

Number of processing passes — more steps means more refined output.

What it does

Sets how many times the AI refines the output. Each step adds detail and coherence, but with diminishing returns past a threshold.

How to think about it

Like render quality in After Effects — more passes means more refined, but past a point you’re wasting render time for invisible improvement.

Recommended settings

Low (5–10): Fast but rough — good for quick previews and iteration
Default (20–30): Best quality-to-speed ratio — start here
High (50+): Marginally better, 2–3x slower, rarely worth it

Common mistakes

Setting steps to 80+ thinking “more = better” — quality plateaus around 30 for most models
Not adjusting steps when switching models — some models are optimized for fewer steps

Also called

num_inference_steps, steps, num_steps, number_of_steps

Seed

In short

Locks randomness — same seed plus same settings equals same result.

What it does

A number that controls the random starting point of the generation. Using the same seed with identical settings reproduces the exact same output.

How to think about it

Like a take number on set — if Take 7 was great, you can ask for Take 7 again and get the exact same performance.

Recommended settings

-1 or 0: Random seed (default) — every generation is different
Any specific number: Locks the output — use when you found a result you like
When to use: Lock the seed to iterate on prompt/settings while keeping composition stable

Common mistakes

Not saving the seed when you get a good result — note it before changing settings
Expecting the same seed to produce identical results across different models — it won’t

Strength

In short

How much the input changes — 0 keeps it, 1 replaces it entirely.

What it does

Controls the degree of transformation applied to your input image or video. At 0 the output is identical to input, at 1 the input is completely ignored.

How to think about it

Like opacity on an adjustment layer — low values make subtle tweaks to your source, high values ignore it and start fresh.

Recommended settings

Low (0.1–0.3): Subtle refinement — keeps your original composition intact
Default (0.4–0.6): Balanced transformation — changes style while preserving structure
High (0.7–1.0): Major changes — your input becomes a rough suggestion, not a guide

Common mistakes

Setting it to 1.0 for image-to-image and wondering why output ignores your source — that’s by design
Using the same strength across different model types — video models often need lower values than image models

Also called

image_strength

End image strength

In short

How strongly the end frame pulls a video transition toward it.

What it does

Controls how firmly the final frame of a video transition or interpolation matches your provided end image. Higher values ensure a precise landing.

How to think about it

Like easing in an animation — higher values pull the motion firmly toward the final keyframe, lower values let it drift and arrive naturally.

Recommended settings

Low (0.3–0.5): Gentle arrival at the end frame — more creative, organic motion
Default (0.6–0.8): Balanced — recognizable end frame with natural transition
High (0.8–1.0): Precise landing — the last frame closely matches your end image

Common mistakes

Setting it too low when you need an exact match — the transition will “miss” the target
Not using it at all for interpolation — the end frame may not resemble your input

Also called

end_image_strength

Scheduler

In short

The algorithm that controls how the AI refines output step by step.

What it does

Selects the mathematical method used to progressively denoise and refine the image or video during generation. Different schedulers produce slightly different quality/speed tradeoffs.

How to think about it

Like choosing a render engine — each produces slightly different results. Most users never need to change this from the default.

Recommended settings

Euler / Euler A: Fast, good general purpose — common default
DPM++ 2M: High quality, slightly slower — good for detailed images
DDIM: Deterministic — same seed gives truly identical results, good for consistency

Common mistakes

Switching schedulers to “fix” a bad generation — the problem is almost always prompt or guidance scale
Spending time comparing schedulers before optimizing more impactful settings first

Also called

sampler

Acceleration

In short

Speed-versus-quality tradeoff — faster generation, lower quality.

What it does

Reduces generation time by taking computational shortcuts. Higher acceleration means faster output but with potential quality loss.

How to think about it

Like proxy editing in Premiere — faster to work with, but you sacrifice some quality. Use proxies for iteration, full-res for finals.

Recommended settings

Low/None: Full quality — use for final renders
Default (Medium): Faster with minimal quality loss — good for previewing
High/Turbo: Fastest — noticeable quality reduction, use for rapid iteration only

Common mistakes

Leaving acceleration on high for final output — always switch back for the version the client sees
Not realizing some models label this differently (turbo, fast, lightning)

Temperature

In short

Randomness in audio/speech — higher means more varied, less predictable.

What it does

Controls how much variation the model introduces in text-to-speech and audio generation. Low values produce consistent, predictable output; high values introduce more natural variation.

How to think about it

Like an actor’s improvisation dial — low temperature reads the script exactly, high temperature ad-libs and adds personality.

Recommended settings

Low (0.3–0.7): Predictable, consistent — good for narration and voiceovers
Default (0.8–1.0): Natural variation — sounds more human
High (1.2+): Unpredictable — may produce interesting results or garbled output

Common mistakes

Setting it above 1.5 for speech — output often becomes incoherent
Confusing this with guidance scale — temperature affects randomness, guidance affects prompt adherence

Top P

In short

Limits AI choices to the most probable options — lower means more focused.

What it does

Restricts the model’s selection pool to tokens whose cumulative probability reaches P. Lower values make output more conservative and predictable.

How to think about it

Like restricting an editor to their top 10 B-roll picks instead of the full library — fewer choices, but each one is strong.

Recommended settings

Low (0.5–0.7): Focused, conservative output
Default (0.9–1.0): Full creative range
When to use: Lower it for consistent narration, raise it for creative/experimental audio

Common mistakes

Setting it very low (0.3) and getting repetitive, monotonous audio — the model needs some freedom
Adjusting Top P and Top K simultaneously without testing each independently

Top K

In short

Number of top candidates considered at each generation step.

What it does

At each step, limits the model to choosing from only the K most likely next tokens. Lower K means less variety but more coherence.

How to think about it

Like a shortlist for casting — instead of auditioning every actor in town, you only see the top K candidates. Smaller shortlist, more predictable result.

Recommended settings

Low (10–30): Very focused — can sound robotic in speech
Default (50–250): Good balance of variety and coherence
High (500+): Maximum variety — may reduce quality

Common mistakes

Setting both Top K and Top P very low simultaneously — the model gets so restricted it produces flat, repetitive output
Changing Top K without understanding it interacts with Temperature and Top P

Repetition penalty

In short

Penalizes repeated sounds or words in audio/speech output.

What it does

Applies a penalty score when the model tries to repeat the same tokens, words, or patterns. Forces more variety in the output.

How to think about it

Like telling an editor “don’t use the same transition twice in a row” — it forces variety, but too strict and the choices become awkward.

Recommended settings

Low (1.0): No penalty (default) — some repetition is natural
Default (1.2–1.3): Mild penalty — reduces obvious repetition
High (1.5+): Strong penalty — may cause unnatural word choices

Common mistakes

Setting it too high for speech — the model starts avoiding common words, making sentences awkward
Using it for music generation where repetition (chorus, rhythm) is intentional

LoRA & style

LoRA scale

In short

How strongly a LoRA style add-on influences the output.

What it does

Controls the blending weight of a LoRA (Low-Rank Adaptation) — a small style or concept model layered on top of the base model. At 0 it has no effect, at 1.0 it’s full strength.

How to think about it

Like the opacity of a LUT in Premiere — at 0% no effect, at 100% full strength. But unlike a LUT, going above 100% often causes visual artifacts.

Recommended settings

Low (0.3–0.5): Subtle influence — good for blending styles
Default (0.7): Strong but clean — best starting point
High (1.0+): Maximum effect — often introduces artifacts, distortion, or pattern repetition

Common mistakes

Setting it to 1.0+ thinking “full strength = best result” — most LoRAs work best at 0.7–0.8
Stacking multiple LoRAs at high scale — effects compound and quality drops fast

Also called

lora_scale

Camera LoRA scale

In short

Intensity of camera movement from a camera motion LoRA.

What it does

Controls how dramatic the camera movement is when using a camera LoRA (zoom, pan, tilt, orbit). Higher values create more pronounced motion.

How to think about it

Like keyframe velocity in Premiere — low values give gentle, subtle camera drift; high values give dramatic sweeping moves.

Recommended settings

Low (0.3–0.5): Subtle camera drift — good for adding life to static shots
Default (0.7–0.8): Noticeable but natural camera movement
High (1.0+): Dramatic motion — can look unnatural if overdone

Common mistakes

Combining high camera LoRA scale with a fast-moving subject — double motion creates disorienting output
Using camera LoRA on very short clips where the motion has no time to develop

Also called

camera_lora_scale

Embeddings

In short

Custom-trained files that teach the model new visual concepts or styles.

What it does

Loads externally trained style or concept data into the model — specific characters, objects, or art styles packaged as reusable files. Works alongside your prompt.

How to think about it

Like custom presets in Premiere — someone trained the AI to recognize a specific concept (a character’s face, a brand’s visual style) and packaged it as a reusable file.

Recommended settings

Single embedding: Best results — load one and reference its trigger word in your prompt
Multiple embeddings: Quality drops with more than 2–3 combined
When to use: Character consistency, brand style enforcement, specific art styles

Common mistakes

Using an embedding without including its trigger word in the prompt — the model loads the style but doesn’t know when to apply it
Combining too many embeddings — effects conflict and quality degrades

Also called

embeddings

Style

In short

Selects a predefined visual aesthetic from the model’s built-in options.

What it does

Applies a preset visual style to the generation. Available styles vary by model — each has been tuned to produce specific aesthetics.

How to think about it

Like choosing a LUT package — each style preset applies a consistent look across your output. Pick one that matches your project’s mood.

Recommended settings

Browse the dropdown: Names describe the look — “cinematic,” “anime,” “photorealistic”
Match style to prompt: A cinematic style with a cartoon prompt creates interesting but unpredictable blends
When to use: When you want a consistent aesthetic without crafting a complex prompt

Common mistakes

Fighting the style with your prompt — selecting “anime” but prompting for “photorealistic” produces inconsistent results
Assuming all models support styles — many don’t have this option

Also called

style

Video & animation

FPS

In short

Frames per second of the generated video — match your timeline.

What it does

Sets the frame rate of the generated video output. This determines motion smoothness and should match your Premiere timeline settings.

How to think about it

Exactly like FPS in Premiere — it controls how smooth the motion looks. Mismatched frame rates between generation and timeline create artifacts.

Recommended settings

24 fps: Film look — standard for cinematic content
30 fps: Smooth motion — standard for web and social media
60 fps: Very smooth — good for slow-motion, not all models support this

Common mistakes

Generating at a different FPS than your Premiere timeline — 24fps in a 30fps timeline creates awkward frame blending
Choosing 60fps when the model doesn’t support it — output may default to a lower rate silently

Also called

frame_rate, frames_per_second

Interpolated frames

In short

AI-generated frames inserted between keyframes for smoother motion.

What it does

Creates new in-between frames that didn’t exist in the original, smoothing out motion. More interpolated frames means smoother playback but longer generation time.

How to think about it

Like Premiere’s Optical Flow — the AI synthesizes new frames between existing ones. Each interpolated frame is a full AI generation, so more frames means proportionally more work.

Recommended settings

Low (2–3): Slight smoothing — fast to generate
Default (4–6): Noticeably smoother — good balance
High (8+): Very smooth, approaching slow-motion — much slower to generate

Common mistakes

Setting it very high and expecting instant results — 8 frames between every pair means 8x the generation work
Using interpolation on footage that’s already smooth — adds processing time with no visible improvement

Also called

num_interpolated_frames

Interpolator model

In short

Which AI model creates the in-between frames during interpolation.

What it does

Selects the specific algorithm used to generate interpolated frames. Different models handle different types of motion better.

How to think about it

Like choosing between Optical Flow and Frame Blending in Premiere — different algorithms, different motion quality. Some handle fast motion better, others are smoother with slow movement.

Recommended settings

Default: Use unless you see artifacts — the default is chosen for broad compatibility
Alternative models: Try if you see ghosting or blurry motion in fast-moving scenes
None: Skips interpolation entirely

Common mistakes

Switching interpolator models to fix blurry output that’s actually caused by too few inference steps or low resolution
Trying every interpolator before addressing more impactful settings like strength or steps

Also called

interpolator_model

Temporal style consistency

In short

How consistent the visual style stays between frames in a video.

What it does

Enforces that every frame maintains the same visual style, preventing frame-to-frame style drift or flicker in generated video.

How to think about it

Like color consistency across a multi-day shoot — higher values enforce that every frame looks like it belongs to the same visual world.

Recommended settings

Low (0.0–0.3): Each frame can drift stylistically — artistic but potentially flickery
Default (0.5–0.7): Consistent look with natural variation
High (0.8–1.0): Very uniform — can look static if overdone

Common mistakes

Setting it to 0 and getting distracting style flicker every few frames
Setting it to 1.0 and getting output that looks frozen or lacks natural motion variation

Also called

temporal_adain_factor

Image quality

Upscale factor

In short

Multiplies output resolution with AI-enhanced detail — 2x or 4x.

What it does

Scales the output image by the given factor while using AI to add detail that wasn’t in the original. A 2x factor turns 512x512 into 1024x1024.

How to think about it

Like Premiere’s “Scale to Frame Size” but with AI enhancement — it doesn’t just stretch pixels, it synthesizes new detail.

Recommended settings

2x: Standard upscale — fast, reliable, good for most uses
4x: Maximum detail — takes longer, costs more, use for hero shots or print
When to use: When your generation resolution is lower than your delivery format requires

Common mistakes

Upscaling an already-large image by 4x — 2048x2048 at 4x creates 8192x8192 with diminishing returns
Expecting upscaling to fix a bad generation — it enhances detail, it doesn’t fix composition

Second-stage guidance

In short

Guidance scale for the refinement pass after initial generation.

What it does

A separate guidance scale applied during a second processing pass. Controls how aggressively the refinement follows your prompt after the initial generation establishes the base.

How to think about it

Like a second round of color grading — the first pass establishes the look, the second fine-tunes it. This controls how much the second pass changes.

Recommended settings

Low (1–3): Light refinement — preserves what the first pass created
Default (3–7): Balanced — same principles as main guidance scale
High (7+): Aggressive refinement — sharpens details but risks artifacts

Common mistakes

Setting it much higher than first-stage guidance — creates an inconsistent look with a soft base and harsh refinement
Ignoring it entirely — the default is usually fine, but tuning it can noticeably improve detail

Also called

guidance_scale_2

Guidance rescale

In short

Reduces color oversaturation caused by high guidance scale values.

What it does

Applies a correction factor that pulls back the color saturation boost that high guidance scale values introduce. Keeps colors natural when guidance is cranked up.

How to think about it

Like a saturation limiter on a color grade — when guidance pushes colors too hard, rescale pulls them back to natural levels.

Recommended settings

Low (0.0): No rescaling — colors may oversaturate at high guidance
Default (0.3–0.5): Mild correction — good when using guidance above 7
High (0.7): Strong correction — useful for very high guidance values

Common mistakes

Using it when guidance scale is already low (3–5) — rescaling at low guidance washes out colors unnecessarily
Not using it when guidance is above 10 — you’re likely getting oversaturated output

Also called

guidance_rescale

Tone map compression

In short

Controls dynamic range — lower is punchy, higher is flat with more detail.

What it does

Adjusts the dynamic range compression of the output, controlling how highlights and shadows are balanced relative to midtones.

How to think about it

Like the tone curve in Lumetri Color — low compression keeps punchy contrast, high compression flattens the range for more recoverable detail in extremes.

Recommended settings

Low (1.0): High contrast — punchy, cinematic look
Default (1.5–2.0): Balanced — good all-around
High (3.0+): Flat, compressed — more detail in highlights/shadows but can look dull

Common mistakes

Cranking compression high then adding contrast back in post — you’re losing quality in both conversions
Not considering your delivery format — flat output needs a grade, punchy output is closer to final

Also called

tone_map_compression_ratio

First pass steps

In short

Processing steps for the initial generation pass in multi-pass pipelines.

What it does

Sets how many refinement steps run during the first generation pass, which establishes composition, shapes, and major details before the refinement pass.

How to think about it

Like a rough cut — the first pass gets the structure right. More steps here means a stronger foundation for the refinement pass to build on.

Recommended settings

Low (10–15): Quick rough pass — sufficient when the refinement pass is strong
Default (15–25): Good balance — solid foundation without over-investing
High (30+): Very refined first pass — diminishing returns, especially if second pass is also high

Common mistakes

Over-investing steps in the first pass at the expense of the second — balance both for best results
Using single-pass step counts (30+) for the first pass — multi-pass pipelines need fewer per stage

Also called

first_pass_num_inference_steps

Second pass steps

In short

Processing steps for the refinement pass that adds detail and sharpness.

What it does

Sets how many steps run during the second refinement pass, which polishes textures, sharpens details, and brings the output to final quality.

How to think about it

Like the fine cut and color grade — this is where details get polished and the output reaches final quality. It builds on existing work, so fewer steps go further.

Recommended settings

Low (5–10): Light polish — fast, preserves first pass character
Default (10–20): Good refinement — adds meaningful detail
High (25+): Heavy refinement — diminishing returns kick in fast since it’s enhancing, not creating

Common mistakes

Setting it to 50+ expecting dramatic improvement — past ~25 steps, you’re paying for invisible changes
Setting it higher than first pass steps — refinement builds on existing work and needs fewer steps

Also called

second_pass_num_inference_steps

Second pass skip steps

In short

Skips early refinement steps to preserve first-pass structure.

What it does

Tells the refinement pass to skip its first N steps, preserving the composition and structure from the first pass while only adding fine detail in later steps.

How to think about it

Like starting your color grade at a later node — you skip the broad strokes (which the first pass already handled) and go straight to fine adjustments.

Recommended settings

Low (3–5): Refinement can still make structural changes — more creative freedom
Default (8–12): Keeps composition locked, focuses on texture and detail
High (15+): Minimal refinement — almost no change from first pass

Common mistakes

Setting skip steps too high and wondering why the second pass doesn’t seem to do anything — if you skip most steps, there’s nothing left to refine
Setting it to 0 and getting structural changes you didn’t want — skip a few to lock composition

Also called

second_pass_skip_initial_steps

Audio

Audio strength

In short

How much audio input influences the video generation.

What it does

Controls the degree to which an audio signal drives the visual output. Higher values make the video react more strongly to the audio’s rhythm, beat, and energy.

How to think about it

Like audio-driven keyframes in After Effects — higher values make the video pulse with the beat, lower values keep visuals independent of the soundtrack.

Recommended settings

Low (0.3–0.5): Subtle audio influence — visual changes are gentle
Default (0.6–0.8): Noticeable sync between audio and visuals
High (0.9–1.0): Strong audio-reactive output — motion matches the beat closely

Common mistakes

Setting it to 1.0 with aggressive music — the video becomes too reactive, looking more like a visualizer than a video
Using high audio strength with spoken word — voice dynamics create jarring visual changes

Also called

audio_strength

Audio guidance scale

In short

How closely generated audio follows your text description.

What it does

Same concept as visual guidance scale but for audio generation. Controls the balance between creative freedom and strict adherence to your audio description.

How to think about it

Same as visual guidance — low values let the model improvise, high values make it follow your description literally. Too high and it sounds forced.

Recommended settings

Low (1–3): Loose interpretation — good for ambient or experimental audio
Default (3–5): Balanced — follows your description naturally
High (7+): Very literal — may sound forced or unnatural

Common mistakes

Cranking it up thinking “higher = better quality” — it means more rigid adherence, not better audio
Using the same value for music and speech — speech usually needs lower guidance than sound effects

Also called

audio_guidance_scale

Voice

In short

Selects the voice character for text-to-speech generation.

What it does

Chooses which synthesized voice “actor” speaks your text. Each voice has its own tone, pitch, pacing, and personality. Available voices vary by model.

How to think about it

Like casting a voiceover artist — each option is a different performer. Some sound warm and conversational, others formal and authoritative.

Recommended settings

Preview first: Test with a short phrase before committing to a long generation
Match to content: Narration, dialogue, and announcements each suit different voice characters
Custom voices: Some models support voice cloning via audio upload

Common mistakes

Choosing a voice without testing it on your specific content — a voice great for narration may sound wrong for dialogue
Assuming voice names are consistent across models — “alloy” in one model may not exist in another

Text input

In short

The literal text the AI will speak or sing in audio generation.

What it does

Provides the script content for text-to-speech or text-to-singing models. Punctuation directly affects pacing and intonation in the output.

How to think about it

Like script copy for a voiceover session — this is the literal words the AI will perform. Formatting matters just like it does on a teleprompter.

Recommended settings

Natural punctuation: Periods create pauses, commas create brief breaks
Spell out numbers: “twenty-five” not “25”, “doctor” not “Dr.”
For emphasis: Some models respond to ALL CAPS or asterisks

Common mistakes

Writing text without punctuation — the AI reads it as one continuous stream with no natural pauses
Using abbreviations the model can’t interpret — it may pronounce “Dr.” as “dee-arr”

Language

In short

Sets the language for speech generation or transcription.

What it does

Tells the model which language rules to follow for pronunciation, rhythm, and intonation. Uses standard language codes (en, es, fr, de, ja, etc.).

How to think about it

Like setting the language track on a timeline — it determines which phonetic rules the AI follows. Wrong language code means wrong pronunciation.

Recommended settings

Match your text: Always set the language code to match your text content
Explicit over auto-detect: Some models auto-detect, but specifying is more reliable
Single language per generation: Multilingual text in one generation produces inconsistent results

Common mistakes

Leaving language set to English when text is in another language — the AI pronounces foreign words with English phonetics
Mixing languages in one generation expecting the model to switch seamlessly

System

Sync mode

In short

Wait for the result or generate in the background while you work.

What it does

When ON, the plugin blocks until the full result is ready. When OFF, generation runs in the background and you’re notified when it’s done.

How to think about it

Like rendering in Premiere — sync mode is “render and wait,” async mode is “add to render queue and keep editing.”

Recommended settings

Off (default): Best for most workflows — keep working while the AI generates
On: Useful for scripted workflows or when you need the result immediately
When to use: Leave OFF unless you have a specific reason to wait

Common mistakes

Turning sync mode ON and wondering why the plugin feels slow — it’s not slow, it’s waiting
Leaving it ON out of habit — async mode lets you queue multiple generations

Safety checker

In short

Filters generated output for inappropriate or harmful content.

What it does

Runs the model’s output through a content moderation filter before delivering it. Blocks results that may contain inappropriate content, protecting against unexpected output.

How to think about it

Like a standards-and-practices review — the AI checks its own work before handing it over. Protects you from surprises, but occasionally blocks legitimate creative work.

Recommended settings

On (default): Recommended for client work, team environments, and any production where unexpected content is unacceptable
Off: Use with caution — only disable when you’re certain the content is appropriate
When to use: Keep ON unless the filter is consistently blocking content you’ve verified is appropriate

Common mistakes

Disabling it for all generations because “it blocks too much” — examine your prompt first, the filter usually reacts to something specific
Assuming it catches everything — it’s a filter, not a guarantee

Safety tolerance

In short

Controls how strict the content filter is, on a scale from 1 (strictest) to 6 (most permissive).

What it does

Adjusts the sensitivity threshold of the model’s built-in safety checker. Lower values block more aggressively — a setting of 1 may flag even mildly suggestive content. Higher values allow more creative freedom but increase the risk of unexpected output.

How to think about it

Like adjusting the rating on a content filter — 1 is “G-rated only,” 6 is “allow almost everything.” Most professional work sits at 2–3: strict enough to avoid surprises, permissive enough to not block legitimate creative content.

Recommended settings

2 (default on most models): Good balance for client work and team environments
1: Maximum filtering — use for children’s content or highly regulated industries
4–6: Use with caution — only when the default filter is consistently blocking content you’ve verified is appropriate
When to adjust: If generations keep getting blocked and your prompt is clean, try moving up by 1

Common mistakes

Setting it to 6 “just to be safe from blocking” — this removes most safety filtering, which is the opposite of safe
Changing it without testing — always preview a generation after adjusting

Multi-prompt

In short

Lets you write separate prompts for different scenes or segments within a single generation.

What it does

Instead of one prompt describing the entire output, multi-prompt lets you define distinct descriptions for different parts — for example, different scenes in a video or different sections of an audio track. The model transitions between them automatically.

How to think about it

Like writing scene descriptions on a shot list — each prompt controls one segment, and the AI handles the transitions between them.

Recommended settings

When to use: Multi-scene videos, music with distinct sections, or any generation where you want different content at different points
Format: Varies by model — some use numbered prompts, others use separator tokens. Check the model’s description for the expected format

Common mistakes

Writing one long prompt and expecting scene breaks — you need to explicitly separate scenes
Using too many segments for a short duration — each scene needs enough time to develop

Auto fix

In short

Lets the model automatically correct input issues before generating.

What it does

When enabled, the model attempts to fix problems with your input rather than rejecting it outright — for example, reformatting a prompt that triggers a policy filter, adjusting an image that’s slightly outside accepted dimensions, or converting an unsupported format.

How to think about it

Like auto-correct for your generation inputs. It tries to make things work rather than throwing an error, but the “fix” may not always match your intent.

Recommended settings

On (default where available): Good for exploratory work — fewer errors, more results
Off: Use when you need precise control over exactly what the model receives — the auto-fix may silently change your input in ways you don’t expect

Common mistakes

Leaving it on and not noticing the model changed your prompt — if results look off, check whether auto-fix modified your input
Turning it off and then getting errors that auto-fix would have handled — re-enable if you’re hitting repeated validation failures

ControlNet

ControlNet conditioning scale

In short

How strongly the control image guides the AI output.

What it does

Sets the influence weight of a ControlNet control image (edge map, depth map, pose skeleton) on the generation. Higher values mean the output follows the control signal more closely.

How to think about it

Like rotoscoping constraints — higher values lock the output to your guide, lower values let the AI take creative liberties with the structure.

Recommended settings

Low (0.3–0.5): Soft guidance — the AI follows the general structure but improvises detail
Default (0.7–1.0): Strong guidance — output closely matches the control image
High (1.2+): Very rigid — can produce artifacts if the control signal is noisy

Common mistakes

Setting it to 1.5+ and getting blocky artifacts — control maps aren’t meant to be followed that literally
Using a low-quality control image at high scale — garbage in, garbage out amplified

Also called

controlnet_conditioning_scale, control_scale

Control timing

In short

When the control image starts and stops influencing the generation.

What it does

control_start and control_end set what fraction of the generation process uses control guidance. The values range from 0 (beginning) to 1 (end). Guidance only applies between these two points.

How to think about it

Like setting in/out points on a reference layer — the AI only “looks at” the control image during this window. Early guidance locks composition, late guidance refines detail.

Recommended settings

Full range (0.0–1.0): Maximum control — the guide influences every step
Early only (0.0–0.5): Locks composition but lets detail evolve freely — often the best balance
Late only (0.5–1.0): Lets the AI establish its own composition, then steers detail — unusual but useful for texture control

Common mistakes

Setting start and end to the same value — zero-width window means the control has no effect
Using full range on a noisy control image — early-only gives better results when your control signal isn’t clean

Also called

control_start, control_end

ControlNet guess mode

In short

Lets ControlNet work without a text prompt — experimental.

What it does

When enabled, the ControlNet generates output guided only by the control image, without any text prompt influence. The model “guesses” what to produce based solely on the structural input.

How to think about it

Like giving your editor footage and no brief — they interpret the structure entirely on their own. Results are unpredictable but can be surprisingly creative.

Recommended settings

Off (default): Use a prompt alongside the control image — more predictable results
On: Experimental — try when you want the AI to interpret your control image freely

Common mistakes

Enabling guess mode and expecting precise results — without a prompt, the model has no creative direction
Forgetting it’s enabled and wondering why your prompt seems to be ignored

Also called

controlnet_guess_mode

Preprocessor

In short

How the input image is prepared before being used as a control signal.

What it does

Selects the preprocessing method applied to your input image before it’s fed to the ControlNet. Different preprocessors extract different structural information: edges, depth, pose, segmentation.

How to think about it

Like choosing which analysis to run on your footage — edge detection gives you outlines, depth gives you spatial structure, pose gives you body positions. Pick the one that matches what you want to control.

Recommended settings

Canny: Edge detection — good for preserving outlines and shapes
Depth: Spatial structure — good for maintaining scene composition
OpenPose: Body pose — good for matching character positions
None: When your input is already a processed control map

Common mistakes

Using the wrong preprocessor for your intent — depth won’t help if you want to match exact outlines
Using “none” when your input is a regular photo — the model expects a processed control map

Also called

preprocessor

IP Adapter

In short

Uses a reference image to guide the style or subject of the output.

What it does

IP Adapter (Image Prompt Adapter) takes a reference image and uses it to influence the generation — transferring visual style, subject appearance, or composition without needing to describe it in text.

How to think about it

Like giving your colorist a reference frame from another film — “make it look like this.” The AI picks up on visual qualities from the reference and applies them to your generation.

Recommended settings

Single reference: Best results — one strong reference image gives clear direction
Multiple references: Some models support multiple IP adapter inputs — results blend between them
When to use: Style transfer, maintaining character consistency, matching a specific visual tone across generations

Common mistakes

Using a busy, complex reference image — simpler references with clear visual identity transfer better
Expecting exact reproduction — IP Adapter captures style and mood, not pixel-perfect copies

Also called

ip_adapters, ip_adapter

Pose guidance scale

In short

How strongly a pose reference controls the character’s position in the output.

What it does

Sets the influence weight of a pose reference (skeleton, keypoints) on the generated character. Higher values lock the character’s pose more tightly to the reference.

How to think about it

Like motion capture fidelity — low values give the AI freedom to adjust the pose naturally, high values force an exact match to the reference skeleton.

Recommended settings

Low (0.3–0.5): Soft pose suggestion — natural but approximate
Default (0.7–1.0): Strong pose match — good for matching specific body positions
High (1.2+): Very rigid — can cause unnatural limb positions if the reference has artifacts

Common mistakes

Setting it very high with a low-quality pose extraction — the model follows the noise too
Using it without checking the extracted pose first — verify the skeleton matches your intent

Also called

pose_guidance_scale

Mixing image prompt and inpaint

In short

Blends IP Adapter style transfer with inpainting for creative fills.

What it does

Combines the style influence of an image prompt (via IP Adapter) with inpainting — the AI fills masked regions using both your text prompt and the visual style from the reference image.

How to think about it

Like doing a content-aware fill in Photoshop but with a specific style guide — the AI fills the gap in a way that matches both the surrounding content and your reference image’s aesthetic.

Recommended settings

When to use: Creative retouching where you want fills to match a specific visual style
Adjust strength: Lower values lean toward the text prompt, higher values lean toward the image reference

Common mistakes

Using a reference image that clashes with the surrounding content — the fill will look inconsistent
Not providing a clear mask — the blending works best with well-defined inpaint regions

Video processing

Video write mode

In short

Encoding quality versus speed tradeoff for the output video file.

What it does

Selects the encoding strategy for the generated video. Typically a choice between faster encoding with lower quality or slower encoding with better compression and quality.

How to think about it

Like choosing between “Export as fast as possible” and “Match source — high bitrate” in Premiere’s export settings. Faster encoding is fine for previews, but use higher quality for finals.

Recommended settings

Fast/Default: Good for iteration and previewing — saves time during creative exploration
Quality: Use for final output — better compression, fewer artifacts

Common mistakes

Using the fast mode for final deliverables — the quality difference is visible on close inspection
Always using quality mode during iteration — it slows down your creative loop for no benefit

Also called

video_write_mode

Multi-scale generation

In short

Generates at multiple resolutions for better quality — usually worth enabling.

What it does

Runs the generation process at multiple resolution scales, progressively refining detail. The model first generates a low-resolution version, then enhances it at higher scales.

How to think about it

Like progressive rendering in After Effects — starting with a rough pass and refining. Each scale adds detail that a single-pass generation would miss.

Recommended settings

On (recommended): Better quality with modest speed cost — default for most models
Off: Faster single-pass generation — use when speed matters more than quality

Common mistakes

Turning it off to save time and not noticing the quality drop until final review
Expecting it to fix low-resolution input — it improves generation quality, not source quality

Also called

use_multiscale

Temporal downsample factor

In short

Skips input video frames — higher values use less of the source motion.

What it does

Reduces the frame rate of the input video before processing by skipping frames. A factor of 2 uses every other frame, 4 uses every fourth frame, and so on.

How to think about it

Like dropping every other frame from a reference clip — the model sees the key poses but not every micro-movement. Good for speeding up processing on long clips.

Recommended settings

1 (no skip): Full frame rate — best quality, slowest processing
2: Every other frame — good balance for most video-to-video work
4+: Heavy skipping — only key poses survive, fine for style transfer but loses subtle motion

Common mistakes

Setting it too high on motion-critical content — fast movements become jerky without enough intermediate frames
Forgetting it’s enabled and wondering why the output feels “jumpy” compared to the source

Also called

temporal_downsample_factor

Motion bucket ID

In short

Controls the amount of motion in generated video — higher means more movement.

What it does

Sets the motion intensity for video generation models. Higher values produce more dynamic, energetic output with more camera and subject movement. Lower values produce calmer, more static shots.

How to think about it

Like choosing between a locked-off tripod shot and a handheld action sequence — this parameter sets the energy level of the motion the AI generates.

Recommended settings

Low (50–100): Calm, minimal motion — good for beauty shots, landscapes
Default (127): Moderate motion — balanced for most content
High (200+): Energetic, dynamic motion — action scenes, music videos

Common mistakes

Setting it very high for talking-head content — the face warps and distorts with too much motion
Not adjusting it when switching between content types — landscapes and action scenes need different values

Also called

motion_bucket_id

Context frames

In short

Existing frames that guide video extension — more means smoother continuation.

What it does

Sets how many existing frames the model “sees” when extending or continuing a video. More context frames give the model a better understanding of the current motion, style, and content.

How to think about it

Like giving an editor more handles on a clip — the more preceding footage they see, the better they can match the cut. Same with AI: more context means smoother continuation.

Recommended settings

Low (2–4): Minimal context — faster but may drift from the source style
Default (8–16): Good balance — enough context for consistent continuation
High (32+): Maximum context — best consistency but slower processing

Common mistakes

Using too few context frames and getting jarring transitions where the extended video doesn’t match the source
Using too many on short clips — if your clip is 24 frames, 32 context frames doesn’t make sense

Also called

num_context_frames

Video segments

In short

Number of segments generated — more segments means longer output video.

What it does

Divides the generation into separate segments (typically ~5 seconds each), which are processed individually and stitched together. More segments produce a longer total video.

How to think about it

Like setting the total duration by choosing how many “scenes” to generate end to end. Each segment is a self-contained generation pass.

Recommended settings

1: Single segment (~5s) — fast, good for short clips
2–3: Medium length (10–15s) — standard for most use cases
5+: Long output — be aware that quality may drift across many segments

Common mistakes

Setting many segments for content that doesn’t need it — each segment costs time and money
Expecting perfect continuity across 10+ segments — some style drift is inevitable in very long generations

Also called

num_segments

Constant rate factor

In short

Video compression quality — lower means higher quality, larger file.

What it does

Standard video encoding quality parameter (CRF). Scale is 0 (lossless, huge file) to 51 (worst quality, tiny file). Controls the tradeoff between output file size and visual quality.

How to think about it

Exactly like CRF in Premiere’s H.264/HEVC export — lower numbers mean bigger files with fewer compression artifacts. The scale and behavior are identical.

Recommended settings

Low (15–18): Near-lossless — use for hero shots and final delivery
Default (23): Good balance — standard for web delivery
High (28+): Small files, visible compression — fine for previews

Common mistakes

Setting CRF to 0 thinking “I always want the best” — lossless files are enormous and unnecessary for AI-generated content
Not adjusting for delivery format — social media doesn’t need CRF 15

Also called

constant_rate_factor

Auto downsample

In short

Automatically reduces input resolution for faster processing.

What it does

When enabled, the model automatically scales down your input if it exceeds the model’s recommended resolution. Saves processing time without requiring manual resize.

How to think about it

Like Premiere’s proxy workflow — the model uses a smaller version of your source for processing. The output resolution is controlled separately.

Recommended settings

On (default): Let the model handle it — saves time on high-res sources
Off: Use when you specifically need the model to process at full input resolution

Common mistakes

Turning it off with a 4K source and wondering why generation takes forever — the model doesn’t need 4K input to produce good output
Not checking the minimum FPS setting — auto-downsample with a very low minimum can strip too much motion

Also called

enable_auto_downsample

Auto downsample minimum FPS

In short

Lowest framerate the model will keep when auto-downsampling your input.

What it does

Sets the floor for frame rate reduction during auto-downsample. The model won’t drop the input below this FPS, even if doing so would speed up processing.

How to think about it

Like setting a minimum proxy resolution — you want speed, but not at the cost of making the source unusable. This prevents the auto-downsample from stripping too much temporal information.

Recommended settings

8–12 fps: Good for style transfer where exact motion matching isn’t critical
Default (16–24): Preserves enough motion for most use cases
Match source: Set equal to your source FPS to prevent any frame dropping

Common mistakes

Setting it too low for motion-critical content — 8fps input means jerky, choppy output
Not realizing this only applies when auto-downsample is enabled

Also called

auto_downsample_min_fps

Interpolation

RIFE interpolation

In short

Adds AI frame interpolation for smoother video output.

What it does

Enables RIFE (Real-Time Intermediate Flow Estimation) — an AI algorithm that generates new in-between frames to increase the effective frame rate of the output.

How to think about it

Like Premiere’s Optical Flow but happening during generation — the AI creates smooth intermediate frames that didn’t exist in the original generation, producing fluid motion.

Recommended settings

On: Smoother output — good for slow-motion or when the base frame rate feels choppy
Off: Raw generation frames only — faster, and some content looks better without interpolation

Common mistakes

Enabling it on content that’s already smooth — adds processing time with no visible benefit
Expecting it to fix fundamentally broken motion — RIFE smooths transitions, it doesn’t fix bad compositions

Also called

use_rife

Adjust FPS for interpolation

In short

Automatically adjusts the output frame rate to account for interpolated frames.

What it does

When frame interpolation adds new frames, this option recalculates the output FPS so the video plays at the correct speed. Without it, interpolated frames extend the duration instead.

How to think about it

Like choosing between slow motion and frame rate conversion — with this ON, you get the same duration at higher FPS. With this OFF, you get longer, slower footage.

Recommended settings

On: Same duration, smoother playback — matches your timeline FPS
Off: Longer, slow-motion-style output — use intentionally for slow-mo effects

Common mistakes

Leaving it off and wondering why the generated clip is longer than expected — interpolated frames are extending duration
Turning it on when you actually wanted slow motion — the extra frames get absorbed into the same duration

Also called

adjust_fps_for_interpolation

Transparency mode

In short

Controls how first and last frame edges blend with transparency.

What it does

Sets how the model handles the transition at the very beginning and end of an interpolated sequence. Controls whether edges fade to transparent, hard-cut, or blend.

How to think about it

Like choosing a dissolve type for the first and last frames of a transition — the edges can be sharp, soft, or fade to nothing.

Recommended settings

Default/Auto: Let the model choose — works well for most use cases
Transparent: Use when layering the output over other footage in Premiere
Opaque: Use when the clip stands alone — no edge artifacts

Common mistakes

Using transparent mode and placing the clip on V1 with nothing below — you’ll see black edges where transparency was
Not matching the mode to your compositing needs — check your timeline layer setup first

Also called

transparency_mode

Movement amplitude

In short

Controls motion intensity — auto lets the model decide.

What it does

Sets how much motion the model adds to the output. Auto mode analyzes the input and chooses an appropriate amount. Manual values override the model’s judgment.

How to think about it

Like setting the amount of parallax or drift in a Ken Burns effect — higher amplitude means more visible motion, lower means subtler movement.

Recommended settings

Auto (default): Best for most cases — the model adapts to your input
Low: Minimal motion — good for subtle background animation
High: Strong motion — use for dynamic, energetic content

Common mistakes

Overriding auto with a high value on static content — too much motion on a still image looks unnatural
Setting it very low and expecting completely static output — even minimum values add some motion

Also called

movement_amplitude

Audio & music parameters

Chunk overlap

In short

Overlap between audio processing chunks — more overlap means smoother blending.

What it does

When generating long audio, the model processes it in chunks. This parameter controls how much adjacent chunks overlap, which affects the smoothness of transitions between them.

How to think about it

Like crossfade length between audio clips in Premiere — more overlap means smoother transitions between sections, less overlap means faster processing but potentially audible seams.

Recommended settings

Low: Faster processing, possible audible transitions between chunks
Default: Good balance — smooth enough for most content
High: Seamless blending — use for music or content where any transition artifact is unacceptable

Common mistakes

Setting it to zero and getting audible clicks or gaps between chunks in long audio
Maximizing overlap for short audio that fits in a single chunk — no benefit, just slower processing

Also called

chunk_overlap

Reranking candidates

In short

How many alternatives the model generates internally before picking the best one.

What it does

The model generates multiple candidate outputs internally, ranks them by quality, and returns the best one. More candidates means better quality selection but proportionally higher cost and time.

How to think about it

Like doing multiple takes in a voiceover session and picking the best one — more takes means a better final selection, but each take costs studio time.

Recommended settings

Low (1–3): Fast and cheap — you get what you get
Default (5): Good quality with reasonable cost
High (10+): Best selection quality — but cost scales linearly with candidate count

Common mistakes

Setting it to 20+ for quick iterations — you’re paying for quality you won’t evaluate anyway
Setting it to 1 for final output — one candidate gives no selection benefit

Also called

reranking_candidates

Advanced generation

Turbo mode

In short

Faster generation at the cost of some quality.

What it does

Enables optimized generation paths that trade quality for speed. The model takes computational shortcuts to produce results faster.

How to think about it

Like switching to draft quality in After Effects RAM preview — you see the result faster, but the fine detail isn’t there. Use turbo for iteration, switch it off for finals.

Recommended settings

On: Quick iteration, previewing ideas, rapid prototyping
Off: Final output, client-facing deliverables, quality-critical work

Common mistakes

Leaving turbo on for final renders — the quality difference is real and visible
Never using turbo during exploration — you’re wasting time rendering details you’ll change anyway

Also called

turbo_mode

Preprocess

In short

How the input image is prepared before generation — crop, resize, pad, etc.

What it does

Selects the method used to fit your input image to the model’s expected dimensions. Options typically include crop (cut edges), resize (stretch/squash), pad (add borders), or none.

How to think about it

Like choosing between “Scale to Fill” and “Scale to Fit” when placing footage in a Premiere sequence — each method handles the size mismatch differently.

Recommended settings

Crop: Fills the frame completely — may lose edges
Resize: Stretches to fit — may distort aspect ratio
Pad: Adds borders — preserves everything but adds empty space
None: Send as-is — model handles it

Common mistakes

Using resize on content where aspect ratio matters — faces and text will distort
Using crop without checking which edges are lost — important content may be cut off

Also called

preprocess

Zoom factor

In short

Camera zoom applied during video generation — 0 means no zoom.

What it does

Adds a progressive zoom effect to the generated video. Positive values zoom in, negative values (where supported) zoom out.

How to think about it

Like keyframing a scale change on a clip in Premiere — the “camera” progressively moves closer or farther during the clip.

Recommended settings

0: No zoom — static framing
Low (0.1–0.3): Subtle push-in — adds cinematic energy without being obvious
High (0.5+): Dramatic zoom — use sparingly, can feel artificial

Common mistakes

Combining high zoom with high motion — the effects compound and create disorienting output
Using zoom on very short clips — there isn’t enough duration for the zoom to develop naturally

Also called

zoom_factor

Noise scale

In short

Intensity of noise in the generation process — affects randomness.

What it does

Controls how much random noise is injected into the diffusion process. Higher values create more variation between frames or generations; lower values produce more deterministic results.

How to think about it

Like film grain intensity — more noise means more organic variation, less noise means cleaner but potentially more “digital” looking output.

Recommended settings

Low: More deterministic, consistent output — good for reproducibility
Default: Standard variation — balanced for most use cases
High: More creative randomness — each generation diverges more from the baseline

Common mistakes

Setting it too high and losing coherence between frames in video — the noise overwhelms the model’s consistency
Setting it to zero expecting perfect determinism — seed controls reproducibility more reliably

Also called

noise_scale

Eta

In short

Sampler noise parameter — 0 means fully deterministic generation.

What it does

Controls the stochastic (random) component of the sampling process. At 0, the sampler is fully deterministic — same seed always gives the same result. Higher values introduce controlled randomness.

How to think about it

Like adding controlled improvisation to a scripted performance — at 0, every take is identical. Higher eta lets the model make small creative decisions that vary between runs.

Recommended settings

0: Fully deterministic — perfect reproducibility
Default (0.5–1.0): Some variation — natural-looking results
High (1.0+): More randomness — each generation is more unique

Common mistakes

Setting eta to 0 and expecting the output to be identical across different models — eta controls randomness within one model only
Cranking it high for consistency testing — that’s the opposite of what you want

Also called

eta

CLIP skip

In short

Skips CLIP text encoder layers — changes how the model interprets your prompt.

What it does

Skips the last N layers of the CLIP text encoder. This changes how deeply the model analyzes your prompt, affecting the style and interpretation of the output.

How to think about it

Like the difference between reading a brief carefully versus skimming it — skipping layers means the model interprets your prompt more loosely, often producing a distinct aesthetic.

Recommended settings

1 (no skip): Full text analysis — most literal prompt interpretation
2: Common for anime and stylized content — slightly looser interpretation
3+: Very loose — the model barely reads your prompt, mostly relies on learned aesthetics

Common mistakes

Using CLIP skip 2 on models that aren’t designed for it — not all architectures benefit from skipping
Setting it to 4+ and wondering why the prompt has no effect — the model can barely “read” it at that point

Also called

clip_skip

Conditional augmentation

In short

Adds noise to the input reference for more variation in the output.

What it does

Adds controlled noise to the conditioning (input) image before generation. Higher values mean the model treats your input more loosely, producing more diverse but less faithful outputs.

How to think about it

Like intentionally degrading your reference footage before handing it to the VFX team — they’ll get the general idea but fill in details differently each time.

Recommended settings

Low (0.0–0.1): Very faithful to input — minimal deviation
Default (0.02–0.05): Slight variation — natural-looking diversity
High (0.1+): Significant departure from input — creative but less predictable

Common mistakes

Setting it too high and losing the key features of your input image — the model ignores your reference
Setting it to exactly 0 and getting overly rigid output that looks “copied” rather than generated

Also called

cond_aug

Granularity scale

In short

Detail control — higher values can reduce artifacts in some models.

What it does

Adjusts the level of fine detail in the generation process. Some models use this to control artifact suppression — higher values smooth out micro-artifacts at the cost of some detail.

How to think about it

Like the detail slider in noise reduction — higher values clean up artifacts but may soften fine detail. It’s a tradeoff between cleanliness and crispness.

Recommended settings

Low: Maximum detail — may include some artifacts
Default: Balanced — clean output with preserved detail
High: Smooth, artifact-free — but may look slightly soft

Common mistakes

Cranking it up to eliminate artifacts when the real problem is low inference steps — fix the root cause first
Setting it to 0 expecting maximum sharpness — some models need a minimum value to produce coherent output

Also called

granularity_scale

Refiner switch

In short

When to switch from the base model to the refiner model — 0.4–0.8 typical.

What it does

In dual-model pipelines (like SDXL), this controls at what percentage of generation the system hands off from the base model to the refiner. The base handles composition, the refiner handles detail.

How to think about it

Like the handoff between rough cut and fine cut editors — the first handles structure and story, the second polishes visuals and pacing. This value sets when the handoff happens.

Recommended settings

Low (0.3–0.4): Early switch — refiner has more influence, softer overall look
Default (0.5–0.6): Balanced — base establishes, refiner polishes
High (0.7–0.8): Late switch — base dominates, refiner only touches up final details

Common mistakes

Setting it too low and getting output that looks overly smooth — the refiner eliminates the base model’s character
Setting it to 1.0 (no switch) — you’re bypassing the refiner entirely, missing its detail enhancement

Also called

refiner_switch

Sharpness

In short

Output sharpness — higher is sharper but risks visual artifacts.

What it does

Controls the sharpening applied to the generated output. Higher values produce crisper edges and more defined detail, but can introduce halos and edge artifacts.

How to think about it

Like the Unsharp Mask in Premiere — a little sharpening makes footage pop, too much creates ugly halos around every edge.

Recommended settings

Low: Soft, natural look — good for organic content
Default: Balanced — most models default to a good sharpness level
High: Crisp, detailed — may show halos on high-contrast edges

Common mistakes

Maxing out sharpness thinking it improves quality — it creates visible artifacts on every edge
Not adjusting per content type — text and architecture need less sharpening than detailed textures

Also called

sharpness

Performance preset

In short

Speed/quality tradeoff preset — from extreme speed to maximum quality.

What it does

Selects a pre-configured balance between generation speed and output quality. Higher performance settings reduce inference steps, use faster schedulers, or apply other optimizations.

How to think about it

Like Premiere’s playback resolution — “Full” for final review, “1/4” for editing. Each step trades quality for speed.

Recommended settings

Quality: Best output — use for final deliverables
Speed: Good balance — fast enough for iteration with decent quality
Extreme Speed: Fastest — noticeably reduced quality, use for rapid prototyping only

Common mistakes

Using Extreme Speed for client deliverables — the quality difference is visible
Always using Quality mode during exploration — you’re waiting for details you’ll change anyway

Also called

performance

Reference control

Reference timing

In short

When the reference image starts and stops guiding the generation.

What it does

reference_start and reference_end control what portion of the generation process uses the reference image for guidance. Values range from 0 (beginning) to 1 (end).

How to think about it

Like setting the influence window for a reference LUT — during this window the model tries to match your reference, outside it the model works independently.

Recommended settings

Full range (0.0–1.0): Maximum reference influence — output closely matches reference throughout
Early only (0.0–0.5): Sets the composition and style, then lets the model elaborate freely
Narrow window (0.2–0.6): Reference influences the middle phase — avoids rigid start/end frames

Common mistakes

Setting start after end — the reference has no influence window
Using full range with a very strong reference — output may look like a copy rather than a generation

Also called

reference_start, reference_end

Image editing

Avatar

In short

Character and appearance presets for AI-generated people.

What it does

Selects a predefined character template — body type, clothing, hairstyle, or full identity — that the model uses as a base for generating human subjects. Some models offer single avatars, others support multi-character setups where you define several people in one scene.

How to think about it

Like casting from a stock talent roster. Instead of describing every detail of a person’s appearance in your prompt, you pick a pre-built character and the AI fills in the rest. Multi-character mode is like setting up a group scene with assigned roles.

Recommended settings

Single avatar: Best for headshots, portraits, and single-subject content — clearest results
Multi-character: Use when the scene requires distinct people interacting — quality drops with more than 3–4 characters
When to use: Character consistency across multiple generations, branded content with recurring “talent”

Common mistakes

Combining an avatar preset with a prompt that describes a completely different person — the model gets conflicting instructions and produces inconsistent results
Using multi-character mode for a single subject — it adds complexity the model doesn’t need

Also called

avatar, character, multi_character

Background removal

In short

Removes or replaces the background behind subjects in an image.

What it does

Detects the foreground subject and separates it from the background. Depending on the model, the background can be made transparent, replaced with a solid color, swapped for a new scene, or processed with adjustable opacity and style.

How to think about it

Like using Ultra Key or the Roto Brush in Adobe — the AI identifies what’s “in front” and what’s “behind,” then lets you choose what happens to the background. The difference is that AI does it in one pass without manual masking.

Recommended settings

Remove (transparent): Best for compositing over other footage in your timeline — gives you a clean alpha channel
Replace (solid/scene): Use when you need a finished shot without further compositing
Threshold controls: Lower thresholds keep more edge detail (hair, fur) but may leave background artifacts; higher thresholds give cleaner cuts but may clip fine edges

Common mistakes

Using background removal on subjects with translucent elements (veils, glass, smoke) — the AI treats them as background and removes them
Setting the threshold too aggressively and losing hair detail — start with the default and adjust gradually

Also called

background_mode, background_opacity, background_removal, remove_background, transparent_background, background_style, background_threshold, background_tolerance, bg_th, remove_background_noise

Color grading

In short

Post-processing effects like contrast, grain, blur, vignette, and color shifts applied to the generated output.

What it does

Applies visual adjustments to the AI’s output — brightness, contrast, saturation, film grain, lens blur, vignetting, sharpening, tinting, and other photographic effects. These run after generation, modifying the final image before delivery.

How to think about it

Like applying a Lumetri Color grade plus creative effects in Premiere, but baked into the generation. The difference: these are applied before the file reaches your timeline, so you can’t undo them in post. Use them for quick stylistic finishes, but keep effects subtle if you plan to grade further in Premiere.

Recommended settings

Subtle (low values): Best when you plan to do your own color grade in Premiere — gives you room to work
Moderate: Good for social media or quick-turnaround content where the generation is the final product
Heavy: Use for deliberate stylistic effects (heavy grain, strong vignette) — but understand these are baked in

Common mistakes

Applying heavy grain and contrast during generation, then adding more in Premiere — the effects stack and look overprocessed
Enabling every effect at once (grain + blur + vignette + tint) — the output looks like an Instagram filter from 2012

Also called

brightness, contrast, saturation, gamma, grain, grain_intensity, grain_scale, grain_style, blur_radius, blur_sigma, blur_type, vignette_strength, sharpen, cas_amount, tint_mode, tint_strength, enable_chromatic, enable_grain, enable_blur, enable_vignette, enable_sharpen, enable_solarize, enable_tint, enable_glow, enable_dodge_burn, enable_desaturate, enable_dissolve, enable_parabolize, enable_color_correction

Color processing

In short

Detects, corrects, and manipulates colors — from automatic color fixing to palette extraction.

What it does

Handles color-specific operations: automatic color detection, color correction, palette limiting (reducing an image to a set number of colors), and targeted color changes like hair color or text color. Some models use this to enforce a specific color palette or fix color casts.

How to think about it

Like the color correction tools in Photoshop or Premiere’s color wheels, but automated. The AI identifies dominant colors, fixes casts, or restricts the palette to a set number of hues. Useful for creating stylized looks (poster art, pixel art) or correcting color problems in the source.

Recommended settings

Auto detect on: Let the model identify and fix color issues automatically — good starting point
Max colors (low): Creates flat, poster-style images with limited palettes — use for graphic design or pixel art
Max colors (high/unlimited): Preserves full color range — use for photorealistic output
Color fix: Enable when the source has obvious color casts or white balance issues

Common mistakes

Setting max colors too low for photorealistic content — faces look banded and unnatural with fewer than 64 colors
Using color fix on content that’s intentionally color-graded — the AI “corrects” your creative choices

Also called

color, color_fix, color_fix_type, colormap, colormode, max_colors, fill_color, font_color, hair_color, highlight_color, auto_color_detect, color_precision, dominant_color_threshold, fix_colors, txt_color

Crop and resize

In short

Controls how the output is cropped, padded, or resized to fit target dimensions.

What it does

Adjusts the output framing after generation. Options include cropping to a bounding box (face or subject detection), padding with a specified color, resizing to original input dimensions, or targeting specific width/height values. Some models crop to fill, others pad to fit.

How to think about it

Like the “Scale to Fill” vs “Scale to Fit” vs “Crop” options when placing footage into a Premiere sequence. Each approach handles the dimension mismatch differently — crop loses edges, pad adds borders, resize stretches or squashes.

Recommended settings

Crop to fill: Best when you need exact dimensions and can afford to lose some edges
Resize to original: Use when you want the output to match your input’s exact dimensions — common for image-to-image workflows
Pad: Use when you need exact dimensions but can’t lose any content — the borders can be trimmed in Premiere
Target dimensions: Set explicitly when your delivery format requires specific pixel counts (1920x1080, 1080x1080, etc.)

Common mistakes

Cropping without checking what’s lost at the edges — important content (hands, props, text) may be cut off
Resizing non-square output to square dimensions — subjects get stretched and distorted

Also called

crop_size, crop_to_bbox, crop_to_fill, crop_duration, dimensions, pad_color, padding_values, resize_to_original, selection_crop, target_height, target_long_side, target_width

Denoising

In short

Reduces visual noise and grain in the generated output.

What it does

Applies noise reduction during or after generation. Some models offer separate controls for high-resolution and low-resolution denoising passes, letting you clean up different types of noise independently.

How to think about it

Like Neat Video or Premiere’s built-in noise reduction — it smooths out grain and speckle. The tradeoff is always the same: more denoising means cleaner images but softer fine detail. The sweet spot depends on whether you’d rather see grain or softness.

Recommended settings

Low: Preserves fine texture and detail — some grain remains, but edges stay sharp
Default: Balanced cleanup — good for most content
High: Very clean, smooth output — but small details like skin texture, fabric weave, and hair strands may disappear

Common mistakes

Maxing out denoising on content with important fine detail (fabric patterns, text, hair) — those details get smoothed away along with the noise
Applying denoising in the AI model AND again in Premiere — double denoising creates a plastic, artificial look

Also called

denoise, highres_denoise, lowres_denoise, noise_reduction

Depth estimation

In short

Generates a depth map showing how far each part of the image is from the camera.

What it does

Analyzes an image and produces a grayscale depth map — bright areas are close to the camera, dark areas are far away. Used as input for other models (ControlNet, relighting, 3D effects) or as a standalone analysis tool.

How to think about it

Like a LiDAR scan from your iPhone, but generated from a flat image. The AI infers depth from visual cues — perspective lines, object size, blur, and occlusion. The output is a grayscale map you can use for parallax effects, depth-of-field simulation, or as a control signal for other AI models.

Recommended settings

Ensemble size (1): Single pass — fast but may have inconsistencies in complex scenes
Ensemble size (5–10): Multiple passes averaged together — more accurate depth, especially at edges and occlusion boundaries
Processing resolution: Higher means more accurate depth estimation but slower processing — match to your output needs, not your source resolution

Common mistakes

Using a low ensemble size on complex scenes with many overlapping objects — depth edges become noisy and inaccurate
Treating the depth map as ground truth — it’s an estimate, not a measurement. Fine details and transparent objects often get wrong depth values

Also called

depth_and_normal, depth_scale, ensemble_size, include_raw_depths, preprocess_depth, processing_res

Face animation

In short

Controls facial expressions, eye movement, head rotation, and lip sync on generated or modified faces.

What it does

Manipulates specific facial features on a generated or uploaded face: mouth shapes (open/closed vowel positions), eye blinks and winks, smiles, eyebrow raises, head pitch/yaw/roll, and pupil direction. Some models support lip sync driven by audio input and face enhancement for cleaner results.

How to think about it

Like a virtual puppet rig in After Effects — each slider controls one aspect of the face. Mouth shapes work like phoneme targets in lip sync animation: aaa opens the mouth wide, eee stretches it horizontally, woo rounds the lips. Head rotation works like a 3-axis gimbal: pitch (nod), yaw (shake), roll (tilt).

Recommended settings

Expression scale (0.5–0.8): Natural range — keeps expressions believable
Expression scale (1.0+): Exaggerated — useful for animation or caricature, but faces start looking uncanny on real portraits
Still mode on: Reduces head motion for talking-head content — keeps the face stable while expressions change
Paste back on: Composites the animated face back onto the original image — essential for natural results

Common mistakes

Cranking multiple expression sliders to their max simultaneously — the face distorts into something unnatural
Forgetting to enable face enhancement when the source image is low resolution — the animation amplifies every pixel

Also called

aaa, blink, eee, woo, wink, smile, expression, expression_scale, eyebrow, rotate_pitch, rotate_roll, rotate_yaw, pupil_x, pupil_y, face_enhancement, face_enhancer, flag_do_crop, flag_lip_retargeting, paste_back, still_mode, vx_ratio, vy_ratio

Inpainting

In short

Edits specific regions of an image by painting a mask over the area you want to change.

What it does

You provide an image and a mask (the area to modify), and the AI regenerates only the masked region while keeping everything else untouched. The strength parameter controls how much the masked area changes — low strength makes subtle edits, high strength replaces the region entirely.

How to think about it

Like Content-Aware Fill in Photoshop, but guided by a text prompt. You mask the area you want to change, describe what should go there, and the AI fills it in while matching the surrounding context. The mask is your “selection” — everything outside it stays exactly as it is.

Recommended settings

Strength (0.3–0.5): Subtle edits — change color, texture, or small details while preserving the original structure
Strength (0.6–0.8): Moderate changes — replace objects or alter significant features
Strength (0.9–1.0): Full replacement — the masked area is generated from scratch based on your prompt
Erode/dilate: Shrink or expand the mask edges for cleaner boundaries — positive values expand, negative values shrink

Common mistakes

Using a mask that’s too tight around the subject — leave some margin so the AI can blend the edges naturally
Setting strength too high for small corrections — a color change doesn’t need 1.0 strength

Also called

inpaint_mode, inpaint_strength, inpaint_mask_only, inpaint_engine, inpaint_erode_or_dilate, inpaint_respective_field, override_inpaint_options, draw_mode, outpaint_selections

Lighting

In short

Controls the direction, type, and style of lighting in the generated image.

What it does

Sets the virtual light source for the scene — direction (top, left, front, etc.), type (ambient, directional, point), and overall lighting style. Some models support full relighting, which re-renders an existing image under completely new lighting conditions.

How to think about it

Like moving a key light on a film set. Direction controls where shadows fall, type controls how hard or soft the light is, and style presets are like choosing between a studio setup and natural golden-hour light. Relighting is the AI equivalent of reshooting under different lighting without going back to set.

Recommended settings

Front/top: Flat, even lighting — safe for product shots and portraits
Side (left/right): Dramatic shadows — good for cinematic and editorial content
Rim/back: Silhouette and edge lighting — use for atmospheric or dramatic shots
Relighting: Enable when you need to completely change the mood of an existing image

Common mistakes

Using strong directional lighting on subjects with complex geometry (jewelry, wrinkled fabric) — the AI-generated shadows may not follow the real shape correctly
Applying relighting to images that already have strong directional light — the original shadows conflict with the new lighting direction

Also called

light_direction, light_type, lighting_style, relight_parameters

Masking

In short

Creates, adjusts, and inverts masks that define which areas of an image are processed.

What it does

Generates or modifies masks used by inpainting, outpainting, and other region-specific operations. Controls include binarization (converting soft masks to hard black/white), inversion (swapping which area is selected), clamping (limiting mask intensity range), and type selection (auto-detected, manual, or segmentation-based).

How to think about it

Like working with masks and mattes in Premiere or After Effects. A white area means “process this,” a black area means “leave this alone.” Binarization is like increasing matte contrast until there are no gray areas. Inversion flips your selection. Clamping limits how strong the mask effect can be, like limiting an adjustment layer’s opacity range.

Recommended settings

Binarize on: Clean, hard-edged masks — best for inpainting where you want a clear boundary between edited and untouched areas
Binarize off: Soft, feathered masks — better for blending and gradual transitions
Invert: Flip when you’ve masked the wrong side — easier than repainting
Clamp (lower 0.3, upper 0.7): Limits mask intensity range — useful for partial-strength edits that blend more naturally

Common mistakes

Forgetting to invert the mask after auto-detection selects the background instead of the subject
Using a binarized mask for subtle blending work — the hard edges create visible seams

Also called

binarize_mask, invert_mask, mask_type, mask_only, mask_binarization_threshold, mask_clamp_lower, mask_clamp_upper, mask_start, mask_end, min_mask_region_area, revert_mask, mask_away_clip

Outpainting

In short

Extends an image beyond its original borders — the AI fills in what’s outside the frame.

What it does

Expands the canvas in any direction (top, bottom, left, right) and generates new content that seamlessly continues the existing image. The AI analyzes the edge content and extends it naturally — continuing backgrounds, landscapes, or patterns.

How to think about it

Like Premiere’s reframing tools, but instead of cropping to fit a new aspect ratio, the AI generates new content to fill the extra space. Need a 16:9 image from a 1:1 source? Outpaint the sides. Need more headroom above a subject? Outpaint the top.

Recommended settings

Small expansion (10–25%): Safest — the AI only needs to extend a small area, so consistency is high
Medium expansion (25–50%): Good for aspect ratio conversion — enough room for meaningful new content
Large expansion (50%+): Risky — the AI is inventing a lot of new content, and quality drops as you move further from the original edges
Blur mask on: Feathers the boundary between original and generated content for seamless blending

Common mistakes

Expanding by 100%+ in one direction and expecting the new content to look as good as the original — quality degrades with distance from the source
Outpainting in all four directions simultaneously — the corners have almost no context to work from

Also called

blur_mask, expand_bottom, expand_left, expand_mask, expand_ratio, expand_right, expand_top

Scene detection

In short

Detects scene changes in video and applies different processing to each scene.

What it does

Analyzes video input to identify where scenes change (cuts, transitions, significant visual shifts), then processes each scene segment independently. This ensures that style transfer, color correction, or other effects are applied consistently within each scene rather than averaging across cuts.

How to think about it

Like Premiere’s scene edit detection, but built into the AI pipeline. Without scene detection, a style transfer model might blend the look of two completely different scenes at the cut point, creating an ugly transition. With it enabled, each scene gets its own processing pass.

Recommended settings

On: Recommended for any multi-scene video input — prevents cross-scene contamination
Off: Use only for single-shot clips where there are no scene changes
Threshold: Lower values detect more subtle scene changes (dissolves, slow fades); higher values only detect hard cuts

Common mistakes

Leaving scene detection off on a multi-cut montage — the AI blends styles across cuts, creating inconsistent looks at every edit point
Setting the threshold too low on footage with lots of camera motion — the model mistakes fast pans for scene changes

Also called

scene, scene_description, scene_threshold, use_scene_detection

Tiling

In short

Processes large images in smaller tiles to avoid running out of memory.

What it does

Splits a large image into overlapping tiles, processes each tile individually, then stitches them back together. This allows models to handle images much larger than their native resolution without crashing. The overlap (stride) between tiles ensures seamless blending at boundaries.

How to think about it

Like rendering a massive After Effects composition in sections — each section renders independently, then they’re composited together. The overlap is like the feather on a split-screen wipe: it ensures you can’t see the seam where tiles meet.

Recommended settings

Tile size (512–1024): Standard — matches most models’ native processing resolution
Stride (256–512): Overlap between tiles — higher stride means less overlap, faster processing but more visible seams. Half the tile size is a safe default
When to use: Enable when working with images above 2048x2048 or when you see out-of-memory errors

Common mistakes

Using very small tiles on a large image — too many tiles means exponentially more processing time and more seam boundaries to blend
Setting stride equal to tile size (zero overlap) — you’ll see visible grid lines in the output where tiles meet

Also called

tile_diffusion, tile_diffusion_size, tile_diffusion_stride, tile_size, tile_stride, tile_vae, tile_vae_decoder_size, tile_vae_encoder_size, tiling_mode

Translation

In short

Translates audio or video dialogue from one language to another.

What it does

Converts spoken content from a source language to a target language — either as a dubbed audio track or as part of a full video translation pipeline. Some models auto-detect the source language, others require you to specify it.

How to think about it

Like sending your timeline to a dubbing house, but the AI handles both the translation and the voice performance. The result is a new audio track (or video with baked-in audio) in the target language, often preserving the original speaker’s voice characteristics.

Recommended settings

Source language: Set explicitly when you know it — auto-detect works for common languages but may fail on dialects or code-switching
Target language: The language you want the output in — use standard language codes (en, es, fr, de, ja, etc.)
When to use: Localizing content for international audiences, creating multilingual versions of the same video

Common mistakes

Relying on auto-detect for niche languages or accented speech — specify the source language explicitly for better results
Expecting perfect lip sync in the translated version — AI dubbing matches timing approximately, not frame-perfectly

Also called

output_language, source_lang, target_lang, target_language

Video chunking

In short

Splits video processing into manageable segments for memory efficiency.

What it does

When processing video, the model breaks the input into chunks of frames, processes each chunk, then assembles the output. Controls include chunk size (how many frames per batch), overlap between chunks (for smooth transitions), and decode chunk size (how many frames are decoded at once from the compressed format).

How to think about it

Like rendering a long sequence in Premiere using “Use Previews” — instead of processing the entire timeline at once (which might crash), the system works through it in sections. The overlap between chunks is like having handles on each clip — it ensures smooth continuity where chunks meet.

Recommended settings

Batch frames (4–8): Fewer frames per batch uses less memory but processes slower
Batch frames (16–32): More frames per batch is faster but requires more memory — reduce if you see errors
Overlap (2–4 frames): Enough overlap for smooth chunk boundaries — increase if you see flicker at chunk edges
Sample stride (1): Process every frame — highest quality. Higher stride skips frames for speed

Common mistakes

Setting batch frames too high and running into memory errors — start low and increase until you hit the limit
Setting overlap to zero and getting visible “jumps” every N frames where chunks were stitched together

Also called

batch_frames, decode_chunk_size, overlap, overlapping_tiles, sample_stride

Camera control

In short

Moves the virtual camera during video generation — pan, tilt, zoom, rotate, and dolly.

What it does

Controls virtual camera movement during AI video generation. You can set horizontal and vertical angles (pan and tilt), zoom level, forward/backward movement (dolly), rotation, and even wide-angle lens distortion. Some models offer preset camera motions, others give you manual axis-by-axis control.

How to think about it

Like programming a camera move on a motorized head or gimbal. Horizontal angle is pan (left/right), vertical angle is tilt (up/down), move forward is dolly (push in), and rotate is roll. The wide-angle lens option is like switching from a 50mm to a 14mm — it adds barrel distortion and a wider field of view.

Recommended settings

Subtle motion (low values): Natural-feeling camera drift — good for adding life to static AI-generated scenes
Moderate motion: Deliberate camera moves — pan to reveal, tilt to follow action, push in for emphasis
Strong motion: Dramatic camera work — use sparingly, as aggressive camera moves combined with AI generation can produce artifacts
Zoom (negative): Pull out / zoom out — good for reveal shots. Positive values push in

Common mistakes

Combining multiple strong camera moves simultaneously (pan + zoom + tilt) — the AI struggles to maintain consistency with too many motion axes active at once
Using aggressive camera control on short clips — the motion has no time to develop and looks like a jarring jump instead of a smooth move

Also called

advanced_camera_control, camera, camera_angle, camera_control, horizontal_angle, move_forward, rotate_right_left, vertical_angle, wide_angle_lens, zoom, zoom_out_percentage

Other parameters

Text input

In short

Secondary text fields that give the AI additional context beyond your main prompt.

What it does

These are specialized text inputs that supplement your primary prompt. Editing models use source/target prompt pairs to understand what to change. Rendering models use gen_text to place visible words in the output. Detection models use a detection prompt to know what to look for.

How to think about it

Like giving different notes to different departments on set. Your main prompt is the director’s vision. The source prompt is the script supervisor’s continuity note (“this is what we have”). The target prompt is the revision (“this is what we want”). Gen_text is the prop department’s signage order.

Recommended settings

Source + target prompts: Be specific about the difference — “a red car” to “a blue car” works better than vague descriptions
Gen_text: Keep it short — AI text rendering degrades quickly past 5-6 words
Detection prompt: Use simple, direct language — “a person wearing a hat” not “someone who appears to be wearing headwear”

Common mistakes

Writing the same thing in the main prompt and additional_prompt — they stack, so you’re doubling the emphasis and may get exaggerated results
Using source_prompt without target_prompt (or vice versa) on editing models — they work as a pair

Also called

additional_prompt, source_prompt, target_prompt, gen_text, new_text, detection_prompt, original_vgl, new_vgl

Output settings

In short

Controls the file format, codec, and encoding quality of the delivered output.

What it does

Determines what kind of file the model delivers and how it is compressed. You can specify video codec (H.264, HEVC), file format (mp4, webm, gif, png), quality level, target bitrate, and write mode. These settings affect file size, compatibility, and visual fidelity.

How to think about it

Like the Export Settings dialog in Premiere. Codec and CRF control the compression tradeoff, output type is your container format, and bitrate sets the data rate ceiling. Getting these right means the AI output drops into your timeline without a re-encode.

Recommended settings

H.264 + mp4: Maximum compatibility — plays everywhere, imports cleanly into Premiere
CRF 18-23: Good quality-to-size ratio for AI-generated content — lower for hero shots, higher for drafts
Match your timeline: If your sequence is ProRes, consider webm or high-bitrate mp4 to minimize generation artifacts before transcode

Common mistakes

Choosing gif for video output longer than 3 seconds — file sizes explode and color depth drops to 256 colors
Setting output bitrate very low to save space — AI-generated content has lots of fine detail that compresses poorly at low bitrates

Also called

codec, crf, output_quality, output_type, output_bitrate, output_write_mode, H264_output

Video frame settings

In short

Controls how many frames are generated or extracted and at what intervals.

What it does

Sets the frame count, sampling interval, and extraction behavior for video generation and processing models. You can cap the total frames, request an exact count, control how many frames each clip segment contains, or limit processing to just the first few seconds of a long input.

How to think about it

Like setting in/out points and frame handles on a Premiere timeline. Max frames is your out point — it caps how long the generation runs. Frame interval is like setting a poster frame frequency — it controls how densely the model samples your input. First_n_seconds is a quick way to preview a long clip without processing the whole thing.

Recommended settings

Max frames: Set to match your timeline gap — 120 frames at 24fps gives you a 5-second clip
Frame interval (1): Every frame — best quality. Higher values skip frames for faster processing of long inputs
First_n_seconds (3-5): Good for previewing how a model handles your footage before committing to a full-length generation

Common mistakes

Setting number_of_frames without considering your FPS — 48 frames at 24fps is 2 seconds, but at 12fps it’s 4 seconds
Using a high frame_interval on footage with fast motion — skipped frames mean the model misses key movements

Also called

max_frame_num, max_frames, number_of_frames, frames_per_clip, frame_interval, frame_type, frame_index, first_n_seconds

Quantity and iteration

In short

How many outputs to generate and how many processing passes to run.

What it does

Controls batch size (how many separate results you get per generation), processing depth (how many layers or passes refine the output), and recursive operations (like interpolation passes that compound with each run). More outputs and more passes mean better selection and quality, but proportionally higher cost and time.

How to think about it

Like shooting multiple takes and choosing the best one. Number of images is your take count — generate 4 and pick the winner. Layers and iterations are like additional polish passes in color or audio mixing. Recursive interpolation is like running Optical Flow twice — each pass doubles your frame count.

Recommended settings

Number of images (2-4): Good for picking the best result without excessive cost — especially useful for hero shots
Max iterations: Start with the default. Only increase if you see quality improve visibly between passes
Recursive interpolation (1-2): One pass doubles frames, two passes quadruples them — rarely need more than 2

Common mistakes

Generating 10+ images per prompt during exploration — you’re paying for outputs you won’t even look at carefully
Setting recursive interpolation passes to 3+ — each pass doubles the frame count exponentially, and quality gains plateau after 2

Also called

num_clips, num_results, num_layers, number_of_images, series_amount, max_iterations, recursive_interpolation_passes

Style and mode

In short

Preset visual styles, effects, and intensity controls that shape the overall look.

What it does

Applies predefined visual aesthetics to the generation — from text-based style descriptions to reference textures, specific visual effects (blur, sketch, pixelate), and intensity sliders that control how strongly the style is applied. Some models offer curated effect presets, others accept freeform style prompts.

How to think about it

Like applying a LUT plus creative effects in Premiere, but at generation time. The style prompt is your creative brief to the colorist. Effect type is like choosing a specific filter. Intensity is the opacity slider — 0 means no effect, 1 means full strength. Photo shot presets are like telling a camera operator “give me a close-up” versus “wide establishing shot.”

Recommended settings

Style prompt: Be specific — “warm cinematic with shallow depth of field” works better than “nice looking”
Intensity (0.5-0.7): Good starting point — strong enough to see the effect, subtle enough to look natural
Effect type: Match to your project — sketch effects for storyboard work, blur for dream sequences, pixelate for retro aesthetics

Common mistakes

Setting intensity to 1.0 on every generation — full-strength effects often look heavy-handed and artificial
Combining a strong style prompt with a conflicting effect type — the model gets mixed signals and produces inconsistent results

Also called

style_prompt, style_description, target_style, target_texture, effect_type, intensity, pikaffect, photo_shot

Virtual try-on

In short

AI clothing fitting — places garments onto people in images.

What it does

Takes a photo of a person and a photo of a garment, then generates a realistic composite showing the person wearing that clothing. You specify the garment category (upper body, lower body, full body), the type of garment photo you’re providing, and whether the model should auto-segment the garment or use the full image.

How to think about it

Like a digital fitting room for e-commerce or costume design. Instead of physically trying on clothes, the AI composites the garment onto the person while accounting for body shape, pose, and lighting. The garment image is like a swatch — the AI adapts it to fit the person’s body in the photo.

Recommended settings

Category: Match to your garment — upper body for shirts and jackets, lower body for pants and skirts, full body for dresses and jumpsuits
Garment photo type (flat lay): Best results — clean, unobstructed view of the garment gives the AI the most information
Segmentation free off (default): Let the model segment the garment for more precise fitting around edges

Common mistakes

Using a garment photo with a busy background — the model may pick up background elements as part of the clothing
Choosing the wrong category — fitting a full-body dress as “upper body” cuts off the bottom half of the garment

Also called

category, cloth_type, garment_type, garment_photo_type, segmentation_free

Vectorization

In short

Converts raster images to clean SVG vector paths for logos and illustrations.

What it does

Traces the edges and color regions of a pixel image and converts them into scalable vector paths (SVG format). Controls include path accuracy, corner detection, noise filtering, and layer organization. The result is a resolution-independent file that scales to any size without pixelation.

How to think about it

Like the Image Trace function in Adobe Illustrator. Path precision is your fidelity slider — higher values follow every pixel edge, lower values smooth and simplify. Filter speckle is like a minimum area threshold — it removes tiny noise artifacts that would become unnecessary paths. The output is meant for print, web, or motion graphics where you need infinitely scalable artwork.

Recommended settings

Path precision (high): Use for detailed illustrations where accuracy matters — logos, technical drawings
Path precision (low): Use for stylized, simplified results — poster art, icons
Filter speckle (3-5): Removes small noise without losing intentional detail
Snap grid on: Produces cleaner geometry — good for UI icons and geometric designs

Common mistakes

Vectorizing a photograph and expecting clean results — vectorization works best on images with clear edges and flat color regions, not continuous-tone photos
Setting path precision too high on a noisy source image — every speckle becomes a vector path, creating enormous files

Also called

path_precision, corner_threshold, filter_speckle, splice_threshold, snap_grid, cleanup_jaggy, cleanup_morph, layer_difference, hierarchical

Sampling advanced

In short

Low-level controls for the AI diffusion process — expert-only, leave at defaults unless troubleshooting.

What it does

Fine-tunes the mathematical noise schedule and guidance behavior during generation. These parameters control how the model distributes processing effort across steps, when guidance influence starts and stops, and how statistical averaging is applied across multiple samples. Changing these affects the fundamental character of the generation process.

How to think about it

Like tweaking render engine internals in After Effects or Nuke — these are the parameters that the engineers set, not the artists. Schedule_mu shapes the noise curve (similar to adjusting gamma on a levels control but for the generation process). The cfg turn-off point is like removing your reference monitor partway through a grade and trusting your eye for the final touches.

Recommended settings

Leave at defaults: These are tuned per model by the developers — changing them without understanding the math usually makes things worse
Schedule_mu: Only adjust if you see banding or sudden quality changes partway through generation
Turn_off_cfg_start_si: Advanced trick — removing guidance in late steps can produce more natural detail, but results vary by model

Common mistakes

Changing multiple sampling parameters at once — if results improve or degrade, you won’t know which parameter caused it
Copying advanced sampling settings between different models — these values are model-specific and rarely transfer well

Also called

schedule_mu, perturbation, last_scale_temp, t_min, t_max, smooth_start_si, turn_off_cfg_start_si, n_avg, n_min, n_max

Motion and physics

In short

Controls physical motion simulation, trajectories, and motion intensity in video generation.

What it does

Governs how objects and cameras move through space in generated video. Includes overall motion intensity scoring, physics simulation forces (gravity, projectile arcs), predefined camera or object trajectories, subject tracking, adaptive motion that responds to content, and shape preservation that prevents objects from warping during movement.

How to think about it

Like rigging up a motion control rig combined with physics simulation in After Effects. Motion score is your overall energy dial — low for a locked-off interview, high for an action sequence. Trajectories are like preset camera moves on a dolly or crane. Shape preservation is like enabling “Preserve Rigid Bodies” in a physics sim — it stops solid objects from bending like rubber.

Recommended settings

Motion score (low): Calm, controlled movement — good for product shots, portraits, and talking heads
Motion score (high): Dynamic, energetic — good for action, sports, and music video content
Shape preservation (high): Use when objects must maintain their form — architecture, vehicles, rigid products
Adapt motion on: Let the model decide motion intensity based on the content — good default for mixed scenes

Common mistakes

Setting motion score high on content with fine detail (text, faces, architecture) — high motion warps and distorts these elements
Using physics forces without understanding they simulate real-world physics — projectile force doesn’t mean “dramatic movement,” it means parabolic arcs with gravity

Also called

motion_score, goal_force, projectile_force, trajectory, trajectories, track, adapt_motion, shape_preservation

Music production

In short

Controls for AI music generation, stem separation, and audio editing.

What it does

Configures AI music generation: tempo (BPM), genre tags, song structure, target duration, stem separation (isolating vocals, drums, bass), instrumental-only mode, and audio editing operations like extending, trimming, or remixing existing tracks.

How to think about it

Like setting up a project in a DAW (GarageBand, Logic, Pro Tools). BPM is your session tempo — match it to your timeline’s beat markers for sync. Genres are like selecting instrument presets and style templates. Stems are like soloing individual tracks in a mix. Composition plan is your song’s arrangement chart — intro, verse, chorus, bridge, outro.

Recommended settings

BPM: Match your timeline’s tempo — 120 BPM is standard pop/dance, 80-90 for hip-hop, 60-70 for ballads
Duration: Set to match your timeline gap exactly — AI music that’s too short or too long means extra editing
Stems (vocals): Extract to create karaoke versions or isolate dialogue from music beds
Force instrumental on: Use for background music and underscore where vocals would compete with dialogue

Common mistakes

Not setting BPM when your timeline has beat-synced edits — the generated music drifts out of sync with your cuts
Using extend_duration on music with a clear ending — the AI continues past the natural conclusion, creating an awkward loop

Also called

bpm, genres, composition_plan, music_duration, music_length_ms, stems, instrumental, force_instrumental, edit_mode, extend_duration

Detection and analysis

In short

AI-powered object detection, image segmentation, and visual analysis parameters.

What it does

Configures models that identify and label elements in images rather than generating new content. You specify what to look for (detection prompt or object class), the type of analysis (detection, segmentation, classification, captioning), analysis density and confidence thresholds, and whether to overlay results visually on the output.

How to think about it

Like using Premiere’s auto-tagging or After Effects’ Roto Brush in analysis mode — the AI examines your image and reports back what it found, where things are, and how confident it is. Points per side is like the resolution of the analysis grid — more points means finer segmentation but slower processing. Confidence thresholds are like setting a minimum match quality for auto-keying.

Recommended settings

Detection prompt: Be specific about what you want found — “red car” works better than “vehicle”
Confidence threshold (0.5-0.7): Good balance between catching real objects and filtering false positives
Points per side (16-32): Standard analysis density — increase for precise mask edges, decrease for speed
Show visualization on: Useful for verifying detections before using the data downstream

Common mistakes

Setting the confidence threshold too low and getting dozens of false positive detections cluttering the output
Running detailed analysis on every frame of a video when you only need a few keyframes — analysis is per-frame and costs add up

Also called

detection_prompt, object, object_name, task_type, points_per_side, pred_iou_thresh, stability_score_thresh, show_visualization, detailed_analysis

Miscellaneous

In short

One-off parameters that appear on individual models and don’t fit other categories.

What it does

Covers model-specific settings that are unique to one or a handful of models. These parameters don’t have enough commonality across models to warrant their own section, but they still affect the output in meaningful ways.

How to think about it

Like custom effect controls on a third-party Premiere plugin — each plugin has its own unique settings that don’t map to any standard control. The parameter name usually hints at what it does, and the default value is almost always a safe starting point.

Recommended settings

Start with defaults: Unfamiliar parameters almost always have sensible defaults — change one at a time and compare results
Check the help icon: Click the help icon in modelBridge for context on any unfamiliar parameter
Small changes first: Adjust by 10-20% from the default, compare, then adjust further if needed

Common mistakes

Changing multiple unfamiliar parameters at once — if results change, you won’t know which setting was responsible
Ignoring model-specific parameters entirely — they exist because they meaningfully affect the output for that model