Parameter reference
This reference covers every parameter you’ll encounter in modelBridge — organized by category. Each section explains what the parameter does, how different values affect your output, and what settings to start with.
You can also access this information directly in the plugin: click the ⓘ icon next to any input field for a quick explanation and a direct link back to the relevant section here.
With over 700 curated explanations across 1,000+ models and over 100 dedicated parameter sections, this is the most comprehensive AI parameter reference built specifically for video editors and motion designers.
Parameters are grouped by theme. Use the sidebar or your browser’s find (Ctrl/Cmd+F) to jump to any parameter.
Prompt & guidance
Section titled “Prompt & guidance”Guidance scale
Section titled “Guidance scale”In short
Section titled “In short”Controls how literally the AI follows your prompt — too high causes artifacts.
What it does
Section titled “What it does”Sets how closely the generated output matches your text description. Higher values force stricter adherence to the prompt.
How to think about it
Section titled “How to think about it”Like directing an actor — low values give creative freedom, high values make them follow the script word-for-word. Too much direction and they become stiff.
Recommended settings
Section titled “Recommended settings”- Low (1–3): Loose, creative interpretation — good for abstract or experimental work
- Default (3–7): Balanced — sweet spot for most models and use cases
- High (10+): Very rigid, often oversaturated and artifact-prone
Common mistakes
Section titled “Common mistakes”- Cranking it to 15+ thinking “more accurate = better” — quality usually drops beyond 7
- Using the same value across different model families — Flux works best at 3–5, SDXL at 7–10
Also called
Section titled “Also called”cfg_scale, cfg, guidance, text_guidance_scale
Negative prompt
Section titled “Negative prompt”In short
Section titled “In short”Tells the AI what to avoid in the output.
What it does
Section titled “What it does”Defines concepts, qualities, or artifacts the model should steer away from during generation. The opposite of your main prompt.
How to think about it
Section titled “How to think about it”Like telling a colorist “no crushed blacks, no blown highlights” — you’re defining boundaries, not directions.
Recommended settings
Section titled “Recommended settings”- Standard starting point:
blurry, watermark, text, low quality, distorted - For video: Add
flickering, jittery, frame drops - For faces: Add
extra fingers, deformed face, cross-eyed
Common mistakes
Section titled “Common mistakes”- Writing full sentences — comma-separated keywords work best
- Overloading with 50+ terms — the model loses focus after ~20 keywords
Also called
Section titled “Also called”negative_prompt
Prompt expansion
Section titled “Prompt expansion”In short
Section titled “In short”Lets the model auto-enhance your prompt before generating.
What it does
Section titled “What it does”The AI automatically elaborates on your prompt, adding style cues, lighting descriptions, and quality keywords it thinks will improve the result.
How to think about it
Section titled “How to think about it”Like auto-correct for creative briefs — the AI fills in gaps you didn’t specify. Helpful when your prompt is short, counterproductive when it’s already detailed.
Recommended settings
Section titled “Recommended settings”- Off: Better when your prompt is already detailed and specific — expansion can override your intent
- On: Good when your prompt is short or vague — the model fills in gaps
- When to use: Short prompts, quick iterations, exploring styles
Common mistakes
Section titled “Common mistakes”- Leaving it on with a carefully crafted prompt, then wondering why the output doesn’t match
- Not checking the model’s expanded version when results surprise you
Also called
Section titled “Also called”enable_prompt_expansion, expand_prompt, enhance_prompt, prompt_optimizer
Thinking type
Section titled “Thinking type”In short
Section titled “In short”Extra processing time for prompt optimization before generating.
What it does
Section titled “What it does”Controls whether the model spends additional time analyzing and optimizing your prompt before starting the generation process. Takes longer and costs more.
How to think about it
Section titled “How to think about it”Like a pre-production meeting — the model “thinks” about the best approach before calling action. More prep doesn’t always mean a better take.
Recommended settings
Section titled “Recommended settings”- Off: Faster, cheaper — good for iterating quickly
- On: Slower, costs more — try it when default results disappoint
- When to use: Complex scenes, multi-subject compositions, when defaults fall short
Common mistakes
Section titled “Common mistakes”- Leaving it enabled for every generation — the quality improvement is inconsistent
- Assuming “thinking = better” without A/B testing against non-thinking results
Also called
Section titled “Also called”thinking_type
Generation quality
Section titled “Generation quality”Inference steps
Section titled “Inference steps”In short
Section titled “In short”Number of processing passes — more steps means more refined output.
What it does
Section titled “What it does”Sets how many times the AI refines the output. Each step adds detail and coherence, but with diminishing returns past a threshold.
How to think about it
Section titled “How to think about it”Like render quality in After Effects — more passes means more refined, but past a point you’re wasting render time for invisible improvement.
Recommended settings
Section titled “Recommended settings”- Low (5–10): Fast but rough — good for quick previews and iteration
- Default (20–30): Best quality-to-speed ratio — start here
- High (50+): Marginally better, 2–3x slower, rarely worth it
Common mistakes
Section titled “Common mistakes”- Setting steps to 80+ thinking “more = better” — quality plateaus around 30 for most models
- Not adjusting steps when switching models — some models are optimized for fewer steps
Also called
Section titled “Also called”num_inference_steps, steps, num_steps, number_of_steps
In short
Section titled “In short”Locks randomness — same seed plus same settings equals same result.
What it does
Section titled “What it does”A number that controls the random starting point of the generation. Using the same seed with identical settings reproduces the exact same output.
How to think about it
Section titled “How to think about it”Like a take number on set — if Take 7 was great, you can ask for Take 7 again and get the exact same performance.
Recommended settings
Section titled “Recommended settings”- -1 or 0: Random seed (default) — every generation is different
- Any specific number: Locks the output — use when you found a result you like
- When to use: Lock the seed to iterate on prompt/settings while keeping composition stable
Common mistakes
Section titled “Common mistakes”- Not saving the seed when you get a good result — note it before changing settings
- Expecting the same seed to produce identical results across different models — it won’t
Strength
Section titled “Strength”In short
Section titled “In short”How much the input changes — 0 keeps it, 1 replaces it entirely.
What it does
Section titled “What it does”Controls the degree of transformation applied to your input image or video. At 0 the output is identical to input, at 1 the input is completely ignored.
How to think about it
Section titled “How to think about it”Like opacity on an adjustment layer — low values make subtle tweaks to your source, high values ignore it and start fresh.
Recommended settings
Section titled “Recommended settings”- Low (0.1–0.3): Subtle refinement — keeps your original composition intact
- Default (0.4–0.6): Balanced transformation — changes style while preserving structure
- High (0.7–1.0): Major changes — your input becomes a rough suggestion, not a guide
Common mistakes
Section titled “Common mistakes”- Setting it to 1.0 for image-to-image and wondering why output ignores your source — that’s by design
- Using the same strength across different model types — video models often need lower values than image models
Also called
Section titled “Also called”image_strength
End image strength
Section titled “End image strength”In short
Section titled “In short”How strongly the end frame pulls a video transition toward it.
What it does
Section titled “What it does”Controls how firmly the final frame of a video transition or interpolation matches your provided end image. Higher values ensure a precise landing.
How to think about it
Section titled “How to think about it”Like easing in an animation — higher values pull the motion firmly toward the final keyframe, lower values let it drift and arrive naturally.
Recommended settings
Section titled “Recommended settings”- Low (0.3–0.5): Gentle arrival at the end frame — more creative, organic motion
- Default (0.6–0.8): Balanced — recognizable end frame with natural transition
- High (0.8–1.0): Precise landing — the last frame closely matches your end image
Common mistakes
Section titled “Common mistakes”- Setting it too low when you need an exact match — the transition will “miss” the target
- Not using it at all for interpolation — the end frame may not resemble your input
Also called
Section titled “Also called”end_image_strength
Scheduler
Section titled “Scheduler”In short
Section titled “In short”The algorithm that controls how the AI refines output step by step.
What it does
Section titled “What it does”Selects the mathematical method used to progressively denoise and refine the image or video during generation. Different schedulers produce slightly different quality/speed tradeoffs.
How to think about it
Section titled “How to think about it”Like choosing a render engine — each produces slightly different results. Most users never need to change this from the default.
Recommended settings
Section titled “Recommended settings”- Euler / Euler A: Fast, good general purpose — common default
- DPM++ 2M: High quality, slightly slower — good for detailed images
- DDIM: Deterministic — same seed gives truly identical results, good for consistency
Common mistakes
Section titled “Common mistakes”- Switching schedulers to “fix” a bad generation — the problem is almost always prompt or guidance scale
- Spending time comparing schedulers before optimizing more impactful settings first
Also called
Section titled “Also called”sampler
Acceleration
Section titled “Acceleration”In short
Section titled “In short”Speed-versus-quality tradeoff — faster generation, lower quality.
What it does
Section titled “What it does”Reduces generation time by taking computational shortcuts. Higher acceleration means faster output but with potential quality loss.
How to think about it
Section titled “How to think about it”Like proxy editing in Premiere — faster to work with, but you sacrifice some quality. Use proxies for iteration, full-res for finals.
Recommended settings
Section titled “Recommended settings”- Low/None: Full quality — use for final renders
- Default (Medium): Faster with minimal quality loss — good for previewing
- High/Turbo: Fastest — noticeable quality reduction, use for rapid iteration only
Common mistakes
Section titled “Common mistakes”- Leaving acceleration on high for final output — always switch back for the version the client sees
- Not realizing some models label this differently (turbo, fast, lightning)
Temperature
Section titled “Temperature”In short
Section titled “In short”Randomness in audio/speech — higher means more varied, less predictable.
What it does
Section titled “What it does”Controls how much variation the model introduces in text-to-speech and audio generation. Low values produce consistent, predictable output; high values introduce more natural variation.
How to think about it
Section titled “How to think about it”Like an actor’s improvisation dial — low temperature reads the script exactly, high temperature ad-libs and adds personality.
Recommended settings
Section titled “Recommended settings”- Low (0.3–0.7): Predictable, consistent — good for narration and voiceovers
- Default (0.8–1.0): Natural variation — sounds more human
- High (1.2+): Unpredictable — may produce interesting results or garbled output
Common mistakes
Section titled “Common mistakes”- Setting it above 1.5 for speech — output often becomes incoherent
- Confusing this with guidance scale — temperature affects randomness, guidance affects prompt adherence
In short
Section titled “In short”Limits AI choices to the most probable options — lower means more focused.
What it does
Section titled “What it does”Restricts the model’s selection pool to tokens whose cumulative probability reaches P. Lower values make output more conservative and predictable.
How to think about it
Section titled “How to think about it”Like restricting an editor to their top 10 B-roll picks instead of the full library — fewer choices, but each one is strong.
Recommended settings
Section titled “Recommended settings”- Low (0.5–0.7): Focused, conservative output
- Default (0.9–1.0): Full creative range
- When to use: Lower it for consistent narration, raise it for creative/experimental audio
Common mistakes
Section titled “Common mistakes”- Setting it very low (0.3) and getting repetitive, monotonous audio — the model needs some freedom
- Adjusting Top P and Top K simultaneously without testing each independently
In short
Section titled “In short”Number of top candidates considered at each generation step.
What it does
Section titled “What it does”At each step, limits the model to choosing from only the K most likely next tokens. Lower K means less variety but more coherence.
How to think about it
Section titled “How to think about it”Like a shortlist for casting — instead of auditioning every actor in town, you only see the top K candidates. Smaller shortlist, more predictable result.
Recommended settings
Section titled “Recommended settings”- Low (10–30): Very focused — can sound robotic in speech
- Default (50–250): Good balance of variety and coherence
- High (500+): Maximum variety — may reduce quality
Common mistakes
Section titled “Common mistakes”- Setting both Top K and Top P very low simultaneously — the model gets so restricted it produces flat, repetitive output
- Changing Top K without understanding it interacts with Temperature and Top P
Repetition penalty
Section titled “Repetition penalty”In short
Section titled “In short”Penalizes repeated sounds or words in audio/speech output.
What it does
Section titled “What it does”Applies a penalty score when the model tries to repeat the same tokens, words, or patterns. Forces more variety in the output.
How to think about it
Section titled “How to think about it”Like telling an editor “don’t use the same transition twice in a row” — it forces variety, but too strict and the choices become awkward.
Recommended settings
Section titled “Recommended settings”- Low (1.0): No penalty (default) — some repetition is natural
- Default (1.2–1.3): Mild penalty — reduces obvious repetition
- High (1.5+): Strong penalty — may cause unnatural word choices
Common mistakes
Section titled “Common mistakes”- Setting it too high for speech — the model starts avoiding common words, making sentences awkward
- Using it for music generation where repetition (chorus, rhythm) is intentional
LoRA & style
Section titled “LoRA & style”LoRA scale
Section titled “LoRA scale”In short
Section titled “In short”How strongly a LoRA style add-on influences the output.
What it does
Section titled “What it does”Controls the blending weight of a LoRA (Low-Rank Adaptation) — a small style or concept model layered on top of the base model. At 0 it has no effect, at 1.0 it’s full strength.
How to think about it
Section titled “How to think about it”Like the opacity of a LUT in Premiere — at 0% no effect, at 100% full strength. But unlike a LUT, going above 100% often causes visual artifacts.
Recommended settings
Section titled “Recommended settings”- Low (0.3–0.5): Subtle influence — good for blending styles
- Default (0.7): Strong but clean — best starting point
- High (1.0+): Maximum effect — often introduces artifacts, distortion, or pattern repetition
Common mistakes
Section titled “Common mistakes”- Setting it to 1.0+ thinking “full strength = best result” — most LoRAs work best at 0.7–0.8
- Stacking multiple LoRAs at high scale — effects compound and quality drops fast
Also called
Section titled “Also called”lora_scale
Camera LoRA scale
Section titled “Camera LoRA scale”In short
Section titled “In short”Intensity of camera movement from a camera motion LoRA.
What it does
Section titled “What it does”Controls how dramatic the camera movement is when using a camera LoRA (zoom, pan, tilt, orbit). Higher values create more pronounced motion.
How to think about it
Section titled “How to think about it”Like keyframe velocity in Premiere — low values give gentle, subtle camera drift; high values give dramatic sweeping moves.
Recommended settings
Section titled “Recommended settings”- Low (0.3–0.5): Subtle camera drift — good for adding life to static shots
- Default (0.7–0.8): Noticeable but natural camera movement
- High (1.0+): Dramatic motion — can look unnatural if overdone
Common mistakes
Section titled “Common mistakes”- Combining high camera LoRA scale with a fast-moving subject — double motion creates disorienting output
- Using camera LoRA on very short clips where the motion has no time to develop
Also called
Section titled “Also called”camera_lora_scale
Embeddings
Section titled “Embeddings”In short
Section titled “In short”Custom-trained files that teach the model new visual concepts or styles.
What it does
Section titled “What it does”Loads externally trained style or concept data into the model — specific characters, objects, or art styles packaged as reusable files. Works alongside your prompt.
How to think about it
Section titled “How to think about it”Like custom presets in Premiere — someone trained the AI to recognize a specific concept (a character’s face, a brand’s visual style) and packaged it as a reusable file.
Recommended settings
Section titled “Recommended settings”- Single embedding: Best results — load one and reference its trigger word in your prompt
- Multiple embeddings: Quality drops with more than 2–3 combined
- When to use: Character consistency, brand style enforcement, specific art styles
Common mistakes
Section titled “Common mistakes”- Using an embedding without including its trigger word in the prompt — the model loads the style but doesn’t know when to apply it
- Combining too many embeddings — effects conflict and quality degrades
Also called
Section titled “Also called”embeddings
In short
Section titled “In short”Selects a predefined visual aesthetic from the model’s built-in options.
What it does
Section titled “What it does”Applies a preset visual style to the generation. Available styles vary by model — each has been tuned to produce specific aesthetics.
How to think about it
Section titled “How to think about it”Like choosing a LUT package — each style preset applies a consistent look across your output. Pick one that matches your project’s mood.
Recommended settings
Section titled “Recommended settings”- Browse the dropdown: Names describe the look — “cinematic,” “anime,” “photorealistic”
- Match style to prompt: A cinematic style with a cartoon prompt creates interesting but unpredictable blends
- When to use: When you want a consistent aesthetic without crafting a complex prompt
Common mistakes
Section titled “Common mistakes”- Fighting the style with your prompt — selecting “anime” but prompting for “photorealistic” produces inconsistent results
- Assuming all models support styles — many don’t have this option
Also called
Section titled “Also called”style
Video & animation
Section titled “Video & animation”In short
Section titled “In short”Frames per second of the generated video — match your timeline.
What it does
Section titled “What it does”Sets the frame rate of the generated video output. This determines motion smoothness and should match your Premiere timeline settings.
How to think about it
Section titled “How to think about it”Exactly like FPS in Premiere — it controls how smooth the motion looks. Mismatched frame rates between generation and timeline create artifacts.
Recommended settings
Section titled “Recommended settings”- 24 fps: Film look — standard for cinematic content
- 30 fps: Smooth motion — standard for web and social media
- 60 fps: Very smooth — good for slow-motion, not all models support this
Common mistakes
Section titled “Common mistakes”- Generating at a different FPS than your Premiere timeline — 24fps in a 30fps timeline creates awkward frame blending
- Choosing 60fps when the model doesn’t support it — output may default to a lower rate silently
Also called
Section titled “Also called”frame_rate, frames_per_second
Interpolated frames
Section titled “Interpolated frames”In short
Section titled “In short”AI-generated frames inserted between keyframes for smoother motion.
What it does
Section titled “What it does”Creates new in-between frames that didn’t exist in the original, smoothing out motion. More interpolated frames means smoother playback but longer generation time.
How to think about it
Section titled “How to think about it”Like Premiere’s Optical Flow — the AI synthesizes new frames between existing ones. Each interpolated frame is a full AI generation, so more frames means proportionally more work.
Recommended settings
Section titled “Recommended settings”- Low (2–3): Slight smoothing — fast to generate
- Default (4–6): Noticeably smoother — good balance
- High (8+): Very smooth, approaching slow-motion — much slower to generate
Common mistakes
Section titled “Common mistakes”- Setting it very high and expecting instant results — 8 frames between every pair means 8x the generation work
- Using interpolation on footage that’s already smooth — adds processing time with no visible improvement
Also called
Section titled “Also called”num_interpolated_frames
Interpolator model
Section titled “Interpolator model”In short
Section titled “In short”Which AI model creates the in-between frames during interpolation.
What it does
Section titled “What it does”Selects the specific algorithm used to generate interpolated frames. Different models handle different types of motion better.
How to think about it
Section titled “How to think about it”Like choosing between Optical Flow and Frame Blending in Premiere — different algorithms, different motion quality. Some handle fast motion better, others are smoother with slow movement.
Recommended settings
Section titled “Recommended settings”- Default: Use unless you see artifacts — the default is chosen for broad compatibility
- Alternative models: Try if you see ghosting or blurry motion in fast-moving scenes
- None: Skips interpolation entirely
Common mistakes
Section titled “Common mistakes”- Switching interpolator models to fix blurry output that’s actually caused by too few inference steps or low resolution
- Trying every interpolator before addressing more impactful settings like strength or steps
Also called
Section titled “Also called”interpolator_model
Temporal style consistency
Section titled “Temporal style consistency”In short
Section titled “In short”How consistent the visual style stays between frames in a video.
What it does
Section titled “What it does”Enforces that every frame maintains the same visual style, preventing frame-to-frame style drift or flicker in generated video.
How to think about it
Section titled “How to think about it”Like color consistency across a multi-day shoot — higher values enforce that every frame looks like it belongs to the same visual world.
Recommended settings
Section titled “Recommended settings”- Low (0.0–0.3): Each frame can drift stylistically — artistic but potentially flickery
- Default (0.5–0.7): Consistent look with natural variation
- High (0.8–1.0): Very uniform — can look static if overdone
Common mistakes
Section titled “Common mistakes”- Setting it to 0 and getting distracting style flicker every few frames
- Setting it to 1.0 and getting output that looks frozen or lacks natural motion variation
Also called
Section titled “Also called”temporal_adain_factor
Image quality
Section titled “Image quality”Upscale factor
Section titled “Upscale factor”In short
Section titled “In short”Multiplies output resolution with AI-enhanced detail — 2x or 4x.
What it does
Section titled “What it does”Scales the output image by the given factor while using AI to add detail that wasn’t in the original. A 2x factor turns 512x512 into 1024x1024.
How to think about it
Section titled “How to think about it”Like Premiere’s “Scale to Frame Size” but with AI enhancement — it doesn’t just stretch pixels, it synthesizes new detail.
Recommended settings
Section titled “Recommended settings”- 2x: Standard upscale — fast, reliable, good for most uses
- 4x: Maximum detail — takes longer, costs more, use for hero shots or print
- When to use: When your generation resolution is lower than your delivery format requires
Common mistakes
Section titled “Common mistakes”- Upscaling an already-large image by 4x — 2048x2048 at 4x creates 8192x8192 with diminishing returns
- Expecting upscaling to fix a bad generation — it enhances detail, it doesn’t fix composition
Second-stage guidance
Section titled “Second-stage guidance”In short
Section titled “In short”Guidance scale for the refinement pass after initial generation.
What it does
Section titled “What it does”A separate guidance scale applied during a second processing pass. Controls how aggressively the refinement follows your prompt after the initial generation establishes the base.
How to think about it
Section titled “How to think about it”Like a second round of color grading — the first pass establishes the look, the second fine-tunes it. This controls how much the second pass changes.
Recommended settings
Section titled “Recommended settings”- Low (1–3): Light refinement — preserves what the first pass created
- Default (3–7): Balanced — same principles as main guidance scale
- High (7+): Aggressive refinement — sharpens details but risks artifacts
Common mistakes
Section titled “Common mistakes”- Setting it much higher than first-stage guidance — creates an inconsistent look with a soft base and harsh refinement
- Ignoring it entirely — the default is usually fine, but tuning it can noticeably improve detail
Also called
Section titled “Also called”guidance_scale_2
Guidance rescale
Section titled “Guidance rescale”In short
Section titled “In short”Reduces color oversaturation caused by high guidance scale values.
What it does
Section titled “What it does”Applies a correction factor that pulls back the color saturation boost that high guidance scale values introduce. Keeps colors natural when guidance is cranked up.
How to think about it
Section titled “How to think about it”Like a saturation limiter on a color grade — when guidance pushes colors too hard, rescale pulls them back to natural levels.
Recommended settings
Section titled “Recommended settings”- Low (0.0): No rescaling — colors may oversaturate at high guidance
- Default (0.3–0.5): Mild correction — good when using guidance above 7
- High (0.7): Strong correction — useful for very high guidance values
Common mistakes
Section titled “Common mistakes”- Using it when guidance scale is already low (3–5) — rescaling at low guidance washes out colors unnecessarily
- Not using it when guidance is above 10 — you’re likely getting oversaturated output
Also called
Section titled “Also called”guidance_rescale
Tone map compression
Section titled “Tone map compression”In short
Section titled “In short”Controls dynamic range — lower is punchy, higher is flat with more detail.
What it does
Section titled “What it does”Adjusts the dynamic range compression of the output, controlling how highlights and shadows are balanced relative to midtones.
How to think about it
Section titled “How to think about it”Like the tone curve in Lumetri Color — low compression keeps punchy contrast, high compression flattens the range for more recoverable detail in extremes.
Recommended settings
Section titled “Recommended settings”- Low (1.0): High contrast — punchy, cinematic look
- Default (1.5–2.0): Balanced — good all-around
- High (3.0+): Flat, compressed — more detail in highlights/shadows but can look dull
Common mistakes
Section titled “Common mistakes”- Cranking compression high then adding contrast back in post — you’re losing quality in both conversions
- Not considering your delivery format — flat output needs a grade, punchy output is closer to final
Also called
Section titled “Also called”tone_map_compression_ratio
First pass steps
Section titled “First pass steps”In short
Section titled “In short”Processing steps for the initial generation pass in multi-pass pipelines.
What it does
Section titled “What it does”Sets how many refinement steps run during the first generation pass, which establishes composition, shapes, and major details before the refinement pass.
How to think about it
Section titled “How to think about it”Like a rough cut — the first pass gets the structure right. More steps here means a stronger foundation for the refinement pass to build on.
Recommended settings
Section titled “Recommended settings”- Low (10–15): Quick rough pass — sufficient when the refinement pass is strong
- Default (15–25): Good balance — solid foundation without over-investing
- High (30+): Very refined first pass — diminishing returns, especially if second pass is also high
Common mistakes
Section titled “Common mistakes”- Over-investing steps in the first pass at the expense of the second — balance both for best results
- Using single-pass step counts (30+) for the first pass — multi-pass pipelines need fewer per stage
Also called
Section titled “Also called”first_pass_num_inference_steps
Second pass steps
Section titled “Second pass steps”In short
Section titled “In short”Processing steps for the refinement pass that adds detail and sharpness.
What it does
Section titled “What it does”Sets how many steps run during the second refinement pass, which polishes textures, sharpens details, and brings the output to final quality.
How to think about it
Section titled “How to think about it”Like the fine cut and color grade — this is where details get polished and the output reaches final quality. It builds on existing work, so fewer steps go further.
Recommended settings
Section titled “Recommended settings”- Low (5–10): Light polish — fast, preserves first pass character
- Default (10–20): Good refinement — adds meaningful detail
- High (25+): Heavy refinement — diminishing returns kick in fast since it’s enhancing, not creating
Common mistakes
Section titled “Common mistakes”- Setting it to 50+ expecting dramatic improvement — past ~25 steps, you’re paying for invisible changes
- Setting it higher than first pass steps — refinement builds on existing work and needs fewer steps
Also called
Section titled “Also called”second_pass_num_inference_steps
Second pass skip steps
Section titled “Second pass skip steps”In short
Section titled “In short”Skips early refinement steps to preserve first-pass structure.
What it does
Section titled “What it does”Tells the refinement pass to skip its first N steps, preserving the composition and structure from the first pass while only adding fine detail in later steps.
How to think about it
Section titled “How to think about it”Like starting your color grade at a later node — you skip the broad strokes (which the first pass already handled) and go straight to fine adjustments.
Recommended settings
Section titled “Recommended settings”- Low (3–5): Refinement can still make structural changes — more creative freedom
- Default (8–12): Keeps composition locked, focuses on texture and detail
- High (15+): Minimal refinement — almost no change from first pass
Common mistakes
Section titled “Common mistakes”- Setting skip steps too high and wondering why the second pass doesn’t seem to do anything — if you skip most steps, there’s nothing left to refine
- Setting it to 0 and getting structural changes you didn’t want — skip a few to lock composition
Also called
Section titled “Also called”second_pass_skip_initial_steps
Audio strength
Section titled “Audio strength”In short
Section titled “In short”How much audio input influences the video generation.
What it does
Section titled “What it does”Controls the degree to which an audio signal drives the visual output. Higher values make the video react more strongly to the audio’s rhythm, beat, and energy.
How to think about it
Section titled “How to think about it”Like audio-driven keyframes in After Effects — higher values make the video pulse with the beat, lower values keep visuals independent of the soundtrack.
Recommended settings
Section titled “Recommended settings”- Low (0.3–0.5): Subtle audio influence — visual changes are gentle
- Default (0.6–0.8): Noticeable sync between audio and visuals
- High (0.9–1.0): Strong audio-reactive output — motion matches the beat closely
Common mistakes
Section titled “Common mistakes”- Setting it to 1.0 with aggressive music — the video becomes too reactive, looking more like a visualizer than a video
- Using high audio strength with spoken word — voice dynamics create jarring visual changes
Also called
Section titled “Also called”audio_strength
Audio guidance scale
Section titled “Audio guidance scale”In short
Section titled “In short”How closely generated audio follows your text description.
What it does
Section titled “What it does”Same concept as visual guidance scale but for audio generation. Controls the balance between creative freedom and strict adherence to your audio description.
How to think about it
Section titled “How to think about it”Same as visual guidance — low values let the model improvise, high values make it follow your description literally. Too high and it sounds forced.
Recommended settings
Section titled “Recommended settings”- Low (1–3): Loose interpretation — good for ambient or experimental audio
- Default (3–5): Balanced — follows your description naturally
- High (7+): Very literal — may sound forced or unnatural
Common mistakes
Section titled “Common mistakes”- Cranking it up thinking “higher = better quality” — it means more rigid adherence, not better audio
- Using the same value for music and speech — speech usually needs lower guidance than sound effects
Also called
Section titled “Also called”audio_guidance_scale
In short
Section titled “In short”Selects the voice character for text-to-speech generation.
What it does
Section titled “What it does”Chooses which synthesized voice “actor” speaks your text. Each voice has its own tone, pitch, pacing, and personality. Available voices vary by model.
How to think about it
Section titled “How to think about it”Like casting a voiceover artist — each option is a different performer. Some sound warm and conversational, others formal and authoritative.
Recommended settings
Section titled “Recommended settings”- Preview first: Test with a short phrase before committing to a long generation
- Match to content: Narration, dialogue, and announcements each suit different voice characters
- Custom voices: Some models support voice cloning via audio upload
Common mistakes
Section titled “Common mistakes”- Choosing a voice without testing it on your specific content — a voice great for narration may sound wrong for dialogue
- Assuming voice names are consistent across models — “alloy” in one model may not exist in another
Text input
Section titled “Text input”In short
Section titled “In short”The literal text the AI will speak or sing in audio generation.
What it does
Section titled “What it does”Provides the script content for text-to-speech or text-to-singing models. Punctuation directly affects pacing and intonation in the output.
How to think about it
Section titled “How to think about it”Like script copy for a voiceover session — this is the literal words the AI will perform. Formatting matters just like it does on a teleprompter.
Recommended settings
Section titled “Recommended settings”- Natural punctuation: Periods create pauses, commas create brief breaks
- Spell out numbers: “twenty-five” not “25”, “doctor” not “Dr.”
- For emphasis: Some models respond to ALL CAPS or asterisks
Common mistakes
Section titled “Common mistakes”- Writing text without punctuation — the AI reads it as one continuous stream with no natural pauses
- Using abbreviations the model can’t interpret — it may pronounce “Dr.” as “dee-arr”
Language
Section titled “Language”In short
Section titled “In short”Sets the language for speech generation or transcription.
What it does
Section titled “What it does”Tells the model which language rules to follow for pronunciation, rhythm, and intonation. Uses standard language codes (en, es, fr, de, ja, etc.).
How to think about it
Section titled “How to think about it”Like setting the language track on a timeline — it determines which phonetic rules the AI follows. Wrong language code means wrong pronunciation.
Recommended settings
Section titled “Recommended settings”- Match your text: Always set the language code to match your text content
- Explicit over auto-detect: Some models auto-detect, but specifying is more reliable
- Single language per generation: Multilingual text in one generation produces inconsistent results
Common mistakes
Section titled “Common mistakes”- Leaving language set to English when text is in another language — the AI pronounces foreign words with English phonetics
- Mixing languages in one generation expecting the model to switch seamlessly
System
Section titled “System”Sync mode
Section titled “Sync mode”In short
Section titled “In short”Wait for the result or generate in the background while you work.
What it does
Section titled “What it does”When ON, the plugin blocks until the full result is ready. When OFF, generation runs in the background and you’re notified when it’s done.
How to think about it
Section titled “How to think about it”Like rendering in Premiere — sync mode is “render and wait,” async mode is “add to render queue and keep editing.”
Recommended settings
Section titled “Recommended settings”- Off (default): Best for most workflows — keep working while the AI generates
- On: Useful for scripted workflows or when you need the result immediately
- When to use: Leave OFF unless you have a specific reason to wait
Common mistakes
Section titled “Common mistakes”- Turning sync mode ON and wondering why the plugin feels slow — it’s not slow, it’s waiting
- Leaving it ON out of habit — async mode lets you queue multiple generations
Safety checker
Section titled “Safety checker”In short
Section titled “In short”Filters generated output for inappropriate or harmful content.
What it does
Section titled “What it does”Runs the model’s output through a content moderation filter before delivering it. Blocks results that may contain inappropriate content, protecting against unexpected output.
How to think about it
Section titled “How to think about it”Like a standards-and-practices review — the AI checks its own work before handing it over. Protects you from surprises, but occasionally blocks legitimate creative work.
Recommended settings
Section titled “Recommended settings”- On (default): Recommended for client work, team environments, and any production where unexpected content is unacceptable
- Off: Use with caution — only disable when you’re certain the content is appropriate
- When to use: Keep ON unless the filter is consistently blocking content you’ve verified is appropriate
Common mistakes
Section titled “Common mistakes”- Disabling it for all generations because “it blocks too much” — examine your prompt first, the filter usually reacts to something specific
- Assuming it catches everything — it’s a filter, not a guarantee
Safety tolerance
Section titled “Safety tolerance”In short
Section titled “In short”Controls how strict the content filter is, on a scale from 1 (strictest) to 6 (most permissive).
What it does
Section titled “What it does”Adjusts the sensitivity threshold of the model’s built-in safety checker. Lower values block more aggressively — a setting of 1 may flag even mildly suggestive content. Higher values allow more creative freedom but increase the risk of unexpected output.
How to think about it
Section titled “How to think about it”Like adjusting the rating on a content filter — 1 is “G-rated only,” 6 is “allow almost everything.” Most professional work sits at 2–3: strict enough to avoid surprises, permissive enough to not block legitimate creative content.
Recommended settings
Section titled “Recommended settings”- 2 (default on most models): Good balance for client work and team environments
- 1: Maximum filtering — use for children’s content or highly regulated industries
- 4–6: Use with caution — only when the default filter is consistently blocking content you’ve verified is appropriate
- When to adjust: If generations keep getting blocked and your prompt is clean, try moving up by 1
Common mistakes
Section titled “Common mistakes”- Setting it to 6 “just to be safe from blocking” — this removes most safety filtering, which is the opposite of safe
- Changing it without testing — always preview a generation after adjusting
Multi-prompt
Section titled “Multi-prompt”In short
Section titled “In short”Lets you write separate prompts for different scenes or segments within a single generation.
What it does
Section titled “What it does”Instead of one prompt describing the entire output, multi-prompt lets you define distinct descriptions for different parts — for example, different scenes in a video or different sections of an audio track. The model transitions between them automatically.
How to think about it
Section titled “How to think about it”Like writing scene descriptions on a shot list — each prompt controls one segment, and the AI handles the transitions between them.
Recommended settings
Section titled “Recommended settings”- When to use: Multi-scene videos, music with distinct sections, or any generation where you want different content at different points
- Format: Varies by model — some use numbered prompts, others use separator tokens. Check the model’s description for the expected format
Common mistakes
Section titled “Common mistakes”- Writing one long prompt and expecting scene breaks — you need to explicitly separate scenes
- Using too many segments for a short duration — each scene needs enough time to develop
Auto fix
Section titled “Auto fix”In short
Section titled “In short”Lets the model automatically correct input issues before generating.
What it does
Section titled “What it does”When enabled, the model attempts to fix problems with your input rather than rejecting it outright — for example, reformatting a prompt that triggers a policy filter, adjusting an image that’s slightly outside accepted dimensions, or converting an unsupported format.
How to think about it
Section titled “How to think about it”Like auto-correct for your generation inputs. It tries to make things work rather than throwing an error, but the “fix” may not always match your intent.
Recommended settings
Section titled “Recommended settings”- On (default where available): Good for exploratory work — fewer errors, more results
- Off: Use when you need precise control over exactly what the model receives — the auto-fix may silently change your input in ways you don’t expect
Common mistakes
Section titled “Common mistakes”- Leaving it on and not noticing the model changed your prompt — if results look off, check whether auto-fix modified your input
- Turning it off and then getting errors that auto-fix would have handled — re-enable if you’re hitting repeated validation failures
ControlNet
Section titled “ControlNet”ControlNet conditioning scale
Section titled “ControlNet conditioning scale”In short
Section titled “In short”How strongly the control image guides the AI output.
What it does
Section titled “What it does”Sets the influence weight of a ControlNet control image (edge map, depth map, pose skeleton) on the generation. Higher values mean the output follows the control signal more closely.
How to think about it
Section titled “How to think about it”Like rotoscoping constraints — higher values lock the output to your guide, lower values let the AI take creative liberties with the structure.
Recommended settings
Section titled “Recommended settings”- Low (0.3–0.5): Soft guidance — the AI follows the general structure but improvises detail
- Default (0.7–1.0): Strong guidance — output closely matches the control image
- High (1.2+): Very rigid — can produce artifacts if the control signal is noisy
Common mistakes
Section titled “Common mistakes”- Setting it to 1.5+ and getting blocky artifacts — control maps aren’t meant to be followed that literally
- Using a low-quality control image at high scale — garbage in, garbage out amplified
Also called
Section titled “Also called”controlnet_conditioning_scale, control_scale
Control timing
Section titled “Control timing”In short
Section titled “In short”When the control image starts and stops influencing the generation.
What it does
Section titled “What it does”control_start and control_end set what fraction of the generation process uses control guidance. The values range from 0 (beginning) to 1 (end). Guidance only applies between these two points.
How to think about it
Section titled “How to think about it”Like setting in/out points on a reference layer — the AI only “looks at” the control image during this window. Early guidance locks composition, late guidance refines detail.
Recommended settings
Section titled “Recommended settings”- Full range (0.0–1.0): Maximum control — the guide influences every step
- Early only (0.0–0.5): Locks composition but lets detail evolve freely — often the best balance
- Late only (0.5–1.0): Lets the AI establish its own composition, then steers detail — unusual but useful for texture control
Common mistakes
Section titled “Common mistakes”- Setting start and end to the same value — zero-width window means the control has no effect
- Using full range on a noisy control image — early-only gives better results when your control signal isn’t clean
Also called
Section titled “Also called”control_start, control_end
ControlNet guess mode
Section titled “ControlNet guess mode”In short
Section titled “In short”Lets ControlNet work without a text prompt — experimental.
What it does
Section titled “What it does”When enabled, the ControlNet generates output guided only by the control image, without any text prompt influence. The model “guesses” what to produce based solely on the structural input.
How to think about it
Section titled “How to think about it”Like giving your editor footage and no brief — they interpret the structure entirely on their own. Results are unpredictable but can be surprisingly creative.
Recommended settings
Section titled “Recommended settings”- Off (default): Use a prompt alongside the control image — more predictable results
- On: Experimental — try when you want the AI to interpret your control image freely
Common mistakes
Section titled “Common mistakes”- Enabling guess mode and expecting precise results — without a prompt, the model has no creative direction
- Forgetting it’s enabled and wondering why your prompt seems to be ignored
Also called
Section titled “Also called”controlnet_guess_mode
Preprocessor
Section titled “Preprocessor”In short
Section titled “In short”How the input image is prepared before being used as a control signal.
What it does
Section titled “What it does”Selects the preprocessing method applied to your input image before it’s fed to the ControlNet. Different preprocessors extract different structural information: edges, depth, pose, segmentation.
How to think about it
Section titled “How to think about it”Like choosing which analysis to run on your footage — edge detection gives you outlines, depth gives you spatial structure, pose gives you body positions. Pick the one that matches what you want to control.
Recommended settings
Section titled “Recommended settings”- Canny: Edge detection — good for preserving outlines and shapes
- Depth: Spatial structure — good for maintaining scene composition
- OpenPose: Body pose — good for matching character positions
- None: When your input is already a processed control map
Common mistakes
Section titled “Common mistakes”- Using the wrong preprocessor for your intent — depth won’t help if you want to match exact outlines
- Using “none” when your input is a regular photo — the model expects a processed control map
Also called
Section titled “Also called”preprocessor
IP Adapter
Section titled “IP Adapter”IP Adapter
Section titled “IP Adapter”In short
Section titled “In short”Uses a reference image to guide the style or subject of the output.
What it does
Section titled “What it does”IP Adapter (Image Prompt Adapter) takes a reference image and uses it to influence the generation — transferring visual style, subject appearance, or composition without needing to describe it in text.
How to think about it
Section titled “How to think about it”Like giving your colorist a reference frame from another film — “make it look like this.” The AI picks up on visual qualities from the reference and applies them to your generation.
Recommended settings
Section titled “Recommended settings”- Single reference: Best results — one strong reference image gives clear direction
- Multiple references: Some models support multiple IP adapter inputs — results blend between them
- When to use: Style transfer, maintaining character consistency, matching a specific visual tone across generations
Common mistakes
Section titled “Common mistakes”- Using a busy, complex reference image — simpler references with clear visual identity transfer better
- Expecting exact reproduction — IP Adapter captures style and mood, not pixel-perfect copies
Also called
Section titled “Also called”ip_adapters, ip_adapter
Pose guidance scale
Section titled “Pose guidance scale”In short
Section titled “In short”How strongly a pose reference controls the character’s position in the output.
What it does
Section titled “What it does”Sets the influence weight of a pose reference (skeleton, keypoints) on the generated character. Higher values lock the character’s pose more tightly to the reference.
How to think about it
Section titled “How to think about it”Like motion capture fidelity — low values give the AI freedom to adjust the pose naturally, high values force an exact match to the reference skeleton.
Recommended settings
Section titled “Recommended settings”- Low (0.3–0.5): Soft pose suggestion — natural but approximate
- Default (0.7–1.0): Strong pose match — good for matching specific body positions
- High (1.2+): Very rigid — can cause unnatural limb positions if the reference has artifacts
Common mistakes
Section titled “Common mistakes”- Setting it very high with a low-quality pose extraction — the model follows the noise too
- Using it without checking the extracted pose first — verify the skeleton matches your intent
Also called
Section titled “Also called”pose_guidance_scale
Mixing image prompt and inpaint
Section titled “Mixing image prompt and inpaint”In short
Section titled “In short”Blends IP Adapter style transfer with inpainting for creative fills.
What it does
Section titled “What it does”Combines the style influence of an image prompt (via IP Adapter) with inpainting — the AI fills masked regions using both your text prompt and the visual style from the reference image.
How to think about it
Section titled “How to think about it”Like doing a content-aware fill in Photoshop but with a specific style guide — the AI fills the gap in a way that matches both the surrounding content and your reference image’s aesthetic.
Recommended settings
Section titled “Recommended settings”- When to use: Creative retouching where you want fills to match a specific visual style
- Adjust strength: Lower values lean toward the text prompt, higher values lean toward the image reference
Common mistakes
Section titled “Common mistakes”- Using a reference image that clashes with the surrounding content — the fill will look inconsistent
- Not providing a clear mask — the blending works best with well-defined inpaint regions
Video processing
Section titled “Video processing”Video write mode
Section titled “Video write mode”In short
Section titled “In short”Encoding quality versus speed tradeoff for the output video file.
What it does
Section titled “What it does”Selects the encoding strategy for the generated video. Typically a choice between faster encoding with lower quality or slower encoding with better compression and quality.
How to think about it
Section titled “How to think about it”Like choosing between “Export as fast as possible” and “Match source — high bitrate” in Premiere’s export settings. Faster encoding is fine for previews, but use higher quality for finals.
Recommended settings
Section titled “Recommended settings”- Fast/Default: Good for iteration and previewing — saves time during creative exploration
- Quality: Use for final output — better compression, fewer artifacts
Common mistakes
Section titled “Common mistakes”- Using the fast mode for final deliverables — the quality difference is visible on close inspection
- Always using quality mode during iteration — it slows down your creative loop for no benefit
Also called
Section titled “Also called”video_write_mode
Multi-scale generation
Section titled “Multi-scale generation”In short
Section titled “In short”Generates at multiple resolutions for better quality — usually worth enabling.
What it does
Section titled “What it does”Runs the generation process at multiple resolution scales, progressively refining detail. The model first generates a low-resolution version, then enhances it at higher scales.
How to think about it
Section titled “How to think about it”Like progressive rendering in After Effects — starting with a rough pass and refining. Each scale adds detail that a single-pass generation would miss.
Recommended settings
Section titled “Recommended settings”- On (recommended): Better quality with modest speed cost — default for most models
- Off: Faster single-pass generation — use when speed matters more than quality
Common mistakes
Section titled “Common mistakes”- Turning it off to save time and not noticing the quality drop until final review
- Expecting it to fix low-resolution input — it improves generation quality, not source quality
Also called
Section titled “Also called”use_multiscale
Temporal downsample factor
Section titled “Temporal downsample factor”In short
Section titled “In short”Skips input video frames — higher values use less of the source motion.
What it does
Section titled “What it does”Reduces the frame rate of the input video before processing by skipping frames. A factor of 2 uses every other frame, 4 uses every fourth frame, and so on.
How to think about it
Section titled “How to think about it”Like dropping every other frame from a reference clip — the model sees the key poses but not every micro-movement. Good for speeding up processing on long clips.
Recommended settings
Section titled “Recommended settings”- 1 (no skip): Full frame rate — best quality, slowest processing
- 2: Every other frame — good balance for most video-to-video work
- 4+: Heavy skipping — only key poses survive, fine for style transfer but loses subtle motion
Common mistakes
Section titled “Common mistakes”- Setting it too high on motion-critical content — fast movements become jerky without enough intermediate frames
- Forgetting it’s enabled and wondering why the output feels “jumpy” compared to the source
Also called
Section titled “Also called”temporal_downsample_factor
Motion bucket ID
Section titled “Motion bucket ID”In short
Section titled “In short”Controls the amount of motion in generated video — higher means more movement.
What it does
Section titled “What it does”Sets the motion intensity for video generation models. Higher values produce more dynamic, energetic output with more camera and subject movement. Lower values produce calmer, more static shots.
How to think about it
Section titled “How to think about it”Like choosing between a locked-off tripod shot and a handheld action sequence — this parameter sets the energy level of the motion the AI generates.
Recommended settings
Section titled “Recommended settings”- Low (50–100): Calm, minimal motion — good for beauty shots, landscapes
- Default (127): Moderate motion — balanced for most content
- High (200+): Energetic, dynamic motion — action scenes, music videos
Common mistakes
Section titled “Common mistakes”- Setting it very high for talking-head content — the face warps and distorts with too much motion
- Not adjusting it when switching between content types — landscapes and action scenes need different values
Also called
Section titled “Also called”motion_bucket_id
Context frames
Section titled “Context frames”In short
Section titled “In short”Existing frames that guide video extension — more means smoother continuation.
What it does
Section titled “What it does”Sets how many existing frames the model “sees” when extending or continuing a video. More context frames give the model a better understanding of the current motion, style, and content.
How to think about it
Section titled “How to think about it”Like giving an editor more handles on a clip — the more preceding footage they see, the better they can match the cut. Same with AI: more context means smoother continuation.
Recommended settings
Section titled “Recommended settings”- Low (2–4): Minimal context — faster but may drift from the source style
- Default (8–16): Good balance — enough context for consistent continuation
- High (32+): Maximum context — best consistency but slower processing
Common mistakes
Section titled “Common mistakes”- Using too few context frames and getting jarring transitions where the extended video doesn’t match the source
- Using too many on short clips — if your clip is 24 frames, 32 context frames doesn’t make sense
Also called
Section titled “Also called”num_context_frames
Video segments
Section titled “Video segments”In short
Section titled “In short”Number of segments generated — more segments means longer output video.
What it does
Section titled “What it does”Divides the generation into separate segments (typically ~5 seconds each), which are processed individually and stitched together. More segments produce a longer total video.
How to think about it
Section titled “How to think about it”Like setting the total duration by choosing how many “scenes” to generate end to end. Each segment is a self-contained generation pass.
Recommended settings
Section titled “Recommended settings”- 1: Single segment (~5s) — fast, good for short clips
- 2–3: Medium length (10–15s) — standard for most use cases
- 5+: Long output — be aware that quality may drift across many segments
Common mistakes
Section titled “Common mistakes”- Setting many segments for content that doesn’t need it — each segment costs time and money
- Expecting perfect continuity across 10+ segments — some style drift is inevitable in very long generations
Also called
Section titled “Also called”num_segments
Constant rate factor
Section titled “Constant rate factor”In short
Section titled “In short”Video compression quality — lower means higher quality, larger file.
What it does
Section titled “What it does”Standard video encoding quality parameter (CRF). Scale is 0 (lossless, huge file) to 51 (worst quality, tiny file). Controls the tradeoff between output file size and visual quality.
How to think about it
Section titled “How to think about it”Exactly like CRF in Premiere’s H.264/HEVC export — lower numbers mean bigger files with fewer compression artifacts. The scale and behavior are identical.
Recommended settings
Section titled “Recommended settings”- Low (15–18): Near-lossless — use for hero shots and final delivery
- Default (23): Good balance — standard for web delivery
- High (28+): Small files, visible compression — fine for previews
Common mistakes
Section titled “Common mistakes”- Setting CRF to 0 thinking “I always want the best” — lossless files are enormous and unnecessary for AI-generated content
- Not adjusting for delivery format — social media doesn’t need CRF 15
Also called
Section titled “Also called”constant_rate_factor
Auto downsample
Section titled “Auto downsample”In short
Section titled “In short”Automatically reduces input resolution for faster processing.
What it does
Section titled “What it does”When enabled, the model automatically scales down your input if it exceeds the model’s recommended resolution. Saves processing time without requiring manual resize.
How to think about it
Section titled “How to think about it”Like Premiere’s proxy workflow — the model uses a smaller version of your source for processing. The output resolution is controlled separately.
Recommended settings
Section titled “Recommended settings”- On (default): Let the model handle it — saves time on high-res sources
- Off: Use when you specifically need the model to process at full input resolution
Common mistakes
Section titled “Common mistakes”- Turning it off with a 4K source and wondering why generation takes forever — the model doesn’t need 4K input to produce good output
- Not checking the minimum FPS setting — auto-downsample with a very low minimum can strip too much motion
Also called
Section titled “Also called”enable_auto_downsample
Auto downsample minimum FPS
Section titled “Auto downsample minimum FPS”In short
Section titled “In short”Lowest framerate the model will keep when auto-downsampling your input.
What it does
Section titled “What it does”Sets the floor for frame rate reduction during auto-downsample. The model won’t drop the input below this FPS, even if doing so would speed up processing.
How to think about it
Section titled “How to think about it”Like setting a minimum proxy resolution — you want speed, but not at the cost of making the source unusable. This prevents the auto-downsample from stripping too much temporal information.
Recommended settings
Section titled “Recommended settings”- 8–12 fps: Good for style transfer where exact motion matching isn’t critical
- Default (16–24): Preserves enough motion for most use cases
- Match source: Set equal to your source FPS to prevent any frame dropping
Common mistakes
Section titled “Common mistakes”- Setting it too low for motion-critical content — 8fps input means jerky, choppy output
- Not realizing this only applies when auto-downsample is enabled
Also called
Section titled “Also called”auto_downsample_min_fps
Interpolation
Section titled “Interpolation”RIFE interpolation
Section titled “RIFE interpolation”In short
Section titled “In short”Adds AI frame interpolation for smoother video output.
What it does
Section titled “What it does”Enables RIFE (Real-Time Intermediate Flow Estimation) — an AI algorithm that generates new in-between frames to increase the effective frame rate of the output.
How to think about it
Section titled “How to think about it”Like Premiere’s Optical Flow but happening during generation — the AI creates smooth intermediate frames that didn’t exist in the original generation, producing fluid motion.
Recommended settings
Section titled “Recommended settings”- On: Smoother output — good for slow-motion or when the base frame rate feels choppy
- Off: Raw generation frames only — faster, and some content looks better without interpolation
Common mistakes
Section titled “Common mistakes”- Enabling it on content that’s already smooth — adds processing time with no visible benefit
- Expecting it to fix fundamentally broken motion — RIFE smooths transitions, it doesn’t fix bad compositions
Also called
Section titled “Also called”use_rife
Adjust FPS for interpolation
Section titled “Adjust FPS for interpolation”In short
Section titled “In short”Automatically adjusts the output frame rate to account for interpolated frames.
What it does
Section titled “What it does”When frame interpolation adds new frames, this option recalculates the output FPS so the video plays at the correct speed. Without it, interpolated frames extend the duration instead.
How to think about it
Section titled “How to think about it”Like choosing between slow motion and frame rate conversion — with this ON, you get the same duration at higher FPS. With this OFF, you get longer, slower footage.
Recommended settings
Section titled “Recommended settings”- On: Same duration, smoother playback — matches your timeline FPS
- Off: Longer, slow-motion-style output — use intentionally for slow-mo effects
Common mistakes
Section titled “Common mistakes”- Leaving it off and wondering why the generated clip is longer than expected — interpolated frames are extending duration
- Turning it on when you actually wanted slow motion — the extra frames get absorbed into the same duration
Also called
Section titled “Also called”adjust_fps_for_interpolation
Transparency mode
Section titled “Transparency mode”In short
Section titled “In short”Controls how first and last frame edges blend with transparency.
What it does
Section titled “What it does”Sets how the model handles the transition at the very beginning and end of an interpolated sequence. Controls whether edges fade to transparent, hard-cut, or blend.
How to think about it
Section titled “How to think about it”Like choosing a dissolve type for the first and last frames of a transition — the edges can be sharp, soft, or fade to nothing.
Recommended settings
Section titled “Recommended settings”- Default/Auto: Let the model choose — works well for most use cases
- Transparent: Use when layering the output over other footage in Premiere
- Opaque: Use when the clip stands alone — no edge artifacts
Common mistakes
Section titled “Common mistakes”- Using transparent mode and placing the clip on V1 with nothing below — you’ll see black edges where transparency was
- Not matching the mode to your compositing needs — check your timeline layer setup first
Also called
Section titled “Also called”transparency_mode
Movement amplitude
Section titled “Movement amplitude”In short
Section titled “In short”Controls motion intensity — auto lets the model decide.
What it does
Section titled “What it does”Sets how much motion the model adds to the output. Auto mode analyzes the input and chooses an appropriate amount. Manual values override the model’s judgment.
How to think about it
Section titled “How to think about it”Like setting the amount of parallax or drift in a Ken Burns effect — higher amplitude means more visible motion, lower means subtler movement.
Recommended settings
Section titled “Recommended settings”- Auto (default): Best for most cases — the model adapts to your input
- Low: Minimal motion — good for subtle background animation
- High: Strong motion — use for dynamic, energetic content
Common mistakes
Section titled “Common mistakes”- Overriding auto with a high value on static content — too much motion on a still image looks unnatural
- Setting it very low and expecting completely static output — even minimum values add some motion
Also called
Section titled “Also called”movement_amplitude
Audio & music parameters
Section titled “Audio & music parameters”Chunk overlap
Section titled “Chunk overlap”In short
Section titled “In short”Overlap between audio processing chunks — more overlap means smoother blending.
What it does
Section titled “What it does”When generating long audio, the model processes it in chunks. This parameter controls how much adjacent chunks overlap, which affects the smoothness of transitions between them.
How to think about it
Section titled “How to think about it”Like crossfade length between audio clips in Premiere — more overlap means smoother transitions between sections, less overlap means faster processing but potentially audible seams.
Recommended settings
Section titled “Recommended settings”- Low: Faster processing, possible audible transitions between chunks
- Default: Good balance — smooth enough for most content
- High: Seamless blending — use for music or content where any transition artifact is unacceptable
Common mistakes
Section titled “Common mistakes”- Setting it to zero and getting audible clicks or gaps between chunks in long audio
- Maximizing overlap for short audio that fits in a single chunk — no benefit, just slower processing
Also called
Section titled “Also called”chunk_overlap
Reranking candidates
Section titled “Reranking candidates”In short
Section titled “In short”How many alternatives the model generates internally before picking the best one.
What it does
Section titled “What it does”The model generates multiple candidate outputs internally, ranks them by quality, and returns the best one. More candidates means better quality selection but proportionally higher cost and time.
How to think about it
Section titled “How to think about it”Like doing multiple takes in a voiceover session and picking the best one — more takes means a better final selection, but each take costs studio time.
Recommended settings
Section titled “Recommended settings”- Low (1–3): Fast and cheap — you get what you get
- Default (5): Good quality with reasonable cost
- High (10+): Best selection quality — but cost scales linearly with candidate count
Common mistakes
Section titled “Common mistakes”- Setting it to 20+ for quick iterations — you’re paying for quality you won’t evaluate anyway
- Setting it to 1 for final output — one candidate gives no selection benefit
Also called
Section titled “Also called”reranking_candidates
Advanced generation
Section titled “Advanced generation”Turbo mode
Section titled “Turbo mode”In short
Section titled “In short”Faster generation at the cost of some quality.
What it does
Section titled “What it does”Enables optimized generation paths that trade quality for speed. The model takes computational shortcuts to produce results faster.
How to think about it
Section titled “How to think about it”Like switching to draft quality in After Effects RAM preview — you see the result faster, but the fine detail isn’t there. Use turbo for iteration, switch it off for finals.
Recommended settings
Section titled “Recommended settings”- On: Quick iteration, previewing ideas, rapid prototyping
- Off: Final output, client-facing deliverables, quality-critical work
Common mistakes
Section titled “Common mistakes”- Leaving turbo on for final renders — the quality difference is real and visible
- Never using turbo during exploration — you’re wasting time rendering details you’ll change anyway
Also called
Section titled “Also called”turbo_mode
Preprocess
Section titled “Preprocess”In short
Section titled “In short”How the input image is prepared before generation — crop, resize, pad, etc.
What it does
Section titled “What it does”Selects the method used to fit your input image to the model’s expected dimensions. Options typically include crop (cut edges), resize (stretch/squash), pad (add borders), or none.
How to think about it
Section titled “How to think about it”Like choosing between “Scale to Fill” and “Scale to Fit” when placing footage in a Premiere sequence — each method handles the size mismatch differently.
Recommended settings
Section titled “Recommended settings”- Crop: Fills the frame completely — may lose edges
- Resize: Stretches to fit — may distort aspect ratio
- Pad: Adds borders — preserves everything but adds empty space
- None: Send as-is — model handles it
Common mistakes
Section titled “Common mistakes”- Using resize on content where aspect ratio matters — faces and text will distort
- Using crop without checking which edges are lost — important content may be cut off
Also called
Section titled “Also called”preprocess
Zoom factor
Section titled “Zoom factor”In short
Section titled “In short”Camera zoom applied during video generation — 0 means no zoom.
What it does
Section titled “What it does”Adds a progressive zoom effect to the generated video. Positive values zoom in, negative values (where supported) zoom out.
How to think about it
Section titled “How to think about it”Like keyframing a scale change on a clip in Premiere — the “camera” progressively moves closer or farther during the clip.
Recommended settings
Section titled “Recommended settings”- 0: No zoom — static framing
- Low (0.1–0.3): Subtle push-in — adds cinematic energy without being obvious
- High (0.5+): Dramatic zoom — use sparingly, can feel artificial
Common mistakes
Section titled “Common mistakes”- Combining high zoom with high motion — the effects compound and create disorienting output
- Using zoom on very short clips — there isn’t enough duration for the zoom to develop naturally
Also called
Section titled “Also called”zoom_factor
Noise scale
Section titled “Noise scale”In short
Section titled “In short”Intensity of noise in the generation process — affects randomness.
What it does
Section titled “What it does”Controls how much random noise is injected into the diffusion process. Higher values create more variation between frames or generations; lower values produce more deterministic results.
How to think about it
Section titled “How to think about it”Like film grain intensity — more noise means more organic variation, less noise means cleaner but potentially more “digital” looking output.
Recommended settings
Section titled “Recommended settings”- Low: More deterministic, consistent output — good for reproducibility
- Default: Standard variation — balanced for most use cases
- High: More creative randomness — each generation diverges more from the baseline
Common mistakes
Section titled “Common mistakes”- Setting it too high and losing coherence between frames in video — the noise overwhelms the model’s consistency
- Setting it to zero expecting perfect determinism — seed controls reproducibility more reliably
Also called
Section titled “Also called”noise_scale
In short
Section titled “In short”Sampler noise parameter — 0 means fully deterministic generation.
What it does
Section titled “What it does”Controls the stochastic (random) component of the sampling process. At 0, the sampler is fully deterministic — same seed always gives the same result. Higher values introduce controlled randomness.
How to think about it
Section titled “How to think about it”Like adding controlled improvisation to a scripted performance — at 0, every take is identical. Higher eta lets the model make small creative decisions that vary between runs.
Recommended settings
Section titled “Recommended settings”- 0: Fully deterministic — perfect reproducibility
- Default (0.5–1.0): Some variation — natural-looking results
- High (1.0+): More randomness — each generation is more unique
Common mistakes
Section titled “Common mistakes”- Setting eta to 0 and expecting the output to be identical across different models — eta controls randomness within one model only
- Cranking it high for consistency testing — that’s the opposite of what you want
Also called
Section titled “Also called”eta
CLIP skip
Section titled “CLIP skip”In short
Section titled “In short”Skips CLIP text encoder layers — changes how the model interprets your prompt.
What it does
Section titled “What it does”Skips the last N layers of the CLIP text encoder. This changes how deeply the model analyzes your prompt, affecting the style and interpretation of the output.
How to think about it
Section titled “How to think about it”Like the difference between reading a brief carefully versus skimming it — skipping layers means the model interprets your prompt more loosely, often producing a distinct aesthetic.
Recommended settings
Section titled “Recommended settings”- 1 (no skip): Full text analysis — most literal prompt interpretation
- 2: Common for anime and stylized content — slightly looser interpretation
- 3+: Very loose — the model barely reads your prompt, mostly relies on learned aesthetics
Common mistakes
Section titled “Common mistakes”- Using CLIP skip 2 on models that aren’t designed for it — not all architectures benefit from skipping
- Setting it to 4+ and wondering why the prompt has no effect — the model can barely “read” it at that point
Also called
Section titled “Also called”clip_skip
Conditional augmentation
Section titled “Conditional augmentation”In short
Section titled “In short”Adds noise to the input reference for more variation in the output.
What it does
Section titled “What it does”Adds controlled noise to the conditioning (input) image before generation. Higher values mean the model treats your input more loosely, producing more diverse but less faithful outputs.
How to think about it
Section titled “How to think about it”Like intentionally degrading your reference footage before handing it to the VFX team — they’ll get the general idea but fill in details differently each time.
Recommended settings
Section titled “Recommended settings”- Low (0.0–0.1): Very faithful to input — minimal deviation
- Default (0.02–0.05): Slight variation — natural-looking diversity
- High (0.1+): Significant departure from input — creative but less predictable
Common mistakes
Section titled “Common mistakes”- Setting it too high and losing the key features of your input image — the model ignores your reference
- Setting it to exactly 0 and getting overly rigid output that looks “copied” rather than generated
Also called
Section titled “Also called”cond_aug
Granularity scale
Section titled “Granularity scale”In short
Section titled “In short”Detail control — higher values can reduce artifacts in some models.
What it does
Section titled “What it does”Adjusts the level of fine detail in the generation process. Some models use this to control artifact suppression — higher values smooth out micro-artifacts at the cost of some detail.
How to think about it
Section titled “How to think about it”Like the detail slider in noise reduction — higher values clean up artifacts but may soften fine detail. It’s a tradeoff between cleanliness and crispness.
Recommended settings
Section titled “Recommended settings”- Low: Maximum detail — may include some artifacts
- Default: Balanced — clean output with preserved detail
- High: Smooth, artifact-free — but may look slightly soft
Common mistakes
Section titled “Common mistakes”- Cranking it up to eliminate artifacts when the real problem is low inference steps — fix the root cause first
- Setting it to 0 expecting maximum sharpness — some models need a minimum value to produce coherent output
Also called
Section titled “Also called”granularity_scale
Refiner switch
Section titled “Refiner switch”In short
Section titled “In short”When to switch from the base model to the refiner model — 0.4–0.8 typical.
What it does
Section titled “What it does”In dual-model pipelines (like SDXL), this controls at what percentage of generation the system hands off from the base model to the refiner. The base handles composition, the refiner handles detail.
How to think about it
Section titled “How to think about it”Like the handoff between rough cut and fine cut editors — the first handles structure and story, the second polishes visuals and pacing. This value sets when the handoff happens.
Recommended settings
Section titled “Recommended settings”- Low (0.3–0.4): Early switch — refiner has more influence, softer overall look
- Default (0.5–0.6): Balanced — base establishes, refiner polishes
- High (0.7–0.8): Late switch — base dominates, refiner only touches up final details
Common mistakes
Section titled “Common mistakes”- Setting it too low and getting output that looks overly smooth — the refiner eliminates the base model’s character
- Setting it to 1.0 (no switch) — you’re bypassing the refiner entirely, missing its detail enhancement
Also called
Section titled “Also called”refiner_switch
Sharpness
Section titled “Sharpness”In short
Section titled “In short”Output sharpness — higher is sharper but risks visual artifacts.
What it does
Section titled “What it does”Controls the sharpening applied to the generated output. Higher values produce crisper edges and more defined detail, but can introduce halos and edge artifacts.
How to think about it
Section titled “How to think about it”Like the Unsharp Mask in Premiere — a little sharpening makes footage pop, too much creates ugly halos around every edge.
Recommended settings
Section titled “Recommended settings”- Low: Soft, natural look — good for organic content
- Default: Balanced — most models default to a good sharpness level
- High: Crisp, detailed — may show halos on high-contrast edges
Common mistakes
Section titled “Common mistakes”- Maxing out sharpness thinking it improves quality — it creates visible artifacts on every edge
- Not adjusting per content type — text and architecture need less sharpening than detailed textures
Also called
Section titled “Also called”sharpness
Performance preset
Section titled “Performance preset”In short
Section titled “In short”Speed/quality tradeoff preset — from extreme speed to maximum quality.
What it does
Section titled “What it does”Selects a pre-configured balance between generation speed and output quality. Higher performance settings reduce inference steps, use faster schedulers, or apply other optimizations.
How to think about it
Section titled “How to think about it”Like Premiere’s playback resolution — “Full” for final review, “1/4” for editing. Each step trades quality for speed.
Recommended settings
Section titled “Recommended settings”- Quality: Best output — use for final deliverables
- Speed: Good balance — fast enough for iteration with decent quality
- Extreme Speed: Fastest — noticeably reduced quality, use for rapid prototyping only
Common mistakes
Section titled “Common mistakes”- Using Extreme Speed for client deliverables — the quality difference is visible
- Always using Quality mode during exploration — you’re waiting for details you’ll change anyway
Also called
Section titled “Also called”performance
Reference control
Section titled “Reference control”Reference timing
Section titled “Reference timing”In short
Section titled “In short”When the reference image starts and stops guiding the generation.
What it does
Section titled “What it does”reference_start and reference_end control what portion of the generation process uses the reference image for guidance. Values range from 0 (beginning) to 1 (end).
How to think about it
Section titled “How to think about it”Like setting the influence window for a reference LUT — during this window the model tries to match your reference, outside it the model works independently.
Recommended settings
Section titled “Recommended settings”- Full range (0.0–1.0): Maximum reference influence — output closely matches reference throughout
- Early only (0.0–0.5): Sets the composition and style, then lets the model elaborate freely
- Narrow window (0.2–0.6): Reference influences the middle phase — avoids rigid start/end frames
Common mistakes
Section titled “Common mistakes”- Setting start after end — the reference has no influence window
- Using full range with a very strong reference — output may look like a copy rather than a generation
Also called
Section titled “Also called”reference_start, reference_end
Image editing
Section titled “Image editing”Avatar
Section titled “Avatar”In short
Section titled “In short”Character and appearance presets for AI-generated people.
What it does
Section titled “What it does”Selects a predefined character template — body type, clothing, hairstyle, or full identity — that the model uses as a base for generating human subjects. Some models offer single avatars, others support multi-character setups where you define several people in one scene.
How to think about it
Section titled “How to think about it”Like casting from a stock talent roster. Instead of describing every detail of a person’s appearance in your prompt, you pick a pre-built character and the AI fills in the rest. Multi-character mode is like setting up a group scene with assigned roles.
Recommended settings
Section titled “Recommended settings”- Single avatar: Best for headshots, portraits, and single-subject content — clearest results
- Multi-character: Use when the scene requires distinct people interacting — quality drops with more than 3–4 characters
- When to use: Character consistency across multiple generations, branded content with recurring “talent”
Common mistakes
Section titled “Common mistakes”- Combining an avatar preset with a prompt that describes a completely different person — the model gets conflicting instructions and produces inconsistent results
- Using multi-character mode for a single subject — it adds complexity the model doesn’t need
Also called
Section titled “Also called”avatar, character, multi_character
Background removal
Section titled “Background removal”In short
Section titled “In short”Removes or replaces the background behind subjects in an image.
What it does
Section titled “What it does”Detects the foreground subject and separates it from the background. Depending on the model, the background can be made transparent, replaced with a solid color, swapped for a new scene, or processed with adjustable opacity and style.
How to think about it
Section titled “How to think about it”Like using Ultra Key or the Roto Brush in Adobe — the AI identifies what’s “in front” and what’s “behind,” then lets you choose what happens to the background. The difference is that AI does it in one pass without manual masking.
Recommended settings
Section titled “Recommended settings”- Remove (transparent): Best for compositing over other footage in your timeline — gives you a clean alpha channel
- Replace (solid/scene): Use when you need a finished shot without further compositing
- Threshold controls: Lower thresholds keep more edge detail (hair, fur) but may leave background artifacts; higher thresholds give cleaner cuts but may clip fine edges
Common mistakes
Section titled “Common mistakes”- Using background removal on subjects with translucent elements (veils, glass, smoke) — the AI treats them as background and removes them
- Setting the threshold too aggressively and losing hair detail — start with the default and adjust gradually
Also called
Section titled “Also called”background_mode, background_opacity, background_removal, remove_background, transparent_background, background_style, background_threshold, background_tolerance, bg_th, remove_background_noise
Color grading
Section titled “Color grading”In short
Section titled “In short”Post-processing effects like contrast, grain, blur, vignette, and color shifts applied to the generated output.
What it does
Section titled “What it does”Applies visual adjustments to the AI’s output — brightness, contrast, saturation, film grain, lens blur, vignetting, sharpening, tinting, and other photographic effects. These run after generation, modifying the final image before delivery.
How to think about it
Section titled “How to think about it”Like applying a Lumetri Color grade plus creative effects in Premiere, but baked into the generation. The difference: these are applied before the file reaches your timeline, so you can’t undo them in post. Use them for quick stylistic finishes, but keep effects subtle if you plan to grade further in Premiere.
Recommended settings
Section titled “Recommended settings”- Subtle (low values): Best when you plan to do your own color grade in Premiere — gives you room to work
- Moderate: Good for social media or quick-turnaround content where the generation is the final product
- Heavy: Use for deliberate stylistic effects (heavy grain, strong vignette) — but understand these are baked in
Common mistakes
Section titled “Common mistakes”- Applying heavy grain and contrast during generation, then adding more in Premiere — the effects stack and look overprocessed
- Enabling every effect at once (grain + blur + vignette + tint) — the output looks like an Instagram filter from 2012
Also called
Section titled “Also called”brightness, contrast, saturation, gamma, grain, grain_intensity, grain_scale, grain_style, blur_radius, blur_sigma, blur_type, vignette_strength, sharpen, cas_amount, tint_mode, tint_strength, enable_chromatic, enable_grain, enable_blur, enable_vignette, enable_sharpen, enable_solarize, enable_tint, enable_glow, enable_dodge_burn, enable_desaturate, enable_dissolve, enable_parabolize, enable_color_correction
Color processing
Section titled “Color processing”In short
Section titled “In short”Detects, corrects, and manipulates colors — from automatic color fixing to palette extraction.
What it does
Section titled “What it does”Handles color-specific operations: automatic color detection, color correction, palette limiting (reducing an image to a set number of colors), and targeted color changes like hair color or text color. Some models use this to enforce a specific color palette or fix color casts.
How to think about it
Section titled “How to think about it”Like the color correction tools in Photoshop or Premiere’s color wheels, but automated. The AI identifies dominant colors, fixes casts, or restricts the palette to a set number of hues. Useful for creating stylized looks (poster art, pixel art) or correcting color problems in the source.
Recommended settings
Section titled “Recommended settings”- Auto detect on: Let the model identify and fix color issues automatically — good starting point
- Max colors (low): Creates flat, poster-style images with limited palettes — use for graphic design or pixel art
- Max colors (high/unlimited): Preserves full color range — use for photorealistic output
- Color fix: Enable when the source has obvious color casts or white balance issues
Common mistakes
Section titled “Common mistakes”- Setting max colors too low for photorealistic content — faces look banded and unnatural with fewer than 64 colors
- Using color fix on content that’s intentionally color-graded — the AI “corrects” your creative choices
Also called
Section titled “Also called”color, color_fix, color_fix_type, colormap, colormode, max_colors, fill_color, font_color, hair_color, highlight_color, auto_color_detect, color_precision, dominant_color_threshold, fix_colors, txt_color
Crop and resize
Section titled “Crop and resize”In short
Section titled “In short”Controls how the output is cropped, padded, or resized to fit target dimensions.
What it does
Section titled “What it does”Adjusts the output framing after generation. Options include cropping to a bounding box (face or subject detection), padding with a specified color, resizing to original input dimensions, or targeting specific width/height values. Some models crop to fill, others pad to fit.
How to think about it
Section titled “How to think about it”Like the “Scale to Fill” vs “Scale to Fit” vs “Crop” options when placing footage into a Premiere sequence. Each approach handles the dimension mismatch differently — crop loses edges, pad adds borders, resize stretches or squashes.
Recommended settings
Section titled “Recommended settings”- Crop to fill: Best when you need exact dimensions and can afford to lose some edges
- Resize to original: Use when you want the output to match your input’s exact dimensions — common for image-to-image workflows
- Pad: Use when you need exact dimensions but can’t lose any content — the borders can be trimmed in Premiere
- Target dimensions: Set explicitly when your delivery format requires specific pixel counts (1920x1080, 1080x1080, etc.)
Common mistakes
Section titled “Common mistakes”- Cropping without checking what’s lost at the edges — important content (hands, props, text) may be cut off
- Resizing non-square output to square dimensions — subjects get stretched and distorted
Also called
Section titled “Also called”crop_size, crop_to_bbox, crop_to_fill, crop_duration, dimensions, pad_color, padding_values, resize_to_original, selection_crop, target_height, target_long_side, target_width
Denoising
Section titled “Denoising”In short
Section titled “In short”Reduces visual noise and grain in the generated output.
What it does
Section titled “What it does”Applies noise reduction during or after generation. Some models offer separate controls for high-resolution and low-resolution denoising passes, letting you clean up different types of noise independently.
How to think about it
Section titled “How to think about it”Like Neat Video or Premiere’s built-in noise reduction — it smooths out grain and speckle. The tradeoff is always the same: more denoising means cleaner images but softer fine detail. The sweet spot depends on whether you’d rather see grain or softness.
Recommended settings
Section titled “Recommended settings”- Low: Preserves fine texture and detail — some grain remains, but edges stay sharp
- Default: Balanced cleanup — good for most content
- High: Very clean, smooth output — but small details like skin texture, fabric weave, and hair strands may disappear
Common mistakes
Section titled “Common mistakes”- Maxing out denoising on content with important fine detail (fabric patterns, text, hair) — those details get smoothed away along with the noise
- Applying denoising in the AI model AND again in Premiere — double denoising creates a plastic, artificial look
Also called
Section titled “Also called”denoise, highres_denoise, lowres_denoise, noise_reduction
Depth estimation
Section titled “Depth estimation”In short
Section titled “In short”Generates a depth map showing how far each part of the image is from the camera.
What it does
Section titled “What it does”Analyzes an image and produces a grayscale depth map — bright areas are close to the camera, dark areas are far away. Used as input for other models (ControlNet, relighting, 3D effects) or as a standalone analysis tool.
How to think about it
Section titled “How to think about it”Like a LiDAR scan from your iPhone, but generated from a flat image. The AI infers depth from visual cues — perspective lines, object size, blur, and occlusion. The output is a grayscale map you can use for parallax effects, depth-of-field simulation, or as a control signal for other AI models.
Recommended settings
Section titled “Recommended settings”- Ensemble size (1): Single pass — fast but may have inconsistencies in complex scenes
- Ensemble size (5–10): Multiple passes averaged together — more accurate depth, especially at edges and occlusion boundaries
- Processing resolution: Higher means more accurate depth estimation but slower processing — match to your output needs, not your source resolution
Common mistakes
Section titled “Common mistakes”- Using a low ensemble size on complex scenes with many overlapping objects — depth edges become noisy and inaccurate
- Treating the depth map as ground truth — it’s an estimate, not a measurement. Fine details and transparent objects often get wrong depth values
Also called
Section titled “Also called”depth_and_normal, depth_scale, ensemble_size, include_raw_depths, preprocess_depth, processing_res
Face animation
Section titled “Face animation”In short
Section titled “In short”Controls facial expressions, eye movement, head rotation, and lip sync on generated or modified faces.
What it does
Section titled “What it does”Manipulates specific facial features on a generated or uploaded face: mouth shapes (open/closed vowel positions), eye blinks and winks, smiles, eyebrow raises, head pitch/yaw/roll, and pupil direction. Some models support lip sync driven by audio input and face enhancement for cleaner results.
How to think about it
Section titled “How to think about it”Like a virtual puppet rig in After Effects — each slider controls one aspect of the face. Mouth shapes work like phoneme targets in lip sync animation: aaa opens the mouth wide, eee stretches it horizontally, woo rounds the lips. Head rotation works like a 3-axis gimbal: pitch (nod), yaw (shake), roll (tilt).
Recommended settings
Section titled “Recommended settings”- Expression scale (0.5–0.8): Natural range — keeps expressions believable
- Expression scale (1.0+): Exaggerated — useful for animation or caricature, but faces start looking uncanny on real portraits
- Still mode on: Reduces head motion for talking-head content — keeps the face stable while expressions change
- Paste back on: Composites the animated face back onto the original image — essential for natural results
Common mistakes
Section titled “Common mistakes”- Cranking multiple expression sliders to their max simultaneously — the face distorts into something unnatural
- Forgetting to enable face enhancement when the source image is low resolution — the animation amplifies every pixel
Also called
Section titled “Also called”aaa, blink, eee, woo, wink, smile, expression, expression_scale, eyebrow, rotate_pitch, rotate_roll, rotate_yaw, pupil_x, pupil_y, face_enhancement, face_enhancer, flag_do_crop, flag_lip_retargeting, paste_back, still_mode, vx_ratio, vy_ratio
Inpainting
Section titled “Inpainting”In short
Section titled “In short”Edits specific regions of an image by painting a mask over the area you want to change.
What it does
Section titled “What it does”You provide an image and a mask (the area to modify), and the AI regenerates only the masked region while keeping everything else untouched. The strength parameter controls how much the masked area changes — low strength makes subtle edits, high strength replaces the region entirely.
How to think about it
Section titled “How to think about it”Like Content-Aware Fill in Photoshop, but guided by a text prompt. You mask the area you want to change, describe what should go there, and the AI fills it in while matching the surrounding context. The mask is your “selection” — everything outside it stays exactly as it is.
Recommended settings
Section titled “Recommended settings”- Strength (0.3–0.5): Subtle edits — change color, texture, or small details while preserving the original structure
- Strength (0.6–0.8): Moderate changes — replace objects or alter significant features
- Strength (0.9–1.0): Full replacement — the masked area is generated from scratch based on your prompt
- Erode/dilate: Shrink or expand the mask edges for cleaner boundaries — positive values expand, negative values shrink
Common mistakes
Section titled “Common mistakes”- Using a mask that’s too tight around the subject — leave some margin so the AI can blend the edges naturally
- Setting strength too high for small corrections — a color change doesn’t need 1.0 strength
Also called
Section titled “Also called”inpaint_mode, inpaint_strength, inpaint_mask_only, inpaint_engine, inpaint_erode_or_dilate, inpaint_respective_field, override_inpaint_options, draw_mode, outpaint_selections
Lighting
Section titled “Lighting”In short
Section titled “In short”Controls the direction, type, and style of lighting in the generated image.
What it does
Section titled “What it does”Sets the virtual light source for the scene — direction (top, left, front, etc.), type (ambient, directional, point), and overall lighting style. Some models support full relighting, which re-renders an existing image under completely new lighting conditions.
How to think about it
Section titled “How to think about it”Like moving a key light on a film set. Direction controls where shadows fall, type controls how hard or soft the light is, and style presets are like choosing between a studio setup and natural golden-hour light. Relighting is the AI equivalent of reshooting under different lighting without going back to set.
Recommended settings
Section titled “Recommended settings”- Front/top: Flat, even lighting — safe for product shots and portraits
- Side (left/right): Dramatic shadows — good for cinematic and editorial content
- Rim/back: Silhouette and edge lighting — use for atmospheric or dramatic shots
- Relighting: Enable when you need to completely change the mood of an existing image
Common mistakes
Section titled “Common mistakes”- Using strong directional lighting on subjects with complex geometry (jewelry, wrinkled fabric) — the AI-generated shadows may not follow the real shape correctly
- Applying relighting to images that already have strong directional light — the original shadows conflict with the new lighting direction
Also called
Section titled “Also called”light_direction, light_type, lighting_style, relight_parameters
Masking
Section titled “Masking”In short
Section titled “In short”Creates, adjusts, and inverts masks that define which areas of an image are processed.
What it does
Section titled “What it does”Generates or modifies masks used by inpainting, outpainting, and other region-specific operations. Controls include binarization (converting soft masks to hard black/white), inversion (swapping which area is selected), clamping (limiting mask intensity range), and type selection (auto-detected, manual, or segmentation-based).
How to think about it
Section titled “How to think about it”Like working with masks and mattes in Premiere or After Effects. A white area means “process this,” a black area means “leave this alone.” Binarization is like increasing matte contrast until there are no gray areas. Inversion flips your selection. Clamping limits how strong the mask effect can be, like limiting an adjustment layer’s opacity range.
Recommended settings
Section titled “Recommended settings”- Binarize on: Clean, hard-edged masks — best for inpainting where you want a clear boundary between edited and untouched areas
- Binarize off: Soft, feathered masks — better for blending and gradual transitions
- Invert: Flip when you’ve masked the wrong side — easier than repainting
- Clamp (lower 0.3, upper 0.7): Limits mask intensity range — useful for partial-strength edits that blend more naturally
Common mistakes
Section titled “Common mistakes”- Forgetting to invert the mask after auto-detection selects the background instead of the subject
- Using a binarized mask for subtle blending work — the hard edges create visible seams
Also called
Section titled “Also called”binarize_mask, invert_mask, mask_type, mask_only, mask_binarization_threshold, mask_clamp_lower, mask_clamp_upper, mask_start, mask_end, min_mask_region_area, revert_mask, mask_away_clip
Outpainting
Section titled “Outpainting”In short
Section titled “In short”Extends an image beyond its original borders — the AI fills in what’s outside the frame.
What it does
Section titled “What it does”Expands the canvas in any direction (top, bottom, left, right) and generates new content that seamlessly continues the existing image. The AI analyzes the edge content and extends it naturally — continuing backgrounds, landscapes, or patterns.
How to think about it
Section titled “How to think about it”Like Premiere’s reframing tools, but instead of cropping to fit a new aspect ratio, the AI generates new content to fill the extra space. Need a 16:9 image from a 1:1 source? Outpaint the sides. Need more headroom above a subject? Outpaint the top.
Recommended settings
Section titled “Recommended settings”- Small expansion (10–25%): Safest — the AI only needs to extend a small area, so consistency is high
- Medium expansion (25–50%): Good for aspect ratio conversion — enough room for meaningful new content
- Large expansion (50%+): Risky — the AI is inventing a lot of new content, and quality drops as you move further from the original edges
- Blur mask on: Feathers the boundary between original and generated content for seamless blending
Common mistakes
Section titled “Common mistakes”- Expanding by 100%+ in one direction and expecting the new content to look as good as the original — quality degrades with distance from the source
- Outpainting in all four directions simultaneously — the corners have almost no context to work from
Also called
Section titled “Also called”blur_mask, expand_bottom, expand_left, expand_mask, expand_ratio, expand_right, expand_top
Scene detection
Section titled “Scene detection”In short
Section titled “In short”Detects scene changes in video and applies different processing to each scene.
What it does
Section titled “What it does”Analyzes video input to identify where scenes change (cuts, transitions, significant visual shifts), then processes each scene segment independently. This ensures that style transfer, color correction, or other effects are applied consistently within each scene rather than averaging across cuts.
How to think about it
Section titled “How to think about it”Like Premiere’s scene edit detection, but built into the AI pipeline. Without scene detection, a style transfer model might blend the look of two completely different scenes at the cut point, creating an ugly transition. With it enabled, each scene gets its own processing pass.
Recommended settings
Section titled “Recommended settings”- On: Recommended for any multi-scene video input — prevents cross-scene contamination
- Off: Use only for single-shot clips where there are no scene changes
- Threshold: Lower values detect more subtle scene changes (dissolves, slow fades); higher values only detect hard cuts
Common mistakes
Section titled “Common mistakes”- Leaving scene detection off on a multi-cut montage — the AI blends styles across cuts, creating inconsistent looks at every edit point
- Setting the threshold too low on footage with lots of camera motion — the model mistakes fast pans for scene changes
Also called
Section titled “Also called”scene, scene_description, scene_threshold, use_scene_detection
Tiling
Section titled “Tiling”In short
Section titled “In short”Processes large images in smaller tiles to avoid running out of memory.
What it does
Section titled “What it does”Splits a large image into overlapping tiles, processes each tile individually, then stitches them back together. This allows models to handle images much larger than their native resolution without crashing. The overlap (stride) between tiles ensures seamless blending at boundaries.
How to think about it
Section titled “How to think about it”Like rendering a massive After Effects composition in sections — each section renders independently, then they’re composited together. The overlap is like the feather on a split-screen wipe: it ensures you can’t see the seam where tiles meet.
Recommended settings
Section titled “Recommended settings”- Tile size (512–1024): Standard — matches most models’ native processing resolution
- Stride (256–512): Overlap between tiles — higher stride means less overlap, faster processing but more visible seams. Half the tile size is a safe default
- When to use: Enable when working with images above 2048x2048 or when you see out-of-memory errors
Common mistakes
Section titled “Common mistakes”- Using very small tiles on a large image — too many tiles means exponentially more processing time and more seam boundaries to blend
- Setting stride equal to tile size (zero overlap) — you’ll see visible grid lines in the output where tiles meet
Also called
Section titled “Also called”tile_diffusion, tile_diffusion_size, tile_diffusion_stride, tile_size, tile_stride, tile_vae, tile_vae_decoder_size, tile_vae_encoder_size, tiling_mode
Translation
Section titled “Translation”In short
Section titled “In short”Translates audio or video dialogue from one language to another.
What it does
Section titled “What it does”Converts spoken content from a source language to a target language — either as a dubbed audio track or as part of a full video translation pipeline. Some models auto-detect the source language, others require you to specify it.
How to think about it
Section titled “How to think about it”Like sending your timeline to a dubbing house, but the AI handles both the translation and the voice performance. The result is a new audio track (or video with baked-in audio) in the target language, often preserving the original speaker’s voice characteristics.
Recommended settings
Section titled “Recommended settings”- Source language: Set explicitly when you know it — auto-detect works for common languages but may fail on dialects or code-switching
- Target language: The language you want the output in — use standard language codes (en, es, fr, de, ja, etc.)
- When to use: Localizing content for international audiences, creating multilingual versions of the same video
Common mistakes
Section titled “Common mistakes”- Relying on auto-detect for niche languages or accented speech — specify the source language explicitly for better results
- Expecting perfect lip sync in the translated version — AI dubbing matches timing approximately, not frame-perfectly
Also called
Section titled “Also called”output_language, source_lang, target_lang, target_language
Video chunking
Section titled “Video chunking”In short
Section titled “In short”Splits video processing into manageable segments for memory efficiency.
What it does
Section titled “What it does”When processing video, the model breaks the input into chunks of frames, processes each chunk, then assembles the output. Controls include chunk size (how many frames per batch), overlap between chunks (for smooth transitions), and decode chunk size (how many frames are decoded at once from the compressed format).
How to think about it
Section titled “How to think about it”Like rendering a long sequence in Premiere using “Use Previews” — instead of processing the entire timeline at once (which might crash), the system works through it in sections. The overlap between chunks is like having handles on each clip — it ensures smooth continuity where chunks meet.
Recommended settings
Section titled “Recommended settings”- Batch frames (4–8): Fewer frames per batch uses less memory but processes slower
- Batch frames (16–32): More frames per batch is faster but requires more memory — reduce if you see errors
- Overlap (2–4 frames): Enough overlap for smooth chunk boundaries — increase if you see flicker at chunk edges
- Sample stride (1): Process every frame — highest quality. Higher stride skips frames for speed
Common mistakes
Section titled “Common mistakes”- Setting batch frames too high and running into memory errors — start low and increase until you hit the limit
- Setting overlap to zero and getting visible “jumps” every N frames where chunks were stitched together
Also called
Section titled “Also called”batch_frames, decode_chunk_size, overlap, overlapping_tiles, sample_stride
Camera control
Section titled “Camera control”In short
Section titled “In short”Moves the virtual camera during video generation — pan, tilt, zoom, rotate, and dolly.
What it does
Section titled “What it does”Controls virtual camera movement during AI video generation. You can set horizontal and vertical angles (pan and tilt), zoom level, forward/backward movement (dolly), rotation, and even wide-angle lens distortion. Some models offer preset camera motions, others give you manual axis-by-axis control.
How to think about it
Section titled “How to think about it”Like programming a camera move on a motorized head or gimbal. Horizontal angle is pan (left/right), vertical angle is tilt (up/down), move forward is dolly (push in), and rotate is roll. The wide-angle lens option is like switching from a 50mm to a 14mm — it adds barrel distortion and a wider field of view.
Recommended settings
Section titled “Recommended settings”- Subtle motion (low values): Natural-feeling camera drift — good for adding life to static AI-generated scenes
- Moderate motion: Deliberate camera moves — pan to reveal, tilt to follow action, push in for emphasis
- Strong motion: Dramatic camera work — use sparingly, as aggressive camera moves combined with AI generation can produce artifacts
- Zoom (negative): Pull out / zoom out — good for reveal shots. Positive values push in
Common mistakes
Section titled “Common mistakes”- Combining multiple strong camera moves simultaneously (pan + zoom + tilt) — the AI struggles to maintain consistency with too many motion axes active at once
- Using aggressive camera control on short clips — the motion has no time to develop and looks like a jarring jump instead of a smooth move
Also called
Section titled “Also called”advanced_camera_control, camera, camera_angle, camera_control, horizontal_angle, move_forward, rotate_right_left, vertical_angle, wide_angle_lens, zoom, zoom_out_percentage
Other parameters
Section titled “Other parameters”Text input
Section titled “Text input”In short
Section titled “In short”Secondary text fields that give the AI additional context beyond your main prompt.
What it does
Section titled “What it does”These are specialized text inputs that supplement your primary prompt. Editing models use source/target prompt pairs to understand what to change. Rendering models use gen_text to place visible words in the output. Detection models use a detection prompt to know what to look for.
How to think about it
Section titled “How to think about it”Like giving different notes to different departments on set. Your main prompt is the director’s vision. The source prompt is the script supervisor’s continuity note (“this is what we have”). The target prompt is the revision (“this is what we want”). Gen_text is the prop department’s signage order.
Recommended settings
Section titled “Recommended settings”- Source + target prompts: Be specific about the difference — “a red car” to “a blue car” works better than vague descriptions
- Gen_text: Keep it short — AI text rendering degrades quickly past 5-6 words
- Detection prompt: Use simple, direct language — “a person wearing a hat” not “someone who appears to be wearing headwear”
Common mistakes
Section titled “Common mistakes”- Writing the same thing in the main prompt and additional_prompt — they stack, so you’re doubling the emphasis and may get exaggerated results
- Using source_prompt without target_prompt (or vice versa) on editing models — they work as a pair
Also called
Section titled “Also called”additional_prompt, source_prompt, target_prompt, gen_text, new_text, detection_prompt, original_vgl, new_vgl
Output settings
Section titled “Output settings”In short
Section titled “In short”Controls the file format, codec, and encoding quality of the delivered output.
What it does
Section titled “What it does”Determines what kind of file the model delivers and how it is compressed. You can specify video codec (H.264, HEVC), file format (mp4, webm, gif, png), quality level, target bitrate, and write mode. These settings affect file size, compatibility, and visual fidelity.
How to think about it
Section titled “How to think about it”Like the Export Settings dialog in Premiere. Codec and CRF control the compression tradeoff, output type is your container format, and bitrate sets the data rate ceiling. Getting these right means the AI output drops into your timeline without a re-encode.
Recommended settings
Section titled “Recommended settings”- H.264 + mp4: Maximum compatibility — plays everywhere, imports cleanly into Premiere
- CRF 18-23: Good quality-to-size ratio for AI-generated content — lower for hero shots, higher for drafts
- Match your timeline: If your sequence is ProRes, consider webm or high-bitrate mp4 to minimize generation artifacts before transcode
Common mistakes
Section titled “Common mistakes”- Choosing gif for video output longer than 3 seconds — file sizes explode and color depth drops to 256 colors
- Setting output bitrate very low to save space — AI-generated content has lots of fine detail that compresses poorly at low bitrates
Also called
Section titled “Also called”codec, crf, output_quality, output_type, output_bitrate, output_write_mode, H264_output
Video frame settings
Section titled “Video frame settings”In short
Section titled “In short”Controls how many frames are generated or extracted and at what intervals.
What it does
Section titled “What it does”Sets the frame count, sampling interval, and extraction behavior for video generation and processing models. You can cap the total frames, request an exact count, control how many frames each clip segment contains, or limit processing to just the first few seconds of a long input.
How to think about it
Section titled “How to think about it”Like setting in/out points and frame handles on a Premiere timeline. Max frames is your out point — it caps how long the generation runs. Frame interval is like setting a poster frame frequency — it controls how densely the model samples your input. First_n_seconds is a quick way to preview a long clip without processing the whole thing.
Recommended settings
Section titled “Recommended settings”- Max frames: Set to match your timeline gap — 120 frames at 24fps gives you a 5-second clip
- Frame interval (1): Every frame — best quality. Higher values skip frames for faster processing of long inputs
- First_n_seconds (3-5): Good for previewing how a model handles your footage before committing to a full-length generation
Common mistakes
Section titled “Common mistakes”- Setting number_of_frames without considering your FPS — 48 frames at 24fps is 2 seconds, but at 12fps it’s 4 seconds
- Using a high frame_interval on footage with fast motion — skipped frames mean the model misses key movements
Also called
Section titled “Also called”max_frame_num, max_frames, number_of_frames, frames_per_clip, frame_interval, frame_type, frame_index, first_n_seconds
Quantity and iteration
Section titled “Quantity and iteration”In short
Section titled “In short”How many outputs to generate and how many processing passes to run.
What it does
Section titled “What it does”Controls batch size (how many separate results you get per generation), processing depth (how many layers or passes refine the output), and recursive operations (like interpolation passes that compound with each run). More outputs and more passes mean better selection and quality, but proportionally higher cost and time.
How to think about it
Section titled “How to think about it”Like shooting multiple takes and choosing the best one. Number of images is your take count — generate 4 and pick the winner. Layers and iterations are like additional polish passes in color or audio mixing. Recursive interpolation is like running Optical Flow twice — each pass doubles your frame count.
Recommended settings
Section titled “Recommended settings”- Number of images (2-4): Good for picking the best result without excessive cost — especially useful for hero shots
- Max iterations: Start with the default. Only increase if you see quality improve visibly between passes
- Recursive interpolation (1-2): One pass doubles frames, two passes quadruples them — rarely need more than 2
Common mistakes
Section titled “Common mistakes”- Generating 10+ images per prompt during exploration — you’re paying for outputs you won’t even look at carefully
- Setting recursive interpolation passes to 3+ — each pass doubles the frame count exponentially, and quality gains plateau after 2
Also called
Section titled “Also called”num_clips, num_results, num_layers, number_of_images, series_amount, max_iterations, recursive_interpolation_passes
Style and mode
Section titled “Style and mode”In short
Section titled “In short”Preset visual styles, effects, and intensity controls that shape the overall look.
What it does
Section titled “What it does”Applies predefined visual aesthetics to the generation — from text-based style descriptions to reference textures, specific visual effects (blur, sketch, pixelate), and intensity sliders that control how strongly the style is applied. Some models offer curated effect presets, others accept freeform style prompts.
How to think about it
Section titled “How to think about it”Like applying a LUT plus creative effects in Premiere, but at generation time. The style prompt is your creative brief to the colorist. Effect type is like choosing a specific filter. Intensity is the opacity slider — 0 means no effect, 1 means full strength. Photo shot presets are like telling a camera operator “give me a close-up” versus “wide establishing shot.”
Recommended settings
Section titled “Recommended settings”- Style prompt: Be specific — “warm cinematic with shallow depth of field” works better than “nice looking”
- Intensity (0.5-0.7): Good starting point — strong enough to see the effect, subtle enough to look natural
- Effect type: Match to your project — sketch effects for storyboard work, blur for dream sequences, pixelate for retro aesthetics
Common mistakes
Section titled “Common mistakes”- Setting intensity to 1.0 on every generation — full-strength effects often look heavy-handed and artificial
- Combining a strong style prompt with a conflicting effect type — the model gets mixed signals and produces inconsistent results
Also called
Section titled “Also called”style_prompt, style_description, target_style, target_texture, effect_type, intensity, pikaffect, photo_shot
Virtual try-on
Section titled “Virtual try-on”In short
Section titled “In short”AI clothing fitting — places garments onto people in images.
What it does
Section titled “What it does”Takes a photo of a person and a photo of a garment, then generates a realistic composite showing the person wearing that clothing. You specify the garment category (upper body, lower body, full body), the type of garment photo you’re providing, and whether the model should auto-segment the garment or use the full image.
How to think about it
Section titled “How to think about it”Like a digital fitting room for e-commerce or costume design. Instead of physically trying on clothes, the AI composites the garment onto the person while accounting for body shape, pose, and lighting. The garment image is like a swatch — the AI adapts it to fit the person’s body in the photo.
Recommended settings
Section titled “Recommended settings”- Category: Match to your garment — upper body for shirts and jackets, lower body for pants and skirts, full body for dresses and jumpsuits
- Garment photo type (flat lay): Best results — clean, unobstructed view of the garment gives the AI the most information
- Segmentation free off (default): Let the model segment the garment for more precise fitting around edges
Common mistakes
Section titled “Common mistakes”- Using a garment photo with a busy background — the model may pick up background elements as part of the clothing
- Choosing the wrong category — fitting a full-body dress as “upper body” cuts off the bottom half of the garment
Also called
Section titled “Also called”category, cloth_type, garment_type, garment_photo_type, segmentation_free
Vectorization
Section titled “Vectorization”In short
Section titled “In short”Converts raster images to clean SVG vector paths for logos and illustrations.
What it does
Section titled “What it does”Traces the edges and color regions of a pixel image and converts them into scalable vector paths (SVG format). Controls include path accuracy, corner detection, noise filtering, and layer organization. The result is a resolution-independent file that scales to any size without pixelation.
How to think about it
Section titled “How to think about it”Like the Image Trace function in Adobe Illustrator. Path precision is your fidelity slider — higher values follow every pixel edge, lower values smooth and simplify. Filter speckle is like a minimum area threshold — it removes tiny noise artifacts that would become unnecessary paths. The output is meant for print, web, or motion graphics where you need infinitely scalable artwork.
Recommended settings
Section titled “Recommended settings”- Path precision (high): Use for detailed illustrations where accuracy matters — logos, technical drawings
- Path precision (low): Use for stylized, simplified results — poster art, icons
- Filter speckle (3-5): Removes small noise without losing intentional detail
- Snap grid on: Produces cleaner geometry — good for UI icons and geometric designs
Common mistakes
Section titled “Common mistakes”- Vectorizing a photograph and expecting clean results — vectorization works best on images with clear edges and flat color regions, not continuous-tone photos
- Setting path precision too high on a noisy source image — every speckle becomes a vector path, creating enormous files
Also called
Section titled “Also called”path_precision, corner_threshold, filter_speckle, splice_threshold, snap_grid, cleanup_jaggy, cleanup_morph, layer_difference, hierarchical
Sampling advanced
Section titled “Sampling advanced”In short
Section titled “In short”Low-level controls for the AI diffusion process — expert-only, leave at defaults unless troubleshooting.
What it does
Section titled “What it does”Fine-tunes the mathematical noise schedule and guidance behavior during generation. These parameters control how the model distributes processing effort across steps, when guidance influence starts and stops, and how statistical averaging is applied across multiple samples. Changing these affects the fundamental character of the generation process.
How to think about it
Section titled “How to think about it”Like tweaking render engine internals in After Effects or Nuke — these are the parameters that the engineers set, not the artists. Schedule_mu shapes the noise curve (similar to adjusting gamma on a levels control but for the generation process). The cfg turn-off point is like removing your reference monitor partway through a grade and trusting your eye for the final touches.
Recommended settings
Section titled “Recommended settings”- Leave at defaults: These are tuned per model by the developers — changing them without understanding the math usually makes things worse
- Schedule_mu: Only adjust if you see banding or sudden quality changes partway through generation
- Turn_off_cfg_start_si: Advanced trick — removing guidance in late steps can produce more natural detail, but results vary by model
Common mistakes
Section titled “Common mistakes”- Changing multiple sampling parameters at once — if results improve or degrade, you won’t know which parameter caused it
- Copying advanced sampling settings between different models — these values are model-specific and rarely transfer well
Also called
Section titled “Also called”schedule_mu, perturbation, last_scale_temp, t_min, t_max, smooth_start_si, turn_off_cfg_start_si, n_avg, n_min, n_max
Motion and physics
Section titled “Motion and physics”In short
Section titled “In short”Controls physical motion simulation, trajectories, and motion intensity in video generation.
What it does
Section titled “What it does”Governs how objects and cameras move through space in generated video. Includes overall motion intensity scoring, physics simulation forces (gravity, projectile arcs), predefined camera or object trajectories, subject tracking, adaptive motion that responds to content, and shape preservation that prevents objects from warping during movement.
How to think about it
Section titled “How to think about it”Like rigging up a motion control rig combined with physics simulation in After Effects. Motion score is your overall energy dial — low for a locked-off interview, high for an action sequence. Trajectories are like preset camera moves on a dolly or crane. Shape preservation is like enabling “Preserve Rigid Bodies” in a physics sim — it stops solid objects from bending like rubber.
Recommended settings
Section titled “Recommended settings”- Motion score (low): Calm, controlled movement — good for product shots, portraits, and talking heads
- Motion score (high): Dynamic, energetic — good for action, sports, and music video content
- Shape preservation (high): Use when objects must maintain their form — architecture, vehicles, rigid products
- Adapt motion on: Let the model decide motion intensity based on the content — good default for mixed scenes
Common mistakes
Section titled “Common mistakes”- Setting motion score high on content with fine detail (text, faces, architecture) — high motion warps and distorts these elements
- Using physics forces without understanding they simulate real-world physics — projectile force doesn’t mean “dramatic movement,” it means parabolic arcs with gravity
Also called
Section titled “Also called”motion_score, goal_force, projectile_force, trajectory, trajectories, track, adapt_motion, shape_preservation
Music production
Section titled “Music production”In short
Section titled “In short”Controls for AI music generation, stem separation, and audio editing.
What it does
Section titled “What it does”Configures AI music generation: tempo (BPM), genre tags, song structure, target duration, stem separation (isolating vocals, drums, bass), instrumental-only mode, and audio editing operations like extending, trimming, or remixing existing tracks.
How to think about it
Section titled “How to think about it”Like setting up a project in a DAW (GarageBand, Logic, Pro Tools). BPM is your session tempo — match it to your timeline’s beat markers for sync. Genres are like selecting instrument presets and style templates. Stems are like soloing individual tracks in a mix. Composition plan is your song’s arrangement chart — intro, verse, chorus, bridge, outro.
Recommended settings
Section titled “Recommended settings”- BPM: Match your timeline’s tempo — 120 BPM is standard pop/dance, 80-90 for hip-hop, 60-70 for ballads
- Duration: Set to match your timeline gap exactly — AI music that’s too short or too long means extra editing
- Stems (vocals): Extract to create karaoke versions or isolate dialogue from music beds
- Force instrumental on: Use for background music and underscore where vocals would compete with dialogue
Common mistakes
Section titled “Common mistakes”- Not setting BPM when your timeline has beat-synced edits — the generated music drifts out of sync with your cuts
- Using extend_duration on music with a clear ending — the AI continues past the natural conclusion, creating an awkward loop
Also called
Section titled “Also called”bpm, genres, composition_plan, music_duration, music_length_ms, stems, instrumental, force_instrumental, edit_mode, extend_duration
Detection and analysis
Section titled “Detection and analysis”In short
Section titled “In short”AI-powered object detection, image segmentation, and visual analysis parameters.
What it does
Section titled “What it does”Configures models that identify and label elements in images rather than generating new content. You specify what to look for (detection prompt or object class), the type of analysis (detection, segmentation, classification, captioning), analysis density and confidence thresholds, and whether to overlay results visually on the output.
How to think about it
Section titled “How to think about it”Like using Premiere’s auto-tagging or After Effects’ Roto Brush in analysis mode — the AI examines your image and reports back what it found, where things are, and how confident it is. Points per side is like the resolution of the analysis grid — more points means finer segmentation but slower processing. Confidence thresholds are like setting a minimum match quality for auto-keying.
Recommended settings
Section titled “Recommended settings”- Detection prompt: Be specific about what you want found — “red car” works better than “vehicle”
- Confidence threshold (0.5-0.7): Good balance between catching real objects and filtering false positives
- Points per side (16-32): Standard analysis density — increase for precise mask edges, decrease for speed
- Show visualization on: Useful for verifying detections before using the data downstream
Common mistakes
Section titled “Common mistakes”- Setting the confidence threshold too low and getting dozens of false positive detections cluttering the output
- Running detailed analysis on every frame of a video when you only need a few keyframes — analysis is per-frame and costs add up
Also called
Section titled “Also called”detection_prompt, object, object_name, task_type, points_per_side, pred_iou_thresh, stability_score_thresh, show_visualization, detailed_analysis
Miscellaneous
Section titled “Miscellaneous”In short
Section titled “In short”One-off parameters that appear on individual models and don’t fit other categories.
What it does
Section titled “What it does”Covers model-specific settings that are unique to one or a handful of models. These parameters don’t have enough commonality across models to warrant their own section, but they still affect the output in meaningful ways.
How to think about it
Section titled “How to think about it”Like custom effect controls on a third-party Premiere plugin — each plugin has its own unique settings that don’t map to any standard control. The parameter name usually hints at what it does, and the default value is almost always a safe starting point.
Recommended settings
Section titled “Recommended settings”- Start with defaults: Unfamiliar parameters almost always have sensible defaults — change one at a time and compare results
- Check the help icon: Click the help icon in modelBridge for context on any unfamiliar parameter
- Small changes first: Adjust by 10-20% from the default, compare, then adjust further if needed
Common mistakes
Section titled “Common mistakes”- Changing multiple unfamiliar parameters at once — if results change, you won’t know which setting was responsible
- Ignoring model-specific parameters entirely — they exist because they meaningfully affect the output for that model