Seedance 2.0 Prompt Guide: The Complete Image-to-Video Workflow for Ad Creative (2026)
Wanderson Jackson
Seedance 2.0 Prompt Guide: The Complete Image-to-Video Workflow for Ad Creative (2026)
Quick answer: Seedance 2.0 is ByteDance's quad-modal AI video model that accepts text, image, video, and audio in a single generation. For performance ads, the workflow that wins is image-to-video: generate a hero frame in Nano Banana 2, upload it to Seedance 2.0 as @Image1, and prompt only what should move. Keep prompts between 30 and 100 words using the structure Subject, Action, Camera, Style. Never use negative prompts. Timestamps like [00:00-00:05] act as hard editorial cuts.
What is Seedance 2.0?
Seedance 2.0 is ByteDance's flagship AI video generation model, released in early 2026. It is the first publicly available model to accept four input modalities (text, image, video, and audio) in a single generation, and the first to produce video with native, lip-synced audio in the same pass.
The hard specs:
Output length: 4 to 15 seconds, up to 1080p, 24 or 30 fps
Aspect ratios: 16:9, 2.35:1, 9:16, 1:1
Reference inputs per generation: up to 9 images, 3 videos (15 seconds combined), 3 audio files (15 seconds combined). 12 reference files maximum.
Universal Reference system: every uploaded asset receives an @ tag (@Image1, @Video2, @Audio1) that the prompt calls out by name
Native audio: dialogue with lip-sync, ambient sound, and music are generated alongside the video, not added in post
The multi-asset reference system is the reason this model matters for ad work. The same model uploaded as @Image1 keeps the same face across generations. The same hero bottle keeps the same label and proportions. That consistency is the difference between a one-off clip and a campaign of 30 ad variants.
Seedance 2.0 vs Veo 3, Sora 2, and Kling 2.1
Capability
Seedance 2.0
Veo 3
Sora 2
Kling 2.1
Max clip length
15s
8s
20s
10s
Max resolution
1080p
1080p
1080p
1080p
Native audio
Yes (dialogue, SFX, music)
Yes (limited)
Yes
No
Reference inputs
9 image + 3 video + 3 audio
1 image
Text + 1 image
1 image
Character consistency
Strong via @ tags
Moderate
Strong
Weak
Prompt language
Plain editorial English
Plain English
Plain English
Cinematography jargon
Best for
Multi-shot campaigns, ad creative
Single hero shots
Narrative scenes
Stylized motion
The decisive row is reference inputs. Nine reference images plus three reference videos let an ad team lock a character, lock a product, and lock a motion style across every scene in a campaign. No other model offers that range of control.
How do you write a Seedance 2.0 prompt?
Every Seedance 2.0 prompt that performs well follows the same four-block structure:
Subject. Who or what is in the scene, described visually.
Action. Exactly what they do, with verbs that carry motion intensity.
Camera. Shot type, lens behavior, and movement.
Style. Color, lighting, lens, atmosphere.
Length target: 30 to 100 words. Past 100, the model deprioritizes later instructions. Below 30, motion comes out stiff because Seedance has no degree of intensity to anchor to.
A working example:
A chef in a white linen apron plates a glossy short rib over polenta, fingertips precise, garnishing with microgreens. Slow push-in from medium shot to extreme close-up on the plate as steam rises. Warm golden-hour kitchen light through a single window, shallow depth of field, anamorphic flare, Sony Venice cinematic look.
The subject is anchored in sentence one. Action carries motion verbs (plates, garnishing). The camera move is named (slow push-in). Style is concrete: named camera body, named lighting condition, specific lens behavior.
Three rules no competing guide documents
These come from internal testing and the official ByteDance prompting notes that never made it into the public docs.
Timestamps are editorial cuts, not labels. Seedance reads [00:00-00:05] style blocks as hard cut instructions. Use them when you want a multi-shot sequence in a single generation.
Named camera bodies shift the learned aesthetic. "Sony Venice," "Arri Alexa," "Sony A7S3," and "Phantom Flex" each produce a recognizably different look. Use them like style anchors.
"Fast" is a quality tax. The single word that degrades output more than any other is "fast." If you need pace, pick one fast element (cuts OR camera OR subject motion) and hold the rest steady. Three fast elements at once produces jitter and artifacts.
How do you use reference images in Seedance 2.0?
Upload up to 9 images. Each gets an automatic @ tag in the order uploaded: @Image1, @Image2, and so on. The prompt then calls out what each asset contributes.
There are three patterns that cover 90% of ad workflows.
1. Character or product anchor. Upload one image. Reference it throughout the prompt.
@Image1 sits at the bar, lifts the glass slowly to her lips, eyes locked on the camera.
2. Style transfer. Upload an image with the exact lighting and color grade you want.
A man walks through a city alley at night. Match the lighting and color grade of @Image2.
3. Multi-asset composition. Character in @Image1, location in @Image2, product in @Image3.
@Image1 walks through @Image2, holding @Image3. Tracking shot left to right.
The image-to-video workflow that beats text-to-video for ads
Text-to-video makes Seedance imagine everything. Image-to-video makes it execute a frame you already approved. For ads, this distinction is decisive.
Text-to-video drifts. Even with detailed prompts, output varies: slightly different face, slightly different bottle, slightly different room. For one-off content, drift is fine. For a campaign of 20 ads featuring the same character or product, drift is fatal.
Image-to-video locks the visual contract. The reference image freezes brand identity. Seedance only animates. Brand stays consistent across the campaign. Motion varies per scene.
What camera movements does Seedance 2.0 understand?
Seedance recognizes editorial English, not cinematography jargon. The verbs below produce reliable, named camera behavior.
Movement
Use when
Slow push-in
Building intimacy or revealing detail on a still subject
Dolly out
Revealing scale, opening a scene
Tracking shot
Following a moving subject laterally
Crane down / crane up
Vertical reveal, scale transition
Whip pan
Hard transition between two subjects
Orbit
Product reveal, 360 degrees around the subject
Handheld
Documentary, UGC, urgency
Static / locked off
Atmosphere, dialogue, dramatic stillness
Drone push
Establishing shots, exterior scale
Aerial pull-back
Closing shot, reveal of context
The critical rule: describe camera movement and subject movement separately, in different sentences.
Correct: "The dancer spins slowly. Camera holds fixed framing."
Wrong: "Spinning camera around a dancing person."
Mixing the two is the single most common cause of jittery, unusable output.
How do you keep characters and products consistent across shots?
The @ reference system is the answer. Three rules govern it.
Rule 1: One asset, one role. @Image1 should represent one thing: your character OR your product OR your environment. Never a composite frame where the model has to guess which element to lock.
Rule 2: Reference by tag in every action sentence. "@Hero turns toward the door" outperforms "She turns toward the door" even when @Hero is the obvious subject. The model needs explicit anchoring on every instruction.
Rule 3: Re-anchor after cuts. Spatial continuity breaks every time Seedance cuts. After a cut, restate position and facing direction.
Cut to wide shot. @Hero now standing at the window, facing left, sunlight on her face.
Without re-anchoring, the second shot will place the character somewhere else with no continuity.
5 production-ready Seedance 2.0 prompt templates
Copy, paste, replace the variables, generate.
1. Product hero (ecommerce, 9:16)
@Product1 rotating slowly on a matte black turntable. Macro lens crawls around the product, catching every label detail and texture transition. Soft top light with a single rim light creating clean specular highlights. Deep black background, no environment, studio commercial style, shallow depth of field, Phase One look.
2. UGC creator testimonial (9:16, talking head)
@Creator1 holds @Product1 up to camera, smiling, speaks one sentence about why they use it. Handheld phone selfie aesthetic, natural window light, slight micro-shake, casual bedroom or kitchen background out of focus. Direct eye contact with lens, iPhone front camera look. Audio: creator delivers the line in their natural cadence, no background music.
3. Founder testimonial (16:9, dialogue)
@Founder1 sits at a wooden desk, speaks to camera. Slow push-in from medium shot to medium close-up over 8 seconds. Warm three-point lighting, soft key from camera left, gentle fill, hair light. Books and a laptop in the background, shallow depth of field, Sony A7S3 aesthetic. Audio: founder delivers dialogue with natural cadence, ambient room tone.
4. Cinematic brand spot (2.35:1, multi-shot)
[00:00-00:05] Wide aerial shot, @Location1 at golden hour, slow drone push toward the building. [00:05-00:10] Hard cut to interior medium shot, @Character1 walking through the entrance, tracking shot left to right. [00:10-00:15] Cut to extreme close-up on @Character1's eyes, jaw setting, slight micro-expression of recognition, static frame. Warm cinematic color grade, anamorphic flare, Arri Alexa look throughout.
5. Before / after demo (16:9, locked-off)
@Product1 sits unused on a kitchen counter. Static medium shot. Hard cut. Same counter, same framing, now @Product1 in use, hands cropped at frame edge demonstrating the action. Bright natural daylight, clean white kitchen, commercial product video style, locked-off camera both shots.
The full 33-scene library covering narrative, sports, ASMR, comedy, sci-fi, and fantasy is published as a companion guide. Link at the bottom of this page.
Common Seedance 2.0 mistakes and how to fix them
Mistake
Why it fails
Fix
Using negative prompts ("not blurry, not low quality")
The model does not parse negatives
Use positive constraints: "ultra sharp, anatomically accurate, natural motion"
Mixing camera and subject movement in one sentence
Output becomes shaky and incoherent
Describe each in a separate sentence
Using "fast" everywhere
Degrades quality of cuts, motion, and the scene
Choose one fast element, hold the rest steady
Off-screen state references ("the door is now open")
The model cannot infer state it has not rendered
Show state changes on camera before referencing them
Describe the physics: "her jaw clenches, fingers tap against her glass"
Prompts over 100 words
Late instructions get deprioritized
Tighten to 30 to 100 words, move detail to references
Exit and re-entry in the same continuous shot
Implicit cut breaks continuity
Treat exit-frame as a hard cut, re-anchor after
Describing age ("a young woman, a teen boy")
Age markers introduce safety filters and inconsistent renders
Use functional labels: "a figure in a wool cloak," "the rider"
The Seedance 2.0 + Nano Banana 2 pipeline
For performance ad creative, image-to-video beats text-to-video on every metric that matters: brand consistency, iteration speed, and revision cost. Here is the pipeline that produces shippable ads end to end.
Step 1. Hero frame in Nano Banana 2. Prompt Nano Banana 2 with your brand colors, lighting, composition, and product or character. Iterate until the frame is on-brand at a single-image level. This is your visual contract.
Step 2. Frame approval. Generate 4 to 8 variations of the hero frame. Pick the one that holds up under a brand QA lens. This is the only stage where brand approval lives.
Step 3. Upload to Seedance 2.0. The hero frame becomes @Image1. Optionally upload a style reference as @Image2 or a motion reference video as @Video1 if you have a target motion style from a prior shoot.
Step 4. Motion prompt. Write the Seedance prompt focused only on motion: what moves, how it moves, camera behavior. Static elements live in the reference image, not the prompt.
Step 5. Generate. Iterate motion only. The brand is locked. You are now art-directing the shot, not designing it.
Step 6. Chain shots for sequences over 15 seconds. Generate shot 1. Use the last frame as @Image1 for shot 2. Continue. This is how teams produce 30 to 60 second pieces from a 15 second model.
This is the workflow Avocado AI users run for performance creative every day. Storyboards holds every Nano Banana 2 frame and every Seedance generation in sequence so you build a campaign instead of orphan clips.
How to run Seedance 2.0 on Avocado AI
Avocado AI gives you the full Seedance 2.0 toolkit inside a multiplayer creative canvas. You upload your reference image, prompt the motion, and generate. Storyboards holds every shot in sequence so a campaign comes together as one artifact, not 40 disconnected MP4s.
The Avocado workflow:
Open a new Storyboard.
Drop your Nano Banana 2 hero frame onto the canvas.
Draw a connection from the frame to a video node.
Write the motion prompt in the video node.
Generate. The output renders inline, ready to chain.
Pricing starts at €19 per month on Intro. Most performance creative teams operate on Growth at €99 per month, which covers enough credits for 40 to 80 short-form ad variants. Pro at €249 unlocks the full quad-modal pipeline plus team seats.
Subject, Action, Camera, Style. Keep it between 30 and 100 words. Lead with who or what is in the scene, then what they do, then how it is filmed, then visual treatment. Use plain editorial English, not cinematography jargon.
Can Seedance 2.0 generate video longer than 15 seconds?
Not in a single generation. To extend, generate 15 seconds, use the last frame as @Image1 in a new generation, and continue. This frame-chaining technique is how teams produce 30 to 60 second pieces.
Does Seedance 2.0 support negative prompts?
No. Use positive constraints. Instead of "not blurry," write "ultra sharp." Instead of "no extra fingers," write "anatomically accurate hands." The model does not parse what to avoid.
How many reference files can I upload to Seedance 2.0?
Up to 12 reference files per generation: 9 images, 3 videos (15 seconds combined maximum), and 3 audio files (15 seconds combined maximum). Each receives an automatic @ tag in upload order.
What is the difference between Seedance 1.5 and Seedance 2.0?
Seedance 2.0 adds quad-modal input (video and audio references in addition to text and image), the @ reference tag system, native lip-synced audio, and roughly 3 times the maximum output length. It is a different category of model, not an incremental update.
How does Seedance 2.0 compare to Veo 3?
Seedance generates longer clips (15 seconds vs 8 seconds), accepts far more reference assets (12 vs 1), and produces native synced audio with dialogue. Veo 3 has slightly tighter physics on some motion types. For ad workflows that require character or product consistency across a campaign, Seedance wins by a wide margin.
Why do my Seedance 2.0 outputs look stiff or low energy?
Two causes. Either the prompt is under 30 words and the model has too little motion guidance, or the action verbs are weak. Replace "moves" with named actions: "lunges," "pivots," "drops to one knee." Replace "looks" with "locks eyes on," "scans the room," "narrows her gaze."
Is Seedance 2.0 free to use?
ByteDance offers limited free generations through Doubao. For production volume and the full multi-asset reference workflow, third-party platforms like Avocado AI bundle credits at predictable monthly pricing starting at €19 per month.
Author: Wanderson Jackson is the founder of Avocado AI, a multiplayer AI creative workspace built for performance ad teams and creative agencies.