From Images to Videos: A Deep Dive into the Wan2.1-I2V Workflow

CN
ComfyUI.org
2025-04-01 14:20:43

Unlock AI-powered video generation with Alibaba's Wan2.1 model! Learn how to create stunning videos from static images using this workflow guide.

Use Case
Video
Best For
Video
Models
Wan2.1
VRAM
Low VRAM (≤8GB)
Reading Time
3 min
View Required ModelsMore Video Workflows

Workflow Overview

Unlock AI-powered video generation with Alibaba's Wan2.1 model! Learn how to create stunning videos from static images using this workflow guide.

Content type: Workflow

Primary intent: Download

Required Models

  • Wan2.1

Setup Notes

  • Install the required models before opening the workflow template.
  • Recommended hardware: Low VRAM (≤8GB).

1. Workflow Overview

m8yl4m6u2spynalas0730227e81229577c45db24e1bd38380080f3d5c376c96a73fea21b1069e3d2a02.gif

This workflow utilizes Alibaba's Wan2.1 model to generate videos from static images (I2V). Key features:

  • Extracts image features via CLIP vision encoder

  • Processes multilingual prompts with T5 text encoder

  • Generates video latent using 14B-parameter Wan2.1-I2V model

  • Outputs animated WEBP/MP4 files


2. Core Models

Model Name

Function

File Source

Wan2.1-I2V-14B

Main video generator (480P)

Wan2_1-I2V-14B-480P_fp8_e4m3fn.safetensors

UMT5-XXL Text Encoder

Handles multilingual prompts

umt5-xxl-enc-fp8_e4m3fn.safetensors

OpenCLIP Vision Encoder

Extracts image semantics

open-clip-xlm-roberta-large-vit-huge-14_visual_fp16.safetensors


3. Key Nodes

Node Name

Function

Installation

Dependencies

WanVideoSampler

Controls video sampling (frames/CFG)

Requires WanVideo plugin

Main model + VAE

WanVideoImageClipEncode

Encodes input image to latent

Same as above

CLIP vision model

VHS_VideoCombine

Combines frames (supports audio)

Install ComfyUI-VideoHelperSuite

FFmpeg required


4. Workflow Structure

  • Group 1: Input Processing

    • LoadImage: Loads input image (e.g., 576x1024)

    • WanVideoTextEncode: Processes prompts (e.g., "A smiling ancient beauty")

  • Group 2: Model Loading

    • LoadWanVideoT5TextEncoder: Loads T5 encoder

    • WanVideoModelLoader: Loads 14B video model

  • Group 3: Video Generation

    • WanVideoSampler: Generates latent (30 frames, CFG=6)

    • WanVideoDecode: Decodes to image sequence via VAE


5. Inputs & Outputs

  • Required Inputs:

    • Image file (PNG/JPG)

    • Positive prompt (e.g., style description)

    • Negative prompt (e.g., "low quality, static")

  • Outputs:

    • Animated WEBP (default) or MP4

    • Resolution: 272x272 (adjustable)


6. Notes

⚠️ Troubleshooting:

  1. VRAM: 14B model requires ≥16GB GPU, enable bf16 precision

  2. Plugin: Manual install required:

    git clone https://github.com/AI-ModelScope/comfyui-wanvideo-plugin
  3. Models: Place all .safetensors in models/wanvideo/

FAQ