Discover the Ultimate Video Transformation Workflow: Wan2.1 VACE Unleashed

CN
ComfyUI.org
2025-04-25 09:40:35

Transform videos into stylized animations with Wan2.1 VACE, Pose Control, and Depth Control. Discover how to leverage AI models for stunning visual effects and learn how to use this workflow to elevate your video editing skills.

Use Case
Video
Best For
Video
VRAM
Medium VRAM (12–16GB)
Reading Time
4 min
View Required ModelsMore Video Workflows

Workflow Overview

Transform videos into stylized animations with Wan2.1 VACE, Pose Control, and Depth Control. Discover how to leverage AI models for stunning visual effects and learn how to use this workflow to elevate your video editing skills.

Content type: Workflow

Primary intent: Download

Required Models

  • Flux
  • Wan2.1

Setup Notes

  • Install the required models before opening the workflow template.
  • Recommended hardware: Medium VRAM (12–16GB).

1. Workflow Overview

m9wlozlo7in3t7kocz23a3ea37ed55437a7436110c0b4c4e1fa8a6121ea0f62e25f6d6cb5f43b5f7fe.gif
  • Purpose:
    This workflow transforms input videos into stylized animations using Wan2.1 VACE with:

    • Pose Control (OpenPose) and Depth Control (Depth Map)

    • Frame interpolation (FILM VFI) and video upscaling

    • Auto-prompt generation via Florence2

  • Core Models:

    • Wan2.1 VACE: Main video generation model for style transfer

    • Florence2: Image captioning model for auto-prompts

    • DepthAnything V2: Depth map generator for structural control

    • FILM VFI: Frame interpolation model (16FPS → 32FPS)


2. Key Nodes

Node

Function

Installation

Dependencies

WanVideoModelLoader

Loads Wan2.1 model

ComfyUI-WanVideoWrapper

Download models: HuggingFace

DepthAnything_V2

Generates depth maps

ComfyUI-DepthAnythingV2

Requires depth_anything_v2_vitl_fp16.safetensors

Florence2Run

Auto-generates prompts

ComfyUI-Florence2

Load Florence-2-Flux-Large model

FILM VFI

Frame interpolation

Built-in

Download film_net_fp32.pt

VHS_VideoCombine

Video rendering/export

ComfyUI-VideoHelperSuite

Requires FFmpeg


3. Workflow Structure

Group 1: Input Setup

  • Inputs: Video file, reference image, seed, resolution cap (e.g., 1280x720)

  • Outputs: Preprocessed frames

Group 2: Control Generation

  • Pose Control: OpenPose keypoints via DWPreprocessor

  • Depth Control: Depth maps via DepthAnything_V2

  • Prompts: Manual input or auto-generated by Florence2

Group 3: Video Generation

  • Wan2.1 Model: Generates latent video frames

  • VACE Encoding: Encodes frames for model processing

Group 4: Post-Processing

  • Frame Interpolation: Upsamples to 32FPS with FILM VFI

  • Video Export: Combines frames into MP4


4. Inputs & Outputs

  • Required Inputs:

    • Video file (MP4)

    • Reference image (e.g., Girl_85_Highres.png)

    • Positive prompt (e.g., "Night scene, a dancing girl")

    • Resolution cap (default: 1280)

  • Output:

    • Final video (saved to output/Video)

    • Intermediate results (depth maps, pose keypoints)


5. Notes

  1. Hardware:

    • ≥12GB VRAM (use BlockSwap for lower VRAM)

    • Enable Triton/SageAttn for 20%-50% speed boost

  2. Troubleshooting:

    • Download missing models via ComfyUI Manager

    • Depth control is more stable than pose control

  3. Optimization:

    • Adjust blocks_to_swap (30-40) in WanVideoBlockSwap

FAQ