Transform Your Videos into Stylized Animations with Advanced AI Technology

CN
ComfyUI.org
2025-04-08 12:55:45

Unlock the power of video stylization with our workflow! Transform input videos into stunning animations using Wan2.1 model, AnimeLineArt, and DepthAnything. Discover how to harness ControlNet, T5 text encoding, and frame interpolation for dynamic content. Learn more and get started now!

Use Case
Video
Best For
Video
Key Nodes
Controlnet
VRAM
Low VRAM (≤8GB)
Reading Time
4 min
View Required ModelsMore Video Workflows

Workflow Overview

Unlock the power of video stylization with our workflow! Transform input videos into stunning animations using Wan2.1 model, AnimeLineArt, and DepthAnything. Discover how to harness ControlNet, T5 text encoding, and frame interpolation for dynamic content. Learn more and get started now!

Content type: Workflow

Primary intent: Download

Required Models

  • Flux
  • Wan2.1
  • Controlnet

Required Nodes

  • Controlnet

Setup Notes

  • Install the required models before opening the workflow template.
  • Recommended hardware: Low VRAM (≤8GB).
  • Use the download button above to import the workflow JSON into ComfyUI.

1. Workflow Overview

m98i6ak71n5emyh5ipoac621c4a9c97aa3b0c527bbdaf51d7df54fe38ef8bb9f0fce98fbeb54732ba22.gif
  • Purpose: Transforms input videos into stylized animations using Wan2.1 model with dual control via line art (AnimeLineArt) and depth maps (DepthAnything).

  • Key Tech: Combines ControlNet, T5 text encoding, and frame interpolation for dynamic content.

2. Core Models

Model Name

Function

Wan2.1-Fun-Control-14B

Main model for video generation (FP8 optimized).

AnimeLineArtPreprocessor

Extracts line art from input video for style control.

DepthAnythingPreprocessor

Generates depth maps for spatial consistency.

Florence2-Flux-Large

Auto-generates captions for video frames.

3. Key Nodes & Installation

Node Name

Function

Installation

WanVideoWrapper

Core nodes for video generation (model loading, sampling, encoding).

GitHub: ComfyUI-WanVideoWrapper

ControlNet Aux

Preprocessors for line art and depth maps.

ComfyUI Manager: comfyui-controlnet-aux

Video Helper Suite

Video loading/combining tools.

ComfyUI Manager: comfyui-videohelpersuite

Florence2

Image captioning.

GitHub: comfyui-florence2

Required Models:

  • Wan2.1-Fun-Control-14B_fp8_e4m3fn.safetensors (Download)

  • umt5-xxl-enc-bf16.safetensors (T5 encoder).

4. Workflow Structure

  1. Input Group (上传视频及参考图):

    • Inputs: Raw video (VHS_LoadVideo), reference image (LoadImage).

    • Process:

      • Frame extraction → Line art + depth map generation.

      • Caption generation via Florence2Run.

    • Outputs: Preprocessed images + text prompts.

  2. Model Loading (wan模型):

    • Loads Wan2.1, T5 encoder, VAE, and configures optimizations (TorchCompile, BlockSwap).

  3. Generation Group (采样生成):

    • Inputs: Preprocessed images, text prompts, control args.

    • Process:

      • Text encoding (WanVideoTextEncode) → Image encoding (WanVideoImageToVideoEncode) → Sampling (WanVideoSampler).

    • Outputs: Latent video representation.

  4. Output Group:

    • Decodes latent to images (WanVideoDecode) → Combines video (VHS_VideoCombine).

5. Inputs & Outputs

  • Inputs:

    • Video (MP4), reference image (PNG).

    • Resolution: 768x768 (adjusted via ImageResizeKJ).

    • Prompts: Auto-generated (Florence2) or manual (example includes positive/negative prompts).

  • Output:

    • Stylized video (H.264 MP4, 16fps).

6. Notes

  • VRAM: Minimum 16GB (recommended 24GB+ due to Wan2.1 size).

  • Common Errors:

    • Frame limit exceeded: Adjust frame_load_cap (currently 81 frames).

    • Line art failure: Ensure input video has motion.

  • Optimization:

    • Enable fp8 mode for lower VRAM usage.

    • Tweak BlockSwap for memory management.

FAQ