Transform Your Videos into Stylized Animations with Advanced AI Technology
Unlock the power of video stylization with our workflow! Transform input videos into stunning animations using Wan2.1 model, AnimeLineArt, and DepthAnything. Discover how to harness ControlNet, T5 text encoding, and frame interpolation for dynamic content. Learn more and get started now!
- Use Case
- Video
- Best For
- Video
- Models
- FluxWan2.1Controlnet
- Key Nodes
- Controlnet
- VRAM
- Low VRAM (≤8GB)
- Reading Time
- 4 min
Workflow Overview
Unlock the power of video stylization with our workflow! Transform input videos into stunning animations using Wan2.1 model, AnimeLineArt, and DepthAnything. Discover how to harness ControlNet, T5 text encoding, and frame interpolation for dynamic content. Learn more and get started now!
Content type: Workflow
Primary intent: Download
Required Models
- Flux
- Wan2.1
- Controlnet
Required Nodes
- Controlnet
Setup Notes
- Install the required models before opening the workflow template.
- Recommended hardware: Low VRAM (≤8GB).
- Use the download button above to import the workflow JSON into ComfyUI.
1. Workflow Overview

Purpose: Transforms input videos into stylized animations using Wan2.1 model with dual control via line art (
AnimeLineArt) and depth maps (DepthAnything).Key Tech: Combines ControlNet, T5 text encoding, and frame interpolation for dynamic content.
2. Core Models
Model Name | Function |
|---|---|
Wan2.1-Fun-Control-14B | Main model for video generation (FP8 optimized). |
AnimeLineArtPreprocessor | Extracts line art from input video for style control. |
DepthAnythingPreprocessor | Generates depth maps for spatial consistency. |
Florence2-Flux-Large | Auto-generates captions for video frames. |
3. Key Nodes & Installation
Node Name | Function | Installation |
|---|---|---|
WanVideoWrapper | Core nodes for video generation (model loading, sampling, encoding). | GitHub: |
ControlNet Aux | Preprocessors for line art and depth maps. | ComfyUI Manager: |
Video Helper Suite | Video loading/combining tools. | ComfyUI Manager: |
Florence2 | Image captioning. | GitHub: |
Required Models:
Wan2.1-Fun-Control-14B_fp8_e4m3fn.safetensors(Download)umt5-xxl-enc-bf16.safetensors(T5 encoder).
4. Workflow Structure
Input Group (
上传视频及参考图):Inputs: Raw video (
VHS_LoadVideo), reference image (LoadImage).Process:
Frame extraction → Line art + depth map generation.
Caption generation via
Florence2Run.
Outputs: Preprocessed images + text prompts.
Model Loading (
wan模型):Loads Wan2.1, T5 encoder, VAE, and configures optimizations (
TorchCompile,BlockSwap).
Generation Group (
采样生成):Inputs: Preprocessed images, text prompts, control args.
Process:
Text encoding (
WanVideoTextEncode) → Image encoding (WanVideoImageToVideoEncode) → Sampling (WanVideoSampler).
Outputs: Latent video representation.
Output Group:
Decodes latent to images (
WanVideoDecode) → Combines video (VHS_VideoCombine).
5. Inputs & Outputs
Inputs:
Video (MP4), reference image (PNG).
Resolution: 768x768 (adjusted via
ImageResizeKJ).Prompts: Auto-generated (Florence2) or manual (example includes positive/negative prompts).
Output:
Stylized video (H.264 MP4, 16fps).
6. Notes
VRAM: Minimum 16GB (recommended 24GB+ due to Wan2.1 size).
Common Errors:
Frame limit exceeded: Adjust
frame_load_cap(currently 81 frames).Line art failure: Ensure input video has motion.
Optimization:
Enable
fp8mode for lower VRAM usage.Tweak
BlockSwapfor memory management.