What models does this workflow require?

Unlock the power of video stylization with our workflow! Transform input videos into stunning animations using Wan2.1 model, AnimeLineArt, and DepthAnything. Discover how to harness ControlNet, T5 text encoding, and frame interpolation for dynamic content. Learn more and get started now!

Use Case: Video
Best For: Video
Models: Flux
Wan2.1
Controlnet
Key Nodes: Controlnet
VRAM: Low VRAM (≤8GB)
Reading Time: 4 min

View Required Models More Video Workflows

Workflow Overview

Content type: Workflow

Primary intent: Download

Required Models

Flux
Wan2.1
Controlnet

Required Nodes

Controlnet

Setup Notes

Install the required models before opening the workflow template.
Recommended hardware: Low VRAM (≤8GB).
Use the download button above to import the workflow JSON into ComfyUI.

1. Workflow Overview

m98i6ak71n5emyh5ipoac621c4a9c97aa3b0c527bbdaf51d7df54fe38ef8bb9f0fce98fbeb54732ba22.gif

Purpose: Transforms input videos into stylized animations using Wan2.1 model with dual control via line art (AnimeLineArt) and depth maps (DepthAnything).
Key Tech: Combines ControlNet, T5 text encoding, and frame interpolation for dynamic content.

2. Core Models

Model Name	Function
Wan2.1-Fun-Control-14B	Main model for video generation (FP8 optimized).
AnimeLineArtPreprocessor	Extracts line art from input video for style control.
DepthAnythingPreprocessor	Generates depth maps for spatial consistency.
Florence2-Flux-Large	Auto-generates captions for video frames.

3. Key Nodes & Installation

Node Name	Function	Installation
WanVideoWrapper	Core nodes for video generation (model loading, sampling, encoding).	GitHub: `ComfyUI-WanVideoWrapper`
ControlNet Aux	Preprocessors for line art and depth maps.	ComfyUI Manager: `comfyui-controlnet-aux`
Video Helper Suite	Video loading/combining tools.	ComfyUI Manager: `comfyui-videohelpersuite`
Florence2	Image captioning.	GitHub: `comfyui-florence2`

Required Models:

Wan2.1-Fun-Control-14B_fp8_e4m3fn.safetensors (Download)
umt5-xxl-enc-bf16.safetensors (T5 encoder).

4. Workflow Structure

Input Group (上传视频及参考图):
- Inputs: Raw video (VHS_LoadVideo), reference image (LoadImage).
- Process:
  - Frame extraction → Line art + depth map generation.
  - Caption generation via Florence2Run.
- Outputs: Preprocessed images + text prompts.
Model Loading (wan模型):
- Loads Wan2.1, T5 encoder, VAE, and configures optimizations (TorchCompile, BlockSwap).
Generation Group (采样生成):
- Inputs: Preprocessed images, text prompts, control args.
- Process:
  - Text encoding (WanVideoTextEncode) → Image encoding (WanVideoImageToVideoEncode) → Sampling (WanVideoSampler).
- Outputs: Latent video representation.
Output Group:
- Decodes latent to images (WanVideoDecode) → Combines video (VHS_VideoCombine).

5. Inputs & Outputs

Inputs:
- Video (MP4), reference image (PNG).
- Resolution: 768x768 (adjusted via ImageResizeKJ).
- Prompts: Auto-generated (Florence2) or manual (example includes positive/negative prompts).
Output:
- Stylized video (H.264 MP4, 16fps).

6. Notes

VRAM: Minimum 16GB (recommended 24GB+ due to Wan2.1 size).
Common Errors:
- Frame limit exceeded: Adjust frame_load_cap (currently 81 frames).
- Line art failure: Ensure input video has motion.
Optimization:
- Enable fp8 mode for lower VRAM usage.
- Tweak BlockSwap for memory management.

FAQ

Related Workflows

Related by Use Case

Mastering Video-to-Video Translation: A Deep Dive into Wan2.1 VACE Model and ComfyUI

Unlock AI-powered video translation with Wan2.1 VACE Model! Discover a workflow that enhances each frame, controls depth, and optimizes generation. Learn how to leverage this innovative technology and transform your video content today!

Unlock Advanced Video Depth Control with Wan Model-Based Workflow

Unlock AI-powered video depth control with our Wan model-based workflow. Discover how to extract depth maps, stylize videos with text guidance, and more. Dive into the details now!

Unleash AI-Powered Video Character Redraw: Transforming Videos with Style

Unlock AI-powered video character redrawing with Wan2.1Fun! Discover how this workflow leverages Stable Diffusion, GroundingDino, and Openpose to transform characters into stylized images and videos. Learn more and elevate your video editing skills!

From Pose to Playback: Mastering Video Generation with Tongyi Wanxiang's Fun-ControlNet

Tongyi Wanxiang-WAN2.1-Fun ControlNet Video Generation: Create dynamic videos with pose/depth control & style control. Learn how this workflow generates videos, controls content, and upscales resolution.

Related by Model

Mastering Video-to-Video Translation: A Deep Dive into Wan2.1 VACE Model and ComfyUI

Unlock Advanced Video Depth Control with Wan Model-Based Workflow

Unlock AI-powered video depth control with our Wan model-based workflow. Discover how to extract depth maps, stylize videos with text guidance, and more. Dive into the details now!

Unleash AI-Powered Video Character Redraw: Transforming Videos with Style

From Pose to Playback: Mastering Video Generation with Tongyi Wanxiang's Fun-ControlNet

Unlock Advanced Video Depth Control with Wan Model-Based Workflow

Unlock AI-powered video depth control with our Wan model-based workflow. Discover how to extract depth maps, stylize videos with text guidance, and more. Dive into the details now!

Unleash AI-Powered Video Character Redraw: Transforming Videos with Style

From Pose to Playback: Mastering Video Generation with Tongyi Wanxiang's Fun-ControlNet

Looking for more Video workflows? Browse the Video hub for additional templates and guides.

"Wan2.1 Multiverse Workflow: Generate Stunning Cooking Cat Videos"

Unlock 360-Degree Product Animation with AI-Powered Video Generation

Summary

Chapter

workflow:

CustomNodes:

WanVideoEnhanceAVideo WanVideo...

workflow

Transform Your Videos into Stylized Animations with Advanced AI Technology

Workflow Overview

Required Models

Required Nodes

Setup Notes

1. Workflow Overview

2. Core Models

3. Key Nodes & Installation

4. Workflow Structure

5. Inputs & Outputs

6. Notes

FAQ

What models does this workflow require?

How much VRAM is recommended?

Can this workflow be used commercially?

Which ComfyUI nodes are involved?

Related Workflows

Related by Use Case

Related by Model

Related by Node

Summary

Chapter