Unlocking Realistic Motion Retargeting: A Deep Dive into Wan2.1-Fun-Control

CN
ComfyUI.org
2025-05-27 08:24:56

Discover how to retarget motion from a source video to a target character using the Wan2.1-Fun-Control model, a powerful tool for creating realistic character animations. Learn the workflow, key technologies, and core models involved in this innovative process.

Use Case
Video
Best For
Video
Models
Wan2.1
VRAM
Low VRAM (≤8GB)
Reading Time
4 min
View Required ModelsMore Video Workflows

Workflow Overview

Discover how to retarget motion from a source video to a target character using the Wan2.1-Fun-Control model, a powerful tool for creating realistic character animations. Learn the workflow, key technologies, and core models involved in this innovative process.

Content type: Workflow

Primary intent: Download

Required Models

  • Wan2.1

Setup Notes

  • Install the required models before opening the workflow template.
  • Recommended hardware: Low VRAM (≤8GB).

1. Workflow Overview

mbqc5s4trjfdwqrr27ezgif-82fa8552c995c1.gif
  • Purpose: Motion retargeting from a source video to a target character using Wan2.1-Fun-Control model.

  • Key Tech:

    • Pose Extraction: DWPreprocessor detects keypoints from input video.

    • Multimodal Control: CLIP vision + T5 text + depth maps (DepthAnythingPreprocessor).

    • Temporal Coherence: WanFunControlToVideo generates frame-consistent videos.

2. Core Models

Model Name

Function

Wan2.1-Fun-Control-14B

Base motion control model (14B params, FP8 optimized).

umt5-xxl_fp8_e4m3fn_scaled

Text encoder for prompts (e.g., negative prompts to filter bad frames).

depth_anything_vitl14

Depth preprocessor for spatial consistency.

3. Key Nodes

3.1 Input Processing

  • VHS_LoadVideo:

    • Loads input video (e.g., 5月12日 0.8.mp4), extracts frames (25FPS default).

  • LoadImage:

    • Loads target character image (e.g., 00088-3677135724.png).

3.2 Motion Analysis

  • DWPreprocessor:

    • Extracts pose keypoints (using yolox_l.onnx and dw-ll_ucoco_384).

  • DepthAnythingPreprocessor:

    • Generates depth maps for background alignment.

3.3 Video Generation

  • WanFunControlToVideo:

    • Key params: 832x480 output, 81 frames (~3.24s), CFG=1.0.

    • Inputs: Pose keypoints + CLIP features + text conditioning.

  • KSampler:

    • Settings: 20 steps, Euler sampler, fixed seed (198).

3.4 Post-Processing

  • SkipLayerGuidanceWanVideo:

    • Skips UNet layers (9,10) at 0.2 strength for detail/fluency balance.

  • WanVideoEnhanceAVideoKJ:

    • Reduces flickering (strength=0.2).

4. Workflow Structure

Stage

Key Nodes

Function

Input Prep

VHS_LoadVideo + LoadImage

Loads video and target image.

Motion Extract

DWPreprocessor → DepthAnything

Extracts poses and depth maps.

Conditioning

CLIPTextEncode + CLIPVisionEncode

Encodes text/visual conditions.

Video Gen

WanFunControlToVideo → KSampler

Renders motion-retargeted frames.

Output Export

VHS_VideoCombine

Final video (H.264, CRF=15).

5. Inputs & Outputs

  • Inputs:

    • Source video (MP4, 25FPS recommended).

    • Target character image (PNG/JPG, transparent background preferred).

    • Optional text prompts (style control).

  • Output:

    • Motion-retargeted video (default 832x480, 25FPS).

6. Notes

  1. Hardware:

    • 16GB+ VRAM (RTX 4080+ recommended for 14B model).

    • Enable FP8 optimization (fp8_e4m3fn) for lower VRAM usage.

  2. Dependencies:

    • Download Wan2.1-Fun-Control-14B and depth_anything_vitl14.pth manually.

  3. Troubleshooting:

    • Reduce flickering: Increase KSampler steps (20→30) or lower SkipLayerGuidance strength (0.2→0.1).

    • Resolution errors: Match video/image aspect ratios (e.g., 512x512).

FAQ