Unleash AI-Powered Video Character Redraw: Transforming Videos with Style

CN
ComfyUI.org
2025-04-10 20:49:19

Unlock AI-powered video character redrawing with Wan2.1Fun! Discover how this workflow leverages Stable Diffusion, GroundingDino, and Openpose to transform characters into stylized images and videos. Learn more and elevate your video editing skills!

Use Case
Video
Best For
Video
Key Nodes
Controlnet
VRAM
Medium VRAM (12–16GB)
Reading Time
4 min
View Required ModelsMore Video Workflows

Workflow Overview

Unlock AI-powered video character redrawing with Wan2.1Fun! Discover how this workflow leverages Stable Diffusion, GroundingDino, and Openpose to transform characters into stylized images and videos. Learn more and elevate your video editing skills!

Content type: Workflow

Primary intent: Download

Required Models

  • Wan2.1
  • Controlnet
  • Sd

Required Nodes

  • Controlnet

Setup Notes

  • Install the required models before opening the workflow template.
  • Recommended hardware: Medium VRAM (12–16GB).

1. Workflow Overview

m9bcut8pbdpiuechjkw6acf8a263f6107cfe1be787f47dac5a192127eb2c95e5b29502af4c8bfad8e83.png

This workflow, named “wan2.1Fun_Video Character Redraw”, converts characters in a video into stylized images or videos using AI models. Key technologies include:

  • Frame Extraction: Extracts key frames from input video.

  • Segmentation & Pose Detection: Uses GroundingDino+SAM for person segmentation and Openpose for pose keypoints.

  • Text/Image-Guided Generation: Generates new content via Stable Diffusion (Wan2.1-Fun-Control).

  • Video Synthesis: Combines frames into a final video.

2. Core Models

  1. Stable Diffusion (Wan2.1-Fun-Control-14B)

    • Purpose: Generates high-quality images/videos from text/image prompts.

    • Model File: Wan2.1-Fun-Control-14B_fp8_e4m3fn.safetensors.

  2. GroundingDino + SAM

    • Purpose: Detects and segments characters (e.g., man label).

    • Model Files: GroundingDINO_SwinT_OGC, sam_vit_b_01ec64.pth.

  3. ControlNet (Openpose)

    • Purpose: Preserves original pose structure.

    • Model File: control_v11p_sd15_openpose.pth.

  4. Florence2

    • Purpose: Auto-generates image captions (prompt inversion).

    • Model File: Florence-2-large.

3. Key Nodes

  • Video Input:

    • VHS_LoadVideo: Loads video files (e.g., 2795746-uhd_2160_3840_25fps.mp4).

  • Character Processing:

    • GroundingDinoSAMSegment: Segments characters and generates masks.

    • OpenposePreprocessor: Extracts pose keypoints.

  • Generation Control:

    • WanVideoTextEncode: Processes text prompts (e.g., "futuristic robot").

    • WanVideoSampler: Controls sampling (steps=25, CFG=8).

  • Output Synthesis:

    • VHS_VideoCombine: Combines frames into MP4 (H.264).

4. Workflow Structure (Groups)

  1. Frame Redraw (Text-Based)

    • Input: Video + text prompts.

    • Output: Redrawn first frame.

  2. Wan2.1 Character Conversion

    • Input: Masks + pose data.

    • Output: Stylized video.

  3. Prompt Inversion (Florence2)

    • Input: Reference image.

    • Output: Auto-generated detailed caption.

5. Inputs & Outputs

  • Inputs:

    • Video file (MP4).

    • Optional text prompts.

    • Generation params (512x910, Euler sampler).

  • Output:

    • Generated video (e.g., AnimateDiff_00027.mp4).

6. Notes

  1. Dependencies:

    • Install via ComfyUI Manager:

      • ComfyUI-WanVideoWrapper (video generation).

      • comfyui_controlnet_aux (pose extraction).

      • comfyui-florence2 (prompt inversion).

  2. Hardware:

    • Recommended VRAM ≥12GB (Wan2.1 model is large).

  3. Troubleshooting:

    • Model path errors: Verify .safetensors file locations.

    • Video encoding issues: Adjust CRF in VHS_VideoCombine (default=19).

FAQ