Unlock the Power of Text-to-Video Generation with Alibaba's Wanx-8G Model

CN
ComfyUI.org
2025-04-01 14:28:28

Unlock AI-powered video creation with Alibaba's Wanx-8G model! Learn how to generate stunning videos from text prompts using this beginner-friendly workflow. Discover advanced features like LoRA fine-tuning & tiled decoding. Get started now!

Use Case
Video
Best For
Video
Models
Lora
VRAM
Low VRAM (≤8GB)
Difficulty
Beginner-friendly
Reading Time
3 min
View Required ModelsMore Video Workflows

Workflow Overview

Unlock AI-powered video creation with Alibaba's Wanx-8G model! Learn how to generate stunning videos from text prompts using this beginner-friendly workflow. Discover advanced features like LoRA fine-tuning & tiled decoding. Get started now!

Content type: Workflow

Primary intent: Download

Required Models

  • Lora

Setup Notes

  • Install the required models before opening the workflow template.
  • Recommended hardware: Low VRAM (≤8GB).
  • Expected skill level: Beginner-friendly.

1. Workflow Overview

m8ylegj79odm8emnxp7dd6f03659d4b235294168ed166b083c0f268eb4ace9e31ec5142d01e6d5e024a.png

This workflow leverages Alibaba's Wanx-8G model for text-to-video generation, featuring:

  • Beginner-friendly: Pre-configured parameters

  • Advanced control: Supports LoRA fine-tuning & tiled decoding

  • Multi-format output: Direct MP4 (H.264) or animated image export


2. Core Models

Model Name

Function

Key Parameters

UMT5-XXL Text Encoder

Handles multilingual prompts

umt5_xxl_fp8_e4m3fn_scaled.safetensors

Wanx-8G UNET

Video latent generation

Default loading (no explicit file)

Tiled VAE Decoder

VRAM-optimized decoding

Tile size: 128x32


3. Key Nodes

Node Name

Function

Installation

EmptyHunyuanLatentVideo

Initializes video latent (832x480@33fps)

Requires Hunyuan plugin

VAEDecodeTiled

Reduces VRAM usage via tiling

Built-in

VHS_VideoCombine

Video compositing (H.264/MP4)

Install ComfyUI-VideoHelperSuite


4. Workflow Structure

  • Group 1: Text Input

    • CLIPTextEncode: Processes positive (e.g., "A fox in snowy scenery") and negative prompts

  • Group 2: Model Loading

    • UNETLoader: Loads Wanx-8G main model

    • LoraLoaderModelOnly: Optional LoRA (default strength=0.8)

  • Group 3: Video Generation

    • KSampler: Uses UniPC sampler (30 steps, CFG=6)

    • VAEDecodeTiled: Decodes latent with tiling


5. Inputs & Outputs

  • Required Inputs:

    • Positive prompt (English/Chinese)

    • Negative prompt (pre-set quality filters)

    • Frame count (default=33)

  • Outputs:

    • MP4 video (16FPS, H.264)

    • Resolution: 832x480 (adjustable)


6. Notes

⚠️ Critical Configs:

  1. VRAM Optimization:

    • Enable VAEDecodeTiled for 8GB GPUs

    • Use uni_pc sampler for faster generation

  2. Plugin:

    git clone https://github.com/AI-ModelScope/comfyui-hunyuan-plugin
  3. Video Quality:

    • Adjust crf in VHS_VideoCombine (18-28, lower=better)

FAQ