ACE-Step 1.5 XL: Commercial-Grade Music Generation in ComfyUI

CN
2026-04-18 02:02:43

A 4B-parameter open-source music model that generates full songs in seconds — locally on consumer hardware

VRAM
High VRAM (24GB+)
Reading Time
2 min
Download Workflow JSON

Workflow Overview

A 4B-parameter open-source music model that generates full songs in seconds — locally on consumer hardware

Content type: Workflow

Primary intent: Comparison

Setup Notes

  • Install the required models before opening the workflow template.
  • Recommended hardware: High VRAM (24GB+).
  • Use the download button above to import the workflow JSON into ComfyUI.

The landscape of music synthesis has witnessed notable enhancements. ACE-Step 1.5 XL incorporates a 4-billion-parameter Diffusion Transformer decoder within its structure, yielding sonic fidelity comparable to premium commercial systems while operating directly on discrete graphics hardware.

Three specialized editions are available: xl-base prioritizes broad adaptability, xl-sft excels in acoustic precision, and xl-turbo maximizes processing velocity. Each iteration operates under MIT licensing provisions and employs legally authorized training datasets.

Instrumental Dark Synthwave
Melodic Dubstep featuring Female Vocals
Ambient Electronic with Female Vocals

Essential Attributes

  • Professional-Grade Output – Statistical assessments position results between Suno v4.5 and v5 versions, with 4B parameters producing fuller audio textures than previous 2B configurations

  • Rapid Synthesis Capability – Complete musical compositions generated in under 2 seconds using A100 hardware, or below 10 seconds on RTX 3090 systems. The xl-turbo variant reduces processing to just 8 steps (approximately 6× acceleration versus base/sft editions)

  • Adaptable Composition Length – Create musical segments ranging from brief 10-second motifs to comprehensive 10-minute arrangements

  • Expansive Sonic Palette – Granular control over tonal characteristics across numerous musical genres using over 1,000 distinct timbres

  • Multilingual Lyric Integration – Structural and stylistic manipulation using lyrical prompts in over 50 languages

  • Commercial Usage Authorization – MIT licensing framework utilizing royalty-free materials, public domain resources, and synthesized audio conversions from MIDI data

Model Selection

All three XL versions share identical 4B-parameter frameworks:
XL-Base – Premier versatility for creative exploration
Download Workflow

XL-SFT – Optimal sonic refinement with moderate diversity reduction
Download Workflow

XL-Turbo – Streamlined 8-step processing for accelerated iteration
Download Workflow

Implementation Process

  1. Obtain the updated ComfyUI release

  2. Access the Template Repository and search using "ACE Step" terminology

  3. Choose the corresponding workflow structure

  4. Follow the embedded guidelines to acquire necessary models

  5. Modify prompts and execute the generation sequence

Enjoy your creative exploration process!

FAQ