ACE-Step 1.5 is Now Available in ComfyUI

CN
2026-02-04 02:01:42

Commercial-grade music generation on consumer hardware

Models
Lora
VRAM
High VRAM (24GB+)
Reading Time
4 min
Download Workflow JSONView Required Models

Workflow Overview

Commercial-grade music generation on consumer hardware

Content type: Workflow

Primary intent: Download

Required Models

  • Lora

Setup Notes

  • Install the required models before opening the workflow template.
  • Recommended hardware: High VRAM (24GB+).
  • Use the download button above to import the workflow JSON into ComfyUI.

ACE-Step 1.5 Now Accessible in ComfyUI

We're pleased to announce that ACE-Step 1.5 has been released for ComfyUI! This significant enhancement of the open-source music generation model provides professional-grade quality on your computer, creating entire songs in less than 10 seconds using standard hardware.

Updates in ACE-Step 1.5

ACE-Step 1.5 features a new hybrid design that revolutionizes AI music creation. A Language Model acts as a versatile planner, converting basic user inputs into detailed song blueprints—spanning brief loops to compositions lasting ten minutes.

  • Professional Sound Quality
    Achieves higher quality than most commercial music systems, scoring 4.72 in musical consistency metrics

  • Rapid Generation
    Creates a complete 4-minute track in approximately 1 second on an RTX 5090 or under 10 seconds on an RTX 3090

  • Standard Hardware Compatibility

  • Multilingual Support
    Accurately follows instructions in 50+ languages, with excellent performance in English, Chinese, Japanese, Korean, Spanish, German, French, Portuguese, Italian, and Russian

Chain-of-Thought Approach

Utilizing sequential reasoning, the model combines metadata, lyrics, and captions to direct the diffusion technique, yielding more unified extended pieces.

LoRA Personalization

ACE-Step 1.5 enables style-specific adjustments through LoRA training. With just a handful of tracks, personal sound characteristics can be acquired and applied locally while maintaining data privacy.

Functionality Overview

ACE-Step 1.5 merges multiple structural advancements:

  1. Combined LM + DiT Framework: Language Model organizes musical elements while a Diffusion Transformer manages sound production

  2. Adaptive Matching Refinement: Uses Z-Image's DMD2 for accelerated production (2 seconds on A100) and superior results

  3. Built-in Reinforcement Learning: Alignment occurs through internal processes, avoiding external bias influences

  4. Self-Improving Tokenizer: The audio tokenizer evolves during DiT training to minimize generation-tokenizing discrepancies

Future Developments

While not yet compatible with ComfyUI, ACE-Step 1.5 has additional capabilities the community will likely implement.

Reinterpretation

Supply any existing track with fresh lyrics and instructions for complete stylistic reinvention

Revision

Regenerate specific segments when a composition is nearly ideal, seamlessly inserting corrections while preserving surrounding content

Sample Vocal Compositions

Neo-Soul: A warm, organic neo-soul track dripping with live instrumentation and effortless groove. A live drummer plays a loose, hip-hop influenced pocket—soft kick drum with lazy swing, snare hits that sit just behind the beat, and brushed hi-hats that breathe and shuffle with human imperfection.
UK Garage: A skippy, energetic UK garage track built on a classic two-step drum pattern with shuffling hi-hats and a punchy, syncopated kick and snare. A warm, wobbling Reese bass line provides the low-end foundation and chopped, pitched-up female vocal samples create the melodic hooks.
K-Pop: A slick, maximalist K-pop track that genre-hops with precision and style. The production shifts seamlessly between sections—a hard-hitting trap-influenced verse with rapid-fire rapping, a softer R&B pre-chorus with breathy vocals and lush harmonies, then an explosive, synth-driven pop chorus with an ear worm hook.

Sample Instrumental Works

Synth-wave: A nostalgic, cinematic ride through neon and chrome. Punchy gated drums with big reverb snare, arpeggiated synth lines running through chorus and delay, warm analog bass, and soaring lead melodies that feel heroic and bittersweet. Driving but emotional, like the credits rolling on a film that never existed.
Meditative Roller: A deep, meditative roller locked into a hypnotic 140 BPM groove, all smooth forward motion and late-night introspection. The bass line is the soul of it—warm, undulating, endlessly cycling through subtle variations like waves lapping at a shore, never jarring, never stopping.
Progressive House: A warm, rolling journey that builds patiently. Soft four-on-the-floor kick with airy hats, a plucky melodic synth hook that repeats and evolves, pads that swell across long phrases, and subtle acid bass bubbling underneath. Emotional but restrained, always moving forward toward a sunrise.

Initial Steps

Desktop & Local Users

  1. Upgrade ComfyUI to the newest version

  2. Visit Template Library → Audio and pick the ACE-Step 1.5 template

  3. Acquire the model when requested (or manually via Hugging Face)

  4. Input style markers and lyrics, then proceed

Download ACE-Step 1.5 Template
Get ACE-Step 1.5 Models

Workflow Suggestions

  • Style Indicators: Use descriptive terms including genre, instruments, feeling, speed, and vocal characteristics
    Example: rock, hard rock, alternative rock, clear male vocalist, powerful voice, energetic, electric guitar, bass, drums, anthem, 120 bpm

  • Lyrical Organization: Employ section identifiers like [verse], [chorus], [bridge]

  • Track Length: Begin with 90–120 seconds for improved consistency; 180+ second compositions might need multiple runs

  • Multiple Generation: Set batch_size between 8-16 and select the optimal output

‌ Happy creating!

FAQ