ACE-Step 1.5 is Now Available in ComfyUI
Commercial-grade music generation on consumer hardware
- Models
- Lora
- VRAM
- High VRAM (24GB+)
- Reading Time
- 4 min
Workflow Overview
Commercial-grade music generation on consumer hardware
Content type: Workflow
Primary intent: Download
Required Models
- Lora
Setup Notes
- Install the required models before opening the workflow template.
- Recommended hardware: High VRAM (24GB+).
- Use the download button above to import the workflow JSON into ComfyUI.
ACE-Step 1.5 Now Accessible in ComfyUI
We're pleased to announce that ACE-Step 1.5 has been released for ComfyUI! This significant enhancement of the open-source music generation model provides professional-grade quality on your computer, creating entire songs in less than 10 seconds using standard hardware.
Updates in ACE-Step 1.5
ACE-Step 1.5 features a new hybrid design that revolutionizes AI music creation. A Language Model acts as a versatile planner, converting basic user inputs into detailed song blueprints—spanning brief loops to compositions lasting ten minutes.
Professional Sound Quality
Achieves higher quality than most commercial music systems, scoring 4.72 in musical consistency metricsRapid Generation
Creates a complete 4-minute track in approximately 1 second on an RTX 5090 or under 10 seconds on an RTX 3090Standard Hardware Compatibility
Multilingual Support
Accurately follows instructions in 50+ languages, with excellent performance in English, Chinese, Japanese, Korean, Spanish, German, French, Portuguese, Italian, and Russian
Chain-of-Thought Approach
Utilizing sequential reasoning, the model combines metadata, lyrics, and captions to direct the diffusion technique, yielding more unified extended pieces.
LoRA Personalization
ACE-Step 1.5 enables style-specific adjustments through LoRA training. With just a handful of tracks, personal sound characteristics can be acquired and applied locally while maintaining data privacy.
Functionality Overview
ACE-Step 1.5 merges multiple structural advancements:
Combined LM + DiT Framework: Language Model organizes musical elements while a Diffusion Transformer manages sound production
Adaptive Matching Refinement: Uses Z-Image's DMD2 for accelerated production (2 seconds on A100) and superior results
Built-in Reinforcement Learning: Alignment occurs through internal processes, avoiding external bias influences
Self-Improving Tokenizer: The audio tokenizer evolves during DiT training to minimize generation-tokenizing discrepancies
Future Developments
While not yet compatible with ComfyUI, ACE-Step 1.5 has additional capabilities the community will likely implement.
Reinterpretation
Supply any existing track with fresh lyrics and instructions for complete stylistic reinvention
Revision
Regenerate specific segments when a composition is nearly ideal, seamlessly inserting corrections while preserving surrounding content
Sample Vocal Compositions
Neo-Soul: A warm, organic neo-soul track dripping with live instrumentation and effortless groove. A live drummer plays a loose, hip-hop influenced pocket—soft kick drum with lazy swing, snare hits that sit just behind the beat, and brushed hi-hats that breathe and shuffle with human imperfection.
UK Garage: A skippy, energetic UK garage track built on a classic two-step drum pattern with shuffling hi-hats and a punchy, syncopated kick and snare. A warm, wobbling Reese bass line provides the low-end foundation and chopped, pitched-up female vocal samples create the melodic hooks.
K-Pop: A slick, maximalist K-pop track that genre-hops with precision and style. The production shifts seamlessly between sections—a hard-hitting trap-influenced verse with rapid-fire rapping, a softer R&B pre-chorus with breathy vocals and lush harmonies, then an explosive, synth-driven pop chorus with an ear worm hook.
Sample Instrumental Works
Synth-wave: A nostalgic, cinematic ride through neon and chrome. Punchy gated drums with big reverb snare, arpeggiated synth lines running through chorus and delay, warm analog bass, and soaring lead melodies that feel heroic and bittersweet. Driving but emotional, like the credits rolling on a film that never existed.
Meditative Roller: A deep, meditative roller locked into a hypnotic 140 BPM groove, all smooth forward motion and late-night introspection. The bass line is the soul of it—warm, undulating, endlessly cycling through subtle variations like waves lapping at a shore, never jarring, never stopping.
Progressive House: A warm, rolling journey that builds patiently. Soft four-on-the-floor kick with airy hats, a plucky melodic synth hook that repeats and evolves, pads that swell across long phrases, and subtle acid bass bubbling underneath. Emotional but restrained, always moving forward toward a sunrise.
Initial Steps
Desktop & Local Users
Upgrade ComfyUI to the newest version
Visit Template Library → Audio and pick the ACE-Step 1.5 template
Acquire the model when requested (or manually via Hugging Face)
Input style markers and lyrics, then proceed
Download ACE-Step 1.5 Template
Get ACE-Step 1.5 Models
Workflow Suggestions
Style Indicators: Use descriptive terms including genre, instruments, feeling, speed, and vocal characteristics
Example:rock, hard rock, alternative rock, clear male vocalist, powerful voice, energetic, electric guitar, bass, drums, anthem, 120 bpmLyrical Organization: Employ section identifiers like
[verse],[chorus],[bridge]Track Length: Begin with 90–120 seconds for improved consistency; 180+ second compositions might need multiple runs
Multiple Generation: Set
batch_sizebetween 8-16 and select the optimal output
Happy creating!