ElevenLabs is now available in ComfyUI

CN
2026-03-08 02:01:26

World-class voice cloning, text-to-speech, and sound effects generation

Use Case
Video
Best For
Video
Reading Time
2 min
More Video Workflows

Workflow Overview

World-class voice cloning, text-to-speech, and sound effects generation

Content type: Workflow

Setup Notes

  • Install the required models before opening the workflow template.

We are thrilled to inform you that ElevenLabs is now integrated into ComfyUI through Partner Nodes! This brings premium voice AI directly into your node interface — eliminating external tools and browser tab switching; simply drag, connect, and execute.

Whether developing podcast systems, adding narration to AI-created videos, extracting dialogue from cluttered audio, or replicating voices for characters, all tasks occur directly on your workspace now.


ElevenLabs Nodes

🗣️ Text to Speech

Input a prompt to receive synthesized speech. Create authentic audio from text — excellent for narrations, commentaries, and automated sound tracks. Integrate with video generation nodes for seamless content production.

🔄 Speech to Speech

Provide one voice recording to output a transformed version. Alter style, tone, or identity while preserving original rhythm and feeling. Ideal for dubbing, acting, and creative modifications.

📝 Speech to Text

Convert audio to text within your process. Use for subtitles, feeding dialogue into LLM analysis, or building audio-to-text-to-image sequences responsive to spoken input.

🎧 Voice Isolation

Clean noisy recordings by separating voices from background interference. Perfect for refining field captures or isolating clear speech before additional handling.

💬 Text to Dialogue

Create multi-speaker conversations from text. Assign different voices, manage exchanges, and produce lifelike dialogues — suitable for podcasts, audiobooks, educational content, or game scripts.

🔊 Text to Sound Effects

Describe a sound to generate it. Explosions, footsteps, rain, sci-fi ambience — whatever your project requires. Great for adding atmosphere to videos, constructing soundscapes, or prototyping game audio without sample libraries.

🎛️ Voice Selector

Choose from ElevenLabs’ library of pre-made voices. Select the right tone, accent, and style instantly, with no setup needed.


Why This Matters

Audio was often the missing element in ComfyUI workflows. While images, videos, 3D assets, and text could be generated, voice synthesis required separate steps. Now, with ElevenLabs as a Partner Node, you can establish fully multimodal pipelines:

  • Prompt → Image → Video → Voiceover — entirely within one diagram

  • Audio cleanup → Transcription → LLM processing — no exports or context changes

  • Generate dialogue → Overlay on produced video — end-to-end character sequences

These nodes function alongside other Partner Nodes, allowing simultaneous generation and quick iteration.


Get Started

  1. Update ComfyUI or ComfyUI Desktop to the newest version.

  2. Locate the ElevenLabs nodes in the Node Library or Templates sidebar.

  3. Place a node on your canvas and begin crafting.

Happy creating as usual!

FAQ