Unlock Anime-Style Image Generation with Multi-LoRA Fusion and Reference Image Guidance
Unlock anime-style image generation with reference image guidance and multi-LoRA fusion. Discover how to combine Meta-Llama, Stable Diffusion, and LoRA Ensemble for stunning results.
- VRAM
- Medium VRAM (12–16GB)
- Reading Time
- 4 min
Workflow Overview
Unlock anime-style image generation with reference image guidance and multi-LoRA fusion. Discover how to combine Meta-Llama, Stable Diffusion, and LoRA Ensemble for stunning results.
Content type: Workflow
Primary intent: Download
Required Models
- Flux
- Lora
- Sd
Setup Notes
- Install the required models before opening the workflow template.
- Recommended hardware: Medium VRAM (12–16GB).
1. Workflow Overview

Purpose: Anime-style image generation with reference image guidance and multi-LoRA fusion.
Key Features:
Image Captioning:
Joy_caption_twonode extracts prompts from input image.Multi-LoRA Stacking: 3 LoRAs (
Dark Fantasy/Beauty CG/Ancient Style) at 0.7 strength.Bilingual Support: Auto-translation via
LibLibTranslate.
2. Core Models
Model Name | Function |
|---|---|
Meta-Llama-3.1-8B | Image-to-text model for prompt generation ( |
Stable Diffusion Flux | Base model ( |
LoRA Ensemble |
|
3. Key Nodes
3.1 Image Input & Captioning
LoadImage: Loads reference image (e.g.,
00059-4156590861.jpg).Joy_caption_two:
Function: Generates English prompts via Meta-Llama.
Install: Requires
comfyui_slk_joy_caption_twoplugin (via ComfyUI Manager).
3.2 Prompt Processing
JoinStrings: Merges user keywords (e.g.,
miluo_cjsj, cloth) with auto-generated prompts.LibLibTranslate: Optional English-to-Chinese translation.
3.3 Multi-LoRA Fusion
LoraLoaderModelOnly:
Chained Loading: 3 LoRAs applied sequentially (strength=0.7 via
ReroutePrimitive).Model Source: Place
.safetensorsfiles inmodels/loras.
3.4 Generation & Output
KSampler:
Settings: Euler sampler, 20 steps, CFG=3.5, random seed.
VAEDecode: Uses
ae.sftVAE for latent decoding.
4. Workflow Structure
Group Name | Key Nodes | I/O Description |
|---|---|---|
Global Control |
| Input: Reference image, resolution (1200x1200), keywords. |
LoRA Models | 3x | Output: Fused model (strength=0.7). |
Generation |
| Output: Final image ( |
5. Inputs & Outputs
Inputs:
Reference image (JPEG/PNG).
Resolution: Set via
EmptyLatentImage(default 1200x1200).Keywords: e.g.,
miluo_cjsj, cloth(priority over auto-prompts).
Output:
Generated image (anime cyberpunk style, red-haired character).
6. Notes
VRAM: Recommend 12GB+ GPU (multi-LoRA is resource-intensive).
Dependencies:
Download
Meta-Llama-3.1-8Band 3 LoRA models manually.Place
ae.sftVAE inmodels/vae.
Troubleshooting:
If captioning fails, check model path in
Joy_caption_two.Adjust CFG (currently 3.5) if images are blurry.