Dynamic VRAM in ComfyUI: Saving Local Models from RAMmageddon

CN
2026-03-26 02:10:25

The ongoing surge in RAM hardware costs has proven challenging for all users. To counter this, ComfyUI now features Dynamic VRAM—our adaptive memory enhancement framework.

This platform has consistently delivered the most streamlined approach for running diffusion models, now substantially upgraded. Our objective remains expanding accessibility to the most extensive open-source models.

Now available for Nvidia systems on Windows and Linux (excluding WSL) through ComfyUI's stable version, this optimization significantly reduces system RAM consumption while accelerating workflow processing. Dynamic VRAM transforms model weight management, ensuring seamless operation on memory-limited devices. Key enhancements include:

  • Reduced System RAM Demand: Noticeably cuts conventional RAM requirements for complex tasks.

  • OOM Error Elimination: Fully resolves crashes from inadequate weight offloading.

  • Accelerated Initialization: Model and LoRA loading times notably faster in specific scenarios.

  • No Paging Dependency: Operate models beyond physical RAM without slow system page files.

  • Enhanced VRAM Usage: Higher GPU memory utilization signals optimized performance (expected behavior).

  • Streamlined Development: Predictions for freeing memory pre-inference are now unnecessary.

Windows Task Manager Insight: RAM usage may not immediately drop if ample memory exists; ComfyUI caches weights for speed without page file reliance. Memory instantly releases upon demand from other applications.

Performance Metrics

ComfyUI maintains industry-leading efficiency for consumer hardware but now achieves measurable speed gains:

Video Analysis (WAN2.2 dual 14B fp16/fp8 models, 320x320x81f)
Windows, RTX 5060, 32GB/64GB RAM

Subgraph Parameter Panel

Total diffusion model size: 56GB (2×28GB fp16 weights).

Flux 2 Dev Default Workflow (bf16 text/diffusion)
Linux, Blackwell 6000 Pro

Subgraph Parameter Panel

Inner Workings: AI Model Dynamic Offloader (aimdo)

Dynamic VRAM operates through a custom PyTorch allocator handling model weights when system pressure occurs:

  1. VBAR Creation: Models establish Virtual Base Address Register regions—consuming zero physical VRAM, only GPU virtual space. Unallocated tensors trigger faults if prematurely accessed.

  2. fault() API: Tensors allocate physical VRAM at millisecond-precise computation demands.

  3. Memory Responses

  • Available VRAM: Weights load permanently until system pressure requires release.

  • Insufficient VRAM: Temporary GPU tensors execute operations without crashes.

  1. Priority Watermarks: Newer VBARs hold top priority. Forced evictions of low-priority weights set watermarks to skip redundant allocation attempts.

Updated Memory Strategy

ComfyUI bypasses weight unloading to RAM. Instead:

  • safetensors loader maps files to uncommitted memory via pointers (disabling deep copies).

  • Windows may report high RAM usage; weights release instantly when required.

  • Linux categorizes this as disk cache due to non-committal allocations.

Development Roadmap

Current efforts focus on:

  • Addressing performance regressions.

  • Adding AMD hardware compatibility.

  • Cutting RAM footprint (experimental --fp16-intermediates optimization).

  • Accelerating disk loading for NVMe configurations.
    Report issues via GitHub with full logs, workflows, and hardware specs. Prioritize total workflow execution time over iterations/second for benchmarks.