Local Image Generation
Generate unlimited images on your own GPU — no subscriptions, no content filters you didn't choose, and full control over your workflow.
Why run image generation locally?
Cloud image generators charge per image, log your prompts, and apply content policies you can't change. Running locally means unlimited generation at effectively zero marginal cost once you have the hardware, with complete privacy and customisability.
Local image generation has also reached parity with — and in some cases surpassed — cloud quality, thanks to models like Flux.1 and SD 3.5 Large.
The tools
ComfyUI
A node-based workflow editor — extremely flexible and powerful. The professional's choice. Supports every model and extension. Steeper learning curve.
Automatic1111
The original web UI for Stable Diffusion. Huge extension ecosystem, mature, well-documented. Slower to adopt new architectures.
Forge
A fork of Automatic1111 optimised for VRAM efficiency. Runs larger models on less memory than the original. Good starting point for 8 GB cards.
InvokeAI
A polished, modern UI focused on creative professionals. Clean interface, good canvas tools, active development.
The models
| Model | Params | Min VRAM | Strengths |
|---|---|---|---|
| Stable Diffusion XL | 3.5B | 6 GB | Fastest, largest LoRA library, great for artistic styles |
| SD 3.5 Large | 8B | 10 GB | Excellent text rendering, photorealism, prompt adherence |
| Flux.1 [schnell] | 12B | 12 GB | High quality in 4 steps — very fast, permissive license |
| Flux.1 [dev] | 12B | 12 GB | Best quality at 20–30 steps, non-commercial license |
| HiDream | 17B | 16 GB | Cinematic detail, strong photorealism |
What about add-ons?
ControlNet models let you control image composition using a reference image, pose skeleton, or edge map. LoRAs are small (50–300 MB) fine-tune files that add a specific style or subject to a base model — you can mix multiple LoRAs with different weights. These run on top of your base model and consume additional VRAM.
Speed expectations
| GPU | Flux.1 schnell (4 steps, 1024px) |
|---|---|
| RTX 4060 (8 GB) — with SDXL instead | ~8 seconds (SDXL only) |
| RTX 4070 Super (12 GB) | ~18–25 seconds |
| RTX 4090 (24 GB) | ~6–8 seconds |
| RTX 5090 (32 GB) | ~3–5 seconds |
NVIDIA Tensor Cores are the engine behind every denoising step. An RTX 4060 (8 GB) runs SDXL comfortably and SD 3.5 Medium with Forge's memory optimisations. An RTX 4070 Super or 4060 Ti 16GB opens up Flux.1 at full quality. An RTX 4090 or 5090 generates Flux images in seconds and supports batch generation for serious workflows.
The DGX Spark's 128 GB lets you load multiple base models, ControlNets, and large LoRA stacks simultaneously — eliminating the model-swapping delays that slow down creative workflows on consumer GPUs. TensorRT-optimised pipelines on NVIDIA hardware can deliver an additional 2–4× speed boost over standard PyTorch inference.