Run FLUX 2 Locally in 2026: VRAM by GPU + ComfyUI Setup

You can run FLUX 2 locally on a single consumer GPU in 2026. The open-weight FLUX 2 dev is a 32B model from Black Forest Labs that fits a 24GB card when quantized, while the smaller Klein builds run on 8GB. This guide picks the right variant for your card, installs it in ComfyUI, and covers what it costs to run.
Key Takeaways
- FLUX 2 dev needs a 24GB card; Klein runs on 8GB.
- ComfyUI plus Stability Matrix is the fastest way to start.
- Quantized GGUF builds cut VRAM in half with little quality loss.
- Running locally costs a fraction of a cent per image in power.
- Only dev and Klein have downloadable weights; Pro and Max are API only.

Which FLUX 2 Variant Fits Your GPU
FLUX 2 ships in several sizes, and only some have downloadable weights. Start by matching the model to your card, because that one choice drives everything else.
| Variant | Parameters | Min VRAM (quantized) | Open weights | Best for |
|---|---|---|---|---|
| FLUX 2 Klein 4B | 4B | ~8 GB | Yes | 8-12 GB GPUs, fast drafts |
| FLUX 2 Klein 9B | 9B | ~16 GB | Yes | 12-16 GB GPUs, balanced quality |
| FLUX 2 dev | 32B | ~18-24 GB (4-bit / GGUF) | Yes | 24 GB GPUs, top open-weight quality |
| FLUX 2 Flex | Not disclosed | API only | No | Typography and fine detail |
| FLUX 2 Pro | Not disclosed | API only | No | Production quality |
| FLUX 2 Max | Not disclosed | API only | No | Maximum quality, web-grounded output |
Here is the quick pick by card:
- RTX 4090 or 5090 (24-32 GB): run dev with a 4-bit or GGUF build.
- RTX 3090 (24 GB): dev works at 4-bit, but Klein 9B leaves more headroom.
- 16 GB cards (RTX 4080, 4070 Ti Super): Klein 9B, or dev at a small GGUF.
- 12 GB cards (RTX 3060, 4070): Klein 4B or a low GGUF build.
- 8 GB cards: Klein 4B only.
If you are coming from FLUX.1, dev is the natural upgrade: better prompt accuracy and text rendering at the same rough VRAM footprint. Klein replaces the old Schnell as the fast, low-VRAM option. Forge loads the same weight files as ComfyUI, so your card decides the variant, not your interface.
Hardware, Power, and Heat
VRAM gets all the attention, but the rest of the box still counts. You need 32GB of system RAM (64GB if you plan to offload), about 60GB of free storage for the model, VAE, and text encoder, and a power supply with headroom for sustained load.
Image generation pins your GPU near its full power draw for the whole render, so size your PSU and cooling around the card’s rated TDP:
| GPU | VRAM | Rated TDP |
|---|---|---|
| RTX 4070 | 12 GB | 200 W |
| RTX 4080 | 16 GB | 320 W |
| RTX 3090 | 24 GB | 350 W |
| RTX 4090 | 24 GB | 450 W |
| RTX 5090 | 32 GB | 575 W |
A 4090 or 5090 will dump 450 to 575 watts of heat into the room during a batch run, spin its fans up, and pull hard on the PSU. Give a 5090 a 1000W supply and good case airflow. The same card also powers other local AI work, from private RAG setups to local LLM inference, so it earns its keep as a general AI workstation, not just an image generator.
Install FLUX 2 With Stability Matrix
In 2026, the easiest way to manage local AI is Stability Matrix . It acts as a one-click installer that handles dependencies and keeps your models in a shared folder across interfaces. I run it on every machine because it saves me from repairing broken Python environments by hand.
You still want a few basics in place first:
- Python 3.10 or 3.11: stable releases beat the newest builds here.
- Git : for cloning repositories.
- CUDA 12.4+: update your NVIDIA drivers to match.
Then set up the interface:
- Download it: grab the installer for your system from the Stability Matrix releases page .
- Install ComfyUI: open the app, click “Add Package,” and choose ComfyUI . It is the best interface for FLUX 2’s node graph. Prefer a classic form-style UI? Pick Forge instead.
- Find the model folder: Stability Matrix creates one shared
Modelsdirectory. Open Settings and note the “Shared model folder” path, because the next step copies three files into it.
Downloading the Weights
FLUX 2 dev needs three files: the diffusion model, the text encoder, and the VAE. The ComfyUI team repackages them in the Comfy-Org/flux2-dev repository on Hugging Face, already split into the layout ComfyUI expects.
1. The diffusion model (FP8 build, about 35 GB):
- Download: flux2_dev_fp8mixed.safetensors
- Copy to:
models/diffusion_models/
This FP8 build fits a 24 GB card. If you have 80 GB of VRAM, grab the full-precision flux2-dev.safetensors instead.
2. The text encoder (Mistral 3 Small):
- Download: mistral_3_small_flux2_bf16.safetensors
- Copy to:
models/text_encoders/
FLUX 2 swapped the old CLIP encoder for a Mistral
language model, which is why prompt accuracy jumped. If VRAM is tight, grab mistral_3_small_flux2_fp8.safetensors from the same folder.
3. The VAE:
- Download: flux2-vae.safetensors
- Copy to:
models/vae/
Once all three sit in place, you can also fine-tune the model with LoRA adapters , which follow the same weight and directory structure.
Running the Model: Settings That Count
FLUX 2 uses its own sampling, so you need a FLUX-specific workflow. ComfyUI ships one built in.
- Open ComfyUI and wait for the canvas to load.
- Load the template: open Workflow, then Browse Templates, and pick “Flux 2 dev” under the Flux category. You can also drag any image from the ComfyUI Flux 2 examples page onto the canvas to load the same graph.
- Point the loaders at your files: set “Load Diffusion Model” to the FP8 build, “Load CLIP” to the Mistral encoder, and “Load VAE” to the VAE. If a dropdown looks empty, restart ComfyUI so it rescans the folders.
- Generate: type your prompt and click “Queue.” The first render is slow because ComfyUI loads the model into VRAM first.
The settings trip people up more than the install does. FLUX is not Stable Diffusion, so the old defaults are wrong:
- CFG (guidance): set it to 1.0. FLUX uses embedded guidance, so high CFG just oversaturates the image.
- Negative prompts: leave them empty. FLUX ignores them.
- Sampler and scheduler:
eulerwithsimpleis the safe baseline. - Steps: about 20 to 30 for dev. Klein 4B runs in 4 steps.
- Prompts: the Mistral encoder rewards plain, descriptive sentences over keyword soup. Write what you want as a short paragraph.
GGUF Quantization: Which Build for Your Card
Quantization is how a 32B model fits a gaming GPU. It shrinks the weights so they use less VRAM, with a small, usually invisible quality cost.
- FP8 (about 35 GB on disk): the default here, fits 24 GB in mixed precision.
- GGUF Q8: near-identical to full precision, the safe choice when you have the room.
- GGUF Q5 or Q4: smaller and faster, with a slight drop in fine detail, good for 12-16 GB cards.
For GGUF builds you also need the ComfyUI-GGUF node, then swap the standard “Load Diffusion Model” node for its GGUF loader. Generation speed tracks your card. Reported times for a 1024-pixel image land around 12 to 30 seconds on a 24 GB card at FP16, 40 to 55 seconds at 16 GB, and 60 to 80 seconds on a 12 GB quantized setup.
What It Costs Versus the Cloud
Running locally is cheap per image, but the GPU is not. It helps to see both numbers.
On the cloud side, fal.ai bills FLUX 2 dev at $0.012 per megapixel, so a standard 1024-pixel image costs about 1.3 cents.
Locally, you pay for electricity. A 4090 pulling 450W for a 20-second render uses about 0.0025 kWh. At a US average near $0.16 per kWh, that is roughly 0.04 cents per image, about 30 times cheaper than the cloud price per image.
The catch is the upfront card. At that gap, a $1,600 GPU pays for itself against cloud pricing only after roughly 130,000 images. Local rarely wins on raw price unless you generate at high volume. Its real advantages are privacy, no rate limits, no content filter, and unlimited iteration while you dial in a prompt.
Licensing: Can You Use FLUX 2 Commercially?
Licensing decides whether you can sell what you make, and the variants do not share one license.
| Variant | License | Commercial use |
|---|---|---|
| FLUX 2 dev | FLUX Non-Commercial License | No, without a separate self-hosted commercial license |
| FLUX 2 Klein | Apache 2.0 | Yes |
| FLUX 2 Flex / Pro / Max | API terms of service | Yes, via the API |
In short: the open-weight dev model is free to run and experiment with, but commercial output needs Black Forest Labs’ self-hosted commercial license or the paid API. Klein is the genuinely open option, size-distilled from the base model and shipped under Apache 2.0, so its output is yours to sell. If commercial rights are the goal and 24GB is a stretch, Klein is often the smarter pick.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Out-of-memory error | Build too large for your VRAM | Drop to a smaller GGUF, or enable weight offloading |
| Washed-out or oversaturated image | CFG set too high | Set CFG to 1.0 |
| Black or noisy output | VAE or precision mismatch | Reload the correct VAE, try an FP16 VAE |
| Generation crawls past 60 seconds | Offloading to system RAM | Use a smaller build that fits VRAM, or add RAM |
| Empty model dropdown | ComfyUI scanned before files landed | Restart ComfyUI to rescan the folders |
Frequently Asked Questions
How much VRAM do you need to run FLUX 2 locally?
About 18 to 24 GB for dev with a 4-bit or GGUF build. Klein 9B runs on 16 GB and Klein 4B on 8 GB. Full-precision dev needs 80 GB, so nearly everyone runs a quantized version.
What is the best tool to run FLUX locally?
ComfyUI is the most capable and gets FLUX features first. Forge is friendlier if you want a simple form-style UI. Stability Matrix installs and manages either one for you, which is the fastest path for a first setup.
How do you run FLUX 2 locally step by step?
Install ComfyUI through Stability Matrix, download the three model files (diffusion model, Mistral encoder, VAE) into their folders, load the built-in “Flux 2 dev” workflow, set CFG to 1.0, and queue a prompt.
Can FLUX 2 generate uncensored or NSFW images?
The dev release was trained on filtered data and ships NSFW filters. Running it locally removes the API content filter, but the base model is not built for explicit content. Community LoRAs and fine-tunes on Civitai add that, and they load like any other LoRA.
Which is better for ComfyUI: FLUX 2 dev or Klein?
The dev model gives the best prompt adherence and detail, but it needs 18 to 24 GB of VRAM. Klein 9B is the sweet spot for 16 GB cards, and Klein 4B suits 8 to 12 GB cards. Klein is also the only one you can use commercially without a license.
Can I run FLUX 2 on an Android phone?
No. Even Klein 4B needs about 8 GB of VRAM, which no phone has. Run the model on a desktop GPU and open ComfyUI in your phone browser, or use the Black Forest Labs API.
How do I fine-tune FLUX 2 on my own images?
Use a trainer like SimpleTuner or AI-Toolkit. Both support FLUX 2 LoRA training against the same weights you downloaded for inference. Our guide to fine-tuning the model with LoRA adapters walks through the workflow.
Botmonster Tech