Run FLUX 2 Locally in 2026: VRAM by GPU + ComfyUI Setup

You can run FLUX 2 locally on a single consumer GPU in 2026. The open-weight FLUX 2 dev is a 32B model from Black Forest Labs that fits a 24GB card when quantized, while the smaller Klein builds run on 8GB. This guide picks the right variant for your card, installs it in ComfyUI, and covers what it costs to run.

Key Takeaways

  • FLUX 2 dev needs a 24GB card; Klein runs on 8GB.
  • ComfyUI plus Stability Matrix is the fastest way to start.
  • Quantized GGUF builds cut VRAM in half with little quality loss.
  • Running locally costs a fraction of a cent per image in power.
  • Only dev and Klein have downloadable weights; Pro and Max are API only.

FLUX 2 dev sample output showing a retro-futuristic cityscape with Japanese-inspired typography and cosmic sky
FLUX 2 produces photorealistic and stylized images with strong detail and coherence

Which FLUX 2 Variant Fits Your GPU

FLUX 2 ships in several sizes, and only some have downloadable weights. Start by matching the model to your card, because that one choice drives everything else.

VariantParametersMin VRAM (quantized)Open weightsBest for
FLUX 2 Klein 4B4B~8 GBYes8-12 GB GPUs, fast drafts
FLUX 2 Klein 9B9B~16 GBYes12-16 GB GPUs, balanced quality
FLUX 2 dev32B~18-24 GB (4-bit / GGUF)Yes24 GB GPUs, top open-weight quality
FLUX 2 FlexNot disclosedAPI onlyNoTypography and fine detail
FLUX 2 ProNot disclosedAPI onlyNoProduction quality
FLUX 2 MaxNot disclosedAPI onlyNoMaximum quality, web-grounded output

Here is the quick pick by card:

  • RTX 4090 or 5090 (24-32 GB): run dev with a 4-bit or GGUF build.
  • RTX 3090 (24 GB): dev works at 4-bit, but Klein 9B leaves more headroom.
  • 16 GB cards (RTX 4080, 4070 Ti Super): Klein 9B, or dev at a small GGUF.
  • 12 GB cards (RTX 3060, 4070): Klein 4B or a low GGUF build.
  • 8 GB cards: Klein 4B only.

If you are coming from FLUX.1, dev is the natural upgrade: better prompt accuracy and text rendering at the same rough VRAM footprint. Klein replaces the old Schnell as the fast, low-VRAM option. Forge loads the same weight files as ComfyUI, so your card decides the variant, not your interface.

Hardware, Power, and Heat

VRAM gets all the attention, but the rest of the box still counts. You need 32GB of system RAM (64GB if you plan to offload), about 60GB of free storage for the model, VAE, and text encoder, and a power supply with headroom for sustained load.

Image generation pins your GPU near its full power draw for the whole render, so size your PSU and cooling around the card’s rated TDP:

GPUVRAMRated TDP
RTX 407012 GB200 W
RTX 408016 GB320 W
RTX 309024 GB350 W
RTX 409024 GB450 W
RTX 509032 GB575 W

A 4090 or 5090 will dump 450 to 575 watts of heat into the room during a batch run, spin its fans up, and pull hard on the PSU. Give a 5090 a 1000W supply and good case airflow. The same card also powers other local AI work, from private RAG setups to local LLM inference, so it earns its keep as a general AI workstation, not just an image generator.

Install FLUX 2 With Stability Matrix

In 2026, the easiest way to manage local AI is Stability Matrix . It acts as a one-click installer that handles dependencies and keeps your models in a shared folder across interfaces. I run it on every machine because it saves me from repairing broken Python environments by hand.

You still want a few basics in place first:

  • Python 3.10 or 3.11: stable releases beat the newest builds here.
  • Git : for cloning repositories.
  • CUDA 12.4+: update your NVIDIA drivers to match.

Then set up the interface:

  1. Download it: grab the installer for your system from the Stability Matrix releases page .
  2. Install ComfyUI: open the app, click “Add Package,” and choose ComfyUI . It is the best interface for FLUX 2’s node graph. Prefer a classic form-style UI? Pick Forge instead.
  3. Find the model folder: Stability Matrix creates one shared Models directory. Open Settings and note the “Shared model folder” path, because the next step copies three files into it.

Downloading the Weights

FLUX 2 dev needs three files: the diffusion model, the text encoder, and the VAE. The ComfyUI team repackages them in the Comfy-Org/flux2-dev repository on Hugging Face, already split into the layout ComfyUI expects.

1. The diffusion model (FP8 build, about 35 GB):

This FP8 build fits a 24 GB card. If you have 80 GB of VRAM, grab the full-precision flux2-dev.safetensors instead.

2. The text encoder (Mistral 3 Small):

FLUX 2 swapped the old CLIP encoder for a Mistral language model, which is why prompt accuracy jumped. If VRAM is tight, grab mistral_3_small_flux2_fp8.safetensors from the same folder.

3. The VAE:

Once all three sit in place, you can also fine-tune the model with LoRA adapters , which follow the same weight and directory structure.

Running the Model: Settings That Count

FLUX 2 uses its own sampling, so you need a FLUX-specific workflow. ComfyUI ships one built in.

  1. Open ComfyUI and wait for the canvas to load.
  2. Load the template: open Workflow, then Browse Templates, and pick “Flux 2 dev” under the Flux category. You can also drag any image from the ComfyUI Flux 2 examples page onto the canvas to load the same graph.
  3. Point the loaders at your files: set “Load Diffusion Model” to the FP8 build, “Load CLIP” to the Mistral encoder, and “Load VAE” to the VAE. If a dropdown looks empty, restart ComfyUI so it rescans the folders.
  4. Generate: type your prompt and click “Queue.” The first render is slow because ComfyUI loads the model into VRAM first.

The settings trip people up more than the install does. FLUX is not Stable Diffusion, so the old defaults are wrong:

  • CFG (guidance): set it to 1.0. FLUX uses embedded guidance, so high CFG just oversaturates the image.
  • Negative prompts: leave them empty. FLUX ignores them.
  • Sampler and scheduler: euler with simple is the safe baseline.
  • Steps: about 20 to 30 for dev. Klein 4B runs in 4 steps.
  • Prompts: the Mistral encoder rewards plain, descriptive sentences over keyword soup. Write what you want as a short paragraph.

GGUF Quantization: Which Build for Your Card

Quantization is how a 32B model fits a gaming GPU. It shrinks the weights so they use less VRAM, with a small, usually invisible quality cost.

  • FP8 (about 35 GB on disk): the default here, fits 24 GB in mixed precision.
  • GGUF Q8: near-identical to full precision, the safe choice when you have the room.
  • GGUF Q5 or Q4: smaller and faster, with a slight drop in fine detail, good for 12-16 GB cards.

For GGUF builds you also need the ComfyUI-GGUF node, then swap the standard “Load Diffusion Model” node for its GGUF loader. Generation speed tracks your card. Reported times for a 1024-pixel image land around 12 to 30 seconds on a 24 GB card at FP16, 40 to 55 seconds at 16 GB, and 60 to 80 seconds on a 12 GB quantized setup.

What It Costs Versus the Cloud

Running locally is cheap per image, but the GPU is not. It helps to see both numbers.

On the cloud side, fal.ai bills FLUX 2 dev at $0.012 per megapixel, so a standard 1024-pixel image costs about 1.3 cents.

Locally, you pay for electricity. A 4090 pulling 450W for a 20-second render uses about 0.0025 kWh. At a US average near $0.16 per kWh, that is roughly 0.04 cents per image, about 30 times cheaper than the cloud price per image.

The catch is the upfront card. At that gap, a $1,600 GPU pays for itself against cloud pricing only after roughly 130,000 images. Local rarely wins on raw price unless you generate at high volume. Its real advantages are privacy, no rate limits, no content filter, and unlimited iteration while you dial in a prompt.

Licensing: Can You Use FLUX 2 Commercially?

Licensing decides whether you can sell what you make, and the variants do not share one license.

VariantLicenseCommercial use
FLUX 2 devFLUX Non-Commercial LicenseNo, without a separate self-hosted commercial license
FLUX 2 KleinApache 2.0Yes
FLUX 2 Flex / Pro / MaxAPI terms of serviceYes, via the API

In short: the open-weight dev model is free to run and experiment with, but commercial output needs Black Forest Labs’ self-hosted commercial license or the paid API. Klein is the genuinely open option, size-distilled from the base model and shipped under Apache 2.0, so its output is yours to sell. If commercial rights are the goal and 24GB is a stretch, Klein is often the smarter pick.

Troubleshooting

SymptomLikely causeFix
Out-of-memory errorBuild too large for your VRAMDrop to a smaller GGUF, or enable weight offloading
Washed-out or oversaturated imageCFG set too highSet CFG to 1.0
Black or noisy outputVAE or precision mismatchReload the correct VAE, try an FP16 VAE
Generation crawls past 60 secondsOffloading to system RAMUse a smaller build that fits VRAM, or add RAM
Empty model dropdownComfyUI scanned before files landedRestart ComfyUI to rescan the folders

Frequently Asked Questions

How much VRAM do you need to run FLUX 2 locally?

About 18 to 24 GB for dev with a 4-bit or GGUF build. Klein 9B runs on 16 GB and Klein 4B on 8 GB. Full-precision dev needs 80 GB, so nearly everyone runs a quantized version.

What is the best tool to run FLUX locally?

ComfyUI is the most capable and gets FLUX features first. Forge is friendlier if you want a simple form-style UI. Stability Matrix installs and manages either one for you, which is the fastest path for a first setup.

How do you run FLUX 2 locally step by step?

Install ComfyUI through Stability Matrix, download the three model files (diffusion model, Mistral encoder, VAE) into their folders, load the built-in “Flux 2 dev” workflow, set CFG to 1.0, and queue a prompt.

Can FLUX 2 generate uncensored or NSFW images?

The dev release was trained on filtered data and ships NSFW filters. Running it locally removes the API content filter, but the base model is not built for explicit content. Community LoRAs and fine-tunes on Civitai add that, and they load like any other LoRA.

Which is better for ComfyUI: FLUX 2 dev or Klein?

The dev model gives the best prompt adherence and detail, but it needs 18 to 24 GB of VRAM. Klein 9B is the sweet spot for 16 GB cards, and Klein 4B suits 8 to 12 GB cards. Klein is also the only one you can use commercially without a license.

Can I run FLUX 2 on an Android phone?

No. Even Klein 4B needs about 8 GB of VRAM, which no phone has. Run the model on a desktop GPU and open ComfyUI in your phone browser, or use the Black Forest Labs API.

How do I fine-tune FLUX 2 on my own images?

Use a trainer like SimpleTuner or AI-Toolkit. Both support FLUX 2 LoRA training against the same weights you downloaded for inference. Our guide to fine-tuning the model with LoRA adapters walks through the workflow.