Contents

FLUX 2 Dev Local Inference: ComfyUI, 32B Parameters, 24GB VRAM

FLUX 2 dev is the largest open-weight image model you can run on your own hardware in 2026, a 32B-parameter release from Black Forest Labs. Its higher tiers (Max, Pro, and Flex) run only through the Black Forest Labs API and have no downloadable weights, so dev is the model this guide installs. Setup is more streamlined than in previous years, but at 32B parameters your hardware is still the biggest hurdle.

Here is the step-by-step guide to getting it running.

FLUX 2 dev sample output showing a retro-futuristic cityscape with Japanese-inspired typography and cosmic sky
FLUX 2 dev produces photorealistic and stylized images with remarkable detail and coherence

Hardware Check (Running the 32B Model)

FLUX 2 dev is designed for high-fidelity output. To run it at a reasonable speed, you need:

GPU: NVIDIA RTX 3090, 4090, or 50-series (minimum 24GB VRAM).

Note: If you have 12 to 16 GB of VRAM, use the “Klein” or “FP8” quantized versions instead. The same quantization tricks that fit large models onto consumer GPUs apply here too.

System RAM: 32GB minimum (64GB recommended for offloading).

Storage: ~60GB of free space (Weights for the model, VAE, and Text Encoders).

The GPU you buy here also powers other local AI workloads, from private RAG knowledge bases to local LLM inference. That makes it a versatile AI workstation, not just an image generator.

Which FLUX 2 Variant Should You Run?

FLUX 2 ships in several sizes, and only some of them are downloadable. The table below shows what fits your card:

VariantParametersMin VRAM (quantized)Open weightsBest for
FLUX 2 Klein 4B4B~8 GBYes8-12 GB GPUs, fast drafts
FLUX 2 Klein 9B9B~16 GBYes12-16 GB GPUs, balanced quality
FLUX 2 dev32B~18-24 GB (4-bit / GGUF)Yes24 GB GPUs, top open-weight quality
FLUX 2 FlexNot disclosedAPI onlyNoTypography and fine detail
FLUX 2 ProNot disclosedAPI onlyNoProduction quality
FLUX 2 MaxNot disclosedAPI onlyNoMaximum quality, web-grounded output

Only Klein and dev have weights you can download and run locally. On an RTX 4090 or 5090 (24 GB), run dev with a 4-bit or GGUF quantization. On an RTX 3090, dev still works at 4-bit, but Klein 9B leaves more headroom. If your card has 8-12 GB, pick Klein 4B. Forge loads the same weight files, so the variant choice does not change with your interface.

Software Prerequisites

  • Python 3.10 or 3.11: Stable versions are still preferred over the very newest builds.
  • Git : To clone repositories.
  • CUDA 12.4+: Ensure your NVIDIA drivers are updated to support the latest kernels.

In 2026, the easiest way to manage local AI is Stability Matrix . It acts as a “one-click” installer that handles dependencies and keeps your models in a shared folder across different interfaces. I run it on every machine because it saves me from repairing broken Python environments by hand.

  1. Download it: Grab the installer for your system from the Stability Matrix releases page . The latest Windows, Linux, and macOS builds sit at the top of that list.
  2. Install ComfyUI: Open the app, click “Add Package,” and choose ComfyUI . It is the best interface for FLUX 2 dev’s node-based workflow. Pick Forge instead if you prefer a classic, form-style UI.
  3. Find the model folder: Stability Matrix creates one shared Models directory for every interface. Open Settings and note the path in the “Shared model folder” field, because the next step copies three files into it.

Downloading FLUX 2 Dev Weights

FLUX 2 dev needs three files: the diffusion model, the text encoder, and the VAE. The ComfyUI team repackages them in the Comfy-Org/flux2-dev repository on Hugging Face. That is the version I recommend, because the files are already split into the layout ComfyUI expects. Download each file below, then copy it into the matching subfolder of the shared Models directory from the previous step.

1. The diffusion model (FP8 build, about 35 GB):

This FP8 build fits a 24 GB card. If you have 80 GB of VRAM or more, download the full-precision flux2-dev.safetensors from the official Black Forest Labs repository instead.

2. The text encoder (Mistral 3 Small):

FLUX 2 replaced the old CLIP encoder with a Mistral language model, which is why prompt accuracy improved so much. If VRAM is tight, grab the smaller mistral_3_small_flux2_fp8.safetensors from the same folder in that repo.

3. The VAE:

Once all three files sit in those folders, you can further customize your results by fine-tuning the model with LoRA adapters , which follow a very similar weight and directory structure.

Running the Model (The Workflow)

FLUX 2 dev does not use the old “Standard” sampling, so you need a FLUX-specific workflow. ComfyUI ships one built in, so there is no JSON file to track down.

  1. Open ComfyUI from Stability Matrix and wait for the canvas to finish loading.
  2. Load the template: In the top menu, open Workflow, then Browse Templates, and pick “Flux 2 dev” under the Flux category. You can also drag any example image from the ComfyUI Flux 2 examples page onto the canvas to load the same graph.
  3. Point the loaders at your files: The template has three loader nodes. Set “Load Diffusion Model” to flux2_dev_fp8mixed.safetensors, “Load CLIP” to mistral_3_small_flux2_bf16.safetensors, and “Load VAE” to flux2-vae.safetensors. FLUX 2 uses a single text encoder, not the dual-CLIP setup older models needed. If a dropdown looks empty, I restart ComfyUI so it rescans the model folders.
  4. Generate: Type your prompt into the positive text box and click “Queue.” The first render is the slow one, because ComfyUI loads the full model into VRAM before it starts.

Pro Tip for 2026: If your generations are taking longer than 60 seconds, check if “Weight Offloading” is enabled in your settings. This moves parts of the model to your System RAM to prevent crashes, though it slows down the process. For contrast, dedicated TPU hardware can handle inference offloading without touching the main GPU at all. This is a useful reference point if you plan to run multiple AI workloads on the same machine.

Frequently Asked Questions

What is the minimum VRAM to run FLUX 2 dev?

About 18 to 20 GB if you use a 4-bit quantized model with a remote text encoder. A GGUF Q8 build fits on 12 to 14 GB cards with almost no quality loss. Full precision needs 80 GB or more, so nearly everyone runs a quantized version.

Where do I download the FLUX 2 FP8 text encoder?

Download mistral_3_small_flux2_fp8.safetensors from the Comfy-Org FLUX.2 repository and copy it into your models/text_encoders/ folder. The larger bf16 build sits in the same folder if you have the VRAM to spare.

Can FLUX 2 generate uncensored or NSFW images?

The official dev release was trained on filtered data and ships input and output NSFW filters. Running it locally removes the API content filter, but the base model is not built for explicit content. Community LoRAs and fine-tunes on Civitai add that capability, and they load like any other LoRA.

Which is better for ComfyUI: FLUX 2 dev or Klein?

The dev model gives the best prompt adherence and detail, but it needs 18 to 24 GB of VRAM. Klein 9B is the sweet spot for 16 GB cards, and Klein 4B suits 8 to 12 GB cards. On an RTX 4090, run dev with a quantized build.

Can I run FLUX 2 locally on an Android phone?

No. Even the smallest variant, Klein 4B, needs about 8 GB of VRAM, which no phone has. Run the model on a desktop GPU and open ComfyUI in your phone browser instead, or use the Black Forest Labs API.

How do I fine-tune FLUX 2 on my own images?

Use a trainer like SimpleTuner or AI-Toolkit. Both support FLUX 2 LoRA training, and you point them at the same weights you downloaded for inference. Our guide to fine-tuning the model with LoRA adapters walks through the workflow.