You can run Llama 4 Scout on a 24 GB consumer GPU, but only with an aggressive quantization and some patience. Scout is a 109B-parameter Mixture-of-Experts model, and even its smallest Unsloth dynamic GGUF build is about 32 GB, so a 24 GB card runs it with CPU offload at roughly 20 tokens per second. This guide covers which Llama 4 model fits your hardware, the real VRAM math, and the fastest way to get it running.
Botmonster Tech