Qwen3.6-35B-A3B: Alibaba's Open-Weight Coding MoE

Botmonster — Fri, 17 Apr 2026 00:00:00 +0000

Qwen3.6-35B-A3B is Alibaba Cloud’s Apache 2.0 sparse Mixture-of-Experts model released April 14, 2026. It carries 35 billion total parameters but activates only about 3 billion per token, and on agentic coding suites it beats Gemma 4-31B and matches Claude Sonnet 4.5 on most vision tasks. A 20.9GB Q4 quantization runs on a MacBook Pro M5, which is the reason this release has taken over half the AI timeline for the past week.

Running Gemma 4 26B MoE on 8GB VRAM: Three Strategies That Work

Botmonster — Wed, 08 Apr 2026 00:00:00 +0000

The short answer is no, the Gemma 4 26B MoE model will not fit entirely in 8 GB of VRAM at standard Q4_K_M quantization - the weights alone require roughly 16-18 GB. But with the right approach, you can run it on budget hardware and get usable interactive performance. The three practical strategies are aggressive quantization (IQ3_XS brings weights under 10 GB), GPU-CPU layer offloading (split 15-20 of 30 layers to GPU, rest on system RAM), and multi-GPU setups (two cheap 8 GB cards via tensor parallelism). Each involves different trade-offs between quality, speed, and hardware requirements.

Moe - Tag - Botmonster Tech

Qwen3.6-35B-A3B: Alibaba's Open-Weight Coding MoE

Running Gemma 4 26B MoE on 8GB VRAM: Three Strategies That Work