<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Moe - Tag - Botmonster Tech</title><link>https://botmonster.com/tags/moe/</link><description>Moe - Tag - Botmonster Tech</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 17 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://botmonster.com/tags/moe/" rel="self" type="application/rss+xml"/><item><title>Qwen3.6-35B-A3B: Alibaba's Open-Weight Coding MoE</title><link>https://botmonster.com/ai/qwen-3-6-35b-a3b-open-weight-coding-moe/</link><pubDate>Fri, 17 Apr 2026 00:00:00 +0000</pubDate><author>Botmonster</author><guid>https://botmonster.com/ai/qwen-3-6-35b-a3b-open-weight-coding-moe/</guid><description><![CDATA[<div class="featured-image">
                <img src="/qwen-3-6-35b-a3b-open-weight-coding-moe.png" referrerpolicy="no-referrer">
            </div><p>Qwen3.6-35B-A3B is Alibaba Cloud&rsquo;s Apache 2.0 sparse Mixture-of-Experts model released April 14, 2026. It carries 35 billion total parameters but activates only about 3 billion per token, and on agentic coding suites it beats Gemma 4-31B and matches Claude Sonnet 4.5 on most vision tasks. A 20.9GB Q4 quantization runs on a MacBook Pro M5, which is the reason this release has taken over half the AI timeline for the past week.</p>]]></description></item><item><title>Running Gemma 4 26B MoE on 8GB VRAM: Three Strategies That Work</title><link>https://botmonster.com/ai/gemma-4-26b-moe-8gb-vram-budget-hardware-setup-guide/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><author>Botmonster</author><guid>https://botmonster.com/ai/gemma-4-26b-moe-8gb-vram-budget-hardware-setup-guide/</guid><description><![CDATA[<div class="featured-image">
                <img src="/gemma-4-26b-moe-8gb-vram-budget-hardware-setup-guide.png" referrerpolicy="no-referrer">
            </div><p>The short answer is no, the Gemma 4 26B MoE model will not fit entirely in 8 GB of VRAM at standard Q4_K_M quantization - the weights alone require roughly 16-18 GB. But with the right approach, you can run it on budget hardware and get usable interactive performance. The three practical strategies are aggressive quantization (IQ3_XS brings weights under 10 GB), GPU-CPU layer offloading (split 15-20 of 30 layers to GPU, rest on system RAM), and multi-GPU setups (two cheap 8 GB cards via tensor parallelism). Each involves different trade-offs between quality, speed, and hardware requirements.</p>]]></description></item></channel></rss>