<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Qwen - Tag - Botmonster Tech</title><link>https://botmonster.com/tags/qwen/</link><description>Qwen - Tag - Botmonster Tech</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Mon, 20 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://botmonster.com/tags/qwen/" rel="self" type="application/rss+xml"/><item><title>Phi-4 Mini vs. Gemma 3 vs. Qwen 2.5: Best SLM for Coding Tasks in 2026</title><link>https://botmonster.com/posts/phi-4-mini-vs-gemma-3-vs-qwen-25-best-slm-coding-2026/</link><pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate><author>Botmonster</author><guid>https://botmonster.com/posts/phi-4-mini-vs-gemma-3-vs-qwen-25-best-slm-coding-2026/</guid><description><![CDATA[<div class="featured-image">
                <img src="/gladiators-circuit-shields.png" referrerpolicy="no-referrer">
            </div><p>Qwen 2.5 Coder 7B is the most accurate of the three for Python and TypeScript completions. Phi-4 Mini (3.8B) uses the least VRAM and generates tokens nearly twice as fast, making it the right pick when memory headroom or latency matters more than raw accuracy. Gemma 3 4B sits in the middle - not the fastest, not the most accurate at code - but the most capable when you need one model for coding, commit messages, documentation, and error explanations. Below are the actual benchmark numbers, the full test methodology, and how to configure each model in VS Code or Neovim.</p>]]></description></item><item><title>Qwen3.6-35B-A3B: Alibaba's Open-Weight Coding MoE</title><link>https://botmonster.com/posts/qwen-3-6-35b-a3b-open-weight-coding-moe/</link><pubDate>Fri, 17 Apr 2026 00:00:00 +0000</pubDate><author>Botmonster</author><guid>https://botmonster.com/posts/qwen-3-6-35b-a3b-open-weight-coding-moe/</guid><description><![CDATA[<div class="featured-image">
                <img src="/qwen-3-6-35b-a3b-open-weight-coding-moe.png" referrerpolicy="no-referrer">
            </div><p>Qwen3.6-35B-A3B is Alibaba Cloud&rsquo;s Apache 2.0 sparse Mixture-of-Experts model released April 14, 2026. It carries 35 billion total parameters but activates only about 3 billion per token, and on agentic coding suites it beats Gemma 4-31B and matches Claude Sonnet 4.5 on most vision tasks. A 20.9GB Q4 quantization runs on a MacBook Pro M5, which is the reason this release has taken over half the AI timeline for the past week.</p>]]></description></item><item><title>Gemma 4 vs Qwen 3.5 vs Llama 4: Which Open Model Should You Actually Use? (2026)</title><link>https://botmonster.com/posts/gemma-4-vs-qwen-3-5-vs-llama-4-open-model-comparison-2026/</link><pubDate>Mon, 06 Apr 2026 00:00:00 +0000</pubDate><author>Botmonster</author><guid>https://botmonster.com/posts/gemma-4-vs-qwen-3-5-vs-llama-4-open-model-comparison-2026/</guid><description><![CDATA[<div class="featured-image">
                <img src="/gemma-4-vs-qwen-3-5-vs-llama-4-open-model-comparison-2026.png" referrerpolicy="no-referrer">
            </div><p>For most developers in 2026, <a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/" target="_blank" rel="noopener noreferrer ">Gemma 4</a>
 31B is the best all-around open model. It ranks #3 on the <a href="https://arena.ai/leaderboard/text" target="_blank" rel="noopener noreferrer ">LMArena</a>
 leaderboard, scores 85.2% on MMLU Pro, and ships under Apache 2.0 with zero usage restrictions. <a href="https://qwen.ai/blog?id=qwen3.5" target="_blank" rel="noopener noreferrer ">Qwen 3.5</a>
 27B edges it on coding benchmarks - 72.4% on SWE-bench Verified versus Gemma 4&rsquo;s strength in math reasoning - and its Omni variant offers real-time speech output that no other open model matches. <a href="https://ai.meta.com/blog/llama-4-multimodal-intelligence/" target="_blank" rel="noopener noreferrer ">Llama 4</a>
 Maverick (400B MoE) wins on raw scale but requires datacenter hardware and carries Meta&rsquo;s restrictive 700M MAU license. Pick Gemma 4 for the best quality-to-size ratio under a true open-source license, Qwen 3.5 for coding-heavy workflows, and Llama 4 only when you need the largest available open model and can absorb the legal overhead.</p>]]></description></item><item><title>Run Vision Models Locally: Florence-2 and Qwen-VL for Image Analysis</title><link>https://botmonster.com/posts/run-vision-models-locally-florence-2-qwen-vl/</link><pubDate>Fri, 03 Apr 2026 00:00:00 +0000</pubDate><author>Botmonster</author><guid>https://botmonster.com/posts/run-vision-models-locally-florence-2-qwen-vl/</guid><description><![CDATA[<div class="featured-image">
                <img src="/mechanical-brains-vision-pipeline.png" referrerpolicy="no-referrer">
            </div><p><a href="https://huggingface.co/microsoft/Florence-2-large" target="_blank" rel="noopener noreferrer ">Florence-2</a>
 and <a href="https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct" target="_blank" rel="noopener noreferrer ">Qwen2-VL</a>
 both run on consumer NVIDIA GPUs starting at 8 GB VRAM and handle OCR, object detection, image captioning, and visual question answering entirely offline. Florence-2 uses a compact sequence-to-sequence architecture with task-specific prompt tokens, which makes it fast and reliable for structured extraction work. Qwen2-VL takes a conversational approach and handles open-ended reasoning, complex documents, and follow-up questions - making the two models complementary rather than interchangeable.</p>]]></description></item></channel></rss>