Marco Patzelt
Back to Overview
February 4, 2026
Updated: February 5, 2026

Qwen3-Coder-Next: 70% SWE-Bench with 3B Active Params—Local AI Just Got Real

70% SWE-Bench on a Mac?! Qwen3-Coder-Next beats DeepSeek-V3.2 with 3B active params. Run local, private & fast—no more VC-subsidized API limits!

The Benchmark Reality

An 80B model that only uses 3B parameters per token just scored 70.6% on SWE-Bench Verified. That beats DeepSeek-V3.2 (671B params) while being 10 points behind Claude Opus 4.5—on hardware you can actually afford.

Alibaba dropped Qwen3-Coder-Next yesterday (Feb 3, 2026). This isn't just another open-source model release. It's the first time a local-runnable model closes the gap with frontier proprietary systems to single digits on real-world coding benchmarks.

Why This Matters (Engineer Perspective)

Let's cut through the noise with numbers.

The Benchmark Reality:

ModelParams ActiveSWE-Bench VerifiedSWE-Bench Pro
Claude Opus 4.5Unknown (closed)80.9%-
GLM-4.7358B74.2%40.6%
Qwen3-Coder-Next3B70.6%44.3%
DeepSeek-V3.2671B70.2%40.9%

Qwen3-Coder-Next beats DeepSeek-V3.2 on SWE-Bench while using 0.4% of the active parameters. That's not a typo. 3B vs 671B.

Claude Opus 4.5 still leads at 80.9%, but that's a closed-source model with unknown (likely massive) parameter counts running on Anthropic's cloud. The interesting story here: Qwen3-Coder-Next gets you 87% of Opus performance while running locally on your own hardware.

On the harder SWE-Bench Pro, Qwen scores 44.3%—actually beating models with 10-20x more active params including GLM-4.7 (40.6%) and DeepSeek-V3.2 (40.9%).

Why the MoE Architecture Matters:

The model has 80B total parameters but only routes to 3B per token. Think of it as having 80 billion neurons but only waking up 3 billion for any given question. The result:

  • 10x higher throughput than dense models of similar capacity
  • Runs on consumer hardware (48-64GB RAM)
  • Token generation speed comparable to a 3B dense model
  • Reasoning capability of a much larger system

The Agentic Training Difference

Here's what actually makes this model different from previous coding LLMs.

Most coding models learn by predicting the next token in code files. Read-only education.

Qwen3-Coder-Next was trained on 800,000 verifiable tasks mined from real GitHub PRs. The training loop:

  1. Model tries to fix a bug
  2. Runs tests in a Docker container
  3. Gets pass/fail feedback
  4. Learns from actual execution results

This isn't "read about coding"—it's "learn by doing." The model learned to plan, call tools, run tests, and recover from failures.

Technical Architecture (for the nerds)

  • Hybrid attention: Gated DeltaNet (O(n) linear complexity) + traditional attention
  • 48 layers with 2048 hidden size
  • Pattern: 12 blocks of (3 DeltaNet layers → 1 attention layer) → MoE
  • 256K native context, extendable to 1M with YaRN
  • Apache 2.0 license

The Gated DeltaNet attention solves the quadratic scaling problem that kills most models at long contexts. Your 256K token repository scan doesn't turn into a memory apocalypse.

Hardware Requirements: The Real Numbers

Q4 Quantized Version:

  • ~46GB VRAM (dual RTX 4090 or Mac Studio M3 Ultra)
  • Alternatively: 8GB VRAM + 32GB RAM with CPU offload (~12 tok/sec)

Sweet Spot Setup:

  • Mac Mini M4 Pro 64GB: ~$2,000
  • Runs the model at 10-15 tokens/second
  • Good enough for real coding assistance

Budget Setup:

  • Mac Mini M4 24GB: ~$800
  • Q4 quantization with aggressive offload
  • Slower but functional (~5-8 tok/sec)

Community reports confirm: you can run this on an RX 7900 XTX ($900 consumer GPU) and get usable performance.

Setup: Claude Code + Ollama + Qwen3-Coder-Next

Since January 2026, Ollama supports the Anthropic Messages API. Claude Code works with any Ollama model out of the box.

Step 1: Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Step 2: Pull the Model

ollama pull qwen3-coder-next

Note: This downloads ~46GB. Grab coffee.

Step 3: Configure Claude Code

Add to ~/.bashrc or ~/.zshrc:

export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""

Or use the new launcher:

ollama launch claude

Step 4: Run

claude --model qwen3-coder-next

That's it. Claude Code now runs with local inference. Your code never leaves your machine.

OpenClaw Integration (For the Ambitious)

OpenClaw (formerly Clawdbot, formerly Moltbot) is the viral open-source AI agent with 145,000+ GitHub stars. It turns your local LLM into a 24/7 proactive assistant accessible via WhatsApp, Telegram, Discord.

Why combine Qwen3-Coder-Next + OpenClaw?

  • Persistent Memory: Agent remembers context across weeks
  • Proactive Actions: Scheduled automations, monitoring, alerts
  • Messaging Interface: Text your code agent from your phone
  • Full System Access: File management, browser automation, shell commands

The Security Reality Check:

OpenClaw is powerful but risky. It's had CVEs already. The community found malicious skills in the ClawHub repo. Security experts call it "a lethal trifecta" of risks.

Newsletter

Weekly insights on AI Architecture. No spam.

If you deploy it:

  • Use DigitalOcean's hardened 1-Click Deploy
  • Never expose the gateway to public internet without auth
  • Whitelist only necessary tools
  • Run on isolated hardware

Basic OpenClaw + Local Model Setup:

# Install OpenClaw
npm install -g openclaw@latest

# Configure to use local Ollama
# In your OpenClaw config:
{
  "llm": {
    "provider": "anthropic",
    "baseUrl": "http://localhost:11434",
    "model": "qwen3-coder-next"
  }
}

Now you have Claude Code-level tooling running on your own hardware, accessible from your phone.

Normie Perspective: What Can I Actually Do With This?

Skip the technical details. Here's what matters for non-engineers:

Before (Cloud AI):

  • Pay $20-200/month for API access
  • Your code goes to someone else's servers
  • Rate limits when you need it most
  • Locked into one provider

After (Local + Qwen3-Coder-Next):

  • $2,000 one-time hardware cost
  • ~$5/month electricity
  • Your code never leaves your machine
  • No rate limits, no API keys, no subscriptions
  • Works offline (on planes, trains, coffee shops with bad wifi)

Practical Use Cases:

  1. Code Review on Steroids: Point it at your codebase, ask "what's wrong with this PR?"
  2. Bug Hunter: "Find security vulnerabilities in my authentication flow"
  3. Refactoring Partner: "Convert this JavaScript to TypeScript with proper types"
  4. Documentation Writer: "Generate API docs from these endpoints"
  5. Test Generator: "Write unit tests for this module with 80% coverage"

The Real Question: Is 70% SWE-Bench Good Enough?

For context: SWE-Bench tests if a model can fix real GitHub issues from popular repos. 70% means it correctly solves 7 out of 10 real-world bugs.

Claude Opus 4.5 sits around 80.9%. GPT-5.2 similar range.

So yes—for most practical coding tasks, a local model at this level is genuinely useful. Not perfect. Not replacing senior engineers. But useful.

The Comparison: When to Use What

ScenarioBest ChoiceWhy
Sensitive codebaseQwen3-Coder-Next localCode never leaves machine
Speed criticalCloud API (Opus 4.5)50+ tok/sec vs 10-15
Budget constrainedLocal after 8 monthsBreak-even vs API costs
Offline workLocal only optionNo internet required
Complex multi-step reasoningOpus 4.5 (80.9%)10 points ahead
Standard coding tasksEither works70% is plenty for most tasks

The Economics

Cloud API (Claude Opus 4.5):

  • $3/million input tokens, $15/million output
  • Heavy usage: $70-150/month
  • Light usage: $20-30/month

Local (Qwen3-Coder-Next):

  • Hardware: $2,000 (Mac Mini M4 Pro 64GB)
  • Electricity: ~$5/month
  • Break-even: 8-12 months depending on usage

After break-even, you're essentially running frontier-level coding AI for free.

The Verdict

Qwen3-Coder-Next is the real deal.

70.6% SWE-Bench with 3B active parameters is architecturally impressive. The agentic training approach produces a model that actually understands the code-test-fix loop rather than just predicting tokens. Yes, Opus 4.5 still wins at 80.9%—but that's a closed model on someone else's servers.

For Engineers: Mac Mini M4 Pro 64GB + Ollama + Qwen3-Coder-Next + Claude Code. That's your setup. You get 87% of Opus performance without Opus costs or privacy concerns.

For Everyone Else: If you're paying $50+/month for AI coding tools, the math now favors buying hardware. One-time cost, permanent capability, your data stays yours.

The Bigger Picture: The gap between open-source and proprietary is collapsing. A year ago, "run AI locally" meant accepting massive quality degradation. Now it means "10 points behind frontier, infinitely more private, eventually cheaper."

Start with Qwen3-Coder-Next on modest hardware. Upgrade when you hit actual limits, not imagined ones.

The Strategic Angle: AI Infrastructure Independence

The "what if the music stops" fear is real. Anthropic burns $2B/year. OpenAI needs $10B+ annually. Currently, these companies are subsidizing our workflows with VC money. One day, the pricing goes 10x, or the API disappears behind an enterprise wall.

And now? Insurance exists. It costs the price of a high-end Mac Mini ($2,000). One-time payment. Yours forever.

With Qwen3 reaching 70% on SWE-Bench locally, "Local AI" is no longer about running garbage 7B models that struggle with for-loops. It is legitimately competitive with frontier models for real engineering work.

Your new backup plan:

  1. High-End: Keep using Opus/Sonnet for the hardest architectural reasoning (it's still ~10 points ahead).
  2. Insurance: Know that if the cloud infrastructure shifts, you have the hardware and the weights to keep shipping.
  3. Privacy: For sensitive code, it already makes sense to run local.

This is the moment local AI stopped being a hobby project and became a legitimate professional tool. That’s not just tech news; that’s career security.

Newsletter

Weekly insights on AI Architecture

No spam. Unsubscribe anytime.

Frequently Asked Questions

Recommended: Mac Mini M4 Pro with 64GB RAM ($2,000). Minimum: 8GB VRAM + 32GB RAM with CPU offload (~12 tok/sec).

Qwen3 scores 70.6% on SWE-Bench (Opus: 80.9%). It runs locally for free, Opus requires API fees.

Yes. Set ANTHROPIC_BASE_URL=http://localhost:11434 and run 'claude --model qwen3-coder-next'.

With a $2,000 Mac Mini and typical usage, the break-even point is 8-12 months compared to cloud APIs.

Yes, in the long run. An API like Opus costs ~$15/million tokens. A Mac Mini costs $2,000 once. If you code daily, the hardware pays for itself in months, plus you avoid potential future price hikes or service shutdowns.

Let's
connect.

I am always open to exciting discussions about frontend architecture, performance, and modern web stacks.

Email me
Email me