Can you run Claude Code with local models?

Yes. Since Ollama v0.14.0, Claude Code connects to local models via the Anthropic-compatible API. Three environment variables, one command, zero API costs, full privacy.

What is the best local model for Claude Code?

GLM-4.7-Flash. 30B MoE model with 3B active params, native tool-calling, 128K context. Ollama officially recommends it for Claude Code integration.

How much RAM do I need for Claude Code with Ollama?

Minimum 24GB for basic models. 64GB unified memory (Mac Mini M4 Pro) is the sweet spot — runs 30B models at 20-60 tokens/second comfortably.

Is Claude Code with local models as good as the API?

No. Local handles 80% of daily tasks well — refactoring, tests, code review. Complex multi-file architecture and deep reasoning still benefit from Opus 4.5 via cloud.

Can I use Claude Code offline with Ollama?

Yes. Once downloaded, the model runs fully offline. Set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 to prevent any network calls. Disconnect internet to verify.

How fast is Claude Code locally on Mac Mini?

GLM-4.7-Flash on Mac Mini M4 Pro 64GB: 35-60 tokens/second. Mac Mini M4 24GB: 20-30 tokens/second. RTX 4090: 120-220 tokens/second. Cloud is faster but costs money.

How much does running Claude Code locally cost?

About $3-5/month in electricity on a Mac Mini. Hardware is $1,999 one-time. Breaks even vs a $90/month Claude Max subscription in under a year.

Claude Code for $3/Month: Local Ollama Setup Guide

Claude Code is Anthropic's best agentic coding tool. It's also expensive — Opus 4.5 burns through API credits fast, and Claude Max at $90/month adds up. A team of engineers can easily hit $2,000+/month.

Since January 2026, you don't need Anthropic's API anymore. Ollama v0.14.0 added native Anthropic Messages API compatibility. Three environment variables, and Claude Code talks to local models instead. Zero API costs. Your code never leaves your machine.

Here's the complete setup, the models that actually work, and the honest performance reality.

How It Works

Claude Code doesn't care where its model lives. It speaks the Anthropic Messages API. Ollama now speaks that same protocol. Point Claude Code at localhost:11434 instead of api.anthropic.com, and it works the same way — file edits, tool calls, terminal commands, the full agentic loop.

The key difference: instead of sending your entire codebase to Anthropic's servers for inference, everything runs on your hardware. Privacy is absolute. Latency depends on your machine, not your internet connection.

What You Need

Software:

Ollama v0.14.0+ (v0.14.3-rc1 or later recommended for streaming tool calls)
Claude Code CLI (latest version)
Node.js 18+

Hardware (realistic minimums):

Setup	RAM	Models You Can Run	Tokens/Sec	Cost
Mac Mini M4 24GB	24GB	GLM-4.7-Flash (Q4), small models	20-30	$599
Mac Mini M4 Pro 48GB	48GB	Most 30B models comfortably	35-55	$1,599
Mac Mini M4 Pro 64GB	64GB	32B models, some 70B quantized	10-60	$1,999
RTX 4090 24GB	24GB VRAM	GLM-4.7-Flash, fast	120-220	~$1,800 GPU

The Mac Mini M4 Pro 64GB at $1,999 is the sweet spot. Unified memory means no VRAM bottleneck. Runs 30B MoE models at usable speeds. Pays for itself in about 8 months vs API costs if you're a heavy user. I wrote a full breakdown of running a Mac Mini as an AI server if you need the hardware deep dive.

Setup: 5 Minutes, 4 Steps

Step 1: Install Ollama

# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh

# For full tool-call support, use pre-release:
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3-rc1 sh

On macOS, you can also download from ollama.com/download.

Verify it's running:

ollama version
# Should show 0.14.0 or higher

Step 2: Pull a Model

# Recommended starter — best tool-calling support
ollama pull glm-4.7-flash

# Alternative coding models
ollama pull qwen3-coder
ollama pull gpt-oss:20b

GLM-4.7-Flash is a 30B parameter MoE model with only 3B active parameters per token. That's why it's fast despite the large total size. 128K context window. Native tool-calling support — critical for Claude Code's agentic loop.

Step 3: Configure Environment

Quick way (new):

ollama launch claude

This handles everything automatically.

Manual way (more control):

Add to your ~/.bashrc, ~/.zshrc, or ~/.config/fish/config.fish:

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434

Or set them in Claude Code's settings file at ~/.claude/settings.json:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:11434",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
  }
}

The CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC flag is optional but recommended — it prevents Claude Code from phoning home and ensures everything stays local.

Step 4: Launch

claude --model glm-4.7-flash

Or inline without persisting environment variables:

ANTHROPIC_AUTH_TOKEN=ollama \
ANTHROPIC_BASE_URL=http://localhost:11434 \
ANTHROPIC_API_KEY="" \
claude --model glm-4.7-flash

That's it. Claude Code is now running on your local model.

Best Models for Local Agentic Coding

Not all models work well with Claude Code. The agentic loop requires tool-calling support, decent context windows, and coding ability. Here's what actually performs:

Model	Params (Active)	Context	Tool Calling	Best For
GLM-4.7-Flash	30B (3B)	128K	Native	Best overall starter
Qwen3-Coder-30B	30B (3B)	256K	Yes	Coding specialist
GPT-OSS-20B	20B (dense)	128K	Yes	General tasks
Devstral-2-Small	24B	128K	Yes	Lightweight option

My recommendation: Start with GLM-4.7-Flash. It has the best balance of speed, tool-calling reliability, and coding quality for Claude Code workflows. Ollama's own documentation recommends it for Claude Code integration.

Newsletter

Weekly insights on AI Architecture. No spam.

Qwen3-Coder-Next is the better pure coding model, but GLM-4.7-Flash has more reliable tool-calling — and tool-calling is what makes Claude Code agentic rather than just a chatbot.

The Reality Check

I'll be direct about what you're giving up.

What works well locally:

Routine refactoring and file edits
Test generation
Code review and analysis
Simple feature implementations
Documentation writing
Sensitive/proprietary code work

What still needs cloud models:

Complex multi-file architectural changes
Deep reasoning across large codebases
Novel algorithm design
Tasks requiring frontier-level intelligence

GLM-4.7-Flash scores 59.2% on SWE-bench Verified. That's impressive for a local model — it beats Qwen3-30B (22%) and GPT-OSS-20B (34%). But Opus 4.5 is still in a different league for complex reasoning.

The practical approach: use local for 80% of your daily coding tasks. Switch to cloud API when you hit something that genuinely needs frontier intelligence.

Context Length: The Hidden Gotcha

Claude Code eats context. Every file it reads, every command it runs, every tool call — all tokens. Ollama defaults to relatively short context windows.

Set a minimum of 20K for basic use, 32K+ for real projects:

# Set context length when starting Ollama
OLLAMA_NUM_CTX=32768 ollama serve

Or in your Modelfile:

FROM glm-4.7-flash
PARAMETER num_ctx 32768

Higher context = more RAM. On a 64GB Mac Mini, 32K context is comfortable. 64K is possible but cuts into model performance. 128K needs 48GB+ just for the KV cache on top of model weights.

DataCamp's testing found 20K context provides the best balance between functionality and speed for Claude Code workflows. Start there and increase only if you're hitting limits.

Common Issues and Fixes

"Connection refused" Ollama isn't running. Start it:

ollama serve

"Model not found" Check installed models and use the exact name:

ollama list

Tool calls failing / streaming errors You need Ollama 0.14.3-rc1 or later. Stable releases before this had issues with streaming tool calls that break Claude Code's agentic loop:

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.14.3-rc1 sh

Slow responses Expected on CPU. On Apple Silicon, make sure the model fits in unified memory — any page-out to SSD kills performance. Check with:

ollama ps
# Look for "100% GPU" in the PROCESSOR column

Verify it's truly local Disconnect from the internet and run a prompt. If you get a response, you're fully offline.

Switching Between Local and Cloud

You don't have to choose one. Use local for daily work, cloud for complex tasks.

Switch to local:

export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
claude --model glm-4.7-flash

Switch back to cloud:

unset ANTHROPIC_BASE_URL
unset ANTHROPIC_AUTH_TOKEN
claude  # Uses Anthropic API with your API key

Cost Comparison

	Cloud API (Opus 4.5)	Claude Max ($90/mo)	Local (Mac Mini)
Hardware	$0	$0	$1,999 one-time
Monthly Cost	$200-2,000+	$90	~$3-5 electricity
Break-Even	—	—	4-8 months
Privacy	Code sent to Anthropic	Code sent to Anthropic	100% local
Speed	50+ tok/s	50+ tok/s	20-60 tok/s
Intelligence	Frontier	Frontier	Good enough for 80%

I was paying $90/month for Claude Max and another $30-40 for Gemini API. That's $130/month. The Mac Mini pays for itself in under a year, and after that it's basically free AI forever.

The Verdict

Mac Mini M4 Pro 64GB + Ollama + GLM-4.7-Flash. That's the setup.

It won't replace Opus 4.5 for complex architectural decisions. It will handle your daily refactoring, test writing, code review, and documentation — at zero marginal cost, with zero data leaving your machine.

If you also want a personal AI agent on your phone, OpenClaw on the same Mac Mini turns one box into both your coding assistant and your Telegram bot.

Start with ollama launch claude. Upgrade your model or hardware when you hit real limits, not imaginary ones.

Claude Code for $3/Month: Local Ollama Setup Guide

How It Works

What You Need

Setup: 5 Minutes, 4 Steps

Step 1: Install Ollama

Step 2: Pull a Model

Step 3: Configure Environment

Step 4: Launch

Best Models for Local Agentic Coding

The Reality Check

Context Length: The Hidden Gotcha

Common Issues and Fixes

Switching Between Local and Cloud

Cost Comparison

The Verdict

Weekly insights on AI Architecture

Frequently Asked Questions

Let's
connect.

How It Works

What You Need

Setup: 5 Minutes, 4 Steps

Step 1: Install Ollama

Step 2: Pull a Model

Step 3: Configure Environment

Step 4: Launch

Best Models for Local Agentic Coding

The Reality Check

Context Length: The Hidden Gotcha

Common Issues and Fixes

Switching Between Local and Cloud

Cost Comparison

The Verdict

Weekly insights on AI Architecture

Frequently Asked Questions

Let's connect.

Let's
connect.