Andrej Karpathy just open-sourced autoresearch — a repo that lets an AI agent run ML experiments autonomously, overnight, without human intervention. It already has 15k+ stars. The concept sounds simple: give an agent a training script, a single GPU, and a clear metric, then let it iterate indefinitely.
But the interesting part isn't what it does. It's why it doesn't stall.
Anyone who's built agentic systems knows the failure mode. The agent gets confused, loses context, asks for clarification, or spirals into broken states. I've seen this in my own projects — agents that work great for 3 iterations, then lose the thread on iteration 4. Karpathy's design eliminates these failure modes not through a better model, but through structural constraints.
That's the part worth studying.
The Setup: Three Files, Nothing Else
The repo strips LLM training down to three files:
- prepare.py — data prep, tokenizer, evaluation harness. Read-only. The agent can't touch it.
- train.py — model architecture, optimizer, training loop. The only file the agent edits.
- program.md — the prompt. Instructions for the agent. This is what the human iterates on.
The agent's job: modify train.py, run a 5-minute training experiment, check if validation loss improved, keep or discard, repeat. As Karpathy described it on X: "every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits."
Seven Patterns That Prevent Stalling
What makes this work isn't any single clever trick. It's seven design decisions that each remove a specific failure mode.
1. Fixed 5-Minute Time Budget
Each experiment finishes fast. At roughly 12 experiments per hour, the agent gets constant feedback. No long waits, no ambiguous "is it still training?" states. The agent always knows what happened and can decide what to try next.
This is something I keep seeing in my own agent work: short feedback loops beat long ones every time. An agent that waits 45 minutes for a result has 45 minutes to drift. An agent that gets a result every 5 minutes stays grounded.
2. Single File Scope
The agent only modifies train.py. No multi-file orchestration, no dependency management, no config files scattered across directories. This eliminates an entire class of failure modes where agents lose track of what they've changed.
I've written about this before — environment design matters more than model intelligence. Karpathy is applying the same principle. He didn't make the agent smarter. He made the environment simpler.
3. One Unambiguous Metric
val_bpb (validation bits per byte) — lower is better. The agent never has to interpret ambiguous results or weigh competing metrics. Every experiment produces a single number that determines keep or discard.
This is the "probabilistic intent, deterministic execution" pattern applied to ML research. The model decides what to try (probabilistic). The metric decides if it worked (deterministic). No room for self-deception.
4. Git-Based Rollback
Failed experiments get git reset back to the last known good state. The agent literally cannot get stuck in a broken state. This is elegant because it uses existing infrastructure rather than building custom rollback logic.
It's the same principle behind bounded autonomy at different levels — you give the agent freedom to experiment, but you make it structurally impossible for experiments to permanently break things.
5. The "NEVER STOP" Directive
The prompt explicitly instructs the agent to never pause for human confirmation. If it runs out of ideas, it should think harder — re-read papers, try combinations of previous near-misses, try radical changes. This overrides the default tendency of most agents to check in and wait.
Weekly insights on AI Architecture. No spam.
Most agent builders underestimate how much this matters. The default behavior of an AI model is to be helpful and cautious — which usually means asking for permission. For autonomous operation, you need to explicitly override that instinct.
6. Structured Loop With Clear Steps
The experiment loop is labeled "LOOP FOREVER" with numbered steps: modify code, commit, run, check results, log, keep or discard. The agent always knows exactly what step it's on and what comes next.
This is context engineering at its most bare-metal. No framework, no abstraction layer. Just a numbered list telling the agent exactly what to do next. It works because clarity beats sophistication.
7. No External Dependencies During Runtime
No API calls, no downloads, no network requests during the loop. Nothing that can block, timeout, or fail in unpredictable ways. The entire execution is local.
Every external dependency is a potential stall point. I've lost count of how many agent failures I've traced back to a rate-limited API, a flaky network call, or a timeout that broke the flow. Karpathy removed all of them.
The Simplicity Criterion: Taste as Code
Beyond preventing failure, Karpathy baked something subtle into the prompt. The agent is told: all else being equal, simpler is better. A 0.001 val_bpb improvement that adds 20 lines of hacky code? Probably not worth it. A 0.001 improvement from deleting code? Definitely keep.
This is a value judgment embedded in the prompt that shapes the direction of research, not just the metric. It prevents the agent from converging on Frankenstein architectures with marginal gains — a common failure mode in automated search. The agent doesn't just optimize. It optimizes with taste.
I find this particularly interesting because it's exactly the kind of constraint that separates a useful agent from a technically correct but practically useless one. The metric tells the agent what's better. The simplicity criterion tells it what's worth pursuing.
Results
In Karpathy's overnight run (126 experiments on an H100), the agent pushed val_bpb from 0.9979 to 0.9697 — a meaningful improvement found through systematic exploration of weight decay, batch size halving, depth changes, and embedding learning rates.
The agent also correctly identified dead ends: weight tying was "completely broken" (+2.24 BPB), parallel attention+MLP was worse, and aggressive MQA didn't help. Knowing what doesn't work is just as valuable as knowing what does.
His vision for what comes next is even more interesting. He described the next step as "asynchronously massively collaborative for agents — think SETI@home style." Not emulating a single researcher, but an entire research community of them.
What This Means for Agentic Systems
This isn't just a cool ML project. It's a reference implementation of a design pattern I keep coming back to: bounded autonomy. Give the agent maximum freedom within tight, well-defined constraints. One file, one metric, fixed time budget, automatic rollback. The constraints aren't limitations — they're what make autonomy possible.
Every pattern Karpathy uses maps directly to a general principle for building agentic systems:
- Fixed time budget — Short feedback loops prevent drift
- Single file scope — Narrow the environment, don't expand the model
- One metric — Deterministic verification, not ambiguous judgment
- Git rollback — Safe failure as a structural guarantee
- "NEVER STOP" — Override default caution for autonomous operation
- Structured loop — Explicit steps beat implicit reasoning
- No external dependencies — Remove every possible stall point
The lesson isn't specific to ML research. It applies to any agentic system. Don't make your agent smarter. Make its environment so well-constrained that even a mediocre agent can't get stuck.