Why do AI agents hallucinate?

Agents hallucinate because they lack data access and verification, not because models are bad. A probabilistic model without grounding data will always generate confident-sounding but wrong output.

What is the difference between a hallucination and a mistake?

Humans don't hallucinate data — they make mistakes on data. Agents with real data access stop hallucinating and start making normal engineering errors like wrong joins or off-by-one bugs.

What is the Triangulation Protocol?

Double-entry bookkeeping for AI. Two independent computation paths (SQL + Python) answer the same question. If results diverge beyond 1%, the system rejects the answer instead of guessing.

Do guardrails prevent agent hallucinations?

Guardrails miss the point. They constrain output when you should fix the input. The model hallucinated because it had no data — adding a detector after the fact is backwards engineering.

How does environment design prevent hallucinations?

Inject real database schemas, business rules, and live data at runtime. Give the agent code execution. When grounded in reality, the agent computes instead of guesses. Hallucination becomes impossible.

Can fine-tuning fix hallucinations?

No. Fine-tuning bakes knowledge into weights, but knowledge changes. You should inject data at runtime instead. Fine-tuning is slow, expensive, and still provides zero output verification.

Agent Hallucinations: It's Not the Model, It's You

The Wrong Problem Statement

Everyone blames the model when an agent hallucinates. "GPT-4 made something up." "Claude invented a function that doesn't exist." "The model is unreliable."

No. The model did exactly what it was designed to do.

You gave it a system prompt that says "You are a database expert." It has no database. No schema. No tables. No data. What did you expect it to do? It guessed. It guessed confidently. And you called that a hallucination.

I wrote about why the environment matters more than the prompt — the idea that we gave LLMs a job title and expected expertise. This article takes that further. The hallucination problem isn't a model problem. It's an environment problem. And the solution isn't better prompts or guardrails. It's better architecture.

What Are Hallucinations, Actually?

A hallucination is not a bug. It's the model working exactly as designed.

Language models are probabilistic text generators. They produce the most likely next token given the context. If the context contains no grounding data, the most likely token is still whatever sounds right. The model doesn't know it's wrong. It doesn't have a concept of "wrong." It has a concept of "probable."

Confidence is orthogonal to correctness. The model can be 99% confident and 100% wrong. That's not a malfunction. That's the architecture working as intended.

The real question isn't "why do models hallucinate?" The answer is obvious — they're probabilistic. The real question is: why did you build a system that lets probabilistic output reach your users without verification?

Hallucination vs Mistake: The Distinction Nobody Makes

Here's where the industry gets it fundamentally wrong.

Humans don't hallucinate data. They make mistakes on data. There's a critical difference.

A human accountant with access to the books might make an error — wrong formula, off-by-one, transposed numbers. But they won't invent a client that doesn't exist. They won't fabricate revenue from a product the company doesn't sell. They have access to reality. Their mistakes are grounded in that reality.

An agent without data access doesn't make mistakes. It hallucinates. It invents things that don't exist because it has nothing to ground itself against. No database. No schema. No live data. Just a prompt that says "you are an expert" and a context window full of patterns.

Now here's the shift that changes everything: give that same agent real data access and code execution, and the problem transforms.

The agent with database access, schema injection, and the ability to execute code against real data stops hallucinating. It can still make mistakes — wrong joins, bad aggregations, off-by-one errors in its SQL. But those are normal engineering errors. The same kind of errors a junior developer makes. Not confabulation. Not invention. Mistakes.

And mistakes I know how to fix. I can write tests for mistakes. I can add assertions. I can build verification loops. I can't do any of that for hallucinations because hallucinations exist in a world without ground truth.

This is the paradigm shift: once you give the agent the ability to prove its work against reality, it's no longer "the AI is lying." It's "the AI made a bug." Bugs are engineering problems. Engineering problems have engineering solutions.

Why Environment Matters More Than Training

Better models still hallucinate. GPT-4 hallucinates. Claude Opus 4.6 hallucinates. Every model that will ever be built will hallucinate — because hallucination isn't a flaw you can train away. It's a property of probabilistic generation.

The only thing I've ever seen actually eliminate hallucinations is environment design.

In my own work, hallucinations only showed up in systems with prompt chains and no data access. The moment I gave agents access to real databases, injected schemas at runtime, and enabled code execution — hallucinations stopped. Not reduced. Stopped.

The agent sees environmental reality. It writes a SQL query against an actual database. It gets actual rows back. It computes actual numbers. There's nothing to hallucinate about because the data is right there.

Newsletter

Weekly insights on AI Architecture. No spam.

This is what I mean when I say build environments, not personas. A persona is a job title. An environment is a desk with tools, data, and constraints. One produces confident guessing. The other produces verifiable work.

The Triangulation Protocol

I needed something stronger than "just give it data access." Data access prevents hallucination, but it doesn't guarantee correctness. The agent can still write a bad query. It can still misinterpret the schema. It can still make mistakes.

So I built a verification system inspired by Ilya Sutskever's "Value Function" concept — the idea of utilizing distinct cognitive paths to converge on a verified truth.

I call it the Triangulation Protocol. It's double-entry bookkeeping for AI.

For any quantitative question, the system doesn't trust a single execution path. Instead, it runs two independent computations:

Vector A — Database Engine. The agent writes a SQL query against the structured database. Direct path. Schema-aware. Returns a number.

Vector B — Runtime Computation. The agent takes the raw data, writes a Python script in an isolated sandbox, and computes the answer through a completely independent method. Different logic. Different execution. Same question.

Then: compare. If both vectors converge — delta under 1% — the answer is verified. If they diverge, the system doesn't guess which one is right. It refuses to answer. It throws an exception instead of returning potentially wrong data.

The core idea isn't complicated: the agent has no other option than to prove its work. When the architecture requires provable outputs, the quality of those outputs changes. Not because the model "tries harder" — because structurally, unverifiable output gets rejected. Only proven results survive.

Why This Beats Everything Else

The industry has three standard answers to hallucinations. All three miss the point.

"Prompt harder." Add more instructions. "Always be truthful." "Never make things up." "If you don't know, say so." This doesn't work because the model doesn't know what it doesn't know. Confidence is orthogonal to correctness. Telling a probabilistic system to be deterministic via text instructions is like telling water to flow uphill via a strongly worded memo.

"Fine-tune the model." Train it on your data so it "knows" the right answers. This is slow, expensive, and fundamentally misguided. You're trying to bake knowledge into weights when you should be injecting it at runtime. Knowledge changes. Weights don't — not until you retrain. And you still have zero verification.

"Add guardrails." Build filters, classifiers, content moderation on top. Check the output before it reaches the user. This is trying to constrain the output when you should be fixing the input. Guardrails are symptom management. They don't address the root cause. The model hallucinated because it had no data — adding a hallucination detector after the fact is backwards engineering.

My approach is different: make hallucination structurally impossible.

Give the agent real data access. Inject the schema at runtime. Enable code execution. Build dual verification paths. Require assertions. Reject anything unverified.

You're not asking the model to be honest. You're building an environment where dishonesty is architecturally impossible.

The Shift

The model will always be probabilistic. Stop trying to fix that. It's not broken.

What's broken is the environment you put it in. An agent without data access, without verification, without constraints — that agent will hallucinate. Not because it's a bad model. Because you gave it a bad desk.

Fix the desk. Inject the data. Require verification. Build the Triangulation Protocol. Turn hallucinations into mistakes. Turn mistakes into bugs. Fix the bugs.

Your job was never to make the model truthful. Your job was to build the world around it where truth is the only possible output.