← Back to blog

Why LLMs can't check their own work

The structural impossibility of self-verification in autoregressive models — and what it means for agent governance in regulated industries.

Jan Szymanski
Jan Szymanski
Founder, theup

There's a common pattern in enterprise AI deployments. A team builds an agent for a high-stakes domain — clinical decision support, credit underwriting, regulatory monitoring. They add a system prompt that says "always follow HIPAA" or "never approve loans without verifying income." They test it. It works. They ship it.

Six weeks later, the agent approves a loan using stale credit data. Or writes to a pharmacy record without checking the allergy database. Or synthesizes conflicting regulatory requirements from two jurisdictions into a confident, wrong answer.

The team's response is predictable: "We need better guardrails." They add output validation. They add a second LLM to check the first one's work. They add a guardrails SDK with structured output constraints.

None of it fixes the structural problem. Here's why.

The self-verification impossibility

An LLM is an autoregressive model. It generates the next token based on the preceding sequence. When you ask it to verify its own output, you're asking the same model — with the same training biases, the same reasoning patterns, the same knowledge gaps — to identify errors in a sequence it just generated.

This is not a tooling problem. It's a structural impossibility analogous to asking a calculator to verify its own arithmetic without an external reference. The calculator doesn't "know" mathematics — it executes deterministic operations. An LLM doesn't "know" HIPAA — it has a probabilistic distribution over tokens that are likely to follow "HIPAA requires..."

An LLM follows HIPAA the way it follows a writing style — probabilistically. It has no structural mechanism to verify whether its output actually satisfies the constraint.

This is why "asking the LLM to check itself" fails at scale. Studies document a ~34% policy violation rate over 10-step agent tasks, even with system prompt constraints. Constraint recall drops by ~40% as task length increases. The agent forgets its own rules.

Why a second LLM doesn't fix it

The next logical step is often: "What if we use a separate LLM as a verifier?" This is the architecture behind most guardrails SDKs. A generation model produces output, then a verification model checks it.

This helps, but it has three fundamental limitations:

  1. The verifier has the same knowledge gaps. If the generation model doesn't know that Drug A interacts with Drug B in patients with kidney disease, the verification model likely doesn't either — they share training data distributions.
  2. Verification is also probabilistic. The verifier can be wrong. It can miss violations. It can hallucinate violations that don't exist. You've added a second point of probabilistic failure, not a deterministic check.
  3. No access to ground truth. The verifier can assess whether output is plausible, not whether it's factually correct against your specific knowledge base. It doesn't know that Source A says the budget is €500K while Source B says €750K.

What deterministic verification looks like

The alternative is structural enforcement — verification that doesn't use LLM judgment at all.

Consider what a governance system actually needs to do:

  • Detect when an agent is about to act on conflicting information
  • Verify that a proposed action doesn't violate a constraint that was established earlier in the task
  • Block actions when confidence in the underlying knowledge drops below a threshold
  • Produce a sealed, hash-chained record of every decision for compliance

None of these require language understanding. They require graph queries.

If you store knowledge as a consensus-scored graph — where every relationship has a confidence score based on source authority, temporal decay, and evidence weight — then conflict detection becomes a deterministic operation:

// Pseudocode: deterministic conflict detection MATCH (entity)-[r1:HAS_PROPERTY]->(value1), (entity)-[r2:HAS_PROPERTY]->(value2) WHERE r1.property = r2.property AND value1 <> value2 AND r1.consensus > 0.30 AND r2.consensus > 0.30 RETURN entity, value1, value2, r1.consensus, r2.consensus

This query will find the same conflicts every time, regardless of how many times you run it. It doesn't "think" about conflicts — it traverses the graph and reports structural contradictions. Same inputs, same results. Deterministic.

Now compare this to asking an LLM: "Are there any conflicts in the following information?" The LLM might find some. It might miss others. It might hallucinate conflicts that don't exist. And it will give different answers on different runs.

The consensus scoring advantage

Deterministic detection is necessary but not sufficient. You also need to know which side of a conflict to trust.

In Brain, every relationship in the knowledge graph carries a consensus score — a continuously recalculated confidence value based on:

  • Source authority — weighted by historical accuracy (sources that are proven right gain authority; wrong sources lose it)
  • Evidence weight — how many independent sources support this claim
  • Temporal decay — domain-configurable half-lives (pharma: 730 days, fintech: 30 days)
  • Expert validation — human resolutions feed back into the model

The formula uses CATD confidence intervals (Li et al., VLDB 2015) with a sigmoid mapping:

consensus = 0.50 + 0.40 × (1 − 1/w) where w = source_count × source_authority_avg Range: 0.50 (single unverified source) → 0.90 (many authoritative sources)

This means a single unverified source gets a consensus of 0.50 — the system is literally uncertain. A claim backed by multiple high-authority sources approaches 0.90. And critically: every human resolution updates these scores. The system learns from every conflict your team resolves.

This is the learning flywheel. Every resolution strengthens the consensus model. Sources that are repeatedly proven right gain authority. Sources that are repeatedly wrong lose it. Brain gets better the more your team uses it.

What this means for your agent architecture

If you're deploying agents in regulated industries, the implication is clear: the verification layer cannot be another LLM.

It needs to be structural. Deterministic. External to the model. Grounded in a source-of-truth that maintains consensus scores, detects contradictions via graph traversal, and produces audit trails that satisfy regulatory requirements.

This isn't a matter of better prompting or smarter guardrails. It's a fundamentally different architecture — one where the LLM does what it's good at (reasoning, generation, synthesis) and a separate system handles what LLMs structurally cannot do (verification, enforcement, provenance).

That's what Brain is. A deterministic truth layer that sits between your agents and production. Not another AI checking AI. A structural system that enforces truth using graph queries, consensus math, and formal methods.

Because you shouldn't let AI check its own work.


See it in action

Watch Brain detect conflicts, enforce consensus, and seal audit trails — with your agent stack.

Get a Demo