← Back to blog

Consensus scoring: the math behind Brain's truth layer

How dynamic consensus graphs use formal methods to give every fact a continuously updated confidence score — and why that changes everything about agent governance.

Jan Szymanski
Jan Szymanski
Founder, theup

Most knowledge systems treat facts as binary — a thing is either in the database or it isn't. If two sources disagree, the most recent one wins. This is the "newest-wins" pattern, and it's how virtually every knowledge graph, vector store, and memory system works today.

The problem is obvious: newest doesn't mean correct. A single erroneous source uploaded last Tuesday can override ten authoritative sources from last month. In regulated industries — pharma, finance, legal — this isn't a UX annoyance. It's a compliance failure.

Brain takes a different approach. Every relationship in the knowledge graph carries a consensus score — a continuously recalculated confidence value that reflects the weight of all available evidence, not just the most recent write.

Consensus scores — same entity, different evidence profiles
3 sources, verified
1 source, expert ok
2 sources, conflict
1 source, unverified

What goes into a consensus score

For a given claim — a specific entity-property-value triple in the knowledge graph — Brain computes a consensus score based on multiple dimensions:

  • Evidence weight — how many independent sources assert this claim. More sources = higher confidence, but with diminishing returns. Going from 1 to 3 sources matters more than going from 10 to 12.
  • Source authority — not all sources are equal. Authority scores are learned over time based on how often a source's claims are confirmed vs. overturned. A source that's repeatedly proven right gains authority; a source that's repeatedly wrong loses it.
  • Expert validation weight — when a domain expert resolves a conflict, their resolution carries weight proportional to their expertise. Brain tracks expert accuracy, domain specialization, and response calibration. A senior cardiologist's resolution on a drug interaction carries more weight than a general analyst's.
  • Temporal recency — information ages at different rates depending on the domain. Brain uses configurable decay rates: a pharmaceutical trial stays relevant for years, while a fintech revenue figure from three months ago is already stale.
Scoring dimensions — relative contribution to consensus
Evidence weight
Number of independent corroborating sources
Source authority
Learned trust score per source, updated with every resolution
Expert validation
Domain expert accuracy, specialization, and calibration
Temporal recency
Domain-configurable decay — pharma: years, fintech: weeks

The result: a single unverified source produces genuine uncertainty — the system doesn't pretend to know. Multiple high-authority sources with expert validation approach high confidence. But the score never reaches absolute certainty — there's always room for new contradictory evidence.

Critically, the scoring uses formal methods from truth discovery research — not ad hoc heuristics. The mathematical foundation ensures consistent, reproducible scores that hold up under audit.

Temporal awareness

Not all information ages the same way. A pharmaceutical efficacy study from 2024 is still relevant in 2026. A fintech company's ARR figure from 2024 is ancient history.

Temporal decay — evidence weight over time by domain

Brain uses domain-configurable decay rates. Pharmaceutical deployments use slow decay — clinical trial data remains weighted for years. Fintech deployments use fast decay — financial metrics lose relevance in weeks. But old evidence is never completely discarded. It just contributes less weight to the consensus score over time.

This means Brain can detect when a "fact" that was true last year has been contradicted by newer evidence — even if no one explicitly flagged it. The temporal dimension surfaces conflicts that static systems miss entirely.

The learning flywheel

Here's where consensus scoring becomes a compounding asset rather than a static system.

When a domain expert resolves a conflict — choosing Option A over Option B — Brain doesn't just update the single relationship. It recalibrates the entire model:

  • The winning source gains authority across all its other claims
  • The losing source loses authority across all its other claims
  • The expert's resolution accuracy is tracked, strengthening their weight in future decisions
  • Every other claim from these sources is recalculated with updated authority scores
The compounding loop
1
Conflict
detected
2
Expert
resolves
3
Sources
recalibrate
4
Graph
improves
5
Fewer false
positives

A team that resolves 50 conflicts per week isn't just cleaning up 50 facts — they're training the consensus model to increasingly favor reliable sources, trust experienced experts, and discount unreliable inputs. Over time, auto-resolution accuracy improves, noise decreases, and the system surfaces fewer but higher-signal conflicts.

This is the compounding dynamic that separates infrastructure from tooling. The more your team uses Brain, the smarter the truth layer gets. Every resolution is an investment that pays dividends across the entire knowledge graph.

Emergent conflicts

One of the more powerful consequences of continuous consensus scoring is the ability to detect emergent conflicts — contradictions that only become visible when two high-confidence claims collide.

Example: Source A says a company's ARR is $10M with strong evidence backing. Source B, uploaded three months later, says ARR is $8M — and it has even stronger source authority. Neither claim was individually suspicious when ingested. But the consensus engine detects that the same entity now has two well-supported contradictory values.

This is a conflict that no LLM-based system would detect. An LLM asked "what is the company's ARR?" would just pick one value and present it confidently. Brain surfaces the conflict, scores both claims, and either auto-resolves (if one side clearly dominates) or escalates to the most qualified expert for that domain.

Domain-configurable autonomy

Not all conflicts require human attention. Brain uses a tiered autonomy model that's configurable per domain:

Autonomy thresholds — configurable per domain
Low severity
Auto-resolve
Log silently
Medium severity
Auto-resolve
Notify expert async
High severity
Block operation
Escalate with context

A pharmaceutical deployment might require human review for nearly everything — patient safety demands it. A general knowledge management deployment might auto-resolve most conflicts and only escalate genuinely contested claims. The thresholds are yours to set.

Try it: interactive consensus

Adjust the inputs below to see how consensus changes. This is a conceptual model — the actual scoring engine uses additional dimensions — but it illustrates the key dynamics.

Interactive — adjust inputs, watch consensus change
Sources1
Avg authority.50
Expert validatedno
Age (days)30d
0.28
LOW CONFIDENCE

Why this matters for agent governance

When you combine consensus scoring with deterministic conflict detection, you get something that doesn't exist anywhere else in the AI infrastructure stack: a continuously self-improving source of truth that agents can query before taking action.

The Query Gate API (POST /gate/query) returns one of three verdicts:

  • ALLOW — the knowledge base has high-confidence, conflict-free coverage for this query. The agent can proceed.
  • WARN — there are active conflicts or low-confidence areas. The agent should include caveats or reduce its confidence.
  • BLOCK — critical conflicts or insufficient coverage. The agent should not proceed without human intervention.

This is deterministic governance. Not a system prompt that can be overridden. Not an LLM checking another LLM's work. A formal query against a consensus-scored graph that returns the same verdict every time, given the same state.

And because the consensus model improves with every resolution, the system gets better at making these verdicts over time. More resolutions → better source authority scores → more accurate auto-resolution → fewer false positives → less noise → more trust.

That's the math behind the truth layer.


See consensus scoring in action

Watch Brain detect conflicts, score claims, and resolve contradictions in a live knowledge graph.

Get a Demo