Accord Book Conflict Detection: 100% Precision, 92.86% Recall on May 2026 Benchmark

On the May 6, 2026 conflict-eval run, Accord Book recorded 52 true positives, 0 observed false positives, 30 true negatives, and 4 false negatives across 86 scenarios.

One of Accord Book's core jobs is to surface project contradictions early enough that a human can still do something useful about them.

A new request conflicts with an earlier decision. A scope change violates a frozen constraint. A later update quietly drifts away from what the team had already agreed.

Those are not generic task-management problems.

They are project-memory and change-control problems.

The benchmark snapshot

Conflict detector benchmark summary with true positives, zero observed false positives, true negatives, and false negatives on the May 6, 2026 evaluation run

On the May 6, 2026 conflict-eval run, Accord Book produced this result across 86 scenarios:

Outcome	Count
True positives	52
False positives	0
True negatives	30
False negatives	4

That yields:

100% observed precision
92.86% recall

Those numbers are strong enough to matter, but the framing matters too.

The right framing is risk-signal infrastructure for owner review, not automatic contradiction prevention.

Why the zero false positives matter

A conflict system that cries wolf becomes noise.

If owners stop trusting it, it stops helping.

That is why the most important part of this benchmark snapshot is not just the total recall. It is the fact that this run recorded 0 observed false positives across the 86-scenario set, including adversarial negatives.

That is exactly the kind of result that supports conservative deployment.

Where the misses still are

The same run recorded 4 false negatives.

That means there is still work to do, especially in cases where the right pair of facts was not connected early enough in the detection pipeline.

The benchmark summary shows misses concentrated in scope, resource, and constraint-style cases. That is useful because it points to the next improvement target without weakening the main conclusion.

By conflict type

Conflict type	Recall	Observed precision
Decision drift	100.00%	100.00%
Factual contradiction	100.00%	100.00%
Stakeholder conflict	100.00%	100.00%
Temporal staleness	100.00%	100.00%
Constraint violation	87.50%	100.00%
Resource conflict	87.50%	100.00%
Scope creep	75.00%	100.00%

This is another reason the owner-review framing is the right one. The system is already strong enough to surface meaningful risks without spamming the user, while still leaving the final judgment to a human.

Why this result fits the Accord Book product story

Conflict detection is a natural place for Accord Book to lead publicly because it combines the parts of the system that already matter most:

project-scoped memory,
provenance-linked evidence,
structured claim comparison,
durable findings,
and owner arbitration instead of silent automation.

The practical public claim

The most accurate public takeaway is:

Accord Book can already serve as a conservative conflict-detection layer that surfaces review-worthy contradictions with provenance, while keeping owners in control of the decision.

That is strong, specific, and supported by the current benchmark snapshot.

If you want the deeper architecture story behind the benchmark, pair this with Designing Conflict Detection That Earns Trust.

What shipped after the benchmark

The current repo state matters because the detector is no longer just a benchmark artifact. The pilot checklist now marks owner arbitration as shipped, including an owner-facing conflicts surface with resolve, defer, and reopen actions. Resolution writes a Decision memory item so the review outcome becomes part of the project record instead of living only in a transient alert.

Owner arbitration workflow from finding to review decision