One of Accord Book's core jobs is to surface project contradictions early enough that a human can still do something useful about them.
A new request conflicts with an earlier decision. A scope change violates a frozen constraint. A later update quietly drifts away from what the team had already agreed.
Those are not generic task-management problems.
They are project-memory and change-control problems.
The benchmark snapshot

On the May 6, 2026 conflict-eval run, Accord Book produced this result across 86 scenarios:
| Outcome | Count |
|---|---|
| True positives | 52 |
| False positives | 0 |
| True negatives | 30 |
| False negatives | 4 |
That yields:
- 100% observed precision
- 92.86% recall
Those numbers are strong enough to matter, but the framing matters too.
The right framing is risk-signal infrastructure for owner review, not automatic contradiction prevention.
Why the zero false positives matter
A conflict system that cries wolf becomes noise.
If owners stop trusting it, it stops helping.
That is why the most important part of this benchmark snapshot is not just the total recall. It is the fact that this run recorded 0 observed false positives across the 86-scenario set, including adversarial negatives.
That is exactly the kind of result that supports conservative deployment.
Where the misses still are
The same run recorded 4 false negatives.
That means there is still work to do, especially in cases where the right pair of facts was not connected early enough in the detection pipeline.
The benchmark summary shows misses concentrated in scope, resource, and constraint-style cases. That is useful because it points to the next improvement target without weakening the main conclusion.
By conflict type
| Conflict type | Recall | Observed precision |
|---|---|---|
| Decision drift | 100.00% | 100.00% |
| Factual contradiction | 100.00% | 100.00% |
| Stakeholder conflict | 100.00% | 100.00% |
| Temporal staleness | 100.00% | 100.00% |
| Constraint violation | 87.50% | 100.00% |
| Resource conflict | 87.50% | 100.00% |
| Scope creep | 75.00% | 100.00% |
This is another reason the owner-review framing is the right one. The system is already strong enough to surface meaningful risks without spamming the user, while still leaving the final judgment to a human.
Why this result fits the Accord Book product story
Conflict detection is a natural place for Accord Book to lead publicly because it combines the parts of the system that already matter most:
- project-scoped memory,
- provenance-linked evidence,
- structured claim comparison,
- durable findings,
- and owner arbitration instead of silent automation.
The practical public claim
The most accurate public takeaway is:
Accord Book can already serve as a conservative conflict-detection layer that surfaces review-worthy contradictions with provenance, while keeping owners in control of the decision.
That is strong, specific, and supported by the current benchmark snapshot.
If you want the deeper architecture story behind the benchmark, pair this with Designing Conflict Detection That Earns Trust.
What shipped after the benchmark
The current repo state matters because the detector is no longer just a benchmark artifact. The pilot checklist now marks owner arbitration as shipped, including an owner-facing conflicts surface with resolve, defer, and reopen actions. Resolution writes a Decision memory item so the review outcome becomes part of the project record instead of living only in a transient alert.
