1Beyond the Obvious Five
The series Introduction catalogs sixteen failure modes that affect agentic AI workflows, organized into four priority tiers. The five-principle architecture and seven sub-agent archetypes from Parts 1–6 effectively address five of those modes: context decay, completeness gaps, behavioral drift, role confusion, and audit trail gaps. These are real and important—but when they occur, the agent produces obviously incomplete or inconsistent results.
The failures that should worry you more are the ones that produce output that looks correct on casual inspection but isn't. A goal-substituted review is thorough and well-organized; it just answers the wrong question. An error-compounded analysis is internally consistent; it just builds on a false premise. A phantom-grounded finding has the right format, cites a real-looking file path, and describes a plausible vulnerability—except the code doesn't actually do what the finding claims.
This matters even more because of a dynamic that no agent architecture can fully solve: automation complacency. The first time a developer reviews an agent's PR, they read every line. By the fiftieth, they're rubber-stamping. The consistent professional quality of agent output creates a false sense of reliability—and the silent corruption modes in Tiers 1 and 2 are precisely the failures that exploit this gap. Structural defenses—checkpoint verification, provenance chains, premise analysis—exist because human vigilance is an unreliable backstop, especially at scale.
This article provides concrete structural remediations across all four priority tiers. Every remediation is designed as a minimal addition to the existing architecture: new sub-agents, new files, orchestration patterns, and targeted prompt enhancements. No existing components need to be removed or restructured.
The existing architecture has strong completeness controls (was everything done?) but weak correctness controls (was what was done actually right?) and no degradation controls (is the workflow getting worse as it runs?). The remediations across all four tiers close these gaps systematically—transforming every category of invisible failure into visible, reviewable artifacts.
2What the Existing Architecture Already Handles
The five-principle architecture from Parts 1–6 was designed to address the original five failure modes. It does so effectively—and its structural properties provide partial coverage against several of the additional eleven. The table below maps each failure mode to its current status.
| Failure Mode | Tier | Existing Mitigation | Status |
|---|---|---|---|
| Context decay | 1 | Principle 1: externalize state to files; sub-agent isolation limits per-invocation context | ✓ Addressed |
| No audit trail | 1 | Principle 4 + file pipeline (checklist → plan → findings → remediation-log) | ✓ Addressed |
| Error compounding | 1 | Phase boundaries exist but no verification between phases | ✗ Gap |
| Completeness gaps | 2 | Principles 2, 3, 5: checklist → validator → auditor triple layer | ✓ Addressed |
| Goal substitution | 2 | Validator checks completeness but not fidelity to objective | ✗ Gap |
| Invisible assumptions | 2 | No mechanism for surfacing interpretive decisions | ✗ Gap |
| Phantom grounding | 2 | No evidence verification mechanism | ✗ Gap |
| Behavioral drift | 3 | Sub-agent architecture isolates critical functions in fresh contexts | ✓ Addressed |
| Role confusion | 3 | Seven archetypes with enforced tool restrictions | ✓ Addressed |
| Premature convergence | 3 | Auditor sub-agent checks for incomplete items (partial) | ◐ Partial |
| Recovery failure | 3 | Not addressed by existing architecture | ✗ Gap |
| State confusion | 4 | Not addressed by existing architecture | ✗ Gap |
| Overconfidence uniformity | 4 | Not addressed by existing architecture | ✗ Gap |
| Scope creep | 3 | State snapshots catch effects; no preventive mechanism | ◐ Partial |
| Coordination divergence | 3 | Orchestrator mediates all sub-agent outputs (serial execution) | ◐ Partial |
| Tool model mismatch | 3 | Implementer-then-verifier catches effects; no preventive mechanism | ◐ Partial |
The gaps cluster in two places: the modes that produce plausible-looking output (Tier 2) and the modes that degrade silently over time (Tier 3). The architecture has strong completeness controls but weak correctness and degradation controls. The three newer Tier 3 modes—scope creep, coordination divergence, and tool model mismatch—are partially covered by existing structural patterns (state diffs, serial orchestration, verifier checks) but lack explicit preventive mechanisms. The remediations below address that asymmetry across all four tiers.
3Tier 1 Remediations: Structural Risks
Context decay and audit trail are already well-addressed. Error compounding is the Tier 1 gap. The current phased approach (PLAN → EXECUTE → AUDIT → IMPLEMENT) creates natural phase boundaries, but there is no mechanism to detect when an error in an early phase has silently corrupted everything downstream.
The specific problem: if the scanner misidentifies a code pattern, misreads a file, or fabricates a finding, the writer will document it as real, the auditor will see a checklist item with a corresponding finding and mark it complete, and the implementer may make a real code change based on a false premise. Every sub-agent downstream treats upstream output as ground truth. The existing auditor checks for completeness (was everything done?) but not for correctness (was what was done actually right?).
Three remediations address this. They also strengthen the existing audit trail and context decay mitigations.
Remediation 1: The Checkpoint-Verifier Sub-Agent
New Sub-AgentThis is the most impactful single addition to the architecture. The current workflow has a clean handoff from scanner → writer → auditor, but nothing verifies that the scanner's raw output is actually grounded in reality before the writer consumes it.
The checkpoint-verifier sits between phases and spot-checks upstream outputs against the actual codebase. After the scanner produces results for a category, the checkpoint-verifier takes a sample of findings (3–5), goes back to the cited file and line number, and confirms that what the scanner described actually exists there. It flags findings where the evidence doesn't match—wrong file, wrong line, described behavior that isn't present, or code that doesn't exist.
Spot-checks upstream sub-agent outputs against the source of truth. Sits between workflow phases to catch phantom grounding, misreading, and error compounding before they propagate downstream.
View Full Sub-Agent Definition
---
name: checkpoint-verifier
description: "Spot-checks upstream sub-agent outputs against source
of truth. Read-only evidence verification."
tools: Read, Grep, Glob
model: sonnet
---
You are a fact-checker for AI-generated findings. Your job is to
verify that findings are grounded in reality.
For each finding you are given:
1. Go to the cited file and line number
2. Read the actual code (10 lines of context minimum)
3. Compare what the finding claims against what actually exists
4. Classify as:
- VERIFIED: Evidence matches the claim
- MISMATCH: Code exists but doesn't match the description
- NOT_FOUND: Cited file or line doesn't exist or contains
unrelated code
- OVERSTATED: Issue exists but severity or impact is exaggerated
CRITICAL RULES:
1. You MUST check at least [N] findings per category
2. You are FORBIDDEN from assuming a finding is correct without
reading the cited file
3. For each finding checked, quote the ACTUAL code you found at
the cited location
4. If the actual code differs from the finding's description,
explain the discrepancy specifically
Report:
- Verification rate: [VERIFIED count] / [total checked]
- Each finding checked with classification and evidence
- List of MISMATCH and NOT_FOUND findings for removal
- Recommendation: PROCEED if verification rate >= 80%,
RE-SCAN if below 80%
The orchestration prompt inserts this between the scanner and writer phases, and again between the auditor and implementer phases. If the verification rate drops below 80% VERIFIED, the orchestrator re-runs the scanner for that category rather than proceeding with corrupted data.
Remediation 2: Provenance Chains in the Findings Template
Template ChangeThe current findings templates require evidence (file path, line number, code snippet), which is good. But they don't track which sub-agent produced the finding, what checklist item it traces to, and whether the evidence was independently verified. Adding a provenance section transforms the findings document from a flat list into a traceable chain. If an implementer makes a code change based on a finding that was never verified, that's visible. If a finding doesn't trace to any checklist item, the auditor can flag it as an orphan.
Add these fields to the bottom of every finding in the findings template:
**Provenance:**
- Source: [scanner | validator | triage | manual]
- Checklist item: [ID from checklist.md]
- Verified by: [checkpoint-verifier | not yet verified]
- Verification result: [VERIFIED | MISMATCH | NOT_FOUND | PENDING]
The cost is four extra lines per finding. The benefit is that error compounding becomes auditable rather than invisible—you can trace any downstream action back through the chain to its evidential foundation and identify where corruption entered.
Remediation 3: Re-Grounding Gates at Phase Boundaries
Orchestration ChangeThis addresses both error compounding and a subtle form of context decay: the orchestrator's progressive drift from the original task description. Over a long workflow with many sub-agent invocations, the orchestrator may lose sight of why the review was initiated—what the actual concern was, what the scope boundaries were, what matters most.
At each phase boundary (PLAN → EXECUTE, EXECUTE → AUDIT, AUDIT → IMPLEMENT), the orchestration prompt should explicitly re-inject the original task context—not a summary of what's been done so far, but the original request, scope definition, and priority criteria.
## Phase Transition Gate
Before proceeding to [NEXT PHASE]:
1. Re-read the original task description and scope constraints
2. Re-read checklist.md to confirm alignment with original objectives
3. Verify that findings so far are within the defined scope
4. If scope has drifted (findings about things not in the original
request), flag for review before proceeding
5. Summarize: "Original objective was X. Current state is Y.
Proceeding to [NEXT PHASE] because Z."
This is low-cost and serves double duty: it catches scope drift (a form of goal substitution) and it forces the orchestrator to articulate the connection between what it's done and what it was asked to do, making both error compounding and behavioral drift more visible.
4Tier 2 Remediations: Silent Corruption
Tier 2 failures are structurally different from Tier 1. Error compounding (Tier 1) is about propagation—a correct process operating on corrupted inputs. Tier 2 failures are about corruption at the source—the sub-agent itself produces output that looks right but isn't. The fixes live mostly inside individual sub-agent definitions and in the file contracts between them.
Remediation 4: Goal-Fidelity Checks
Prompt Addition — Validator Prompt Addition — Auditor Orchestration ChangeAddresses: Goal Substitution
Goal substitution is the most insidious Tier 2 failure because the architecture's strongest existing defenses—checklists and validators—are orthogonal to it. The validator checks whether the checklist is complete (are there gaps?) but not whether it is faithful (does it actually address what was asked?). You can score perfectly on completeness and still have answered the wrong question.
A concrete example: an orchestration prompt says "review this API for backward compatibility risks." The scanner produces a thorough catalog of every endpoint, with schemas, naming conventions, and authentication patterns. The findings document is meticulous. The auditor confirms every checklist item has a corresponding finding. But the scanner cataloged rather than assessed risk—it described what exists rather than evaluating what would break for existing consumers. The output reads beautifully and is completely useless for the stated goal.
Addition to the Validator Sub-Agent Prompt
View Validator Rule Zero
0. GOAL FIDELITY CHECK — run this BEFORE reviewing for gaps.
Re-read the stated objective (first line of plan.md).
For every checklist item, verify it is an ACTIONABLE CHECK
that directly serves the stated objective.
A checklist item like "Catalog all endpoints" is a TASK,
not a CHECK.
A checklist item like "For each endpoint, identify request
and response fields that would break existing consumers if
changed" is a CHECK.
Flag any item that describes work to do without specifying
what question the work answers relative to the objective.
This is your FIRST priority. A complete checklist that
doesn't serve the objective is worse than an incomplete
one that does.
Addition to the Auditor Sub-Agent Prompt
View Auditor Goal-Fidelity Section
## Goal Fidelity Assessment
Re-read the original task description (first line of plan.md).
For each finding, answer: does this finding help a human make
a decision about the stated objective?
- RELEVANT: Finding directly addresses the objective
- TANGENTIAL: Finding is true but doesn't help with the
stated goal
- SUBSTITUTE: Finding answers a different, easier question
than the one asked
If more than 20% of findings are TANGENTIAL or SUBSTITUTE,
flag the entire review as potentially goal-substituted and
recommend re-running the scanner with tighter scoping.
Orchestration Change: Capture the Objective
The orchestration prompt should capture the objective as the literal first line of plan.md in a standardized format:
OBJECTIVE: [exact text from the original request]
Currently the plan template tracks execution status but doesn't preserve the original intent in a way that's easy for sub-agents to reference. This one-line change closes that gap.
Remediation 5: Premise Analysis
New Sub-Agent — Premise Analyst Prompt Addition — Scanners New File — premise-report.mdAddresses: Invisible Assumptions
This is the Tier 2 failure where the existing architecture has the biggest structural gap. Checklists track what to check. Plans track execution status. Findings track results. But nowhere in the pipeline does the agent record the reasoning that connects evidence to conclusion—and the premises embedded in that reasoning are where the most dangerous assumptions hide.
An early version of this remediation used a self-reported assumptions log: scanners were instructed to log ambiguities as they noticed them. This approach is structurally flawed. The most dangerous assumptions are ones the scanner doesn't recognize as assumptions—training-data priors that feel like "just knowing" rather than "deciding." A scanner that sees execute_query(user_input) and flags it as a SQL injection risk is applying a training-data prior, not a deliberate standard. It won't log this as an assumption because from its perspective, it's not one.
The solution has two parts: reasoning chains on every finding and a premise analyst sub-agent that extracts premises across all findings.
Part A: Reasoning Chain Requirement for Scanners
Instead of asking the scanner to self-report assumptions as a separate activity, require it to show its reasoning as part of every finding. This is a more natural task—models are trained to explain reasoning, and chain-of-thought prompting is one of the most reliable techniques for improving output quality. The assumptions emerge as a byproduct of producing better-reasoned findings.
View Reasoning Chain Requirement
REASONING CHAIN — required for EVERY finding.
After recording the finding with its evidence and
provenance, provide your reasoning in four steps:
STEP 1 — OBSERVATION: What I Saw
What specific code, configuration, pattern, or behavior
did I observe? Quote or reference the exact evidence.
This should match the provenance chain.
STEP 2 — STANDARD: What I Compared Against
What standard, convention, best practice, or expectation
am I using to evaluate this observation? Be specific.
- If it's from the checklist, cite the checklist item.
- If it's a published standard, name it (OWASP, CWE,
RFC, framework docs).
- If it's a general practice, say "common practice"
and describe what the practice is.
- If you're not sure where you learned it, say that.
STEP 3 — INFERENCE: Why This Is a Problem
How does the gap between observation (Step 1) and
standard (Step 2) constitute a problem? What could
go wrong? Be specific about the failure mode, not
just "this is a security risk."
STEP 4 — CONTEXT ASSUMED: What Must Be True
What am I assuming about this project that makes
Steps 1–3 valid?
- Framework or language version
- Deployment environment (public-facing, internal)
- User trust level (authenticated, anonymous, admin)
- Architecture (monolith, microservices, serverless)
- Anything else that, if wrong, would invalidate
this finding
CRITICAL: Do not skip Step 2. "It's a well-known
vulnerability" is not a standard — name the specific
standard or practice you're applying.
Part B: The Premise Analyst Sub-Agent
Reasoning chains give you premises per finding. The premise analyst gives you premises across findings—shared assumptions, contradictions, and dependencies that no individual scanner invocation can see because each runs in an isolated context.
Reads all reasoning chains from all scanner invocations. Produces a consolidated premise report identifying shared premises, contradictions, ungrounded inferences, and inter-finding dependencies. Replaces the self-reported assumptions log.
View Full Sub-Agent Definition
ROLE: Premise Analyst
PURPOSE: Extract and cross-reference the premises
underlying all findings to identify patterns that
individual scanner invocations cannot see.
INPUT: All scanner outputs with reasoning chains,
checklist.md, plan.md (for OBJECTIVE line)
OUTPUT: premise-report.md with four sections:
## 1. Shared Premises
Premises that appear in reasoning chains of multiple
findings. For each:
- The premise (stated once)
- Finding IDs that depend on this premise
- Blast radius: how many findings become invalid
if this premise is wrong
- Confidence: grounded in evidence, or inferred?
## 2. Contradictory Premises
Cases where different scanner invocations used
contradictory premises. For each:
- Both premises stated
- Which findings depend on each side
- Which premise is better supported by evidence
## 3. Ungrounded Premises
Findings where Step 2 cites vague authority
("common practice," "security best practice")
without a specific standard. For each:
- The premise as stated
- Finding IDs affected
- Suggested specific standard (if you know one)
## 4. Inter-Finding Dependencies
Cases where one finding's conclusion is used as
another finding's premise. For each:
- Upstream finding ID and its conclusion
- Downstream finding ID and how it uses that
conclusion
- Whether the upstream finding was verified
CRITICAL: You analyze reasoning. You do NOT modify
findings. You do NOT validate evidence (that's the
checkpoint-verifier's job). You read reasoning
chains and identify patterns across them.
The premise analyst runs as Phase 2.5—after all scanners and the checkpoint-verifier have completed, but before the writer formats findings. The blast radius counts in the premise report are the most actionable piece: a shared premise with a blast radius of 8 is a single point of failure for a large fraction of the review's findings. If the human reviewer has fifteen minutes, validating one high-blast-radius premise is a better use of time than reading eight individual findings.
The old instruction said: "When you encounter ambiguity, log it." This requires the scanner to recognize ambiguity, decide it's worth logging, and formulate the entry as a separate activity. The reasoning chain says: "Explain why you think this is a problem." The assumptions emerge as a byproduct of explaining the reasoning, not as a separate introspective task. And the premise analyst's cross-finding analysis—shared premises, contradictions, dependency chains—is structurally impossible for any individual scanner invocation to perform.
Remediation 6: Evidence Verification for Phantom Grounding
Prompt Addition — Scanners Orchestration ChangeAddresses: Phantom Grounding
The checkpoint-verifier (Remediation 1) catches phantom grounding between phases, but there's a cheaper first line of defense: make the scanner's own output format harder to fabricate. Currently the scanner reports findings with a file path and line number, which is easy for a model to produce even when it hasn't actually read the file. Adding a requirement to include a verbatim code quote changes the economics—the model has to either actually read the file (correct behavior) or fabricate plausible-looking code (harder and more likely to be caught).
View Scanner Evidence Requirements
EVIDENCE REQUIREMENTS:
For every finding, you MUST include:
- Exact file path (verify by re-reading the file before
reporting)
- Line number range (not a single line — the range
containing the relevant code)
- VERBATIM code quote: Copy the exact lines from the file.
Do not paraphrase or summarize code. If you cannot quote
the exact code, you do not have sufficient evidence for
the finding.
Before finalizing your results for a category, re-read at
least 3 of the files you've cited to verify your line
numbers are still accurate.
Orchestration Change: File Existence Verification
The orchestrator should verify that cited files exist before passing scanner results to the writer:
After receiving scanner results for each category:
1. Extract all file paths cited in findings
2. Verify each file exists (ls or glob)
3. If any cited file does not exist, remove the finding
and log it as PHANTOM in premise-report.md
4. Only pass verified findings to the writer sub-agent
This won't catch findings where the file exists but the described behavior is wrong—that's the checkpoint-verifier's job—but it eliminates the cheapest fabrications at near-zero cost.
Remediation 7: First-Principles Completeness Check
Prompt Addition — ValidatorAddresses: Completeness Gaps (enhancement)
Completeness gaps are already the best-addressed failure mode in the architecture. The residual risk is that all three layers share the same blind spot: they can only check for categories that someone thought to include. Add a "from first principles" section to the validator prompt that runs after the standards-based review:
View First-Principles Check
FIRST-PRINCIPLES CHECK — run AFTER the standards-based review.
Ignore the checklist entirely. Look at the actual codebase
or system under review.
Ask: "If I were building this from scratch, what could go
wrong that would be embarrassing or dangerous?"
List anything that comes to mind that is NOT already on the
checklist. These are your most valuable findings — the things
no standard thought to include because they're specific to
this particular codebase.
This creates a second cognitive path that isn't anchored on
the canonical checklist.
This is a one-paragraph addition. It creates a second cognitive path—one that isn't anchored on the canonical checklist—which is often enough to catch the "obvious in hindsight" categories that standards-based reviews miss.
5Tier 3 Remediations: Degradation Over Time
Tier 3 failures are structurally different from Tiers 1 and 2. They're not about corrupted evidence or wrong objectives—they're about the workflow gradually losing effectiveness as it runs. Four of the seven Tier 3 modes—behavioral drift, role confusion, coordination divergence, and tool model mismatch—are already substantially or partially addressed by the existing architecture. The remediations here target residual risk and provide full solutions for the modes that aren't yet addressed.
Remediation 8: Orchestrator Re-Anchoring Protocol
Orchestration ChangeAddresses: Behavioral Drift (residual risk in the orchestrator)
Sub-agent isolation already prevents behavioral drift within individual sub-agents. The remaining risk is in the orchestrator itself, which runs continuously across the entire workflow. Over a long session with many sub-agent invocations, the orchestrator may progressively soften quality requirements, accept marginal output it would have rejected earlier, or skip steps it considers "probably fine."
View Orchestrator Re-Anchoring Protocol
DRIFT PREVENTION — PERIODIC SELF-CHECK
Every 3 sub-agent invocations, pause and run this check:
1. Re-read the OBJECTIVE line from plan.md
2. Re-read your orchestration instructions (this prompt)
3. Review the last 3 sub-agent outputs you accepted:
- Did you accept any output that was incomplete or
below the quality bar?
- Did you skip any verification step because you
assumed correctness?
- Did you soften any requirement from the original
instructions?
4. If the answer to any of the above is YES:
- Log what you softened and why in premise-report.md
- Re-read the full orchestration prompt before
proceeding
- Consider re-running the sub-agent whose output
you accepted too readily
5. If the answer to all is NO, proceed normally.
This check takes ~30 seconds. The cost of drift is
re-running the entire workflow.
The key design choice is making this periodic (every N invocations) rather than continuous. Continuous self-monitoring would consume context budget and actually contribute to context decay.
Remediation 9: Definition-of-Done Completion Criteria
Template Change — Checklist Prompt Addition — AuditorAddresses: Premature Convergence
The current auditor check is binary: is there a finding for this checklist item, yes or no? A single shallow finding satisfies the check even if the item required thorough investigation across multiple files. Add explicit, testable completion criteria to each checklist item:
## [SEC-03] Error Handling Review
- **Check:** Verify all API endpoints handle errors without
leaking internal state
- **Completion criteria:**
1. Every controller/handler file was opened and read
2. Each endpoint's error path was examined (not just happy path)
3. Finding includes count: "Reviewed N of M endpoints"
4. Any endpoint NOT reviewed is listed with reason
- **Done when:** All criteria met, or gaps explicitly
logged as PARTIAL with justification
View Auditor Completion-Criteria Verification
COMPLETION QUALITY CHECK — run for EVERY checklist item.
Do not just check whether a finding exists for this item.
Check whether the finding SATISFIES THE COMPLETION CRITERIA.
For each checklist item:
1. Read the completion criteria
2. Read the corresponding finding(s)
3. Verify each criterion is met
4. Classify as:
- COMPLETE: All criteria satisfied
- PARTIAL: Some criteria met, gaps acknowledged
- SHALLOW: Finding exists but doesn't meet criteria
- MISSING: No finding at all
SHALLOW is worse than MISSING — it creates false
confidence. Flag all SHALLOW items for re-scanning.
GAP CLASSIFICATION — for MISSING and SHALLOW items:
- SKIPPED: No scanner invocation was attempted.
→ Safe for auto-completion (re-invoke scanner)
- BLOCKED: Scanner was invoked but failed.
→ NOT safe for auto-completion (will fail again)
- EMPTY: Scanner ran and genuinely found nothing.
→ NOT safe for auto-completion (risks fabrication)
This classification feeds the auto-completion loop:
only SKIPPED items are re-scanned automatically.
A MISSING item is visible—the audit report flags it, and the human reviewer knows to investigate. A SHALLOW item looks like it was completed. The finding exists, it references the right category, and a casual reviewer will skip past it. The damage is false confidence: the team believes error handling was reviewed when really one of forty endpoints was checked.
The Auto-Completion Loop
The gap classification enables Phase 3.5: for each SKIPPED item, the orchestrator automatically re-invokes the scanner, runs the checkpoint-verifier on new results, and merges verified findings. The loop runs exactly once. Repeated auto-remediation risks the scanner producing lower-quality findings under implicit "you must find something" pressure—a subtle form of phantom grounding.
Remediation 10: Error Recovery Scaffolding
Orchestration ChangeAddresses: Recovery Failure
This is the Tier 3 failure mode with no existing mitigation at all. When the orchestrator hits an error, it either loops on the same failing approach or silently skips the problematic step. Both are bad.
View Error Recovery Scaffolding
ERROR RECOVERY PROTOCOL
When a sub-agent fails or returns unusable output:
ATTEMPT 1 — Retry with the same approach
- Re-run the sub-agent with the same inputs
- Some failures are transient
ATTEMPT 2 — Retry with variation
- Reduce scope (scan fewer files per invocation)
- Change granularity (scan by directory instead of
by category)
- Simplify the task (split complex checklist items)
- Log what you changed and why
ATTEMPT 3 — Abandon and log
- Log the failure in the audit report:
Status: BLOCKED
Checklist item: [ID]
Approach 1: [what was tried, what failed]
Approach 2: [what was tried, what failed]
Recommendation: Manual review required
- Proceed to the next checklist item
CRITICAL RULES:
1. NEVER try the same failing approach more than twice
2. NEVER silently skip a failed item — it MUST appear
as BLOCKED in the audit report
3. NEVER let a single failure block the entire workflow
4. After 3 BLOCKED items in the same category, pause
and flag for human review
The key principle: visible failure is better than invisible skipping. A BLOCKED item in the audit report gets human attention. A silently skipped item gets nothing.
Remediation 11: Orchestrator Role Boundaries
Prompt Addition — OrchestratorAddresses: Role Confusion (residual risk in orchestrator)
The eight archetypes with enforced tool restrictions make it structurally impossible for a scanner to write files or for a validator to approve work. The residual risk is in the orchestrator, which coordinates across all phases and has access to all tools. Under pressure, the orchestrator may start doing sub-agent work itself.
View Orchestrator Role Boundary Rules
ROLE BOUNDARIES — WHAT THE ORCHESTRATOR DOES AND DOES NOT DO
The orchestrator COORDINATES. It does not EXECUTE.
YOU DO:
- Invoke sub-agents with appropriate inputs
- Pass outputs between sub-agents
- Track progress against the plan
- Make sequencing decisions (what runs next)
- Run lightweight verification (file existence checks)
- Apply the error recovery protocol
- Manage re-grounding gates at phase transitions
YOU DO NOT:
- Read source code files to assess them yourself
(that's the scanner's job)
- Decide whether a finding is valid or invalid
(that's the checkpoint-verifier's job)
- Identify gaps in the checklist yourself
(that's the validator's job)
- Modify findings content
(that's the writer's job)
- Make implementation decisions
(that's the human's job)
If you find yourself reading source code for any reason
other than passing it to a sub-agent, STOP. You are
doing a sub-agent's job.
Remediation 12: Scope Boundary Enforcement
Prompt Addition — Implementer Prompt Addition — Scanners Orchestration ChangeAddresses: Scope Creep
Scope creep is the structural mirror of premature convergence: where premature convergence is the agent being too lazy, scope creep is the agent being too diligent. An implementer asked to fix a null pointer check also refactors the surrounding function "while it's in there." In read-only workflows, scope creep wastes tokens. In write workflows, it modifies files outside the agent's mandate—potentially breaking things that were working fine.
View Implementer Scope Boundary Rules
SCOPE BOUNDARIES — STAY IN YOUR LANE
You are authorized to modify ONLY the files and code
regions specified in the finding you were given.
BEFORE making any change, verify:
1. Is this file listed in the finding? If NO → STOP.
2. Is this code region within the line range cited?
If NO → STOP.
3. Is this change directly required to resolve the
finding? If NO → STOP.
If you notice issues OUTSIDE the current finding's scope:
- Do NOT fix them
- Do NOT refactor adjacent code
- DO log them as a brief note:
"NOTICED: [file:line] [brief description] — out of
scope for current finding, recommend separate review"
Your job is the scoped fix and ONLY the scoped fix.
View Scanner Scope Boundary Rules
SCOPE BOUNDARIES — STAY ON TASK
Report ONLY findings that address the checklist item
you were invoked for.
If you notice issues outside the current scope:
- Do NOT investigate them
- Do NOT include them in your main findings
- DO log them in a DEFERRED section at the end:
## DEFERRED — Out of Scope Observations
- [file:line] [brief description] [which checklist
item it might relate to]
The orchestrator will route deferred items to the
appropriate scanner invocation.
Orchestration Change: Scope Verification
After each implementer invocation:
1. Compare files modified against files listed in the finding
2. If any file was modified that is NOT in the finding:
- Flag as SCOPE_CREEP in the remediation log
- Revert the out-of-scope change if possible
- Log the change for human review
3. Only pass in-scope changes to the verifier
Remediation 13: Cross-Scanner Consistency Reconciliation
Orchestration ChangeAddresses: Coordination Divergence
The premise analyst (Remediation 5) detects contradictions across scanner reasoning chains—but it doesn't resolve them. Detection and resolution are deliberately separated: the premise analyst is a read-only analytical agent; the orchestrator is where operational decisions happen. Without an explicit protocol for acting on the premise analyst's findings, the orchestrator will do what LLMs do by default—quietly pick whichever premise appeared most recently and move on, merging contradictory scanner outputs without noticing the conflict.
This reconciliation step is the orchestrator's action protocol for premise-level contradictions. It depends on premise-report.md as its primary input and cannot function without it. But it also adds an independent check that the premise analyst cannot perform: entity-level consistency across scanner outputs. When three scanners each analyze a different category of the same codebase, they may diverge not just on articulated premises but on operational facts that never surface in reasoning chains—one scanner treating a component as stateless while another assumes it maintains session state, or one scanner identifying an endpoint as internal while another's findings assume it's public-facing. These sub-reasoning-chain divergences won't appear in the Contradictory Premises section because they were never stated as premises in the first place.
View Cross-Scanner Consistency Check
CONSISTENCY RECONCILIATION — RUN AFTER ALL SCANNERS COMPLETE
Before passing scanner results to the writer:
1. Read premise-report.md and check the Contradictory
Premises section
2. For each contradiction identified by the premise analyst:
- Determine which premise is better supported
- Flag affected findings as DEPENDS_ON_RECONCILIATION
- Include the conflict in the auditor's review scope
3. If no contradictions: proceed normally
The premise analyst identifies the contradictions;
the orchestrator decides how to handle them.
The protocol above handles contradictions the premise analyst surfaces. The following lightweight check addresses the residual gap—divergences in how scanners treat shared entities that never rise to the level of articulated premises.
View Entity-Level Consistency Check
ENTITY CONSISTENCY CHECK — RUN ALONGSIDE PREMISE RECONCILIATION
Before passing scanner results to the writer, compare
how different scanner invocations characterize shared
components:
1. Extract key entities referenced by multiple scanners
(endpoints, services, data stores, modules, external
dependencies)
2. For each shared entity, check whether scanners agree on:
- What it is (library vs. service, stateless vs. stateful)
- How it's accessed (internal vs. public-facing)
- What trust level it operates at
- Whether it handles sensitive data
3. If characterizations conflict:
- Flag affected findings as DEPENDS_ON_RECONCILIATION
- Note: these conflicts will NOT appear in
premise-report.md because they were never stated
as explicit premises
- Include the conflict in the auditor's review scope
This check gives the reconciliation step independent
value beyond acting on the premise analyst's output.
Remediation 14: Irreversibility Gates
Prompt Addition — Implementer Orchestration ChangeAddresses: Tool Model Mismatch
Tool model mismatch is the hardest Tier 3 failure to prevent proactively because you can't prompt-engineer away a competence gap. The most effective structural defense: before executing any destructive or hard-to-reverse operation, the agent must articulate what the operation does, whether it's reversible, and what the rollback procedure is.
View Irreversibility Gate
IRREVERSIBILITY GATE — DESTRUCTIVE OPERATIONS
Before executing ANY operation that modifies persistent
state, answer these questions:
1. WHAT does this operation do?
(Describe in plain language)
2. IS this operation reversible?
- YES → proceed, but note the rollback command
- PARTIALLY → proceed with caution, document what
can't be rolled back
- NO → STOP. Do not execute without human approval.
3. WHAT is the rollback procedure?
(Specific commands, not "undo the change")
4. WHAT happens if this operation fails midway?
Operations that ALWAYS require this gate:
- Database migrations (CREATE, ALTER, DROP)
- File deletions (rm, unlink)
- Git operations that rewrite history
- Package or dependency changes affecting lockfiles
- Infrastructure changes (container configs, CI/CD)
- Permission or access control changes
If you cannot articulate the rollback procedure for
an irreversible operation, log it as NEEDS_HUMAN_REVIEW
and move to the next finding. Do NOT guess.
Tool model mismatch is fundamentally different from the other failure modes because no amount of architectural design can give the model knowledge it doesn't have. What the irreversibility gate does is change the failure mode: instead of confidently executing a wrong command, the agent either (a) explains its understanding correctly, giving the verifier something to check, (b) reveals its misunderstanding, giving the human a signal to intervene, or (c) correctly identifies uncertainty and escalates. All three outcomes are better than silent misuse.
6Tier 4 Remediations: Tractable but Important
Tier 4 contains two failure modes that are real and worth fixing but fundamentally more tractable than the higher tiers. State confusion is a bookkeeping problem with a bookkeeping solution. Overconfidence uniformity is a communication problem with a formatting solution.
Remediation 15: State Snapshots and Diff Verification
Orchestration Change Prompt Addition — VerifierAddresses: State Confusion
The current implementer-then-verifier pattern partially addresses state confusion: the verifier re-reads the modified file. But the verifier doesn't know what the file looked like before the change, so it can't detect unintended side effects.
Before invoking the implementer for each finding:
1. Read the target file(s) and save content as BEFORE_STATE
2. Invoke the implementer with scoped instructions
3. After implementer completes, read the same file(s) as AFTER_STATE
4. Pass BEFORE_STATE and AFTER_STATE to the verifier
View Verifier State-Diff Check
STATE-DIFF VERIFICATION
You will receive BEFORE_STATE and AFTER_STATE for each
implementation. Verify that:
1. INTENDED CHANGE APPLIED:
- The specific fix was implemented correctly
- The fix addresses the root cause, not just the symptom
2. NO UNINTENDED CHANGES:
- Compare BEFORE_STATE and AFTER_STATE line by line
- Flag ANY change not explained by the implementation:
- COSMETIC: Whitespace, formatting (low risk)
- REFACTOR: Code restructuring beyond fix (medium risk)
- FUNCTIONAL: Logic changes beyond fix (high risk)
- SCOPE CREEP: Changes to unmentioned code (high risk)
3. STATE CONSISTENCY:
- Does the file still parse/compile after changes?
- Are imports, dependencies, and references intact?
Report:
- Intended change: APPLIED | PARTIALLY APPLIED | NOT APPLIED
- Unintended changes: NONE | COSMETIC ONLY | FLAGGED [list]
- State consistency: CONSISTENT | ISSUES [list]
Remediation 16: Confidence Calibration
Template Change — Findings Prompt Addition — ScannersAddresses: Overconfidence Uniformity
When every finding is presented with identical confidence, the human reviewer has no signal for where to apply scrutiny. Add a confidence field to each finding:
**Confidence:** [HIGH | MEDIUM | LOW | UNCERTAIN]
**Confidence basis:** [one-sentence justification]
View Scanner Confidence Calibration Rules
CONFIDENCE CALIBRATION
For every finding, assign a confidence level and explain why.
HIGH — You read the exact code and it clearly exhibits the
described behavior. The pattern is unambiguous.
MEDIUM — The code likely exhibits the behavior but context
could change the interpretation. You checked the file but
the relevant code spans multiple files you may not have
fully traced.
LOW — You're inferring from indirect signals (naming, file
structure, comments). The finding depends on runtime
behavior you can't observe from static analysis.
UNCERTAIN — You're flagging something unusual but genuinely
don't know if it's a problem. You want human attention but
aren't making a claim.
CRITICAL RULES:
1. Do NOT default to HIGH. Most findings should be MEDIUM —
you are reading code statically and inferring behavior.
2. An honest LOW finding is more valuable than an inflated
HIGH finding.
3. The confidence basis must reference specific evidence or
specific uncertainty.
Confidence calibration lets the reviewer cross-reference severity against confidence. A CRITICAL-severity, HIGH-confidence finding needs a disposition decision. A CRITICAL-severity, LOW-confidence finding needs investigation first. This is the difference between "fix this" and "check whether this is actually a problem before deciding whether to fix it."
7The Hardened Workflow Sequence
With all remediations in place, the workflow gains verification, premise analysis, and re-grounding steps at each phase boundary. New steps are highlighted.
Phase 1: PLAN
Main Agent → Generate checklist from template + domain context
Main Agent → Capture objective as first line of plan.md ← NEW
Main Agent → Validator sub-agent → Return gap analysis
Main Agent → Merge gaps → Create execution plan
Phase 1→2: RE-GROUNDING GATE ← NEW
Main Agent → Re-read objective → Verify scope alignment
Phase 2: EXECUTE
Main Agent → Scanner sub-agent(s) → Return raw results
with REASONING CHAINS (per category) ← NEW
Main Agent → File existence check → Remove phantom findings ← NEW
Main Agent → Checkpoint-verifier → Spot-check evidence ← NEW
Phase 2.5: PREMISE ANALYSIS ← NEW
Main Agent → Premise analyst sub-agent
→ Read all reasoning chains from all scanner outputs
→ Produce premise-report.md (shared, contradictions,
ungrounded, dependencies)
Phase 2→3: RE-GROUNDING GATE ← NEW
Main Agent → Re-read objective → Verify findings serve goal
Main Agent → Writer sub-agent → Format findings
(with provenance + reasoning chains)
Phase 3: AUDIT
Main Agent → Auditor sub-agent → Return completeness report
→ Includes premise review (from premise-report.md) ← NEW
→ Includes goal-fidelity assessment ← NEW
→ Classifies gaps as SKIPPED | BLOCKED | EMPTY ← NEW
Phase 3.5: AUTO-COMPLETION (single pass) ← NEW
For each SKIPPED item:
→ Re-invoke scanner (with reasoning chain requirement)
→ Run checkpoint-verifier on new results
→ Run premise analyst incrementally on new chains
→ Merge into findings
Re-run auditor on updated findings
Phase 3→4: RE-GROUNDING GATE ← NEW
Main Agent → Re-read objective → Verify scope before implementation
Phase 4: IMPLEMENT (if applicable)
Human checkpoint → Approve/defer/reject findings
Main Agent → Implementer sub-agent → Make changes
Main Agent → State snapshot (before/after) ← NEW
Main Agent → Checkpoint-verifier → Verify implementation ← NEW
Main Agent → Verifier sub-agent → State-diff confirmation ← NEW
8Summary: All Architecture Changes
The total footprint across all four tiers: two new sub-agents, one new file, one new workflow phase, orchestration enhancements, and prompt additions to existing sub-agent types. No existing components are removed or restructured.
| Change | Type | Component | Addresses |
|---|---|---|---|
| TIER 1 — STRUCTURAL RISKS | |||
| checkpoint-verifier | New Sub-Agent | Between workflow phases | Error compounding, phantom grounding |
| Provenance fields | Template Change | findings.md template | Error compounding, no audit trail |
| Re-grounding gates | Orchestration | Phase transitions | Context decay, goal substitution |
| TIER 2 — SILENT CORRUPTION | |||
| Objective capture | Orchestration | plan.md first line | Goal substitution |
| Validator rule zero | Prompt Addition | All validator sub-agents | Goal substitution |
| Goal-fidelity assessment | Prompt Addition | Auditor sub-agent | Goal substitution |
| Reasoning chains | Prompt Addition | All scanner sub-agents | Invisible assumptions |
| premise-analyst | New Sub-Agent | Phase 2.5 (after verify, before write) | Invisible assumptions, coordination divergence |
| premise-report.md | New File | Per-review directory | Invisible assumptions |
| Verbatim evidence | Prompt Addition | All scanner sub-agents | Phantom grounding |
| File existence check | Orchestration | Between scanner and writer | Phantom grounding |
| First-principles check | Prompt Addition | All validator sub-agents | Completeness gaps |
| TIER 3 — DEGRADATION OVER TIME | |||
| Re-anchoring protocol | Orchestration | Orchestrator (periodic) | Behavioral drift |
| Completion criteria | Template Change | Checklist items | Premature convergence |
| Completion quality check | Prompt Addition | Auditor sub-agent | Premature convergence |
| Gap classification | Prompt Addition | Auditor sub-agent | Completeness gaps, premature convergence |
| Auto-completion loop | Orchestration | Phase 3.5 (after auditor) | Completeness gaps |
| Error recovery scaffolding | Orchestration | Orchestrator (on failure) | Recovery failure |
| Role boundary rules | Prompt Addition | Orchestrator | Role confusion |
| Scope boundary enforcement | Prompt Addition | Implementer + scanner sub-agents | Scope creep |
| Scope verification check | Orchestration | After each implementer invocation | Scope creep |
| Consistency reconciliation | Orchestration | After all scanners complete (requires: premise-report.md) | Coordination divergence |
| Irreversibility gate | Prompt Addition | Implementer sub-agent | Tool model mismatch |
| Destructive operation logging | Orchestration | Orchestrator (on destructive ops) | Tool model mismatch |
| TIER 4 — TRACTABLE BUT IMPORTANT | |||
| State snapshots | Orchestration | Before/after implementer | State confusion |
| State-diff verification | Prompt Addition | Verifier sub-agent | State confusion |
| Confidence field | Template Change | Findings template | Overconfidence uniformity |
| Confidence calibration rules | Prompt Addition | All scanner sub-agents | Overconfidence uniformity |
Updated Sub-Agent Archetypes
With the checkpoint-verifier and premise analyst added, the architecture has nine archetypes:
| Archetype | Purpose | Key Constraint | Changes in This Guide |
|---|---|---|---|
| Scanner | Fast, thorough exploration | Read-only; cannot modify | + reasoning chains, + verbatim evidence, + confidence calibration, + scope boundary rules |
| Validator | Adversarial gap-finding | Must find problems; forbidden from approving | + goal fidelity rule zero, + first-principles check |
| Writer | Consistent documentation | Enforces template structure | + provenance chain fields, + confidence field, + reasoning chains in output |
| Auditor | Completeness verification | Compares plan vs. actual work | + goal fidelity assessment, + premise review, + completion quality check, + gap classification |
| Triage Specialist | Classifies external tool output | Read-only; TRUE/FALSE/INVESTIGATE | Unchanged |
| Implementer | Makes targeted changes | Write access, scoped to one finding | + scope boundary rules, + irreversibility gate |
| Verifier | Confirms changes resolve issue | Read-only re-verification | + state-diff verification |
| Checkpoint-Verifier | Spot-checks evidence between phases | Read-only; must verify against source | NEW (Tier 1) |
| Premise Analyst | Cross-finding inference analysis | Read-only; analyzes reasoning, not findings | NEW (Tier 2) |
The existing architecture verifies evidence (checkpoint-verifier) and completeness (auditor) independently. Premise analysis adds independent verification of reasoning—the inferential step between evidence and conclusion. This closes the last major gap in the correctness control chain: you can now verify that the evidence is real, that the reasoning from evidence to conclusion is grounded, and that the conclusions serve the stated objective. Each verification is performed by a different sub-agent with a different analytical perspective.
Updated File Structure
project/
├── templates/
│ ├── CHECKLIST_TEMPLATE.md # + completion criteria per item
│ ├── PLAN_TEMPLATE.md # + OBJECTIVE as first line
│ └── FINDINGS_TEMPLATE.md # + provenance, confidence, reasoning chain
├── review-[date]/
│ ├── checklist.md
│ ├── plan.md
│ ├── findings.md
│ ├── premise-report.md ← NEW (replaces assumptions.md)
│ └── remediation-log.md # + state tracking per implementation
9Frequently Asked Questions
The checkpoint-verifier sits between workflow phases and spot-checks a sample of findings (3–5 per category) against the actual codebase. It goes to the cited file and line number, reads the actual code, and classifies each finding as VERIFIED, MISMATCH, NOT_FOUND, or OVERSTATED. If the verification rate drops below 80%, the orchestrator re-runs the scanner rather than proceeding with corrupted data. It runs between scanner and writer phases and again between auditor and implementer phases.
The self-reported assumptions log asks scanner sub-agents to report their own interpretive decisions. Three problems make this structurally weak. First, the most dangerous assumptions are ones the scanner doesn't recognize as assumptions—training-data priors that feel like "just knowing." Second, self-report is reflexive (what did I take for granted?) while models are better at forward reasoning (premise → conclusion), so reasoning chains produce better data. Third, individual scanner invocations can't see each other's premises, so cross-finding patterns like shared assumptions and contradictions are invisible to self-report. The premise analyst fills all three gaps.
Goal substitution occurs when the agent quietly replaces the assigned goal with a nearby easier one. The output has the right structure, uses the right vocabulary, and reads well—it just doesn't answer the question that was asked. The validator's "rule zero" and the auditor's goal-fidelity assessment catch this by explicitly checking whether checklist items and findings serve the stated objective.
Yes. The total additions are two new sub-agents (checkpoint-verifier and premise analyst), one new file (premise-report.md), one new workflow phase (Phase 3.5 auto-completion loop), prompt enhancements to existing sub-agent types, and orchestration prompt additions. No existing components need to be removed or restructured.
The checkpoint-verifier checks evidence—it goes to the cited file and confirms the code matches the finding. The premise analyst checks reasoning—it reads the inferential step between evidence and conclusion across all findings and identifies shared premises, contradictions, and ungrounded inferences. One verifies that what was found is real; the other verifies that the logic connecting observation to conclusion is sound.
The auto-completion loop (Phase 3.5) runs after the auditor classifies gaps as SKIPPED, BLOCKED, or EMPTY. For SKIPPED items only, it re-invokes the scanner, runs the checkpoint-verifier on new results, and merges verified findings. It runs exactly once because repeated auto-remediation risks the scanner producing lower-quality findings under implicit "you must find something" pressure—a subtle form of phantom grounding. One pass catches honest gaps; persistent gaps need human attention.
Sub-agent isolation already prevents drift within individual sub-agents. The remaining risk is in the orchestrator, which runs continuously. The re-anchoring protocol adds a periodic self-check every three sub-agent invocations: the orchestrator re-reads its instructions, reviews whether it accepted sub-par output, and checks whether it softened requirements. The check is lightweight enough to sustain across a full workflow without contributing to context decay.
Tool model mismatch occurs when the agent's understanding of how a tool works diverges from reality—using sed when an AST parser is needed, misinterpreting exit codes, or not understanding that a database migration is irreversible. It can't be fully prevented because no prompt engineering can give the model knowledge it doesn't have. The irreversibility gate changes the failure mode: instead of silent misuse, the agent either explains correctly (verifiable), reveals its misunderstanding (catchable), or escalates (safe).
Goal substitution is the agent answering a different, easier question than the one asked. Scope creep is the agent answering the right question plus a bunch of questions nobody asked—refactoring files it wasn't asked to touch, adding features beyond the specification. Goal substitution is caught by the validator's rule-zero fidelity check. Scope creep is prevented by scope boundary enforcement and caught after the fact by state-diff verification. Scope creep is particularly dangerous in implementation workflows where every unsolicited modification is an unreviewed change in production code.
Automation complacency is the human tendency to rubber-stamp agent output after repeated positive experiences. The architecture addresses it structurally: the checkpoint-verifier provides systematic review independent of human attention, provenance chains make evidential basis traceable, confidence calibration directs limited attention to where it matters, and the premise analyst's blast radius counts help the reviewer prioritize which assumptions to validate. These structural defenses exist precisely because human vigilance degrades over time.
The scanner may produce a finding from pattern matching and then construct a justification after the fact. The premise analyst partially catches this by flagging premises that cite broad standards without specific clauses. Even with rationalization, the aggregate patterns (shared premises, contradictions) emerge regardless of whether individual premises are genuine or rationalized, because they reflect what the scanner actually assumed even if the explanation is reconstructed.
LLM-general failure modes (phantom grounding, goal substitution, invisible assumptions, overconfidence uniformity, premature convergence, behavioral drift, scope creep) exist in any LLM interaction—the agentic context amplifies their consequences. Agentic-specific modes (error compounding, context decay, state confusion, recovery failure, role confusion, no audit trail, completeness gaps, coordination divergence, tool model mismatch) emerge from multi-step execution architecture. The distinction matters for mitigation: LLM-general modes need structural checks that catch inherent model limitations, while agentic-specific modes require architectural solutions in the orchestration layer. Both categories are addressed in this article.