What is premise analysis and why does it replace the assumptions log?

Premise analysis replaces the self-reported assumptions log with two mechanisms: reasoning chains on every scanner finding (a four-step structure: what was observed, what standard was compared against, why the comparison indicates a problem, and what project context is assumed) and a dedicated premise analyst sub-agent that reads all reasoning chains and produces a consolidated report identifying shared premises, contradictions, ungrounded inferences, and inter-finding dependencies. The self-reported log fails because the most dangerous assumptions are ones the scanner doesn't recognize as assumptions—training-data priors that feel like 'just knowing.' Reasoning chains surface these as a byproduct of explaining findings.

What is goal substitution and how does the validator's rule zero catch it?

Goal substitution occurs when the agent quietly replaces the assigned goal with a nearby easier one. The validator's rule zero runs before the completeness check: it re-reads the stated objective and verifies that every checklist item is an actionable check that directly serves that objective, not just a task to perform. A checklist item like 'Catalog all endpoints' is a task; 'For each endpoint, identify fields that would break existing consumers if changed' is a check. The auditor's goal-fidelity assessment provides a second layer by classifying each finding as RELEVANT, TANGENTIAL, or SUBSTITUTE.

What makes error compounding the most dangerous agentic AI failure mode?

Error compounding is multiplicative rather than additive. A small mistake in an early step becomes the foundation for every subsequent step. Downstream sub-agents build confidently on the error, and the agent's consistent tone masks the corruption. A key mechanism is handoff information loss: every time context passes between agents, compression strips uncertainty signals and caveats. The receiving agent gets conclusions without qualifications and treats them as ground truth. By the time the problem surfaces, the entire output is structurally unsound.

How does the irreversibility gate address tool model mismatch?

Tool model mismatch can't be fully prevented because no prompt engineering can give the model knowledge it doesn't have. The irreversibility gate changes the failure mode: before executing any destructive operation, the agent must articulate what it does, whether it's reversible, and the specific rollback procedure. The agent either explains its understanding correctly (giving the verifier something to check), reveals its misunderstanding (giving the human a signal to intervene), or correctly identifies uncertainty and escalates. All three outcomes are better than silent misuse.

What failure modes does the existing five-principle architecture already address?

The existing architecture effectively addresses five of the sixteen failure modes. Context decay is handled by externalizing state to files. Completeness gaps are covered by the checklist-validator-auditor triple layer. Behavioral drift is mitigated by isolating critical functions in dedicated sub-agents. Role confusion is solved by the seven archetypes with enforced tool restrictions. The audit trail is provided by the file pipeline. Three additional modes—scope creep, coordination divergence, and tool model mismatch—receive partial coverage. The remaining eight modes require the structural additions covered in this article.

Hardening Agentic Workflows: Structural Remediations for Silent AI Failure Modes

Q: What is the difference between the checkpoint-verifier and the premise analyst?

The checkpoint-verifier checks evidence—it goes to the cited file and line number and confirms the code matches what the finding describes. The premise analyst checks reasoning—it reads the inferential step between evidence and conclusion across all findings and identifies shared premises, contradictions, and ungrounded inferences. One verifies that what was found is real; the other verifies that the reasoning connecting what was found to what was concluded is sound. Each addresses a different category of silent corruption.

Q: How does confidence calibration improve human review?

Without confidence calibration, the agent presents well-grounded facts and speculative inferences with identical tone. Adding a confidence field (HIGH, MEDIUM, LOW, UNCERTAIN) with mandatory justification lets the reviewer cross-reference severity against confidence. A CRITICAL-severity, HIGH-confidence finding needs a disposition decision. A CRITICAL-severity, LOW-confidence finding needs investigation first. The calibration rules set a useful default: most findings should be MEDIUM, since static analysis is inherently inferential.

1Beyond the Obvious Five

The series Introduction catalogs sixteen failure modes that affect agentic AI workflows, organized into four priority tiers. The five-principle architecture and seven sub-agent archetypes from Parts 1–6 effectively address five of those modes: context decay, completeness gaps, behavioral drift, role confusion, and audit trail gaps. These are real and important—but when they occur, the agent produces obviously incomplete or inconsistent results.

The failures that should worry you more are the ones that produce output that looks correct on casual inspection but isn't. A goal-substituted review is thorough and well-organized; it just answers the wrong question. An error-compounded analysis is internally consistent; it just builds on a false premise. A phantom-grounded finding has the right format, cites a real-looking file path, and describes a plausible vulnerability—except the code doesn't actually do what the finding claims.

This matters even more because of a dynamic that no agent architecture can fully solve: automation complacency. The first time a developer reviews an agent's PR, they read every line. By the fiftieth, they're rubber-stamping. The consistent professional quality of agent output creates a false sense of reliability—and the silent corruption modes in Tiers 1 and 2 are precisely the failures that exploit this gap. Structural defenses—checkpoint verification, provenance chains, premise analysis—exist because human vigilance is an unreliable backstop, especially at scale.

This article provides concrete structural remediations across all four priority tiers. Every remediation is designed as a minimal addition to the existing architecture: new sub-agents, new files, orchestration patterns, and targeted prompt enhancements. No existing components need to be removed or restructured.

The Organizing Principle

The existing architecture has strong completeness controls (was everything done?) but weak correctness controls (was what was done actually right?) and no degradation controls (is the workflow getting worse as it runs?). The remediations across all four tiers close these gaps systematically—transforming every category of invisible failure into visible, reviewable artifacts.

2What the Existing Architecture Already Handles

The five-principle architecture from Parts 1–6 was designed to address the original five failure modes. It does so effectively—and its structural properties provide partial coverage against several of the additional eleven. The table below maps each failure mode to its current status.

Failure Mode	Tier	Existing Mitigation	Status
Context decay	1	Principle 1: externalize state to files; sub-agent isolation limits per-invocation context	✓ Addressed
No audit trail	1	Principle 4 + file pipeline (checklist → plan → findings → remediation-log)	✓ Addressed
Error compounding	1	Phase boundaries exist but no verification between phases	✗ Gap
Completeness gaps	2	Principles 2, 3, 5: checklist → validator → auditor triple layer	✓ Addressed
Goal substitution	2	Validator checks completeness but not fidelity to objective	✗ Gap
Invisible assumptions	2	No mechanism for surfacing interpretive decisions	✗ Gap
Phantom grounding	2	No evidence verification mechanism	✗ Gap
Behavioral drift	3	Sub-agent architecture isolates critical functions in fresh contexts	✓ Addressed
Role confusion	3	Seven archetypes with enforced tool restrictions	✓ Addressed
Premature convergence	3	Auditor sub-agent checks for incomplete items (partial)	◐ Partial
Recovery failure	3	Not addressed by existing architecture	✗ Gap
State confusion	4	Not addressed by existing architecture	✗ Gap
Overconfidence uniformity	4	Not addressed by existing architecture	✗ Gap
Scope creep	3	State snapshots catch effects; no preventive mechanism	◐ Partial
Coordination divergence	3	Orchestrator mediates all sub-agent outputs (serial execution)	◐ Partial
Tool model mismatch	3	Implementer-then-verifier catches effects; no preventive mechanism	◐ Partial

The Pattern

The gaps cluster in two places: the modes that produce plausible-looking output (Tier 2) and the modes that degrade silently over time (Tier 3). The architecture has strong completeness controls but weak correctness and degradation controls. The three newer Tier 3 modes—scope creep, coordination divergence, and tool model mismatch—are partially covered by existing structural patterns (state diffs, serial orchestration, verifier checks) but lack explicit preventive mechanisms. The remediations below address that asymmetry across all four tiers.

3Tier 1 Remediations: Structural Risks

Context decay and audit trail are already well-addressed. Error compounding is the Tier 1 gap. The current phased approach (PLAN → EXECUTE → AUDIT → IMPLEMENT) creates natural phase boundaries, but there is no mechanism to detect when an error in an early phase has silently corrupted everything downstream.

The specific problem: if the scanner misidentifies a code pattern, misreads a file, or fabricates a finding, the writer will document it as real, the auditor will see a checklist item with a corresponding finding and mark it complete, and the implementer may make a real code change based on a false premise. Every sub-agent downstream treats upstream output as ground truth. The existing auditor checks for completeness (was everything done?) but not for correctness (was what was done actually right?).

Three remediations address this. They also strengthen the existing audit trail and context decay mitigations.

Remediation 1: The Checkpoint-Verifier Sub-Agent

New Sub-Agent

This is the most impactful single addition to the architecture. The current workflow has a clean handoff from scanner → writer → auditor, but nothing verifies that the scanner's raw output is actually grounded in reality before the writer consumes it.

The checkpoint-verifier sits between phases and spot-checks upstream outputs against the actual codebase. After the scanner produces results for a category, the checkpoint-verifier takes a sample of findings (3–5), goes back to the cited file and line number, and confirms that what the scanner described actually exists there. It flags findings where the evidence doesn't match—wrong file, wrong line, described behavior that isn't present, or code that doesn't exist.

🔍

checkpoint-verifier

Model: claude-sonnet • Read-only • Evidence verification

Spot-checks upstream sub-agent outputs against the source of truth. Sits between workflow phases to catch phantom grounding, misreading, and error compounding before they propagate downstream.

Read Grep Glob

View Full Sub-Agent Definition

---
name: checkpoint-verifier
description: "Spot-checks upstream sub-agent outputs against source 
of truth. Read-only evidence verification."
tools: Read, Grep, Glob
model: sonnet
---
You are a fact-checker for AI-generated findings. Your job is to 
verify that findings are grounded in reality.

For each finding you are given:
1. Go to the cited file and line number
2. Read the actual code (10 lines of context minimum)
3. Compare what the finding claims against what actually exists
4. Classify as:
   - VERIFIED: Evidence matches the claim
   - MISMATCH: Code exists but doesn't match the description
   - NOT_FOUND: Cited file or line doesn't exist or contains 
     unrelated code
   - OVERSTATED: Issue exists but severity or impact is exaggerated

CRITICAL RULES:
1. You MUST check at least [N] findings per category
2. You are FORBIDDEN from assuming a finding is correct without 
   reading the cited file
3. For each finding checked, quote the ACTUAL code you found at 
   the cited location
4. If the actual code differs from the finding's description, 
   explain the discrepancy specifically

Report:
- Verification rate: [VERIFIED count] / [total checked]
- Each finding checked with classification and evidence
- List of MISMATCH and NOT_FOUND findings for removal
- Recommendation: PROCEED if verification rate >= 80%, 
  RE-SCAN if below 80%

The orchestration prompt inserts this between the scanner and writer phases, and again between the auditor and implementer phases. If the verification rate drops below 80% VERIFIED, the orchestrator re-runs the scanner for that category rather than proceeding with corrupted data.

Remediation 2: Provenance Chains in the Findings Template

Template Change

The current findings templates require evidence (file path, line number, code snippet), which is good. But they don't track which sub-agent produced the finding, what checklist item it traces to, and whether the evidence was independently verified. Adding a provenance section transforms the findings document from a flat list into a traceable chain. If an implementer makes a code change based on a finding that was never verified, that's visible. If a finding doesn't trace to any checklist item, the auditor can flag it as an orphan.

Add these fields to the bottom of every finding in the findings template:

**Provenance:**
- Source: [scanner | validator | triage | manual]
- Checklist item: [ID from checklist.md]
- Verified by: [checkpoint-verifier | not yet verified]
- Verification result: [VERIFIED | MISMATCH | NOT_FOUND | PENDING]

The cost is four extra lines per finding. The benefit is that error compounding becomes auditable rather than invisible—you can trace any downstream action back through the chain to its evidential foundation and identify where corruption entered.

Remediation 3: Re-Grounding Gates at Phase Boundaries

Orchestration Change

This addresses both error compounding and a subtle form of context decay: the orchestrator's progressive drift from the original task description. Over a long workflow with many sub-agent invocations, the orchestrator may lose sight of why the review was initiated—what the actual concern was, what the scope boundaries were, what matters most.

At each phase boundary (PLAN → EXECUTE, EXECUTE → AUDIT, AUDIT → IMPLEMENT), the orchestration prompt should explicitly re-inject the original task context—not a summary of what's been done so far, but the original request, scope definition, and priority criteria.

## Phase Transition Gate

Before proceeding to [NEXT PHASE]:
1. Re-read the original task description and scope constraints
2. Re-read checklist.md to confirm alignment with original objectives
3. Verify that findings so far are within the defined scope
4. If scope has drifted (findings about things not in the original 
   request), flag for review before proceeding
5. Summarize: "Original objective was X. Current state is Y. 
   Proceeding to [NEXT PHASE] because Z."

This is low-cost and serves double duty: it catches scope drift (a form of goal substitution) and it forces the orchestrator to articulate the connection between what it's done and what it was asked to do, making both error compounding and behavioral drift more visible.

4Tier 2 Remediations: Silent Corruption

Tier 2 failures are structurally different from Tier 1. Error compounding (Tier 1) is about propagation—a correct process operating on corrupted inputs. Tier 2 failures are about corruption at the source—the sub-agent itself produces output that looks right but isn't. The fixes live mostly inside individual sub-agent definitions and in the file contracts between them.

Remediation 4: Goal-Fidelity Checks

Prompt Addition — Validator Prompt Addition — Auditor Orchestration Change

Addresses: Goal Substitution

Goal substitution is the most insidious Tier 2 failure because the architecture's strongest existing defenses—checklists and validators—are orthogonal to it. The validator checks whether the checklist is complete (are there gaps?) but not whether it is faithful (does it actually address what was asked?). You can score perfectly on completeness and still have answered the wrong question.

A concrete example: an orchestration prompt says "review this API for backward compatibility risks." The scanner produces a thorough catalog of every endpoint, with schemas, naming conventions, and authentication patterns. The findings document is meticulous. The auditor confirms every checklist item has a corresponding finding. But the scanner cataloged rather than assessed risk—it described what exists rather than evaluating what would break for existing consumers. The output reads beautifully and is completely useless for the stated goal.

Addition to the Validator Sub-Agent Prompt

View Validator Rule Zero

0. GOAL FIDELITY CHECK — run this BEFORE reviewing for gaps.
   Re-read the stated objective (first line of plan.md).
   For every checklist item, verify it is an ACTIONABLE CHECK 
   that directly serves the stated objective.

   A checklist item like "Catalog all endpoints" is a TASK, 
   not a CHECK.
   A checklist item like "For each endpoint, identify request
   and response fields that would break existing consumers if 
   changed" is a CHECK.

   Flag any item that describes work to do without specifying 
   what question the work answers relative to the objective.

   This is your FIRST priority. A complete checklist that 
   doesn't serve the objective is worse than an incomplete 
   one that does.

Addition to the Auditor Sub-Agent Prompt

View Auditor Goal-Fidelity Section

## Goal Fidelity Assessment

Re-read the original task description (first line of plan.md).
For each finding, answer: does this finding help a human make 
a decision about the stated objective?

- RELEVANT: Finding directly addresses the objective
- TANGENTIAL: Finding is true but doesn't help with the 
  stated goal
- SUBSTITUTE: Finding answers a different, easier question 
  than the one asked

If more than 20% of findings are TANGENTIAL or SUBSTITUTE, 
flag the entire review as potentially goal-substituted and 
recommend re-running the scanner with tighter scoping.

Orchestration Change: Capture the Objective

The orchestration prompt should capture the objective as the literal first line of plan.md in a standardized format:

OBJECTIVE: [exact text from the original request]

Currently the plan template tracks execution status but doesn't preserve the original intent in a way that's easy for sub-agents to reference. This one-line change closes that gap.

Remediation 5: Premise Analysis

New Sub-Agent — Premise Analyst Prompt Addition — Scanners New File — premise-report.md

Addresses: Invisible Assumptions

This is the Tier 2 failure where the existing architecture has the biggest structural gap. Checklists track what to check. Plans track execution status. Findings track results. But nowhere in the pipeline does the agent record the reasoning that connects evidence to conclusion—and the premises embedded in that reasoning are where the most dangerous assumptions hide.

An early version of this remediation used a self-reported assumptions log: scanners were instructed to log ambiguities as they noticed them. This approach is structurally flawed. The most dangerous assumptions are ones the scanner doesn't recognize as assumptions—training-data priors that feel like "just knowing" rather than "deciding." A scanner that sees execute_query(user_input) and flags it as a SQL injection risk is applying a training-data prior, not a deliberate standard. It won't log this as an assumption because from its perspective, it's not one.

The solution has two parts: reasoning chains on every finding and a premise analyst sub-agent that extracts premises across all findings.

Part A: Reasoning Chain Requirement for Scanners

Instead of asking the scanner to self-report assumptions as a separate activity, require it to show its reasoning as part of every finding. This is a more natural task—models are trained to explain reasoning, and chain-of-thought prompting is one of the most reliable techniques for improving output quality. The assumptions emerge as a byproduct of producing better-reasoned findings.

View Reasoning Chain Requirement

REASONING CHAIN — required for EVERY finding.

After recording the finding with its evidence and 
provenance, provide your reasoning in four steps:

STEP 1 — OBSERVATION: What I Saw
What specific code, configuration, pattern, or behavior 
did I observe? Quote or reference the exact evidence.
This should match the provenance chain.

STEP 2 — STANDARD: What I Compared Against
What standard, convention, best practice, or expectation 
am I using to evaluate this observation? Be specific.
- If it's from the checklist, cite the checklist item.
- If it's a published standard, name it (OWASP, CWE, 
  RFC, framework docs).
- If it's a general practice, say "common practice" 
  and describe what the practice is.
- If you're not sure where you learned it, say that.

STEP 3 — INFERENCE: Why This Is a Problem
How does the gap between observation (Step 1) and 
standard (Step 2) constitute a problem? What could 
go wrong? Be specific about the failure mode, not 
just "this is a security risk."

STEP 4 — CONTEXT ASSUMED: What Must Be True
What am I assuming about this project that makes 
Steps 1–3 valid?
- Framework or language version
- Deployment environment (public-facing, internal)
- User trust level (authenticated, anonymous, admin)
- Architecture (monolith, microservices, serverless)
- Anything else that, if wrong, would invalidate 
  this finding

CRITICAL: Do not skip Step 2. "It's a well-known 
vulnerability" is not a standard — name the specific 
standard or practice you're applying.

Part B: The Premise Analyst Sub-Agent

Reasoning chains give you premises per finding. The premise analyst gives you premises across findings—shared assumptions, contradictions, and dependencies that no individual scanner invocation can see because each runs in an isolated context.

🔬

premise-analyst

Model: claude-sonnet • Read-only • Cross-finding inference analysis

Reads all reasoning chains from all scanner invocations. Produces a consolidated premise report identifying shared premises, contradictions, ungrounded inferences, and inter-finding dependencies. Replaces the self-reported assumptions log.

Read: all scanner outputs Read: checklist.md, plan.md Write: premise-report.md

View Full Sub-Agent Definition

ROLE: Premise Analyst
PURPOSE: Extract and cross-reference the premises 
underlying all findings to identify patterns that 
individual scanner invocations cannot see.

INPUT: All scanner outputs with reasoning chains, 
checklist.md, plan.md (for OBJECTIVE line)

OUTPUT: premise-report.md with four sections:

## 1. Shared Premises
Premises that appear in reasoning chains of multiple 
findings. For each:
- The premise (stated once)
- Finding IDs that depend on this premise
- Blast radius: how many findings become invalid 
  if this premise is wrong
- Confidence: grounded in evidence, or inferred?

## 2. Contradictory Premises
Cases where different scanner invocations used 
contradictory premises. For each:
- Both premises stated
- Which findings depend on each side
- Which premise is better supported by evidence

## 3. Ungrounded Premises
Findings where Step 2 cites vague authority 
("common practice," "security best practice") 
without a specific standard. For each:
- The premise as stated
- Finding IDs affected
- Suggested specific standard (if you know one)

## 4. Inter-Finding Dependencies
Cases where one finding's conclusion is used as 
another finding's premise. For each:
- Upstream finding ID and its conclusion
- Downstream finding ID and how it uses that 
  conclusion
- Whether the upstream finding was verified

CRITICAL: You analyze reasoning. You do NOT modify 
findings. You do NOT validate evidence (that's the 
checkpoint-verifier's job). You read reasoning 
chains and identify patterns across them.

The premise analyst runs as Phase 2.5—after all scanners and the checkpoint-verifier have completed, but before the writer formats findings. The blast radius counts in the premise report are the most actionable piece: a shared premise with a blast radius of 8 is a single point of failure for a large fraction of the review's findings. If the human reviewer has fifteen minutes, validating one high-blast-radius premise is a better use of time than reading eight individual findings.

Why This Is Better Than Self-Report

The old instruction said: "When you encounter ambiguity, log it." This requires the scanner to recognize ambiguity, decide it's worth logging, and formulate the entry as a separate activity. The reasoning chain says: "Explain why you think this is a problem." The assumptions emerge as a byproduct of explaining the reasoning, not as a separate introspective task. And the premise analyst's cross-finding analysis—shared premises, contradictions, dependency chains—is structurally impossible for any individual scanner invocation to perform.

Remediation 6: Evidence Verification for Phantom Grounding

Prompt Addition — Scanners Orchestration Change

Addresses: Phantom Grounding

The checkpoint-verifier (Remediation 1) catches phantom grounding between phases, but there's a cheaper first line of defense: make the scanner's own output format harder to fabricate. Currently the scanner reports findings with a file path and line number, which is easy for a model to produce even when it hasn't actually read the file. Adding a requirement to include a verbatim code quote changes the economics—the model has to either actually read the file (correct behavior) or fabricate plausible-looking code (harder and more likely to be caught).

View Scanner Evidence Requirements

EVIDENCE REQUIREMENTS:
For every finding, you MUST include:
- Exact file path (verify by re-reading the file before 
  reporting)
- Line number range (not a single line — the range 
  containing the relevant code)
- VERBATIM code quote: Copy the exact lines from the file. 
  Do not paraphrase or summarize code. If you cannot quote 
  the exact code, you do not have sufficient evidence for 
  the finding.

Before finalizing your results for a category, re-read at 
least 3 of the files you've cited to verify your line 
numbers are still accurate.

Orchestration Change: File Existence Verification

The orchestrator should verify that cited files exist before passing scanner results to the writer:

After receiving scanner results for each category:
1. Extract all file paths cited in findings
2. Verify each file exists (ls or glob)
3. If any cited file does not exist, remove the finding 
   and log it as PHANTOM in premise-report.md
4. Only pass verified findings to the writer sub-agent

This won't catch findings where the file exists but the described behavior is wrong—that's the checkpoint-verifier's job—but it eliminates the cheapest fabrications at near-zero cost.

Remediation 7: First-Principles Completeness Check

Prompt Addition — Validator

Addresses: Completeness Gaps (enhancement)

Completeness gaps are already the best-addressed failure mode in the architecture. The residual risk is that all three layers share the same blind spot: they can only check for categories that someone thought to include. Add a "from first principles" section to the validator prompt that runs after the standards-based review:

View First-Principles Check

FIRST-PRINCIPLES CHECK — run AFTER the standards-based review.

Ignore the checklist entirely. Look at the actual codebase 
or system under review.
Ask: "If I were building this from scratch, what could go 
wrong that would be embarrassing or dangerous?"

List anything that comes to mind that is NOT already on the 
checklist. These are your most valuable findings — the things 
no standard thought to include because they're specific to 
this particular codebase.

This creates a second cognitive path that isn't anchored on 
the canonical checklist.

This is a one-paragraph addition. It creates a second cognitive path—one that isn't anchored on the canonical checklist—which is often enough to catch the "obvious in hindsight" categories that standards-based reviews miss.

5Tier 3 Remediations: Degradation Over Time

Tier 3 failures are structurally different from Tiers 1 and 2. They're not about corrupted evidence or wrong objectives—they're about the workflow gradually losing effectiveness as it runs. Four of the seven Tier 3 modes—behavioral drift, role confusion, coordination divergence, and tool model mismatch—are already substantially or partially addressed by the existing architecture. The remediations here target residual risk and provide full solutions for the modes that aren't yet addressed.

Remediation 8: Orchestrator Re-Anchoring Protocol

Orchestration Change

Addresses: Behavioral Drift (residual risk in the orchestrator)

Sub-agent isolation already prevents behavioral drift within individual sub-agents. The remaining risk is in the orchestrator itself, which runs continuously across the entire workflow. Over a long session with many sub-agent invocations, the orchestrator may progressively soften quality requirements, accept marginal output it would have rejected earlier, or skip steps it considers "probably fine."

View Orchestrator Re-Anchoring Protocol

DRIFT PREVENTION — PERIODIC SELF-CHECK

Every 3 sub-agent invocations, pause and run this check:

1. Re-read the OBJECTIVE line from plan.md
2. Re-read your orchestration instructions (this prompt)
3. Review the last 3 sub-agent outputs you accepted:
   - Did you accept any output that was incomplete or 
     below the quality bar?
   - Did you skip any verification step because you 
     assumed correctness?
   - Did you soften any requirement from the original 
     instructions?

4. If the answer to any of the above is YES:
   - Log what you softened and why in premise-report.md
   - Re-read the full orchestration prompt before 
     proceeding
   - Consider re-running the sub-agent whose output 
     you accepted too readily

5. If the answer to all is NO, proceed normally.

This check takes ~30 seconds. The cost of drift is 
re-running the entire workflow.

The key design choice is making this periodic (every N invocations) rather than continuous. Continuous self-monitoring would consume context budget and actually contribute to context decay.

Remediation 9: Definition-of-Done Completion Criteria

Template Change — Checklist Prompt Addition — Auditor

Addresses: Premature Convergence

The current auditor check is binary: is there a finding for this checklist item, yes or no? A single shallow finding satisfies the check even if the item required thorough investigation across multiple files. Add explicit, testable completion criteria to each checklist item:

## [SEC-03] Error Handling Review
- **Check:** Verify all API endpoints handle errors without 
  leaking internal state
- **Completion criteria:**
  1. Every controller/handler file was opened and read
  2. Each endpoint's error path was examined (not just happy path)
  3. Finding includes count: "Reviewed N of M endpoints"
  4. Any endpoint NOT reviewed is listed with reason
- **Done when:** All criteria met, or gaps explicitly 
  logged as PARTIAL with justification

View Auditor Completion-Criteria Verification

COMPLETION QUALITY CHECK — run for EVERY checklist item.

Do not just check whether a finding exists for this item.
Check whether the finding SATISFIES THE COMPLETION CRITERIA.

For each checklist item:
1. Read the completion criteria
2. Read the corresponding finding(s)
3. Verify each criterion is met
4. Classify as:
   - COMPLETE: All criteria satisfied
   - PARTIAL: Some criteria met, gaps acknowledged
   - SHALLOW: Finding exists but doesn't meet criteria
   - MISSING: No finding at all

SHALLOW is worse than MISSING — it creates false 
confidence. Flag all SHALLOW items for re-scanning.

GAP CLASSIFICATION — for MISSING and SHALLOW items:

   - SKIPPED: No scanner invocation was attempted. 
     → Safe for auto-completion (re-invoke scanner)
   
   - BLOCKED: Scanner was invoked but failed.
     → NOT safe for auto-completion (will fail again)
   
   - EMPTY: Scanner ran and genuinely found nothing. 
     → NOT safe for auto-completion (risks fabrication)

This classification feeds the auto-completion loop: 
only SKIPPED items are re-scanned automatically.

Why SHALLOW Is Worse Than MISSING

A MISSING item is visible—the audit report flags it, and the human reviewer knows to investigate. A SHALLOW item looks like it was completed. The finding exists, it references the right category, and a casual reviewer will skip past it. The damage is false confidence: the team believes error handling was reviewed when really one of forty endpoints was checked.

The Auto-Completion Loop

The gap classification enables Phase 3.5: for each SKIPPED item, the orchestrator automatically re-invokes the scanner, runs the checkpoint-verifier on new results, and merges verified findings. The loop runs exactly once. Repeated auto-remediation risks the scanner producing lower-quality findings under implicit "you must find something" pressure—a subtle form of phantom grounding.

Remediation 10: Error Recovery Scaffolding

Orchestration Change

Addresses: Recovery Failure

This is the Tier 3 failure mode with no existing mitigation at all. When the orchestrator hits an error, it either loops on the same failing approach or silently skips the problematic step. Both are bad.

View Error Recovery Scaffolding

ERROR RECOVERY PROTOCOL

When a sub-agent fails or returns unusable output:

ATTEMPT 1 — Retry with the same approach
- Re-run the sub-agent with the same inputs
- Some failures are transient

ATTEMPT 2 — Retry with variation
- Reduce scope (scan fewer files per invocation)
- Change granularity (scan by directory instead of 
  by category)
- Simplify the task (split complex checklist items)
- Log what you changed and why

ATTEMPT 3 — Abandon and log
- Log the failure in the audit report:
  Status: BLOCKED
  Checklist item: [ID]
  Approach 1: [what was tried, what failed]
  Approach 2: [what was tried, what failed]
  Recommendation: Manual review required
- Proceed to the next checklist item

CRITICAL RULES:
1. NEVER try the same failing approach more than twice
2. NEVER silently skip a failed item — it MUST appear 
   as BLOCKED in the audit report
3. NEVER let a single failure block the entire workflow
4. After 3 BLOCKED items in the same category, pause 
   and flag for human review

The key principle: visible failure is better than invisible skipping. A BLOCKED item in the audit report gets human attention. A silently skipped item gets nothing.

Remediation 11: Orchestrator Role Boundaries

Prompt Addition — Orchestrator

Addresses: Role Confusion (residual risk in orchestrator)

The eight archetypes with enforced tool restrictions make it structurally impossible for a scanner to write files or for a validator to approve work. The residual risk is in the orchestrator, which coordinates across all phases and has access to all tools. Under pressure, the orchestrator may start doing sub-agent work itself.

View Orchestrator Role Boundary Rules

ROLE BOUNDARIES — WHAT THE ORCHESTRATOR DOES AND DOES NOT DO

The orchestrator COORDINATES. It does not EXECUTE.

YOU DO:
- Invoke sub-agents with appropriate inputs
- Pass outputs between sub-agents
- Track progress against the plan
- Make sequencing decisions (what runs next)
- Run lightweight verification (file existence checks)
- Apply the error recovery protocol
- Manage re-grounding gates at phase transitions

YOU DO NOT:
- Read source code files to assess them yourself
  (that's the scanner's job)
- Decide whether a finding is valid or invalid
  (that's the checkpoint-verifier's job)
- Identify gaps in the checklist yourself
  (that's the validator's job)
- Modify findings content
  (that's the writer's job)
- Make implementation decisions
  (that's the human's job)

If you find yourself reading source code for any reason 
other than passing it to a sub-agent, STOP. You are 
doing a sub-agent's job.

Remediation 12: Scope Boundary Enforcement

Prompt Addition — Implementer Prompt Addition — Scanners Orchestration Change

Addresses: Scope Creep

Scope creep is the structural mirror of premature convergence: where premature convergence is the agent being too lazy, scope creep is the agent being too diligent. An implementer asked to fix a null pointer check also refactors the surrounding function "while it's in there." In read-only workflows, scope creep wastes tokens. In write workflows, it modifies files outside the agent's mandate—potentially breaking things that were working fine.

View Implementer Scope Boundary Rules

SCOPE BOUNDARIES — STAY IN YOUR LANE

You are authorized to modify ONLY the files and code 
regions specified in the finding you were given.

BEFORE making any change, verify:
1. Is this file listed in the finding? If NO → STOP.
2. Is this code region within the line range cited? 
   If NO → STOP.
3. Is this change directly required to resolve the 
   finding? If NO → STOP.

If you notice issues OUTSIDE the current finding's scope:
- Do NOT fix them
- Do NOT refactor adjacent code
- DO log them as a brief note:
  "NOTICED: [file:line] [brief description] — out of 
  scope for current finding, recommend separate review"

Your job is the scoped fix and ONLY the scoped fix.

View Scanner Scope Boundary Rules

SCOPE BOUNDARIES — STAY ON TASK

Report ONLY findings that address the checklist item 
you were invoked for. 

If you notice issues outside the current scope:
- Do NOT investigate them
- Do NOT include them in your main findings
- DO log them in a DEFERRED section at the end:
  
  ## DEFERRED — Out of Scope Observations
  - [file:line] [brief description] [which checklist 
    item it might relate to]

The orchestrator will route deferred items to the 
appropriate scanner invocation.

Orchestration Change: Scope Verification

After each implementer invocation:
1. Compare files modified against files listed in the finding
2. If any file was modified that is NOT in the finding:
   - Flag as SCOPE_CREEP in the remediation log
   - Revert the out-of-scope change if possible
   - Log the change for human review
3. Only pass in-scope changes to the verifier

Remediation 13: Cross-Scanner Consistency Reconciliation

Orchestration Change

Addresses: Coordination Divergence

The premise analyst (Remediation 5) detects contradictions across scanner reasoning chains—but it doesn't resolve them. Detection and resolution are deliberately separated: the premise analyst is a read-only analytical agent; the orchestrator is where operational decisions happen. Without an explicit protocol for acting on the premise analyst's findings, the orchestrator will do what LLMs do by default—quietly pick whichever premise appeared most recently and move on, merging contradictory scanner outputs without noticing the conflict.

This reconciliation step is the orchestrator's action protocol for premise-level contradictions. It depends on premise-report.md as its primary input and cannot function without it. But it also adds an independent check that the premise analyst cannot perform: entity-level consistency across scanner outputs. When three scanners each analyze a different category of the same codebase, they may diverge not just on articulated premises but on operational facts that never surface in reasoning chains—one scanner treating a component as stateless while another assumes it maintains session state, or one scanner identifying an endpoint as internal while another's findings assume it's public-facing. These sub-reasoning-chain divergences won't appear in the Contradictory Premises section because they were never stated as premises in the first place.

View Cross-Scanner Consistency Check

CONSISTENCY RECONCILIATION — RUN AFTER ALL SCANNERS COMPLETE

Before passing scanner results to the writer:

1. Read premise-report.md and check the Contradictory 
   Premises section
2. For each contradiction identified by the premise analyst:
   - Determine which premise is better supported
   - Flag affected findings as DEPENDS_ON_RECONCILIATION
   - Include the conflict in the auditor's review scope
3. If no contradictions: proceed normally

The premise analyst identifies the contradictions; 
the orchestrator decides how to handle them.

The protocol above handles contradictions the premise analyst surfaces. The following lightweight check addresses the residual gap—divergences in how scanners treat shared entities that never rise to the level of articulated premises.

View Entity-Level Consistency Check

ENTITY CONSISTENCY CHECK — RUN ALONGSIDE PREMISE RECONCILIATION

Before passing scanner results to the writer, compare 
how different scanner invocations characterize shared 
components:

1. Extract key entities referenced by multiple scanners
   (endpoints, services, data stores, modules, external 
   dependencies)
2. For each shared entity, check whether scanners agree on:
   - What it is (library vs. service, stateless vs. stateful)
   - How it's accessed (internal vs. public-facing)
   - What trust level it operates at
   - Whether it handles sensitive data
3. If characterizations conflict:
   - Flag affected findings as DEPENDS_ON_RECONCILIATION
   - Note: these conflicts will NOT appear in 
     premise-report.md because they were never stated 
     as explicit premises
   - Include the conflict in the auditor's review scope

This check gives the reconciliation step independent 
value beyond acting on the premise analyst's output.

Remediation 14: Irreversibility Gates

Prompt Addition — Implementer Orchestration Change

Addresses: Tool Model Mismatch

Tool model mismatch is the hardest Tier 3 failure to prevent proactively because you can't prompt-engineer away a competence gap. The most effective structural defense: before executing any destructive or hard-to-reverse operation, the agent must articulate what the operation does, whether it's reversible, and what the rollback procedure is.

View Irreversibility Gate

IRREVERSIBILITY GATE — DESTRUCTIVE OPERATIONS

Before executing ANY operation that modifies persistent 
state, answer these questions:

1. WHAT does this operation do? 
   (Describe in plain language)
2. IS this operation reversible?
   - YES → proceed, but note the rollback command
   - PARTIALLY → proceed with caution, document what 
     can't be rolled back
   - NO → STOP. Do not execute without human approval.
3. WHAT is the rollback procedure?
   (Specific commands, not "undo the change")
4. WHAT happens if this operation fails midway?

Operations that ALWAYS require this gate:
- Database migrations (CREATE, ALTER, DROP)
- File deletions (rm, unlink)
- Git operations that rewrite history
- Package or dependency changes affecting lockfiles
- Infrastructure changes (container configs, CI/CD)
- Permission or access control changes

If you cannot articulate the rollback procedure for 
an irreversible operation, log it as NEEDS_HUMAN_REVIEW 
and move to the next finding. Do NOT guess.

The Competence Gap Problem

Tool model mismatch is fundamentally different from the other failure modes because no amount of architectural design can give the model knowledge it doesn't have. What the irreversibility gate does is change the failure mode: instead of confidently executing a wrong command, the agent either (a) explains its understanding correctly, giving the verifier something to check, (b) reveals its misunderstanding, giving the human a signal to intervene, or (c) correctly identifies uncertainty and escalates. All three outcomes are better than silent misuse.

6Tier 4 Remediations: Tractable but Important

Tier 4 contains two failure modes that are real and worth fixing but fundamentally more tractable than the higher tiers. State confusion is a bookkeeping problem with a bookkeeping solution. Overconfidence uniformity is a communication problem with a formatting solution.

Remediation 15: State Snapshots and Diff Verification

Orchestration Change Prompt Addition — Verifier

Addresses: State Confusion

The current implementer-then-verifier pattern partially addresses state confusion: the verifier re-reads the modified file. But the verifier doesn't know what the file looked like before the change, so it can't detect unintended side effects.

Before invoking the implementer for each finding:
1. Read the target file(s) and save content as BEFORE_STATE
2. Invoke the implementer with scoped instructions
3. After implementer completes, read the same file(s) as AFTER_STATE
4. Pass BEFORE_STATE and AFTER_STATE to the verifier

View Verifier State-Diff Check

STATE-DIFF VERIFICATION

You will receive BEFORE_STATE and AFTER_STATE for each 
implementation. Verify that:

1. INTENDED CHANGE APPLIED:
   - The specific fix was implemented correctly
   - The fix addresses the root cause, not just the symptom

2. NO UNINTENDED CHANGES:
   - Compare BEFORE_STATE and AFTER_STATE line by line
   - Flag ANY change not explained by the implementation:
     - COSMETIC: Whitespace, formatting (low risk)
     - REFACTOR: Code restructuring beyond fix (medium risk)
     - FUNCTIONAL: Logic changes beyond fix (high risk)
     - SCOPE CREEP: Changes to unmentioned code (high risk)

3. STATE CONSISTENCY:
   - Does the file still parse/compile after changes?
   - Are imports, dependencies, and references intact?

Report:
- Intended change: APPLIED | PARTIALLY APPLIED | NOT APPLIED
- Unintended changes: NONE | COSMETIC ONLY | FLAGGED [list]
- State consistency: CONSISTENT | ISSUES [list]

Remediation 16: Confidence Calibration

Template Change — Findings Prompt Addition — Scanners

Addresses: Overconfidence Uniformity

When every finding is presented with identical confidence, the human reviewer has no signal for where to apply scrutiny. Add a confidence field to each finding:

**Confidence:** [HIGH | MEDIUM | LOW | UNCERTAIN]
**Confidence basis:** [one-sentence justification]

View Scanner Confidence Calibration Rules

CONFIDENCE CALIBRATION

For every finding, assign a confidence level and explain why.

HIGH — You read the exact code and it clearly exhibits the 
described behavior. The pattern is unambiguous.

MEDIUM — The code likely exhibits the behavior but context 
could change the interpretation. You checked the file but 
the relevant code spans multiple files you may not have 
fully traced.

LOW — You're inferring from indirect signals (naming, file 
structure, comments). The finding depends on runtime 
behavior you can't observe from static analysis.

UNCERTAIN — You're flagging something unusual but genuinely 
don't know if it's a problem. You want human attention but 
aren't making a claim.

CRITICAL RULES:
1. Do NOT default to HIGH. Most findings should be MEDIUM — 
   you are reading code statically and inferring behavior.
2. An honest LOW finding is more valuable than an inflated 
   HIGH finding.
3. The confidence basis must reference specific evidence or 
   specific uncertainty.

Interaction with Human Review

Confidence calibration lets the reviewer cross-reference severity against confidence. A CRITICAL-severity, HIGH-confidence finding needs a disposition decision. A CRITICAL-severity, LOW-confidence finding needs investigation first. This is the difference between "fix this" and "check whether this is actually a problem before deciding whether to fix it."

7The Hardened Workflow Sequence

With all remediations in place, the workflow gains verification, premise analysis, and re-grounding steps at each phase boundary. New steps are highlighted.

1: PLAN

→

2: SCAN

→

2a: VERIFY

→

2.5: PREMISES

→

2b: WRITE

→

3: AUDIT

→

3.5: AUTO-COMP

→

4: IMPLEMENT

Phase 1: PLAN
  Main Agent → Generate checklist from template + domain context
  Main Agent → Capture objective as first line of plan.md           ← NEW
  Main Agent → Validator sub-agent → Return gap analysis
  Main Agent → Merge gaps → Create execution plan

Phase 1→2: RE-GROUNDING GATE                                       ← NEW
  Main Agent → Re-read objective → Verify scope alignment

Phase 2: EXECUTE
  Main Agent → Scanner sub-agent(s) → Return raw results 
               with REASONING CHAINS (per category)                 ← NEW
  Main Agent → File existence check → Remove phantom findings       ← NEW
  Main Agent → Checkpoint-verifier → Spot-check evidence            ← NEW

Phase 2.5: PREMISE ANALYSIS                                         ← NEW
  Main Agent → Premise analyst sub-agent
    → Read all reasoning chains from all scanner outputs
    → Produce premise-report.md (shared, contradictions, 
      ungrounded, dependencies)

Phase 2→3: RE-GROUNDING GATE                                       ← NEW
  Main Agent → Re-read objective → Verify findings serve goal

  Main Agent → Writer sub-agent → Format findings 
               (with provenance + reasoning chains)

Phase 3: AUDIT
  Main Agent → Auditor sub-agent → Return completeness report
    → Includes premise review (from premise-report.md)              ← NEW
    → Includes goal-fidelity assessment                             ← NEW
    → Classifies gaps as SKIPPED | BLOCKED | EMPTY                  ← NEW

Phase 3.5: AUTO-COMPLETION (single pass)                            ← NEW
  For each SKIPPED item:
    → Re-invoke scanner (with reasoning chain requirement)
    → Run checkpoint-verifier on new results
    → Run premise analyst incrementally on new chains
    → Merge into findings
  Re-run auditor on updated findings

Phase 3→4: RE-GROUNDING GATE                                       ← NEW
  Main Agent → Re-read objective → Verify scope before implementation

Phase 4: IMPLEMENT (if applicable)
  Human checkpoint → Approve/defer/reject findings
  Main Agent → Implementer sub-agent → Make changes
  Main Agent → State snapshot (before/after)                        ← NEW
  Main Agent → Checkpoint-verifier → Verify implementation          ← NEW
  Main Agent → Verifier sub-agent → State-diff confirmation         ← NEW

8Summary: All Architecture Changes

The total footprint across all four tiers: two new sub-agents, one new file, one new workflow phase, orchestration enhancements, and prompt additions to existing sub-agent types. No existing components are removed or restructured.

Change	Type	Component	Addresses
TIER 1 — STRUCTURAL RISKS
checkpoint-verifier	New Sub-Agent	Between workflow phases	Error compounding, phantom grounding
Provenance fields	Template Change	findings.md template	Error compounding, no audit trail
Re-grounding gates	Orchestration	Phase transitions	Context decay, goal substitution
TIER 2 — SILENT CORRUPTION
Objective capture	Orchestration	plan.md first line	Goal substitution
Validator rule zero	Prompt Addition	All validator sub-agents	Goal substitution
Goal-fidelity assessment	Prompt Addition	Auditor sub-agent	Goal substitution
Reasoning chains	Prompt Addition	All scanner sub-agents	Invisible assumptions
premise-analyst	New Sub-Agent	Phase 2.5 (after verify, before write)	Invisible assumptions, coordination divergence
premise-report.md	New File	Per-review directory	Invisible assumptions
Verbatim evidence	Prompt Addition	All scanner sub-agents	Phantom grounding
File existence check	Orchestration	Between scanner and writer	Phantom grounding
First-principles check	Prompt Addition	All validator sub-agents	Completeness gaps
TIER 3 — DEGRADATION OVER TIME
Re-anchoring protocol	Orchestration	Orchestrator (periodic)	Behavioral drift
Completion criteria	Template Change	Checklist items	Premature convergence
Completion quality check	Prompt Addition	Auditor sub-agent	Premature convergence
Gap classification	Prompt Addition	Auditor sub-agent	Completeness gaps, premature convergence
Auto-completion loop	Orchestration	Phase 3.5 (after auditor)	Completeness gaps
Error recovery scaffolding	Orchestration	Orchestrator (on failure)	Recovery failure
Role boundary rules	Prompt Addition	Orchestrator	Role confusion
Scope boundary enforcement	Prompt Addition	Implementer + scanner sub-agents	Scope creep
Scope verification check	Orchestration	After each implementer invocation	Scope creep
Consistency reconciliation	Orchestration	After all scanners complete (requires: premise-report.md)	Coordination divergence
Irreversibility gate	Prompt Addition	Implementer sub-agent	Tool model mismatch
Destructive operation logging	Orchestration	Orchestrator (on destructive ops)	Tool model mismatch
TIER 4 — TRACTABLE BUT IMPORTANT
State snapshots	Orchestration	Before/after implementer	State confusion
State-diff verification	Prompt Addition	Verifier sub-agent	State confusion
Confidence field	Template Change	Findings template	Overconfidence uniformity
Confidence calibration rules	Prompt Addition	All scanner sub-agents	Overconfidence uniformity

Updated Sub-Agent Archetypes

With the checkpoint-verifier and premise analyst added, the architecture has nine archetypes:

Archetype	Purpose	Key Constraint	Changes in This Guide
Scanner	Fast, thorough exploration	Read-only; cannot modify	+ reasoning chains, + verbatim evidence, + confidence calibration, + scope boundary rules
Validator	Adversarial gap-finding	Must find problems; forbidden from approving	+ goal fidelity rule zero, + first-principles check
Writer	Consistent documentation	Enforces template structure	+ provenance chain fields, + confidence field, + reasoning chains in output
Auditor	Completeness verification	Compares plan vs. actual work	+ goal fidelity assessment, + premise review, + completion quality check, + gap classification
Triage Specialist	Classifies external tool output	Read-only; TRUE/FALSE/INVESTIGATE	Unchanged
Implementer	Makes targeted changes	Write access, scoped to one finding	+ scope boundary rules, + irreversibility gate
Verifier	Confirms changes resolve issue	Read-only re-verification	+ state-diff verification
Checkpoint-Verifier	Spot-checks evidence between phases	Read-only; must verify against source	NEW (Tier 1)
Premise Analyst	Cross-finding inference analysis	Read-only; analyzes reasoning, not findings	NEW (Tier 2)

Design Principle

The existing architecture verifies evidence (checkpoint-verifier) and completeness (auditor) independently. Premise analysis adds independent verification of reasoning—the inferential step between evidence and conclusion. This closes the last major gap in the correctness control chain: you can now verify that the evidence is real, that the reasoning from evidence to conclusion is grounded, and that the conclusions serve the stated objective. Each verification is performed by a different sub-agent with a different analytical perspective.

Updated File Structure

project/
├── templates/
│   ├── CHECKLIST_TEMPLATE.md        # + completion criteria per item
│   ├── PLAN_TEMPLATE.md             # + OBJECTIVE as first line
│   └── FINDINGS_TEMPLATE.md         # + provenance, confidence, reasoning chain
├── review-[date]/
│   ├── checklist.md
│   ├── plan.md
│   ├── findings.md
│   ├── premise-report.md            ← NEW (replaces assumptions.md)
│   └── remediation-log.md           # + state tracking per implementation

9Frequently Asked Questions

How does the checkpoint-verifier sub-agent work?

The checkpoint-verifier sits between workflow phases and spot-checks a sample of findings (3–5 per category) against the actual codebase. It goes to the cited file and line number, reads the actual code, and classifies each finding as VERIFIED, MISMATCH, NOT_FOUND, or OVERSTATED. If the verification rate drops below 80%, the orchestrator re-runs the scanner rather than proceeding with corrupted data. It runs between scanner and writer phases and again between auditor and implementer phases.

Why does premise analysis replace the self-reported assumptions log?

The self-reported assumptions log asks scanner sub-agents to report their own interpretive decisions. Three problems make this structurally weak. First, the most dangerous assumptions are ones the scanner doesn't recognize as assumptions—training-data priors that feel like "just knowing." Second, self-report is reflexive (what did I take for granted?) while models are better at forward reasoning (premise → conclusion), so reasoning chains produce better data. Third, individual scanner invocations can't see each other's premises, so cross-finding patterns like shared assumptions and contradictions are invisible to self-report. The premise analyst fills all three gaps.

What is goal substitution and why is it hard to detect?

Goal substitution occurs when the agent quietly replaces the assigned goal with a nearby easier one. The output has the right structure, uses the right vocabulary, and reads well—it just doesn't answer the question that was asked. The validator's "rule zero" and the auditor's goal-fidelity assessment catch this by explicitly checking whether checklist items and findings serve the stated objective.

Can these remediations be added without redesigning the existing workflow?

Yes. The total additions are two new sub-agents (checkpoint-verifier and premise analyst), one new file (premise-report.md), one new workflow phase (Phase 3.5 auto-completion loop), prompt enhancements to existing sub-agent types, and orchestration prompt additions. No existing components need to be removed or restructured.

What is the difference between the checkpoint-verifier and the premise analyst?

The checkpoint-verifier checks evidence—it goes to the cited file and confirms the code matches the finding. The premise analyst checks reasoning—it reads the inferential step between evidence and conclusion across all findings and identifies shared premises, contradictions, and ungrounded inferences. One verifies that what was found is real; the other verifies that the logic connecting observation to conclusion is sound.

What is the auto-completion loop and why does it only run once?

The auto-completion loop (Phase 3.5) runs after the auditor classifies gaps as SKIPPED, BLOCKED, or EMPTY. For SKIPPED items only, it re-invokes the scanner, runs the checkpoint-verifier on new results, and merges verified findings. It runs exactly once because repeated auto-remediation risks the scanner producing lower-quality findings under implicit "you must find something" pressure—a subtle form of phantom grounding. One pass catches honest gaps; persistent gaps need human attention.

How does the orchestrator re-anchoring protocol prevent behavioral drift?

Sub-agent isolation already prevents drift within individual sub-agents. The remaining risk is in the orchestrator, which runs continuously. The re-anchoring protocol adds a periodic self-check every three sub-agent invocations: the orchestrator re-reads its instructions, reviews whether it accepted sub-par output, and checks whether it softened requirements. The check is lightweight enough to sustain across a full workflow without contributing to context decay.

What is tool model mismatch and why can't it be fully prevented?

Tool model mismatch occurs when the agent's understanding of how a tool works diverges from reality—using sed when an AST parser is needed, misinterpreting exit codes, or not understanding that a database migration is irreversible. It can't be fully prevented because no prompt engineering can give the model knowledge it doesn't have. The irreversibility gate changes the failure mode: instead of silent misuse, the agent either explains correctly (verifiable), reveals its misunderstanding (catchable), or escalates (safe).

How does scope creep differ from goal substitution?

Goal substitution is the agent answering a different, easier question than the one asked. Scope creep is the agent answering the right question plus a bunch of questions nobody asked—refactoring files it wasn't asked to touch, adding features beyond the specification. Goal substitution is caught by the validator's rule-zero fidelity check. Scope creep is prevented by scope boundary enforcement and caught after the fact by state-diff verification. Scope creep is particularly dangerous in implementation workflows where every unsolicited modification is an unreviewed change in production code.

What is automation complacency and how does the architecture address it?

Automation complacency is the human tendency to rubber-stamp agent output after repeated positive experiences. The architecture addresses it structurally: the checkpoint-verifier provides systematic review independent of human attention, provenance chains make evidential basis traceable, confidence calibration directs limited attention to where it matters, and the premise analyst's blast radius counts help the reviewer prioritize which assumptions to validate. These structural defenses exist precisely because human vigilance degrades over time.

What is the risk of post-hoc rationalization in reasoning chains?

The scanner may produce a finding from pattern matching and then construct a justification after the fact. The premise analyst partially catches this by flagging premises that cite broad standards without specific clauses. Even with rationalization, the aggregate patterns (shared premises, contradictions) emerge regardless of whether individual premises are genuine or rationalized, because they reflect what the scanner actually assumed even if the explanation is reconstructed.

What's the difference between LLM-general and agentic-specific failure modes?

LLM-general failure modes (phantom grounding, goal substitution, invisible assumptions, overconfidence uniformity, premature convergence, behavioral drift, scope creep) exist in any LLM interaction—the agentic context amplifies their consequences. Agentic-specific modes (error compounding, context decay, state confusion, recovery failure, role confusion, no audit trail, completeness gaps, coordination divergence, tool model mismatch) emerge from multi-step execution architecture. The distinction matters for mitigation: LLM-general modes need structural checks that catch inherent model limitations, while agentic-specific modes require architectural solutions in the orchestration layer. Both categories are addressed in this article.