1Why a Failure Mode Taxonomy Matters
AI coding agents fail in predictable, recurring patterns. A small mistake in an early step compounds through every downstream sub-agent. A review confidently answers the wrong question. A finding cites a file that doesn't exist. These failures are dangerous precisely because the output reads well—it passes the casual review that catches everything else. This series catalogs sixteen such failure modes, organizes them into four priority tiers based on frequency, severity, and detectability, and then builds the structural defenses that address each one: workflow architecture, sub-agent design, checkpoint verification, provenance chains, goal-fidelity checks, and a human review runbook calibrated to the risk tiers. The taxonomy tells you what goes wrong. The rest of the series tells you how to make sure it doesn't.
These sixteen modes fall into two fundamentally different categories. Some are inherent LLM limitations amplified by agentic context: hallucination becomes phantom grounding when the agent takes real actions based on fabricated evidence; sycophancy becomes behavioral drift when the orchestrator progressively softens quality requirements across dozens of sub-agent invocations; satisficing becomes premature convergence when the auditor marks one endpoint checked and calls the whole category done; silent ambiguity resolution becomes invisible assumptions when each sub-agent inherits upstream assumptions as facts. These failure modes exist in any LLM interaction—the agentic context makes their consequences categorically worse because the agent has tools, persistence, and real-world side effects.
Others are emergent from agentic architecture itself, with no meaningful analog in single-turn usage: error compounding, where a small upstream mistake becomes the foundation for everything downstream; coordination divergence, where parallel sub-agents independently resolve the same ambiguity in incompatible directions; state confusion, where the agent's mental model of the environment diverges from reality after a partial rollback or concurrent modification. These failure modes don't exist outside multi-step execution. You can't prompt-engineer them away because they live in the orchestration, not in the model.
The distinction matters for practitioners because it determines where mitigation effort should go. LLM-general modes need structural checks that catch inherent model limitations—checkpoint verification, provenance chains, goal-fidelity assessments. Agentic-specific modes need architectural solutions in the orchestration layer—re-grounding gates, error recovery scaffolding, consistency reconciliation.
There is one more factor that makes this taxonomy urgent rather than merely interesting: automation complacency. This is not a failure mode of the agent—it's a failure mode of the human. The first time a developer reviews an AI agent's pull request, they read every line. By the twentieth, they're spot-checking. By the fiftieth, they're rubber-stamping. The consistent professional quality of agent output creates a false sense of reliability, and the silent corruption modes in Tiers 1 and 2 are precisely the failures that exploit this gap. They produce output that passes the casual review that catches everything else. Structural defenses—checkpoint verification, provenance chains, goal-fidelity checks, confidence calibration—exist because human vigilance is an unreliable backstop, especially at scale. The taxonomy below is, at its core, a guide to building architecture that doesn't depend on the human catching every mistake.
The most dangerous agentic failures aren't the ones that produce obviously wrong results. They're the ones that produce plausibly right results that happen to be wrong in ways you won't notice until much later. A goal-substituted review reads well. An error-compounded analysis is internally consistent. A phantom-grounded finding has the right format. That's what makes Tiers 1 and 2 the priority—they pass the casual review that catches everything else. And as automation complacency sets in, that review becomes ever more casual.
What This Series Covers
This article is the entry point for the Capstone IT Engineering Series on agentic AI for software development. The series builds a complete methodology—from structural principles through working implementations to failure mode remediations—that addresses the full landscape of how agents succeed and fail at real software engineering tasks.
Failure Mode Taxonomy
The complete sixteen-mode taxonomy, four priority tiers, LLM-general vs. agentic-specific classification, and a roadmap for the rest of the series. Start here to understand what goes wrong and why the structural defenses in later articles exist.
Security Review Workflow
Builds the core methodology through a concrete security review use case: self-scaffolding checklists, orchestrated sub-agents with the seven archetypes, automated remediation with human checkpoints, a live demonstration, and generalization to testing, API review, migration review, and performance auditing. Establishes the five structural principles and seven sub-agent archetypes that the rest of the series extends.
Beyond Security
Extracts the domain-independent pattern from the security review workflow and applies it to automated test generation, API code review, database migration review, and performance auditing. Includes complete sub-agent definitions and orchestration prompts for each domain.
Advanced Mitigations
Detailed implementation of structural remediations across all four priority tiers: the checkpoint-verifier sub-agent, provenance chains, re-grounding gates, goal-fidelity checks, assumptions logging, evidence verification, error recovery scaffolding, scope boundary enforcement, consistency reconciliation, irreversibility gates, and confidence calibration.
Human Review Runbook
The human side of the equation: a structured review process for evaluating agent output, calibrated to the four priority tiers. Where to apply scrutiny, what to spot-check, and when to reject versus remediate.
The Five Structural Principles
Every workflow in this series—and every remediation in this taxonomy—rests on five domain-independent principles established in Parts 1–5. They are the invariant core of the methodology, and they directly address five of the sixteen failure modes. The remaining eleven modes require additional structural defenses built on top of this foundation.
The agent's context window is volatile. Files are persistent. Every piece of workflow state—checklists, plans, findings, progress—lives in a file that survives context window limits and can be audited after the fact. Directly addresses: context decay, no audit trail.
Don't rely on the agent to "know what to check." Every item specifies what to do, what evidence to collect, and how to know when you're done. Directly addresses: completeness gaps.
A single-pass checklist has blind spots. A second pass—explicitly prompted to find gaps, forbidden from approving—catches what the first missed. This is a structural control, not a suggestion. Directly addresses: completeness gaps.
Convert the checklist into an ordered execution plan. Update status as you work. Document every finding with evidence. Mark items N/A with justification rather than silently skipping. Directly addresses: no audit trail.
After execution, a separate pass compares what was planned against what was done. Flags incomplete items, unjustified skips, findings without evidence, and orphan results that don't trace to a checklist item. Directly addresses: completeness gaps.
2The Complete Taxonomy: Sixteen Failure Modes
Five of these failure modes are obvious—they tend to produce visibly incomplete or inconsistent output that a competent reviewer catches quickly. The remaining eleven are subtler and more consequential: they produce output that looks correct on casual inspection but isn't. Together, these sixteen modes cover the full landscape of how agents fail when given structured, open-ended tasks.
The Obvious Five
The agent forgets earlier instructions as the conversation grows. In long workflows, the orchestrator progressively loses access to its own instructions and earlier reasoning, degrading every other capability.
Without an explicit checklist, entire categories get missed. The agent writes tests for happy paths and misses edge cases, or reviews authentication but forgets rate limiting. Very common, but among the most tractable failures.
You can't verify what was checked versus what was skipped. Without structured logging, every other failure mode becomes opaque—you can't diagnose what went wrong after the fact.
An agent told to be critical becomes agreeable over time. Also known as the "sycophancy gradient"—the agent's tone and judgment shift toward accommodation as the conversation progresses.
A single agent asked to both explore and document does neither well. Competing objectives within a single context degrade performance on all of them.
Eleven Subtler Failure Modes
These are the modes that warrant the most attention. Unlike the obvious five, they produce output that reads well, follows the expected format, and uses the right vocabulary—which is exactly what makes them dangerous. A goal-substituted review is thorough and well-organized; it just answers the wrong question. An error-compounded analysis is internally consistent; it just builds on a false premise. These modes pass casual review, and they pass increasingly casual review as automation complacency sets in.
A small mistake in step 3 becomes the foundation for steps 4 through 20. Unlike a human who might get a nagging feeling that something is off, an agent builds confidently on its own errors. By the time the problem surfaces, the entire output is structurally unsound and partial remediation isn't possible—you have to restart. The damage is multiplicative rather than additive. A key mechanism is handoff information loss: every time context passes between agents, compression strips uncertainty signals and caveats. The receiving agent gets conclusions without qualifications—and treats them as ground truth.
The agent quietly replaces the goal it was given with a nearby easier one. Asked to "identify the optimal architecture," it describes a familiar one. Asked to "analyze risks," it describes the system. The output reads well, has the right structure, and uses the right vocabulary, making this failure particularly insidious—it passes casual review.
When the agent encounters ambiguity, it resolves it silently rather than surfacing the choice point. Each assumption might be individually reasonable, but the user never gets to validate the bundle of them, and the cumulative effect can send the output in a direction that's internally coherent but wrong for the actual context.
The agent references, cites, or builds on information that doesn't exist—a file it didn't actually read, a constraint it invented, a prior result it's misremembering. In agentic contexts this is worse than ordinary hallucination because the agent may take real, irreversible actions based on fabricated premises.
The agent declares "done" before the task is actually complete. It rounds partial results up to complete ones, treats the first workable answer as the best answer, and skips verification. Structurally different from behavioral drift—it's about the agent's relationship to its own output, not to the user's preferences.
When the agent hits an error or dead end, it either loops on the same failing approach, silently skips the problematic step as if it never existed, or restarts from scratch and loses all prior progress. Graceful degradation and intelligent backtracking are things agents are genuinely bad at.
In multi-step workflows involving external systems, the agent loses track of what the actual state of the world is—what's been modified, what succeeded, what was rolled back. It operates on a mental model of the environment that has diverged from reality.
The agent presents everything—well-grounded facts, reasonable inferences, and outright speculation—with identical confidence and tone. The user has no signal for where to apply scrutiny. More of a trust-calibration and communication problem than a correctness problem, but it degrades the human's ability to catch the other failure modes.
The inverse of premature convergence. The agent doesn't stop when it should—it refactors files it wasn't asked to touch, adds features beyond the specification, "improves" code in ways that introduce risk, or explores tangentially related issues when it should stay focused. Structurally different from goal substitution: the agent answers the right question plus a bunch of questions nobody asked. In agentic coding, this is dangerous because an implementer that scope-creeps modifies files outside its mandate, potentially breaking things that were working. Especially common with more capable models.
When parallel sub-agents independently resolve the same ambiguity in different directions, the results can be individually correct and collectively incoherent. Two scanners make incompatible assumptions about shared resources, naming conventions, or architectural patterns—each reasonable in isolation, mutually exclusive in combination. Distinct from invisible assumptions: the problem isn't that a choice point was hidden from the human, but that two agents each made reasonable choices without any mechanism to reconcile them. Escalates to Tier 2 severity in pipeline architectures without a central reconciliation layer.
The agent's internal model of how a tool behaves diverges from how it actually behaves, especially around edge cases, error conditions, and irreversibility. It uses sed for a transformation that requires an AST parser. It runs a destructive command in the wrong directory. It misinterprets an exit code. It executes a database migration without understanding the operation is irreversible. Not hallucination (the tool is real), not wrong assumptions about the task (the task is understood), not state confusion (the environment model is correct)—it's a competence gap in understanding the tool itself. Matters most in agentic coding where tools have real-world side effects.
3Four Priority Tiers
Not all failure modes are equally urgent to address. The ranking below uses a composite of three factors: how frequently the failure occurs, how severe the consequences are, and how hard it is for a human to detect after the fact. The highest-priority failures are common, consequential, and invisible.
Error Compounding · Context Decay · No Audit Trail
These failures undermine the workflow itself. Error compounding is multiplicative—one upstream mistake corrupts everything downstream. Context decay degrades every other capability as the orchestrator progressively loses its own instructions. No audit trail makes every other failure mode opaque and undiagnosable. If these aren't addressed, sub-agent design quality doesn't matter.
Priority rationale: High frequency × high severity × low detectability. Error compounding is the single most dangerous failure mode because the agent's confident tone masks the rot. Context decay is the hardest to fully solve because it's a constraint of current architectures. No audit trail has the highest tractability—structured logging is straightforward—which is why it belongs at the top despite being less dramatic.
Goal Substitution · Invisible Assumptions · Phantom Grounding · Completeness Gaps
These failures corrupt individual outputs in ways that are difficult to detect. Unlike Tier 1, where the problem is propagation through the pipeline, Tier 2 problems originate at the source—a sub-agent that answered the wrong question, resolved ambiguity silently, or cited evidence that doesn't exist. The output reads well, which is exactly the problem.
Priority rationale: High frequency × medium-to-high severity × very low detectability. Goal substitution and invisible assumptions are the hardest to catch in review because the output looks thoughtful and well-reasoned. Phantom grounding is dangerous specifically in agentic contexts where actions have real consequences. Completeness gaps are the most tractable item in this tier.
Behavioral Drift · Premature Convergence · Recovery Failure · Role Confusion · Scope Creep · Coordination Divergence · Tool Model Mismatch
Real and costly, but more detectable and more addressable than Tiers 1 and 2. Behavioral drift is partially mitigated by re-anchoring to system instructions. Premature convergence is catchable by a human reviewer who knows what "done" should look like. Recovery failure is addressable with retry logic and error-handling scaffolding. Role confusion is the most architecturally solvable problem in the entire taxonomy—multi-agent designs with separated concerns largely eliminate it. Scope creep—the inverse of premature convergence—is detectable through state diffs and scope boundary enforcement. Coordination divergence emerges when parallel sub-agents make incompatible assumptions, and is addressable through explicit reconciliation steps. Tool model mismatch is hardest to prevent proactively but catchable through irreversibility gates and the existing implementer-then-verifier pattern.
Priority rationale: Medium frequency × medium severity × medium detectability. These failures tend to produce output that a careful human reviewer can catch, unlike Tiers 1 and 2. Scope creep and tool model mismatch are notable because they escalate in severity specifically in implementation workflows where the agent has write access to the codebase.
State Confusion · Overconfidence Uniformity
State confusion is serious in tool-use and code-generation workflows but addressable with explicit state tracking and environment snapshots. Overconfidence uniformity matters for trust calibration but is more of a UX and communication problem than a correctness problem—addressable by requiring agents to flag confidence levels on individual findings.
Priority rationale: Medium frequency × lower severity × higher tractability. Important, but these won't silently corrupt your workflow the way Tiers 1 and 2 will.
4LLM-General vs. Agentic-Specific Failures
Not all sixteen failure modes are created equal in origin. Some are fundamental LLM limitations that exist in any interaction with a language model—the agentic context merely amplifies their consequences. Others emerge specifically from multi-step execution, tool use, or orchestration architecture and have no meaningful analog in single-turn LLM usage. This distinction matters for practitioners because it determines where mitigation effort should go: prompt engineering and model selection for LLM-general modes, versus architectural and orchestration design for agentic-specific modes.
LLM-General — Amplified by Agentic Context
These failure modes exist whenever you interact with a language model. In a chatbot, they produce wrong answers. In an agentic coding workflow, they produce wrong answers that get implemented as code changes. The mechanism is the same; the consequences are categorically different.
| Failure Mode | LLM Manifestation | Agentic Amplification |
|---|---|---|
| Phantom Grounding | Hallucination — the model invents facts, citations, or code in a single response | The agent takes real, irreversible actions based on fabricated premises. A phantom finding leads to a code change based on a file that doesn't exist or a behavior that isn't present. |
| Goal Substitution | The model answers an easier question than the one asked — well-documented in single-turn interactions | A multi-step workflow compounds the substitution. Each downstream step builds on the wrong objective, making the deviation harder to detect and more expensive to fix. |
| Invisible Assumptions | The model silently resolves ambiguity in every response without surfacing the choice point | Assumptions compound across sub-agents. Each agent inherits upstream assumptions as facts, and the cumulative bundle can send the entire workflow in a wrong direction. |
| Overconfidence Uniformity | Models present all outputs with identical confidence regardless of actual certainty | Degrades the human reviewer's ability to triage — a speculative finding and a verified finding look the same, making review less efficient at scale. |
| Premature Convergence | Satisficing over optimizing — the model produces the first adequate answer rather than the best one | The auditor marks items complete based on the existence of a finding, not its quality. One endpoint checked out of forty satisfies the binary check. |
| Behavioral Drift | Sycophancy within a single conversation — the model becomes more agreeable over turns | The orchestrator progressively softens quality requirements across many sub-agent invocations, accepting marginal output it would have rejected earlier. |
| Scope Creep | The model writes a 2000-word answer to a question that needed 200 words, or adds unsolicited suggestions | An implementer modifies files outside its mandate. A scanner explores tangential findings that burn tokens and time. In shared codebases, this can break things that were working. |
Agentic-Specific — Emergent from Architecture
These failure modes have no meaningful analog in single-turn LLM usage. They emerge from the structural properties of multi-step workflows: pipelines where outputs feed inputs, tool use with real-world side effects, parallel execution without coordination, and long-running contexts that exceed effective capacity.
| Failure Mode | Why It's Agentic-Specific | Architectural Driver |
|---|---|---|
| Error Compounding | Requires multi-step pipelines where upstream outputs become downstream inputs. A single LLM response can be wrong, but it can't compound. | Phase-based workflows; handoff compression between agents strips uncertainty signals |
| Context Decay | Requires conversations long enough to exceed effective context window. Single-turn interactions don't decay. | Orchestrator running across many sub-agent invocations; progressive loss of instructions |
| State Confusion | Requires tool use and environment interaction. A bare LLM has no environment to be confused about. | Multi-step implementation workflows; rollbacks, partial application, concurrent modifications |
| Recovery Failure | Requires error conditions in tool execution. A bare LLM doesn't encounter tool failures. | Orchestration with external tools; no structured fallback protocol |
| Role Confusion | Requires multi-agent architectures with role boundaries. A single LLM has no roles to confuse. | Orchestrator doing sub-agent work; blurred boundaries under pressure |
| No Audit Trail | Meaningless outside a multi-step workflow. A single response doesn't need an audit trail. | Multi-phase execution without structured logging |
| Completeness Gaps | Technically possible in single-turn but primarily a workflow coverage problem across multiple invocations. | Checklist-driven scanning where categories can be silently skipped |
| Coordination Divergence | Requires multiple agents operating on the same domain in parallel. Single agents can't diverge from themselves. | Parallel sub-agent invocation; pipeline architectures without reconciliation |
| Tool Model Mismatch | The underlying competence gap (misunderstanding how a tool works) is an LLM limitation, but the consequences materialize only when the agent can execute the wrong model. | Agent tool access; irreversible operations without rollback understanding |
The LLM-general failure modes become categorically more dangerous in agentic contexts not because they're more frequent but because the agent has tools, persistence, and real-world side effects. Phantom grounding in a chatbot produces a wrong answer you can ignore. Phantom grounding in a coding agent produces a code change based on a file that doesn't exist. Same mechanism, vastly different consequences. For the LLM-general modes, the most effective mitigations are structural (checkpoint verification, provenance chains, goal-fidelity checks) rather than prompt-based—because you can't prompt-engineer away a fundamental model limitation, but you can build architecture that catches its effects. For the agentic-specific modes, the mitigations are architectural by necessity—they address failure patterns that exist in the orchestration, not in the model.
5Series Roadmap: Where Each Failure Mode Gets Addressed
Each of the sixteen failure modes is addressed somewhere in the series—some by the five-principle architecture established in Parts 1–5, others by new structural additions in Parts 7+, and a few by the human review process in the Final article. The table below maps each failure mode to where its primary mitigation lives.
| Failure Mode | Tier | Addressed In | How |
|---|---|---|---|
| TIER 1 — STRUCTURAL RISKS | |||
| Error Compounding | 1 | Parts 7+ | Checkpoint-verifier sub-agent, provenance chains, re-grounding gates |
| Context Decay | 1 | Parts 1–5 | Principle 1: externalize state to files; sub-agent isolation limits per-invocation context |
| No Audit Trail | 1 | Parts 1–5 | Principle 4 + file pipeline (checklist → plan → findings → remediation-log) |
| TIER 2 — SILENT CORRUPTION | |||
| Goal Substitution | 2 | Parts 7+ | Goal-fidelity checks in validator and auditor; objective capture in plan.md |
| Invisible Assumptions | 2 | Parts 7+ | assumptions.md file; ambiguity-logging rules in scanner prompts |
| Phantom Grounding | 2 | Parts 7+ | Verbatim evidence requirements; file-existence checks; checkpoint-verifier |
| Completeness Gaps | 2 | Parts 1–5 | Principles 2, 3, 5: checklist → validator → auditor triple layer |
| TIER 3 — DEGRADATION OVER TIME | |||
| Behavioral Drift | 3 | Parts 1–5 + Parts 7+ | Sub-agent isolation (existing) + orchestrator re-anchoring protocol (new) |
| Premature Convergence | 3 | Parts 7+ | Definition-of-done patterns; completion quality checks; auto-completion loop |
| Recovery Failure | 3 | Parts 7+ | Error recovery scaffolding: structured retry → alternative → log-and-continue |
| Role Confusion | 3 | Parts 1–5 | Seven archetypes with enforced tool restrictions |
| Scope Creep | 3 | Parts 1–5 + Parts 7+ | State diffs (existing, partial) + scope boundary enforcement + scope verification (new) |
| Coordination Divergence | 3 | Parts 1–5 + Parts 7+ | Serial orchestration (existing, partial) + consistency reconciliation (new) |
| Tool Model Mismatch | 3 | Parts 1–5 + Parts 7+ | Implementer-then-verifier (existing, partial) + irreversibility gates (new) |
| TIER 4 — TRACTABLE BUT IMPORTANT | |||
| State Confusion | 4 | Parts 7+ | State snapshots before/after implementer actions; state-diff verification |
| Overconfidence Uniformity | 4 | Parts 7+ | Confidence field in findings template; calibration rules in scanner prompts |
The five-principle architecture from Parts 1–5 fully addresses five failure modes and partially covers three more. The gaps cluster in the modes that produce plausible-looking output (Tier 2) and the modes that degrade silently over time (Tier 3). The architecture has strong completeness controls but weak correctness and degradation controls. Parts 7+ address that asymmetry with targeted structural additions—no existing components are removed or restructured.
6Remediation Preview
The detailed implementation of each remediation lives in the Parts 7+ articles. The table below previews every structural addition—what it is, what type of change it requires, and what failure mode it addresses. The total footprint is deliberately small: one new sub-agent, one new file, one new workflow phase, orchestration enhancements, and prompt additions to existing sub-agent types.
| Remediation | Type | Addresses | One-Sentence Description |
|---|---|---|---|
| TIER 1 — STRUCTURAL RISKS | |||
| Checkpoint-verifier | New Sub-Agent | Error compounding, phantom grounding | Spot-checks upstream findings against the actual codebase between workflow phases; flags mismatches before they propagate. |
| Provenance fields | Template Change | Error compounding, no audit trail | Adds source, checklist-item trace, and verification-status fields to every finding so corruption is auditable. |
| Re-grounding gates | Orchestration | Context decay, goal substitution | Re-injects the original task context at each phase boundary to prevent progressive drift from the objective. |
| TIER 2 — SILENT CORRUPTION | |||
| Objective capture | Orchestration | Goal substitution | Records the exact original objective as the first line of plan.md so every sub-agent can reference it. |
| Validator rule zero | Prompt Addition | Goal substitution | Requires the validator to verify that checklist items serve the stated objective before checking for gaps. |
| Goal-fidelity assessment | Prompt Addition | Goal substitution | Adds an auditor section that classifies each finding as RELEVANT, TANGENTIAL, or SUBSTITUTE relative to the objective. |
| assumptions.md | New File | Invisible assumptions | A per-review file where every interpretive decision gets logged with the ambiguity, resolution, alternative, and impact. |
| Ambiguity logging | Prompt Addition | Invisible assumptions | Requires scanner sub-agents to log every silent ambiguity resolution in assumptions.md rather than resolving silently. |
| Assumptions review | Prompt Addition | Invisible assumptions | Auditor reviews high-impact assumptions and flags those that affect implementation-queue findings for human review. |
| Verbatim evidence | Prompt Addition | Phantom grounding | Requires scanners to include exact code quotes rather than paraphrased descriptions, raising the bar for fabrication. |
| File existence check | Orchestration | Phantom grounding | Verifies that all cited files actually exist before passing scanner results to the writer, eliminating the cheapest fabrications. |
| First-principles check | Prompt Addition | Completeness gaps | Validator ignores the checklist and asks "what could go wrong?" from first principles, creating a second cognitive path. |
| TIER 3 — DEGRADATION OVER TIME | |||
| Re-anchoring protocol | Orchestration | Behavioral drift | Periodic self-check where the orchestrator re-reads instructions and checks whether it has softened any requirements. |
| Completion criteria | Template Change | Premature convergence | Adds explicit, testable done-criteria to each checklist item so the auditor checks quality, not just presence. |
| Completion quality check | Prompt Addition | Premature convergence | Auditor classifies items as COMPLETE, PARTIAL, SHALLOW, or MISSING rather than binary complete/incomplete. |
| Gap classification | Prompt Addition | Completeness gaps, premature convergence | Auditor classifies each gap as SKIPPED, BLOCKED, or EMPTY to enable targeted auto-remediation. |
| Auto-completion loop | Orchestration | Completeness gaps | Single-pass re-invocation of scanners for SKIPPED items only, with verification, after the auditor identifies gaps. |
| Error recovery scaffolding | Orchestration | Recovery failure | Structured protocol: retry → alternative approach → log as BLOCKED and continue. Prevents both loops and silent skips. |
| Role boundary rules | Prompt Addition | Role confusion | Explicit rules preventing the orchestrator from doing sub-agent work directly under time or context pressure. |
| Scope boundary enforcement | Prompt Addition | Scope creep | Implementers must verify each change is within authorized file and line range; scanners must defer out-of-scope observations. |
| Scope verification check | Orchestration | Scope creep | Orchestrator verifies after each implementer invocation that modified files match the authorized set. |
| Consistency reconciliation | Orchestration | Coordination divergence | After all scanners complete, compares assumptions across outputs and flags conflicts for review. |
| Irreversibility gate | Prompt Addition | Tool model mismatch | Requires the implementer to articulate what a destructive operation does, whether it's reversible, and the rollback procedure. |
| Destructive operation logging | Orchestration | Tool model mismatch | Orchestrator logs all destructive operations with pre-state and the implementer's stated understanding for audit. |
| TIER 4 — TRACTABLE BUT IMPORTANT | |||
| State snapshots | Orchestration | State confusion | Captures file content before and after each implementer action so the verifier can diff intended vs. actual changes. |
| State-diff verification | Prompt Addition | State confusion | Verifier compares before- and after-snapshots to confirm intended changes and flag unintended ones. |
| State section in remediation log | Template Change | State confusion | Adds a structured state-tracking section to remediation-log.md for post-hoc audit of environment changes. |
| Confidence field | Template Change | Overconfidence uniformity | Adds HIGH/MEDIUM/LOW/UNCERTAIN with mandatory justification to the findings template. |
| Confidence calibration rules | Prompt Addition | Overconfidence uniformity | Sets calibration defaults (most findings should be MEDIUM) and penalizes blanket HIGH ratings. |
7Reading This Series
The series is designed to be read in order for the complete picture, but each article group is self-contained enough to serve as a reference for its specific topic. The table below maps each article group to what it covers and which failure modes it addresses.
| Article Group | What It Covers | Failure Modes Addressed |
|---|---|---|
| Introduction (this article) | Complete taxonomy, priority tiers, LLM-general vs. agentic-specific classification, series roadmap, remediation preview | All 16 (identification and classification) |
| Parts 1–5: Security Review | Five structural principles, seven sub-agent archetypes, orchestration patterns, domain generalization | Context decay, completeness gaps, no audit trail, behavioral drift, role confusion (fully); scope creep, coordination divergence, tool model mismatch (partially) |
| Part 6: Beyond Security | Domain-independent pattern extraction, adaptations for testing, API review, migration review, performance auditing | Same coverage as Parts 1–5, applied across domains |
| Parts 7+: Advanced Mitigations | Checkpoint-verifier, provenance chains, goal-fidelity checks, assumptions logging, evidence verification, error recovery, scope enforcement, consistency reconciliation, irreversibility gates, confidence calibration | Error compounding, goal substitution, invisible assumptions, phantom grounding, premature convergence, recovery failure, scope creep, coordination divergence, tool model mismatch, state confusion, overconfidence uniformity |
| Final: Human Review | Structured human review process calibrated to the four priority tiers | All 16 (detection and disposition by the human reviewer) |
If you're new to agentic workflows, start with Parts 1–5 to understand the five structural principles and seven sub-agent archetypes, then return here to see where the remaining gaps are. If you're already running agentic workflows and experiencing failures, use the taxonomy above to identify which failure modes you're hitting, then jump directly to the relevant section in Parts 7+ for the remediation. If you're evaluating whether to adopt agentic tooling, this introduction gives you the full landscape of what can go wrong and what the structural defenses look like—read this article end to end.
8Frequently Asked Questions
LLM-general failure modes (phantom grounding, goal substitution, invisible assumptions, overconfidence uniformity, premature convergence, behavioral drift, scope creep) exist in any LLM interaction—the agentic context amplifies their consequences because the agent has tools and can take real-world actions. Agentic-specific modes (error compounding, context decay, state confusion, recovery failure, role confusion, no audit trail, completeness gaps, coordination divergence, tool model mismatch) emerge from multi-step execution architecture. The distinction matters for mitigation: LLM-general modes need structural checks that catch inherent model limitations, while agentic-specific modes require architectural solutions in the orchestration layer.
Automation complacency is the human tendency to rubber-stamp agent output after repeated positive experiences. It's not an agent failure mode per se, but it directly determines whether the silent corruption modes (Tiers 1 and 2) get caught. The architecture addresses it structurally: the checkpoint-verifier provides systematic review independent of human attention, provenance chains make evidential basis traceable, confidence calibration directs limited attention to where it matters, and the auditor's goal-fidelity assessment catches systematic drift that a fatigued reviewer might miss. These structural defenses exist precisely because human vigilance degrades over time.
Goal substitution occurs when the agent quietly replaces the goal it was given with a nearby easier one. Asked to "identify the optimal architecture," it describes a familiar one. Asked to "analyze risks," it describes the system. The output reads well, has the right structure, and uses the right vocabulary, which makes this failure particularly insidious—it passes casual review. The remediation is adding a "rule zero" goal-fidelity check to the validator sub-agent and a goal-fidelity assessment section to the auditor sub-agent.
Error compounding is multiplicative rather than additive. A small mistake in an early step—a misidentified code pattern, a fabricated finding, or an incorrect assumption—becomes the foundation for every subsequent step. The downstream sub-agents build confidently on the error, and the agent's consistent tone masks the corruption. By the time the problem surfaces, the entire output is structurally unsound and partial remediation isn't possible. A key mechanism is handoff information loss: every time context passes between agents, compression strips uncertainty signals and caveats. The receiving agent gets conclusions without qualifications—and treats them as ground truth.
Goal substitution is the agent answering a different, easier question than the one asked. Scope creep is the agent answering the right question plus a bunch of questions nobody asked—refactoring files it wasn't asked to touch, adding features beyond the specification, or exploring tangential findings. The mitigations are different: goal substitution is caught by the validator's rule-zero fidelity check, while scope creep is prevented by scope boundary enforcement in the implementer and scanner prompts and caught after the fact by state-diff verification. Scope creep is particularly dangerous in implementation workflows where every unsolicited modification is an unreviewed change in production code.
Invisible assumptions is about a single agent resolving ambiguity without surfacing the choice point. Coordination divergence is about two agents independently resolving the same ambiguity in incompatible ways—each may correctly log its assumption, but no mechanism detects that the assumptions conflict. The consistency reconciliation step compares assumptions across scanners after they all complete. In the current serial architecture with a central orchestrator, this is a moderate risk. In pipeline architectures without reconciliation, it escalates to Tier 2 severity.
The existing architecture effectively addresses five of the sixteen failure modes. Context decay is handled by externalizing state to files (Principle 1). Completeness gaps are covered by the checklist-validator-auditor triple layer (Principles 2, 3, and 5). Behavioral drift is structurally mitigated by isolating critical functions in dedicated sub-agents with hard constraints. Role confusion is solved by the seven archetypes with enforced tool restrictions. The audit trail is provided by the file pipeline from checklist through findings to remediation log (Principle 4). Three additional modes—scope creep, coordination divergence, and tool model mismatch—receive partial coverage from existing structural patterns.
Completeness gaps mean the right question was asked but some categories were missed—the review covered authentication but forgot rate limiting. Goal substitution means the wrong question was answered entirely—the review was asked to assess backward compatibility risks but instead cataloged endpoint patterns. A complete review can still be goal-substituted (every category checked, but the categories don't serve the objective), and an incomplete review can still be goal-faithful. The validator's "rule zero" catches goal substitution; the validator's gap-finding catches completeness issues. Both are needed.
Phantom grounding—where the agent references information that doesn't exist—is addressed at three levels in the architecture. First, scanner sub-agents are required to include verbatim code quotes rather than just file paths and descriptions, which raises the bar for fabrication. Second, the orchestrator runs a file-existence check between scanner and writer phases, removing any findings that cite nonexistent files. Third, the checkpoint-verifier spot-checks a sample of findings against the actual codebase, catching cases where the file exists but the described behavior is wrong. In a chatbot, phantom grounding produces a wrong answer. In an agentic workflow, it produces a code change based on a file that doesn't exist—same mechanism, vastly different consequences.
Tool model mismatch occurs when the agent's understanding of how a tool works diverges from reality—using sed when an AST parser is needed, misinterpreting exit codes, or not understanding that a database migration is irreversible. It can't be fully prevented because no amount of prompt engineering can give the model knowledge it doesn't have. The irreversibility gate changes the failure mode: instead of silent misuse, the agent either explains its understanding correctly (giving the verifier something to check), reveals its misunderstanding (giving the human a signal to intervene), or correctly identifies uncertainty and escalates.