Capstone IT Engineering Series — Introduction

Agentic AI for Software Development: Failure Modes and Engineering Safeguards

Sixteen ways AI coding agents fail — and the architectural mitigations that catch each one before it reaches production.

1Why a Failure Mode Taxonomy Matters

AI coding agents fail in predictable, recurring patterns. A small mistake in an early step compounds through every downstream sub-agent. A review confidently answers the wrong question. A finding cites a file that doesn't exist. These failures are dangerous precisely because the output reads well—it passes the casual review that catches everything else. This series catalogs sixteen such failure modes, organizes them into four priority tiers based on frequency, severity, and detectability, and then builds the structural defenses that address each one: workflow architecture, sub-agent design, checkpoint verification, provenance chains, goal-fidelity checks, and a human review runbook calibrated to the risk tiers. The taxonomy tells you what goes wrong. The rest of the series tells you how to make sure it doesn't.

These sixteen modes fall into two fundamentally different categories. Some are inherent LLM limitations amplified by agentic context: hallucination becomes phantom grounding when the agent takes real actions based on fabricated evidence; sycophancy becomes behavioral drift when the orchestrator progressively softens quality requirements across dozens of sub-agent invocations; satisficing becomes premature convergence when the auditor marks one endpoint checked and calls the whole category done; silent ambiguity resolution becomes invisible assumptions when each sub-agent inherits upstream assumptions as facts. These failure modes exist in any LLM interaction—the agentic context makes their consequences categorically worse because the agent has tools, persistence, and real-world side effects.

Others are emergent from agentic architecture itself, with no meaningful analog in single-turn usage: error compounding, where a small upstream mistake becomes the foundation for everything downstream; coordination divergence, where parallel sub-agents independently resolve the same ambiguity in incompatible directions; state confusion, where the agent's mental model of the environment diverges from reality after a partial rollback or concurrent modification. These failure modes don't exist outside multi-step execution. You can't prompt-engineer them away because they live in the orchestration, not in the model.

The distinction matters for practitioners because it determines where mitigation effort should go. LLM-general modes need structural checks that catch inherent model limitations—checkpoint verification, provenance chains, goal-fidelity assessments. Agentic-specific modes need architectural solutions in the orchestration layer—re-grounding gates, error recovery scaffolding, consistency reconciliation.

There is one more factor that makes this taxonomy urgent rather than merely interesting: automation complacency. This is not a failure mode of the agent—it's a failure mode of the human. The first time a developer reviews an AI agent's pull request, they read every line. By the twentieth, they're spot-checking. By the fiftieth, they're rubber-stamping. The consistent professional quality of agent output creates a false sense of reliability, and the silent corruption modes in Tiers 1 and 2 are precisely the failures that exploit this gap. They produce output that passes the casual review that catches everything else. Structural defenses—checkpoint verification, provenance chains, goal-fidelity checks, confidence calibration—exist because human vigilance is an unreliable backstop, especially at scale. The taxonomy below is, at its core, a guide to building architecture that doesn't depend on the human catching every mistake.

The Organizing Principle

The most dangerous agentic failures aren't the ones that produce obviously wrong results. They're the ones that produce plausibly right results that happen to be wrong in ways you won't notice until much later. A goal-substituted review reads well. An error-compounded analysis is internally consistent. A phantom-grounded finding has the right format. That's what makes Tiers 1 and 2 the priority—they pass the casual review that catches everything else. And as automation complacency sets in, that review becomes ever more casual.

What This Series Covers

This article is the entry point for the Capstone IT Engineering Series on agentic AI for software development. The series builds a complete methodology—from structural principles through working implementations to failure mode remediations—that addresses the full landscape of how agents succeed and fail at real software engineering tasks.

Introduction (This Article)

Failure Mode Taxonomy

The complete sixteen-mode taxonomy, four priority tiers, LLM-general vs. agentic-specific classification, and a roadmap for the rest of the series. Start here to understand what goes wrong and why the structural defenses in later articles exist.

Parts 1–5

Security Review Workflow

Builds the core methodology through a concrete security review use case: self-scaffolding checklists, orchestrated sub-agents with the seven archetypes, automated remediation with human checkpoints, a live demonstration, and generalization to testing, API review, migration review, and performance auditing. Establishes the five structural principles and seven sub-agent archetypes that the rest of the series extends.

Part 6

Beyond Security

Extracts the domain-independent pattern from the security review workflow and applies it to automated test generation, API code review, database migration review, and performance auditing. Includes complete sub-agent definitions and orchestration prompts for each domain.

Parts 7+

Advanced Mitigations

Detailed implementation of structural remediations across all four priority tiers: the checkpoint-verifier sub-agent, provenance chains, re-grounding gates, goal-fidelity checks, assumptions logging, evidence verification, error recovery scaffolding, scope boundary enforcement, consistency reconciliation, irreversibility gates, and confidence calibration.

Final

Human Review Runbook

The human side of the equation: a structured review process for evaluating agent output, calibrated to the four priority tiers. Where to apply scrutiny, what to spot-check, and when to reject versus remediate.

The Five Structural Principles

Every workflow in this series—and every remediation in this taxonomy—rests on five domain-independent principles established in Parts 1–5. They are the invariant core of the methodology, and they directly address five of the sixteen failure modes. The remaining eleven modes require additional structural defenses built on top of this foundation.

Principle 1
Externalize knowledge into files, not conversation

The agent's context window is volatile. Files are persistent. Every piece of workflow state—checklists, plans, findings, progress—lives in a file that survives context window limits and can be audited after the fact. Directly addresses: context decay, no audit trail.

Principle 2
Generate domain-specific checklists with explicit completion criteria

Don't rely on the agent to "know what to check." Every item specifies what to do, what evidence to collect, and how to know when you're done. Directly addresses: completeness gaps.

Principle 3
Adversarially validate before executing

A single-pass checklist has blind spots. A second pass—explicitly prompted to find gaps, forbidden from approving—catches what the first missed. This is a structural control, not a suggestion. Directly addresses: completeness gaps.

Principle 4
Execute systematically with progress tracking

Convert the checklist into an ordered execution plan. Update status as you work. Document every finding with evidence. Mark items N/A with justification rather than silently skipping. Directly addresses: no audit trail.

Principle 5
Self-audit for completeness

After execution, a separate pass compares what was planned against what was done. Flags incomplete items, unjustified skips, findings without evidence, and orphan results that don't trace to a checklist item. Directly addresses: completeness gaps.

2The Complete Taxonomy: Sixteen Failure Modes

Five of these failure modes are obvious—they tend to produce visibly incomplete or inconsistent output that a competent reviewer catches quickly. The remaining eleven are subtler and more consequential: they produce output that looks correct on casual inspection but isn't. Together, these sixteen modes cover the full landscape of how agents fail when given structured, open-ended tasks.

The Obvious Five

Obvious · Tractable
Context Decay

The agent forgets earlier instructions as the conversation grows. In long workflows, the orchestrator progressively loses access to its own instructions and earlier reasoning, degrading every other capability.

Obvious · Tractable
Completeness Gaps

Without an explicit checklist, entire categories get missed. The agent writes tests for happy paths and misses edge cases, or reviews authentication but forgets rate limiting. Very common, but among the most tractable failures.

Obvious · Tractable
No Audit Trail

You can't verify what was checked versus what was skipped. Without structured logging, every other failure mode becomes opaque—you can't diagnose what went wrong after the fact.

Obvious · Tractable
Behavioral Drift

An agent told to be critical becomes agreeable over time. Also known as the "sycophancy gradient"—the agent's tone and judgment shift toward accommodation as the conversation progresses.

Obvious · Tractable
Role Confusion

A single agent asked to both explore and document does neither well. Competing objectives within a single context degrade performance on all of them.

Eleven Subtler Failure Modes

These are the modes that warrant the most attention. Unlike the obvious five, they produce output that reads well, follows the expected format, and uses the right vocabulary—which is exactly what makes them dangerous. A goal-substituted review is thorough and well-organized; it just answers the wrong question. An error-compounded analysis is internally consistent; it just builds on a false premise. These modes pass casual review, and they pass increasingly casual review as automation complacency sets in.

Tier 1 — Structural
Error Compounding

A small mistake in step 3 becomes the foundation for steps 4 through 20. Unlike a human who might get a nagging feeling that something is off, an agent builds confidently on its own errors. By the time the problem surfaces, the entire output is structurally unsound and partial remediation isn't possible—you have to restart. The damage is multiplicative rather than additive. A key mechanism is handoff information loss: every time context passes between agents, compression strips uncertainty signals and caveats. The receiving agent gets conclusions without qualifications—and treats them as ground truth.

Tier 2 — Silent Corruption
Goal Substitution

The agent quietly replaces the goal it was given with a nearby easier one. Asked to "identify the optimal architecture," it describes a familiar one. Asked to "analyze risks," it describes the system. The output reads well, has the right structure, and uses the right vocabulary, making this failure particularly insidious—it passes casual review.

Tier 2 — Silent Corruption
Invisible Assumptions

When the agent encounters ambiguity, it resolves it silently rather than surfacing the choice point. Each assumption might be individually reasonable, but the user never gets to validate the bundle of them, and the cumulative effect can send the output in a direction that's internally coherent but wrong for the actual context.

Tier 2 — Silent Corruption
Phantom Grounding

The agent references, cites, or builds on information that doesn't exist—a file it didn't actually read, a constraint it invented, a prior result it's misremembering. In agentic contexts this is worse than ordinary hallucination because the agent may take real, irreversible actions based on fabricated premises.

Tier 3 — Degradation
Premature Convergence

The agent declares "done" before the task is actually complete. It rounds partial results up to complete ones, treats the first workable answer as the best answer, and skips verification. Structurally different from behavioral drift—it's about the agent's relationship to its own output, not to the user's preferences.

Tier 3 — Degradation
Recovery Failure

When the agent hits an error or dead end, it either loops on the same failing approach, silently skips the problematic step as if it never existed, or restarts from scratch and loses all prior progress. Graceful degradation and intelligent backtracking are things agents are genuinely bad at.

Tier 4 — Tractable
State Confusion

In multi-step workflows involving external systems, the agent loses track of what the actual state of the world is—what's been modified, what succeeded, what was rolled back. It operates on a mental model of the environment that has diverged from reality.

Tier 4 — Tractable
Overconfidence Uniformity

The agent presents everything—well-grounded facts, reasonable inferences, and outright speculation—with identical confidence and tone. The user has no signal for where to apply scrutiny. More of a trust-calibration and communication problem than a correctness problem, but it degrades the human's ability to catch the other failure modes.

Tier 3 — Degradation
Scope Creep

The inverse of premature convergence. The agent doesn't stop when it should—it refactors files it wasn't asked to touch, adds features beyond the specification, "improves" code in ways that introduce risk, or explores tangentially related issues when it should stay focused. Structurally different from goal substitution: the agent answers the right question plus a bunch of questions nobody asked. In agentic coding, this is dangerous because an implementer that scope-creeps modifies files outside its mandate, potentially breaking things that were working. Especially common with more capable models.

Tier 3 — Degradation
Coordination Divergence

When parallel sub-agents independently resolve the same ambiguity in different directions, the results can be individually correct and collectively incoherent. Two scanners make incompatible assumptions about shared resources, naming conventions, or architectural patterns—each reasonable in isolation, mutually exclusive in combination. Distinct from invisible assumptions: the problem isn't that a choice point was hidden from the human, but that two agents each made reasonable choices without any mechanism to reconcile them. Escalates to Tier 2 severity in pipeline architectures without a central reconciliation layer.

Tier 3 — Degradation
Tool Model Mismatch

The agent's internal model of how a tool behaves diverges from how it actually behaves, especially around edge cases, error conditions, and irreversibility. It uses sed for a transformation that requires an AST parser. It runs a destructive command in the wrong directory. It misinterprets an exit code. It executes a database migration without understanding the operation is irreversible. Not hallucination (the tool is real), not wrong assumptions about the task (the task is understood), not state confusion (the environment model is correct)—it's a competence gap in understanding the tool itself. Matters most in agentic coding where tools have real-world side effects.

3Four Priority Tiers

Not all failure modes are equally urgent to address. The ranking below uses a composite of three factors: how frequently the failure occurs, how severe the consequences are, and how hard it is for a human to detect after the fact. The highest-priority failures are common, consequential, and invisible.

Tier 1 — Structural Risks That Undermine Everything Else

Error Compounding · Context Decay · No Audit Trail

These failures undermine the workflow itself. Error compounding is multiplicative—one upstream mistake corrupts everything downstream. Context decay degrades every other capability as the orchestrator progressively loses its own instructions. No audit trail makes every other failure mode opaque and undiagnosable. If these aren't addressed, sub-agent design quality doesn't matter.

Priority rationale: High frequency × high severity × low detectability. Error compounding is the single most dangerous failure mode because the agent's confident tone masks the rot. Context decay is the hardest to fully solve because it's a constraint of current architectures. No audit trail has the highest tractability—structured logging is straightforward—which is why it belongs at the top despite being less dramatic.

Tier 2 — Silent Corruption of Outputs

Goal Substitution · Invisible Assumptions · Phantom Grounding · Completeness Gaps

These failures corrupt individual outputs in ways that are difficult to detect. Unlike Tier 1, where the problem is propagation through the pipeline, Tier 2 problems originate at the source—a sub-agent that answered the wrong question, resolved ambiguity silently, or cited evidence that doesn't exist. The output reads well, which is exactly the problem.

Priority rationale: High frequency × medium-to-high severity × very low detectability. Goal substitution and invisible assumptions are the hardest to catch in review because the output looks thoughtful and well-reasoned. Phantom grounding is dangerous specifically in agentic contexts where actions have real consequences. Completeness gaps are the most tractable item in this tier.

Tier 3 — Degradation Over Time and Under Complexity

Behavioral Drift · Premature Convergence · Recovery Failure · Role Confusion · Scope Creep · Coordination Divergence · Tool Model Mismatch

Real and costly, but more detectable and more addressable than Tiers 1 and 2. Behavioral drift is partially mitigated by re-anchoring to system instructions. Premature convergence is catchable by a human reviewer who knows what "done" should look like. Recovery failure is addressable with retry logic and error-handling scaffolding. Role confusion is the most architecturally solvable problem in the entire taxonomy—multi-agent designs with separated concerns largely eliminate it. Scope creep—the inverse of premature convergence—is detectable through state diffs and scope boundary enforcement. Coordination divergence emerges when parallel sub-agents make incompatible assumptions, and is addressable through explicit reconciliation steps. Tool model mismatch is hardest to prevent proactively but catchable through irreversibility gates and the existing implementer-then-verifier pattern.

Priority rationale: Medium frequency × medium severity × medium detectability. These failures tend to produce output that a careful human reviewer can catch, unlike Tiers 1 and 2. Scope creep and tool model mismatch are notable because they escalate in severity specifically in implementation workflows where the agent has write access to the codebase.

Tier 4 — Important but More Tractable

State Confusion · Overconfidence Uniformity

State confusion is serious in tool-use and code-generation workflows but addressable with explicit state tracking and environment snapshots. Overconfidence uniformity matters for trust calibration but is more of a UX and communication problem than a correctness problem—addressable by requiring agents to flag confidence levels on individual findings.

Priority rationale: Medium frequency × lower severity × higher tractability. Important, but these won't silently corrupt your workflow the way Tiers 1 and 2 will.

4LLM-General vs. Agentic-Specific Failures

Not all sixteen failure modes are created equal in origin. Some are fundamental LLM limitations that exist in any interaction with a language model—the agentic context merely amplifies their consequences. Others emerge specifically from multi-step execution, tool use, or orchestration architecture and have no meaningful analog in single-turn LLM usage. This distinction matters for practitioners because it determines where mitigation effort should go: prompt engineering and model selection for LLM-general modes, versus architectural and orchestration design for agentic-specific modes.

LLM-General — Amplified by Agentic Context

These failure modes exist whenever you interact with a language model. In a chatbot, they produce wrong answers. In an agentic coding workflow, they produce wrong answers that get implemented as code changes. The mechanism is the same; the consequences are categorically different.

Failure ModeLLM ManifestationAgentic Amplification
Phantom GroundingHallucination — the model invents facts, citations, or code in a single responseThe agent takes real, irreversible actions based on fabricated premises. A phantom finding leads to a code change based on a file that doesn't exist or a behavior that isn't present.
Goal SubstitutionThe model answers an easier question than the one asked — well-documented in single-turn interactionsA multi-step workflow compounds the substitution. Each downstream step builds on the wrong objective, making the deviation harder to detect and more expensive to fix.
Invisible AssumptionsThe model silently resolves ambiguity in every response without surfacing the choice pointAssumptions compound across sub-agents. Each agent inherits upstream assumptions as facts, and the cumulative bundle can send the entire workflow in a wrong direction.
Overconfidence UniformityModels present all outputs with identical confidence regardless of actual certaintyDegrades the human reviewer's ability to triage — a speculative finding and a verified finding look the same, making review less efficient at scale.
Premature ConvergenceSatisficing over optimizing — the model produces the first adequate answer rather than the best oneThe auditor marks items complete based on the existence of a finding, not its quality. One endpoint checked out of forty satisfies the binary check.
Behavioral DriftSycophancy within a single conversation — the model becomes more agreeable over turnsThe orchestrator progressively softens quality requirements across many sub-agent invocations, accepting marginal output it would have rejected earlier.
Scope CreepThe model writes a 2000-word answer to a question that needed 200 words, or adds unsolicited suggestionsAn implementer modifies files outside its mandate. A scanner explores tangential findings that burn tokens and time. In shared codebases, this can break things that were working.

Agentic-Specific — Emergent from Architecture

These failure modes have no meaningful analog in single-turn LLM usage. They emerge from the structural properties of multi-step workflows: pipelines where outputs feed inputs, tool use with real-world side effects, parallel execution without coordination, and long-running contexts that exceed effective capacity.

Failure ModeWhy It's Agentic-SpecificArchitectural Driver
Error CompoundingRequires multi-step pipelines where upstream outputs become downstream inputs. A single LLM response can be wrong, but it can't compound.Phase-based workflows; handoff compression between agents strips uncertainty signals
Context DecayRequires conversations long enough to exceed effective context window. Single-turn interactions don't decay.Orchestrator running across many sub-agent invocations; progressive loss of instructions
State ConfusionRequires tool use and environment interaction. A bare LLM has no environment to be confused about.Multi-step implementation workflows; rollbacks, partial application, concurrent modifications
Recovery FailureRequires error conditions in tool execution. A bare LLM doesn't encounter tool failures.Orchestration with external tools; no structured fallback protocol
Role ConfusionRequires multi-agent architectures with role boundaries. A single LLM has no roles to confuse.Orchestrator doing sub-agent work; blurred boundaries under pressure
No Audit TrailMeaningless outside a multi-step workflow. A single response doesn't need an audit trail.Multi-phase execution without structured logging
Completeness GapsTechnically possible in single-turn but primarily a workflow coverage problem across multiple invocations.Checklist-driven scanning where categories can be silently skipped
Coordination DivergenceRequires multiple agents operating on the same domain in parallel. Single agents can't diverge from themselves.Parallel sub-agent invocation; pipeline architectures without reconciliation
Tool Model MismatchThe underlying competence gap (misunderstanding how a tool works) is an LLM limitation, but the consequences materialize only when the agent can execute the wrong model.Agent tool access; irreversible operations without rollback understanding
The Practical Takeaway

The LLM-general failure modes become categorically more dangerous in agentic contexts not because they're more frequent but because the agent has tools, persistence, and real-world side effects. Phantom grounding in a chatbot produces a wrong answer you can ignore. Phantom grounding in a coding agent produces a code change based on a file that doesn't exist. Same mechanism, vastly different consequences. For the LLM-general modes, the most effective mitigations are structural (checkpoint verification, provenance chains, goal-fidelity checks) rather than prompt-based—because you can't prompt-engineer away a fundamental model limitation, but you can build architecture that catches its effects. For the agentic-specific modes, the mitigations are architectural by necessity—they address failure patterns that exist in the orchestration, not in the model.

5Series Roadmap: Where Each Failure Mode Gets Addressed

Each of the sixteen failure modes is addressed somewhere in the series—some by the five-principle architecture established in Parts 1–5, others by new structural additions in Parts 7+, and a few by the human review process in the Final article. The table below maps each failure mode to where its primary mitigation lives.

Failure ModeTierAddressed InHow
TIER 1 — STRUCTURAL RISKS
Error Compounding1Parts 7+Checkpoint-verifier sub-agent, provenance chains, re-grounding gates
Context Decay1Parts 1–5Principle 1: externalize state to files; sub-agent isolation limits per-invocation context
No Audit Trail1Parts 1–5Principle 4 + file pipeline (checklist → plan → findings → remediation-log)
TIER 2 — SILENT CORRUPTION
Goal Substitution2Parts 7+Goal-fidelity checks in validator and auditor; objective capture in plan.md
Invisible Assumptions2Parts 7+assumptions.md file; ambiguity-logging rules in scanner prompts
Phantom Grounding2Parts 7+Verbatim evidence requirements; file-existence checks; checkpoint-verifier
Completeness Gaps2Parts 1–5Principles 2, 3, 5: checklist → validator → auditor triple layer
TIER 3 — DEGRADATION OVER TIME
Behavioral Drift3Parts 1–5 + Parts 7+Sub-agent isolation (existing) + orchestrator re-anchoring protocol (new)
Premature Convergence3Parts 7+Definition-of-done patterns; completion quality checks; auto-completion loop
Recovery Failure3Parts 7+Error recovery scaffolding: structured retry → alternative → log-and-continue
Role Confusion3Parts 1–5Seven archetypes with enforced tool restrictions
Scope Creep3Parts 1–5 + Parts 7+State diffs (existing, partial) + scope boundary enforcement + scope verification (new)
Coordination Divergence3Parts 1–5 + Parts 7+Serial orchestration (existing, partial) + consistency reconciliation (new)
Tool Model Mismatch3Parts 1–5 + Parts 7+Implementer-then-verifier (existing, partial) + irreversibility gates (new)
TIER 4 — TRACTABLE BUT IMPORTANT
State Confusion4Parts 7+State snapshots before/after implementer actions; state-diff verification
Overconfidence Uniformity4Parts 7+Confidence field in findings template; calibration rules in scanner prompts
The Pattern

The five-principle architecture from Parts 1–5 fully addresses five failure modes and partially covers three more. The gaps cluster in the modes that produce plausible-looking output (Tier 2) and the modes that degrade silently over time (Tier 3). The architecture has strong completeness controls but weak correctness and degradation controls. Parts 7+ address that asymmetry with targeted structural additions—no existing components are removed or restructured.

6Remediation Preview

The detailed implementation of each remediation lives in the Parts 7+ articles. The table below previews every structural addition—what it is, what type of change it requires, and what failure mode it addresses. The total footprint is deliberately small: one new sub-agent, one new file, one new workflow phase, orchestration enhancements, and prompt additions to existing sub-agent types.

RemediationTypeAddressesOne-Sentence Description
TIER 1 — STRUCTURAL RISKS
Checkpoint-verifierNew Sub-AgentError compounding, phantom groundingSpot-checks upstream findings against the actual codebase between workflow phases; flags mismatches before they propagate.
Provenance fieldsTemplate ChangeError compounding, no audit trailAdds source, checklist-item trace, and verification-status fields to every finding so corruption is auditable.
Re-grounding gatesOrchestrationContext decay, goal substitutionRe-injects the original task context at each phase boundary to prevent progressive drift from the objective.
TIER 2 — SILENT CORRUPTION
Objective captureOrchestrationGoal substitutionRecords the exact original objective as the first line of plan.md so every sub-agent can reference it.
Validator rule zeroPrompt AdditionGoal substitutionRequires the validator to verify that checklist items serve the stated objective before checking for gaps.
Goal-fidelity assessmentPrompt AdditionGoal substitutionAdds an auditor section that classifies each finding as RELEVANT, TANGENTIAL, or SUBSTITUTE relative to the objective.
assumptions.mdNew FileInvisible assumptionsA per-review file where every interpretive decision gets logged with the ambiguity, resolution, alternative, and impact.
Ambiguity loggingPrompt AdditionInvisible assumptionsRequires scanner sub-agents to log every silent ambiguity resolution in assumptions.md rather than resolving silently.
Assumptions reviewPrompt AdditionInvisible assumptionsAuditor reviews high-impact assumptions and flags those that affect implementation-queue findings for human review.
Verbatim evidencePrompt AdditionPhantom groundingRequires scanners to include exact code quotes rather than paraphrased descriptions, raising the bar for fabrication.
File existence checkOrchestrationPhantom groundingVerifies that all cited files actually exist before passing scanner results to the writer, eliminating the cheapest fabrications.
First-principles checkPrompt AdditionCompleteness gapsValidator ignores the checklist and asks "what could go wrong?" from first principles, creating a second cognitive path.
TIER 3 — DEGRADATION OVER TIME
Re-anchoring protocolOrchestrationBehavioral driftPeriodic self-check where the orchestrator re-reads instructions and checks whether it has softened any requirements.
Completion criteriaTemplate ChangePremature convergenceAdds explicit, testable done-criteria to each checklist item so the auditor checks quality, not just presence.
Completion quality checkPrompt AdditionPremature convergenceAuditor classifies items as COMPLETE, PARTIAL, SHALLOW, or MISSING rather than binary complete/incomplete.
Gap classificationPrompt AdditionCompleteness gaps, premature convergenceAuditor classifies each gap as SKIPPED, BLOCKED, or EMPTY to enable targeted auto-remediation.
Auto-completion loopOrchestrationCompleteness gapsSingle-pass re-invocation of scanners for SKIPPED items only, with verification, after the auditor identifies gaps.
Error recovery scaffoldingOrchestrationRecovery failureStructured protocol: retry → alternative approach → log as BLOCKED and continue. Prevents both loops and silent skips.
Role boundary rulesPrompt AdditionRole confusionExplicit rules preventing the orchestrator from doing sub-agent work directly under time or context pressure.
Scope boundary enforcementPrompt AdditionScope creepImplementers must verify each change is within authorized file and line range; scanners must defer out-of-scope observations.
Scope verification checkOrchestrationScope creepOrchestrator verifies after each implementer invocation that modified files match the authorized set.
Consistency reconciliationOrchestrationCoordination divergenceAfter all scanners complete, compares assumptions across outputs and flags conflicts for review.
Irreversibility gatePrompt AdditionTool model mismatchRequires the implementer to articulate what a destructive operation does, whether it's reversible, and the rollback procedure.
Destructive operation loggingOrchestrationTool model mismatchOrchestrator logs all destructive operations with pre-state and the implementer's stated understanding for audit.
TIER 4 — TRACTABLE BUT IMPORTANT
State snapshotsOrchestrationState confusionCaptures file content before and after each implementer action so the verifier can diff intended vs. actual changes.
State-diff verificationPrompt AdditionState confusionVerifier compares before- and after-snapshots to confirm intended changes and flag unintended ones.
State section in remediation logTemplate ChangeState confusionAdds a structured state-tracking section to remediation-log.md for post-hoc audit of environment changes.
Confidence fieldTemplate ChangeOverconfidence uniformityAdds HIGH/MEDIUM/LOW/UNCERTAIN with mandatory justification to the findings template.
Confidence calibration rulesPrompt AdditionOverconfidence uniformitySets calibration defaults (most findings should be MEDIUM) and penalizes blanket HIGH ratings.

7Reading This Series

The series is designed to be read in order for the complete picture, but each article group is self-contained enough to serve as a reference for its specific topic. The table below maps each article group to what it covers and which failure modes it addresses.

Article GroupWhat It CoversFailure Modes Addressed
Introduction (this article)Complete taxonomy, priority tiers, LLM-general vs. agentic-specific classification, series roadmap, remediation previewAll 16 (identification and classification)
Parts 1–5: Security ReviewFive structural principles, seven sub-agent archetypes, orchestration patterns, domain generalizationContext decay, completeness gaps, no audit trail, behavioral drift, role confusion (fully); scope creep, coordination divergence, tool model mismatch (partially)
Part 6: Beyond SecurityDomain-independent pattern extraction, adaptations for testing, API review, migration review, performance auditingSame coverage as Parts 1–5, applied across domains
Parts 7+: Advanced MitigationsCheckpoint-verifier, provenance chains, goal-fidelity checks, assumptions logging, evidence verification, error recovery, scope enforcement, consistency reconciliation, irreversibility gates, confidence calibrationError compounding, goal substitution, invisible assumptions, phantom grounding, premature convergence, recovery failure, scope creep, coordination divergence, tool model mismatch, state confusion, overconfidence uniformity
Final: Human ReviewStructured human review process calibrated to the four priority tiersAll 16 (detection and disposition by the human reviewer)
Where to Start

If you're new to agentic workflows, start with Parts 1–5 to understand the five structural principles and seven sub-agent archetypes, then return here to see where the remaining gaps are. If you're already running agentic workflows and experiencing failures, use the taxonomy above to identify which failure modes you're hitting, then jump directly to the relevant section in Parts 7+ for the remediation. If you're evaluating whether to adopt agentic tooling, this introduction gives you the full landscape of what can go wrong and what the structural defenses look like—read this article end to end.

8Frequently Asked Questions

What's the difference between LLM-general and agentic-specific failure modes?

LLM-general failure modes (phantom grounding, goal substitution, invisible assumptions, overconfidence uniformity, premature convergence, behavioral drift, scope creep) exist in any LLM interaction—the agentic context amplifies their consequences because the agent has tools and can take real-world actions. Agentic-specific modes (error compounding, context decay, state confusion, recovery failure, role confusion, no audit trail, completeness gaps, coordination divergence, tool model mismatch) emerge from multi-step execution architecture. The distinction matters for mitigation: LLM-general modes need structural checks that catch inherent model limitations, while agentic-specific modes require architectural solutions in the orchestration layer.

What is automation complacency and how does the architecture address it?

Automation complacency is the human tendency to rubber-stamp agent output after repeated positive experiences. It's not an agent failure mode per se, but it directly determines whether the silent corruption modes (Tiers 1 and 2) get caught. The architecture addresses it structurally: the checkpoint-verifier provides systematic review independent of human attention, provenance chains make evidential basis traceable, confidence calibration directs limited attention to where it matters, and the auditor's goal-fidelity assessment catches systematic drift that a fatigued reviewer might miss. These structural defenses exist precisely because human vigilance degrades over time.

What is goal substitution in agentic AI workflows?

Goal substitution occurs when the agent quietly replaces the goal it was given with a nearby easier one. Asked to "identify the optimal architecture," it describes a familiar one. Asked to "analyze risks," it describes the system. The output reads well, has the right structure, and uses the right vocabulary, which makes this failure particularly insidious—it passes casual review. The remediation is adding a "rule zero" goal-fidelity check to the validator sub-agent and a goal-fidelity assessment section to the auditor sub-agent.

What makes error compounding the most dangerous agentic AI failure mode?

Error compounding is multiplicative rather than additive. A small mistake in an early step—a misidentified code pattern, a fabricated finding, or an incorrect assumption—becomes the foundation for every subsequent step. The downstream sub-agents build confidently on the error, and the agent's consistent tone masks the corruption. By the time the problem surfaces, the entire output is structurally unsound and partial remediation isn't possible. A key mechanism is handoff information loss: every time context passes between agents, compression strips uncertainty signals and caveats. The receiving agent gets conclusions without qualifications—and treats them as ground truth.

How does scope creep differ from goal substitution?

Goal substitution is the agent answering a different, easier question than the one asked. Scope creep is the agent answering the right question plus a bunch of questions nobody asked—refactoring files it wasn't asked to touch, adding features beyond the specification, or exploring tangential findings. The mitigations are different: goal substitution is caught by the validator's rule-zero fidelity check, while scope creep is prevented by scope boundary enforcement in the implementer and scanner prompts and caught after the fact by state-diff verification. Scope creep is particularly dangerous in implementation workflows where every unsolicited modification is an unreviewed change in production code.

How does coordination divergence differ from invisible assumptions?

Invisible assumptions is about a single agent resolving ambiguity without surfacing the choice point. Coordination divergence is about two agents independently resolving the same ambiguity in incompatible ways—each may correctly log its assumption, but no mechanism detects that the assumptions conflict. The consistency reconciliation step compares assumptions across scanners after they all complete. In the current serial architecture with a central orchestrator, this is a moderate risk. In pipeline architectures without reconciliation, it escalates to Tier 2 severity.

What failure modes does the existing five-principle architecture already address?

The existing architecture effectively addresses five of the sixteen failure modes. Context decay is handled by externalizing state to files (Principle 1). Completeness gaps are covered by the checklist-validator-auditor triple layer (Principles 2, 3, and 5). Behavioral drift is structurally mitigated by isolating critical functions in dedicated sub-agents with hard constraints. Role confusion is solved by the seven archetypes with enforced tool restrictions. The audit trail is provided by the file pipeline from checklist through findings to remediation log (Principle 4). Three additional modes—scope creep, coordination divergence, and tool model mismatch—receive partial coverage from existing structural patterns.

What is the difference between completeness gaps and goal substitution?

Completeness gaps mean the right question was asked but some categories were missed—the review covered authentication but forgot rate limiting. Goal substitution means the wrong question was answered entirely—the review was asked to assess backward compatibility risks but instead cataloged endpoint patterns. A complete review can still be goal-substituted (every category checked, but the categories don't serve the objective), and an incomplete review can still be goal-faithful. The validator's "rule zero" catches goal substitution; the validator's gap-finding catches completeness issues. Both are needed.

What is phantom grounding and why is it worse than regular hallucination?

Phantom grounding—where the agent references information that doesn't exist—is addressed at three levels in the architecture. First, scanner sub-agents are required to include verbatim code quotes rather than just file paths and descriptions, which raises the bar for fabrication. Second, the orchestrator runs a file-existence check between scanner and writer phases, removing any findings that cite nonexistent files. Third, the checkpoint-verifier spot-checks a sample of findings against the actual codebase, catching cases where the file exists but the described behavior is wrong. In a chatbot, phantom grounding produces a wrong answer. In an agentic workflow, it produces a code change based on a file that doesn't exist—same mechanism, vastly different consequences.

What is tool model mismatch and why can't it be fully prevented?

Tool model mismatch occurs when the agent's understanding of how a tool works diverges from reality—using sed when an AST parser is needed, misinterpreting exit codes, or not understanding that a database migration is irreversible. It can't be fully prevented because no amount of prompt engineering can give the model knowledge it doesn't have. The irreversibility gate changes the failure mode: instead of silent misuse, the agent either explains its understanding correctly (giving the verifier something to check), reveals its misunderstanding (giving the human a signal to intervene), or correctly identifies uncertainty and escalates.