1The Security Review Challenge
Modern applications face a paradox: codebases are larger and more complex than ever, security threats are more sophisticated, and development cycles are faster. Manual security review doesn't scale. But automated tools miss critical vulnerability classes.
The solution isn't choosing between automation and expertise—it's layering them strategically. This guide examines three complementary approaches to security review and shows how they work together.
The series Introduction catalogs sixteen failure modes that affect agentic AI workflows for software development. The three-layer security model presented here is a direct response to those failure modes: SAST catches pattern-based issues that an AI agent might hallucinate past (phantom grounding), AI-orchestrated review addresses the silent corruption modes—goal substitution, invisible assumptions, completeness gaps—that pattern matching fundamentally cannot detect, and human oversight provides the accountability layer that counters automation complacency, the tendency to rubber-stamp agent output after repeated positive experiences.
No single approach catches everything. SAST tools miss business logic flaws. AI review may miss edge cases in complex control flow. Human reviewers can't examine every line of code. The goal isn't finding a silver bullet—it's building a stack where each layer catches what the others miss.
2The Three Layers
SAST tools analyze source code without executing it, searching for patterns that match known vulnerability signatures. They're fast, deterministic, and integrate seamlessly into CI/CD pipelines.
Large language models analyze code with contextual understanding, tracing data flows across architectural boundaries and reasoning about authorization logic. They produce human-readable explanations of vulnerabilities.
Security experts provide judgment on risk acceptance, validate automated findings, approve remediation approaches, and handle novel vulnerability classes that tools haven't encountered.
3SAST: What It Does Well
Static Application Security Testing tools—Semgrep, SonarQube, CodeQL, Checkmarx, Fortify, Snyk Code—have been the backbone of automated security for over a decade. They excel in specific scenarios:
Pattern Detection at Scale
SAST tools maintain extensive rule libraries for known vulnerability patterns:
# Semgrep rule example: SQL injection detection
rules:
- id: sql-injection
patterns:
- pattern: execute($QUERY)
- pattern-not: execute($QUERY, $PARAMS)
message: "Potential SQL injection: use parameterized queries"
severity: ERROR
These rules catch obvious mistakes instantly. A developer writes execute(f"SELECT * FROM users WHERE id = {user_id}"), and the tool flags it before the code leaves their machine.
CI/CD Integration
SAST tools produce deterministic results—same code, same findings, every time. This makes them ideal for automated gates:
# GitHub Actions example
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Semgrep
uses: returntocorp/semgrep-action@v1
with:
config: p/security-audit
- name: Block on high severity
if: steps.semgrep.outputs.findings != ''
run: exit 1
Where SAST Falls Short
SAST tools fundamentally operate through pattern matching. They recognize "code that looks dangerous" but cannot assess "code that behaves dangerously in context."
Context Blindness
# SAST flags this as SQL injection
table_name = get_table_from_allowlist(request.table) # Returns "users" or "orders"
query = f"SELECT * FROM {table_name} WHERE id = ?"
cursor.execute(query, [user_id])
# But it's actually SAFE because:
# 1. table_name comes from a hardcoded allowlist
# 2. The WHERE clause uses parameterized query
# SAST can't trace the allowlist constraint
Authorization Logic
# SAST sees nothing wrong here
@app.get("/api/documents/{doc_id}")
def get_document(doc_id: int, user: User = Depends(get_current_user)):
return db.query(Document).filter(Document.id == doc_id).first()
# But there's a critical flaw:
# No check that user is authorized to access this specific document
# This is an IDOR vulnerability—SAST cannot detect it
Cross-Boundary Data Flow
When user input travels through message queues, microservice calls, or database round-trips, SAST tools lose track. A malicious payload stored in a database and retrieved by another service won't be flagged as tainted.
4AI Review: Filling the Gaps
AI-orchestrated security review uses large language models to analyze code with contextual understanding. Rather than matching patterns, the AI reasons about what code does and whether that behavior is secure.
Contextual Understanding
Where SAST sees a dangerous pattern, AI review sees the full context:
# AI Review Analysis:
#
# Examining: query = f"SELECT * FROM {table_name} WHERE id = ?"
#
# Tracing table_name backward:
# - Line 12: table_name = get_table_from_allowlist(request.table)
# - get_table_from_allowlist() returns only "users" or "orders" (line 45-48)
# - User-controlled input (request.table) is validated against allowlist
# - SQL injection not possible for table name
#
# Checking parameterization:
# - WHERE clause uses ? placeholder with separate params
# - user_id passed as parameter, not interpolated
#
# Finding: FALSE POSITIVE - table name constrained by allowlist,
# value parameterized. No SQL injection vulnerability.
Business Logic Analysis
AI review can detect vulnerabilities that require understanding intent:
# AI Review Analysis:
#
# Examining: GET /api/documents/{doc_id}
#
# Authorization check present: ✓ (get_current_user dependency)
# User authenticated: ✓
#
# Access control check: ✗ MISSING
# - Function retrieves document by ID only
# - No verification that requesting user owns/can access document
# - Any authenticated user can access any document by ID
#
# Finding: HIGH - Insecure Direct Object Reference (IDOR)
# Attack scenario: Authenticated user changes doc_id parameter
# to access other users' documents.
#
# Recommended fix: Add ownership check
# document = db.query(Document).filter(
# Document.id == doc_id,
# Document.owner_id == user.id # ADD THIS
# ).first()
Adaptive Methodology
AI review generates checklists tailored to the specific technology stack and threat model. A review of a multi-tenant SaaS application handling healthcare data will include different checks than a single-tenant internal tool:
# Generated checklist items for multi-tenant healthcare SaaS:
TENANT-01: Verify tenant ID in JWT claims validated on every request
TENANT-02: Check all database queries include tenant filter
TENANT-03: Review admin cross-tenant access controls
PHI-01: Verify PHI fields excluded from logs
PHI-02: Check encryption at rest for PHI columns
PHI-03: Review PHI access audit trail implementation
...
Where AI Review Has Limitations
- Speed — AI analysis is slower than pattern matching; not suitable for every commit in high-velocity repos
- Cost — LLM inference costs more than running static rules
- Consistency — Results may vary slightly between runs (though structured workflows mitigate this)
- Complex Control Flow — Deeply nested conditionals or unusual patterns may confuse analysis
- Dependencies — Cannot scan vulnerability databases for known CVEs (that's SCA tooling)
5Comparison Matrix
This matrix shows which approach is strongest for each vulnerability class:
| Vulnerability Class | SAST | AI Review | Recommended |
|---|---|---|---|
| SQL Injection (obvious) | ✓ Strong | ✓ Strong | SAST first (faster) |
| SQL Injection (subtle/contextual) | ~ Limited | ✓ Strong | AI primary |
| XSS (reflected) | ✓ Strong | ✓ Strong | SAST first |
| XSS (stored, cross-service) | ✗ Loses track | ✓ Traces flow | AI primary |
| Broken Authorization / IDOR | ✗ Cannot detect | ✓ Core strength | AI only |
| Business Logic Bypass | ✗ Cannot detect | ✓ Reasons about intent | AI only |
| Tenant Isolation Flaws | ✗ Cannot detect | ✓ Checks patterns | AI only |
| Hardcoded Secrets | ✓ Strong | ✓ Strong | SAST first (faster) |
| Dependency CVEs | ✓ SCA tools | ✗ Not designed for this | SAST/SCA only |
| Cryptographic Weakness | ✓ Pattern matching | ✓ Context evaluation | Both |
| Race Conditions | ~ Limited | ~ Can reason but may miss | Both + dynamic testing |
| Input Validation Gaps | ~ Pattern-based | ✓ Traces to usage | AI primary |
SAST excels when the vulnerable code looks dangerous (dangerous function calls, missing sanitization at the call site). AI excels when the vulnerability is about missing code (no authorization check, no tenant filter) or requires understanding what the code should do versus what it does.
6The Integrated Workflow
The optimal approach layers these tools strategically, using each where it's strongest:
Cost-Benefit Optimization
| Layer | When to Run | Cost | Value |
|---|---|---|---|
| SAST | Every commit, every PR | Low (automated) | Catches 60-70% of pattern-based vulns |
| AI Triage | When SAST has findings | Medium | Reduces false positive noise by 50-80% |
| AI Deep Review | Major releases, sensitive changes | Higher | Finds logic flaws SAST misses entirely |
| Human Review | High-risk changes, final approval | Highest | Accountability, novel vulns, judgment |
7This Series: AI-Orchestrated Security Review
The series Introduction established a taxonomy of sixteen failure modes that affect agentic AI workflows and five structural principles that address them. The remaining parts of this series build the AI layer—structured, repeatable workflows for AI-assisted security review using Claude Code. Each part builds on the previous:
If you're new to this series, start with the Introduction for the full failure mode taxonomy and structural principles. If you already use SAST tools, continue to Part 2 to add AI review for the vulnerability classes SAST misses. If you're building security review from scratch, this article provides context for why the AI workflow is designed the way it is.
8Frequently Asked Questions
SAST uses pattern matching to find known vulnerability signatures—it's fast, deterministic, and excellent for CI/CD. AI review uses large language models to understand code context, trace data flows across boundaries, and reason about business logic. SAST finds "code that looks dangerous"; AI review finds "code that behaves dangerously in context."
No, they're complementary. SAST excels at fast, consistent detection of known patterns in CI/CD pipelines. AI excels at contextual analysis and logic flaws SAST cannot detect. The optimal approach uses SAST as an automated first-pass, then AI for depth and context, with human oversight for final judgment.
AI review detects: broken authorization and access control (IDOR), business logic bypass, tenant isolation failures, subtle injection where SAST loses context, cross-service data flow issues, and missing security controls. These require understanding intent, not just pattern matching.
SAST typically has 30-70% false positive rates because it lacks context. AI review has lower false positive rates because it understands when "dangerous" patterns are mitigated. However, AI may occasionally miss edge cases that SAST's exhaustive pattern matching would catch—hence using both.
Humans provide: accountability for security decisions, judgment on risk acceptance and business tradeoffs, verification that findings are actionable, approval gates before code modification, and expertise for novel vulnerability classes. Both SAST and AI can be wrong—humans catch those errors.
Layer them: (1) SAST runs in CI/CD as a fast gate, (2) AI triages SAST findings to filter false positives, (3) AI conducts deep analysis for logic flaws, (4) humans approve findings and approach, (5) AI implements fixes, (6) both verify fixes work. Each layer catches what others miss.