The AI Agent Security Checklist
12 controls every team running production AI agents must own.
Every control below maps to a concrete OPA policy you can copy. If you can't answer "yes" to all 12, your agents have ungoverned blast radius.
1. Action governance — explicit allow/deny per tool
Every tool an agent can call is enumerated. Unknown tools default to deny. Wildcard tool grants (e.g., tool: "*") only exist for explicitly approved internal services.
OPA pattern: allowed_tools := {"jira:create_issue", "stripe:list_customers"} with default deny.
2. Operation-scoped permissions
Within each tool, dangerous operations (delete_*, transfer_*, update_billing) require approval; read operations (list_*, get_*) can auto-allow.
OPA pattern: require_approval { input.operation in {"delete_customer", "refund"} }
3. Per-agent service grants
Different agents have different reachability. The customer-support agent should not reach the production database. Track grants in agent_service_grants with open (default-allow) or strict (default-deny) modes.
4. Human-in-the-loop approvals for high-risk actions
Approvals are required for: any write to production data stores, any financial transaction over $X, any action affecting more than N records, any cross-system data movement.
Channels: Slack, Teams, email — with full action context, not just an approve/deny button.
5. Tamper-evident audit trail
Every action produces an audit event with prev_hash → hash chaining. Auditors can verify the chain is unbroken from genesis to today. No event is mutable.
6. DLP scanning of inputs and outputs
Every action input and output is scanned for: SSN, credit card, API keys, email addresses, internal IDs. Configurable per-org patterns. Block, mask, or alert based on classification.
7. SSRF protection on outbound HTTP
The HTTP proxy blocks: private IP ranges (10/8, 172.16/12, 192.168/16), metadata IPs (169.254.x.x, 100.100.x.x), localhost and loopback, DNS rebinding attempts. Whitelist-only for external webhooks.
8. Rate limiting and blast-radius caps
Per-org caps on actions per minute. Hard maximums regardless of plan tier (e.g., 200 actions/min). Exponential backoff on auth failures (5 failed attempts → progressive lockout).
9. Anomaly detection on agent behavior
Each agent has a behavioral baseline (call patterns, target tools, parameter shapes). Deviations trigger an alert: unusual time of day, unusual tool sequence, parameter values outside baseline ranges.
10. Prompt-injection and content guardrails
Inputs are scanned by signature patterns (known jailbreaks), heuristic analysis (suspicious instructions to ignore prior context), and statistical classifiers. Confidence ≥ 0.90 triggers a block.
11. Policy-as-code with a review trail
Policies are stored in version control. Changes go through PR review. Production policies have a deployment log answering: who changed what, when, with what justification.
12. Compliance-ready proof packs
On demand, you can export a ZIP containing: relevant action records, the policy in effect, every approval, every audit event, and a hash-chain integrity proof. Auditors can verify it offline.
Where AAF fits
Agent Action Firewall is a drop-in implementation of all 12 controls. The free tier covers controls 1–5 with up to 100 actions/month. Paid tiers add DLP scanning, anomaly detection, and exportable proof packs.
Next step: Run AAF in Shadow Mode for 30 days. We'll send you a "would-have-blocked" digest showing every action that violated policy without enforcing it. Most teams find 4–10 high-risk actions per week they didn't know were happening.
→ Start at https://agentactionfirewall.com
Agent Action Firewall · agentactionfirewall.com
Get the checklist + bi-weekly governance issues
Drop your email and we’ll send the printable version, plus the Agents Under Control newsletter every other Tuesday.