The AI Agent Security Checklist

12 controls every team running production AI agents must own.

Every control below maps to a concrete OPA policy you can copy. If you can't answer "yes" to all 12, your agents have ungoverned blast radius.

1. Action governance — explicit allow/deny per tool

Every tool an agent can call is enumerated. Unknown tools default to deny. Wildcard tool grants (e.g., tool: "*") only exist for explicitly approved internal services.

OPA pattern: allowed_tools := {"jira:create_issue", "stripe:list_customers"} with default deny.

2. Operation-scoped permissions

Within each tool, dangerous operations (delete_*, transfer_*, update_billing) require approval; read operations (list_*, get_*) can auto-allow.

OPA pattern: require_approval { input.operation in {"delete_customer", "refund"} }

3. Per-agent service grants

Different agents have different reachability. The customer-support agent should not reach the production database. Track grants in agent_service_grants with open (default-allow) or strict (default-deny) modes.

4. Human-in-the-loop approvals for high-risk actions

Approvals are required for: any write to production data stores, any financial transaction over $X, any action affecting more than N records, any cross-system data movement.

Channels: Slack, Teams, email — with full action context, not just an approve/deny button.

5. Tamper-evident audit trail

Every action produces an audit event with prev_hash → hash chaining. Auditors can verify the chain is unbroken from genesis to today. No event is mutable.

6. DLP scanning of inputs and outputs

Every action input and output is scanned for: SSN, credit card, API keys, email addresses, internal IDs. Configurable per-org patterns. Block, mask, or alert based on classification.

7. SSRF protection on outbound HTTP

The HTTP proxy blocks: private IP ranges (10/8, 172.16/12, 192.168/16), metadata IPs (169.254.x.x, 100.100.x.x), localhost and loopback, DNS rebinding attempts. Whitelist-only for external webhooks.

8. Rate limiting and blast-radius caps

Per-org caps on actions per minute. Hard maximums regardless of plan tier (e.g., 200 actions/min). Exponential backoff on auth failures (5 failed attempts → progressive lockout).

9. Anomaly detection on agent behavior

Each agent has a behavioral baseline (call patterns, target tools, parameter shapes). Deviations trigger an alert: unusual time of day, unusual tool sequence, parameter values outside baseline ranges.

10. Prompt-injection and content guardrails

Inputs are scanned by signature patterns (known jailbreaks), heuristic analysis (suspicious instructions to ignore prior context), and statistical classifiers. Confidence ≥ 0.90 triggers a block.

11. Policy-as-code with a review trail

Policies are stored in version control. Changes go through PR review. Production policies have a deployment log answering: who changed what, when, with what justification.

12. Compliance-ready proof packs

On demand, you can export a ZIP containing: relevant action records, the policy in effect, every approval, every audit event, and a hash-chain integrity proof. Auditors can verify it offline.

Where AAF fits

Agent Action Firewall is a drop-in implementation of all 12 controls. The free tier covers controls 1–5 with up to 100 actions/month. Paid tiers add DLP scanning, anomaly detection, and exportable proof packs.

Next step: Run AAF in Shadow Mode for 30 days. We'll send you a "would-have-blocked" digest showing every action that violated policy without enforcing it. Most teams find 4–10 high-risk actions per week they didn't know were happening.

→ Start at https://agentactionfirewall.com

Agent Action Firewall · agentactionfirewall.com