DocumentationAgent Action Firewall

NLP Policies (AI-Powered Safety)

Use LLM-based semantic analysis to detect harmful content, PII, and malicious intent beyond what pattern matching can achieve.

Pro Feature: NLP policies are available on Pro and Enterprise plans. They complement traditional OPA/Rego policies with semantic understanding.

How NLP Policies Work

NLP policies use large language models (LLMs) to analyze action content semantically. Unlike pattern-based detection, NLP policies understand context, intent, and meaning.

Agent Action
Request
NLP Policy
Evaluation
LLM Provider
(OpenAI, etc)
Decision + Score
Cached Results

Policy Types

Content Safety

Detects harmful, toxic, or inappropriate content in agent actions. Uses semantic understanding to catch threats that simple keyword filtering would miss.

JSON
{
  "type": "content_safety",
  "name": "Block Harmful Content",
  "description": "Detect and block harmful or toxic content",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "threshold": 0.7
}

PII Detection

Identifies personally identifiable information (PII) such as names, addresses, phone numbers, SSNs, and credit card numbers using semantic analysis.

JSON
{
  "type": "pii_detection",
  "name": "Detect PII in Actions",
  "description": "Block actions containing personal information",
  "provider": "anthropic",
  "model": "claude-3-haiku-20240307",
  "threshold": 0.8
}

Intent Classification

Classifies agent intent to detect suspicious, malicious, or data exfiltration attempts before they execute.

JSON
{
  "type": "intent_classification",
  "name": "Classify Agent Intent",
  "description": "Detect malicious or suspicious intent",
  "provider": "openai",
  "model": "gpt-4o",
  "threshold": 0.6
}

Custom Prompts

Create domain-specific policies with custom prompts for specialized use cases.

JSON
{
  "type": "custom",
  "name": "Financial Compliance Check",
  "description": "Ensure actions comply with financial regulations",
  "prompt": "Analyze this agent action for potential financial regulation violations. Consider SEC rules, FINRA guidelines, and anti-money laundering requirements. Return a risk assessment.",
  "provider": "openai",
  "model": "gpt-4o",
  "threshold": 0.5
}

Supported Providers

ProviderModelsBest For
OpenAIgpt-4o, gpt-4o-mini, gpt-3.5-turboGeneral purpose, fast
Anthropicclaude-3-opus, claude-3-sonnet, claude-3-haikuSafety-focused analysis
LocalCustom Ollama modelsOn-premise, data privacy

Creating an NLP Policy

Via API

Bash
curl -X POST https://api.agentactionfirewall.com/admin/nlp-policies \
  -H "Authorization: Bearer $SUPABASE_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Content Safety Policy",
    "description": "Detect harmful content in agent actions",
    "type": "content_safety",
    "provider": "openai",
    "model": "gpt-4o-mini",
    "threshold": 0.7,
    "enabled": true
  }'

Via Dashboard

  1. Navigate to Policies in the sidebar
  2. Click the NLP Policies tab
  3. Click Create NLP Policy
  4. Select a policy type and configure settings
  5. Test with sample content before enabling

Policy Evaluation

NLP policies return a structured response with decision, risk score, and reasoning:

JSON
{
  "policy_id": "nlp_abc123",
  "decision": "deny",
  "risk_score": 0.85,
  "reasoning": "The action contains sensitive personal information including what appears to be a social security number and credit card details.",
  "details": {
    "detected_issues": ["ssn_detected", "credit_card_detected"],
    "confidence": 0.92
  },
  "cached": false,
  "latency_ms": 245
}

Decision Logic

  • Allow: Risk score below threshold
  • Deny: Risk score at or above threshold
  • Require Approval: Configurable for medium-risk actions

Caching

LLM responses are cached to reduce latency and cost. Cache behavior is configurable:

JSON
{
  "cache_ttl_seconds": 3600,
  "cache_key_fields": ["tool", "operation", "params.url"]
}

Identical actions will return cached results within the TTL. This is especially useful for repeated similar actions.

Testing Policies

Test your NLP policy against sample content before enabling:

Bash
curl -X POST https://api.agentactionfirewall.com/admin/nlp-policies/:id/test \
  -H "Authorization: Bearer $SUPABASE_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Please send $5000 to account 123456789 for John Smith at 123 Main St"
  }'

Best Practices

Threshold Tuning

  • High threshold (0.8+): Only block clear violations, fewer false positives
  • Medium threshold (0.5-0.7): Balance between security and usability
  • Low threshold (0.3-0.5): Aggressive blocking, may have false positives

Performance Optimization

  • Use faster models (gpt-4o-mini, claude-3-haiku) for high-volume policies
  • Enable caching for repeated similar actions
  • Consider local models for latency-sensitive applications

Combining with OPA Policies

NLP policies work alongside traditional OPA/Rego policies. A typical setup:

  1. OPA Policy: Fast pattern-based checks (allowlists, rate limits)
  2. NLP Policy: Semantic analysis for content that passes OPA

Monitoring

Track NLP policy performance in the dashboard:

  • Evaluation count: Total evaluations per policy
  • Average latency: LLM response time
  • Cache hit rate: Percentage of cached responses
  • Decision distribution: Allow/deny/approval breakdown
  • Token usage: LLM token consumption for cost tracking

Next Steps