NLP Policies (AI-Powered Safety)

Use LLM-based semantic analysis to detect harmful content, PII, and malicious intent beyond what pattern matching can achieve.

Pro Feature: NLP policies are available on Pro and Enterprise plans. They complement traditional OPA/Rego policies with semantic understanding.

How NLP Policies Work

NLP policies use large language models (LLMs) to analyze action content semantically. Unlike pattern-based detection, NLP policies understand context, intent, and meaning.

Agent Action

Request

NLP Policy

Evaluation

LLM Provider

(OpenAI, etc)

Decision + Score

Cached Results

Policy Types

Content Safety

Detects harmful, toxic, or inappropriate content in agent actions. Uses semantic understanding to catch threats that simple keyword filtering would miss.

JSON

{
  "type": "content_safety",
  "name": "Block Harmful Content",
  "description": "Detect and block harmful or toxic content",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "threshold": 0.7
}

PII Detection

Identifies personally identifiable information (PII) such as names, addresses, phone numbers, SSNs, and credit card numbers using semantic analysis.

JSON

{
  "type": "pii_detection",
  "name": "Detect PII in Actions",
  "description": "Block actions containing personal information",
  "provider": "anthropic",
  "model": "claude-3-haiku-20240307",
  "threshold": 0.8
}

Intent Classification

Classifies agent intent to detect suspicious, malicious, or data exfiltration attempts before they execute.

JSON

{
  "type": "intent_classification",
  "name": "Classify Agent Intent",
  "description": "Detect malicious or suspicious intent",
  "provider": "openai",
  "model": "gpt-4o",
  "threshold": 0.6
}

Custom Prompts

Create domain-specific policies with custom prompts for specialized use cases.

JSON

{
  "type": "custom",
  "name": "Financial Compliance Check",
  "description": "Ensure actions comply with financial regulations",
  "prompt": "Analyze this agent action for potential financial regulation violations. Consider SEC rules, FINRA guidelines, and anti-money laundering requirements. Return a risk assessment.",
  "provider": "openai",
  "model": "gpt-4o",
  "threshold": 0.5
}

Supported Providers

Provider	Models	Best For
OpenAI	gpt-4o, gpt-4o-mini, gpt-3.5-turbo	General purpose, fast
Anthropic	claude-3-opus, claude-3-sonnet, claude-3-haiku	Safety-focused analysis
Local	Custom Ollama models	On-premise, data privacy

Creating an NLP Policy

Via API

Bash

curl -X POST https://api.agentactionfirewall.com/admin/nlp-policies \
  -H "Authorization: Bearer $SUPABASE_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Content Safety Policy",
    "description": "Detect harmful content in agent actions",
    "type": "content_safety",
    "provider": "openai",
    "model": "gpt-4o-mini",
    "threshold": 0.7,
    "enabled": true
  }'

Via Dashboard

Navigate to Policies in the sidebar
Click the NLP Policies tab
Click Create NLP Policy
Select a policy type and configure settings
Test with sample content before enabling

Policy Evaluation

NLP policies return a structured response with decision, risk score, and reasoning:

JSON

{
  "policy_id": "nlp_abc123",
  "decision": "deny",
  "risk_score": 0.85,
  "reasoning": "The action contains sensitive personal information including what appears to be a social security number and credit card details.",
  "details": {
    "detected_issues": ["ssn_detected", "credit_card_detected"],
    "confidence": 0.92
  },
  "cached": false,
  "latency_ms": 245
}

Decision Logic

Allow: Risk score below threshold
Deny: Risk score at or above threshold
Require Approval: Configurable for medium-risk actions

Caching

LLM responses are cached to reduce latency and cost. Cache behavior is configurable:

JSON

{
  "cache_ttl_seconds": 3600,
  "cache_key_fields": ["tool", "operation", "params.url"]
}

Identical actions will return cached results within the TTL. This is especially useful for repeated similar actions.

Testing Policies

Test your NLP policy against sample content before enabling:

Bash

curl -X POST https://api.agentactionfirewall.com/admin/nlp-policies/:id/test \
  -H "Authorization: Bearer $SUPABASE_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Please send $5000 to account 123456789 for John Smith at 123 Main St"
  }'

Best Practices

Threshold Tuning

High threshold (0.8+): Only block clear violations, fewer false positives
Medium threshold (0.5-0.7): Balance between security and usability
Low threshold (0.3-0.5): Aggressive blocking, may have false positives

Performance Optimization

Use faster models (gpt-4o-mini, claude-3-haiku) for high-volume policies
Enable caching for repeated similar actions
Consider local models for latency-sensitive applications

Combining with OPA Policies

NLP policies work alongside traditional OPA/Rego policies. A typical setup:

OPA Policy: Fast pattern-based checks (allowlists, rate limits)
NLP Policy: Semantic analysis for content that passes OPA

Monitoring

Track NLP policy performance in the dashboard:

Evaluation count: Total evaluations per policy
Average latency: LLM response time
Cache hit rate: Percentage of cached responses
Decision distribution: Allow/deny/approval breakdown
Token usage: LLM token consumption for cost tracking

NLP Policies (AI-Powered Safety)

How NLP Policies Work

Policy Types

Content Safety

PII Detection

Intent Classification

Custom Prompts

Supported Providers

Creating an NLP Policy

Via API

Via Dashboard

Policy Evaluation

Decision Logic

Caching

Testing Policies

Best Practices

Threshold Tuning

Performance Optimization

Combining with OPA Policies

Monitoring

Next Steps