AI agents face a new class of security threats. Prompt injection, data exfiltration, privilege escalation — each exploits the unique way agents process language and interact with tools. This database covers the most common threats, how they work, and how to detect them.
Traditional web applications face SQL injection and XSS. AI agents face an entirely different class of attack. When an agent processes natural language, there is no clean boundary between instructions and data — every input is potentially an instruction. Combine that with tool access (databases, APIs, file systems) and you get a system that can be manipulated into taking real-world actions that bypass every authorization check.
The threats below are organized by category: injection attacks that hijack agent behavior, exfiltration techniques that steal data through the agent, escalation attacks that expand what the agent can do, and evasion techniques that hide malicious activity from monitoring. Each threat page includes real-world examples, actual attack payloads, and the specific detection layers that catch them.
These aren't theoretical risks. In a scan of 1,000 production agent sessions, we found prompt injection attempts in 14% of sessions and data exfiltration attempts in 9%. Most went undetected because the agents had no runtime security.
Framework-specific guides to prevent each threat type
Security guidance tailored to your specific deployment
Prompt injection is the most common attack against AI agents. An attacker crafts input that overrides the agent's system instructions, causing it to ignore safety guidelines, leak confidential data, or perform unauthorized actions. Unlike traditional injection attacks (SQL, XSS), prompt injection exploits the fundamental way LLMs process natural language — there's no clean boundary between instructions and data.
Data exfiltration occurs when an AI agent is manipulated into sending sensitive data to an attacker-controlled destination. Agents with tool access — file systems, APIs, databases — can be tricked into reading sensitive data and encoding it in outbound requests, tool parameters, or even seemingly innocent responses.
System prompt extraction is a targeted form of prompt injection where the attacker's goal is to reveal the agent's hidden instructions. System prompts often contain business logic, guardrail configurations, API endpoint details, and persona instructions that give attackers a roadmap for further attacks.
Secret exposure happens when API keys, passwords, tokens, or private keys appear in agent inputs or outputs. This can occur accidentally — a user pastes code containing credentials — or through deliberate extraction attacks. Once exposed in an LLM conversation, secrets may be logged, cached, or sent to third-party services.
Privilege escalation occurs when an AI agent performs actions beyond its intended scope — accessing restricted tools, modifying data it should only read, or executing admin-level operations. This usually results from overly permissive tool configurations, missing authorization checks, or successful prompt injection that overrides the agent's behavioral constraints.
Command injection against AI agents occurs when an attacker manipulates the agent into executing arbitrary shell commands, code, or database queries. Unlike traditional command injection (which exploits string concatenation), agent-based command injection exploits the agent's tool-calling ability — convincing it to use code execution, shell, or database tools with malicious parameters.
PII exposure occurs when personally identifiable information — Social Security numbers, credit card numbers, phone numbers, home addresses — appears in AI agent conversations. This creates compliance risks (GDPR, CCPA, HIPAA), liability exposure, and potential for identity theft if the data is logged, cached, or exfiltrated.
Prompt injection is the most prevalent threat, found in approximately 14% of production agent sessions. It works by embedding instructions in user input or retrieved documents that override the agent's system prompt, causing it to ignore safety guidelines or perform unauthorized actions.
Rune uses a three-layer detection pipeline: L1 pattern scanning catches known injection phrases in under 5ms, L2 semantic scanning uses vector similarity to catch rephrased attacks, and L3 LLM-based judgment evaluates whether input attempts to manipulate agent behavior — catching novel zero-day techniques.
Yes. LLM security focuses on the model itself — jailbreaks, harmful outputs, bias. Agent security focuses on what happens when an LLM has tools and autonomy. An agent with database access, API keys, and file system permissions can be manipulated into taking real-world actions that a standalone LLM cannot.
No. Prompt engineering helps but is fundamentally insufficient because LLMs cannot reliably distinguish between legitimate instructions and injected ones. Runtime security scanning — analyzing inputs before they reach the LLM and outputs before they reach the user — is required to catch attacks that bypass prompt-level defenses.
Rune's L1 pattern scanning adds under 5ms. L2 semantic scanning adds 10-20ms. L3 LLM-based judgment adds 100-500ms but is optional and can run asynchronously. Most production deployments use L1+L2 for blocking and L3 for alerting.
Rune scans every agent input and output for these threats in real-time. Add runtime security in under 5 minutes.
Start Free — 10K Events/Month