Threat Database

AI Agent Security Threats

AI agents face a new class of security threats. Prompt injection, data exfiltration, privilege escalation — each exploits the unique way agents process language and interact with tools. This database covers the most common threats, how they work, and how to detect them.

What Are AI Agent Security Threats?

Traditional web applications face SQL injection and XSS. AI agents face an entirely different class of attack. When an agent processes natural language, there is no clean boundary between instructions and data — every input is potentially an instruction. Combine that with tool access (databases, APIs, file systems) and you get a system that can be manipulated into taking real-world actions that bypass every authorization check.

The threats below are organized by category: injection attacks that hijack agent behavior, exfiltration techniques that steal data through the agent, escalation attacks that expand what the agent can do, and evasion techniques that hide malicious activity from monitoring. Each threat page includes real-world examples, actual attack payloads, and the specific detection layers that catch them.

These aren't theoretical risks. In a scan of 1,000 production agent sessions, we found prompt injection attempts in 14% of sessions and data exfiltration attempts in 9%. Most went undetected because the agents had no runtime security.

criticalInjection

Prompt Injection

Prompt injection is the most common attack against AI agents. An attacker crafts input that overrides the agent's system instructions, causing it to ignore safety guidelines, leak confidential data, or perform unauthorized actions. Unlike traditional injection attacks (SQL, XSS), prompt injection exploits the fundamental way LLMs process natural language — there's no clean boundary between instructions and data.

Found in 14% of AI agent sessionsLearn more
criticalData Exfiltration

Data Exfiltration

Data exfiltration occurs when an AI agent is manipulated into sending sensitive data to an attacker-controlled destination. Agents with tool access — file systems, APIs, databases — can be tricked into reading sensitive data and encoding it in outbound requests, tool parameters, or even seemingly innocent responses.

Attempted in 9% of monitored sessionsLearn more
highInjection

System Prompt Extraction

System prompt extraction is a targeted form of prompt injection where the attacker's goal is to reveal the agent's hidden instructions. System prompts often contain business logic, guardrail configurations, API endpoint details, and persona instructions that give attackers a roadmap for further attacks.

Attempted in 11% of monitored sessionsLearn more
criticalData Exfiltration

Secret Exposure

Secret exposure happens when API keys, passwords, tokens, or private keys appear in agent inputs or outputs. This can occur accidentally — a user pastes code containing credentials — or through deliberate extraction attacks. Once exposed in an LLM conversation, secrets may be logged, cached, or sent to third-party services.

Detected in 6% of agent outputsLearn more
highPrivilege Escalation

Privilege Escalation

Privilege escalation occurs when an AI agent performs actions beyond its intended scope — accessing restricted tools, modifying data it should only read, or executing admin-level operations. This usually results from overly permissive tool configurations, missing authorization checks, or successful prompt injection that overrides the agent's behavioral constraints.

Attempted in 5% of monitored sessionsLearn more
criticalInjection

Command Injection

Command injection against AI agents occurs when an attacker manipulates the agent into executing arbitrary shell commands, code, or database queries. Unlike traditional command injection (which exploits string concatenation), agent-based command injection exploits the agent's tool-calling ability — convincing it to use code execution, shell, or database tools with malicious parameters.

Found in 3% of tool-enabled agent sessionsLearn more
highData Exfiltration

PII Exposure

PII exposure occurs when personally identifiable information — Social Security numbers, credit card numbers, phone numbers, home addresses — appears in AI agent conversations. This creates compliance risks (GDPR, CCPA, HIPAA), liability exposure, and potential for identity theft if the data is logged, cached, or exfiltrated.

Detected in 8% of scanned conversationsLearn more

Frequently Asked Questions

What is the most common attack on AI agents?

Prompt injection is the most prevalent threat, found in approximately 14% of production agent sessions. It works by embedding instructions in user input or retrieved documents that override the agent's system prompt, causing it to ignore safety guidelines or perform unauthorized actions.

How do you detect prompt injection in real-time?

Rune uses a three-layer detection pipeline: L1 pattern scanning catches known injection phrases in under 5ms, L2 semantic scanning uses vector similarity to catch rephrased attacks, and L3 LLM-based judgment evaluates whether input attempts to manipulate agent behavior — catching novel zero-day techniques.

Are AI agent threats different from LLM security threats?

Yes. LLM security focuses on the model itself — jailbreaks, harmful outputs, bias. Agent security focuses on what happens when an LLM has tools and autonomy. An agent with database access, API keys, and file system permissions can be manipulated into taking real-world actions that a standalone LLM cannot.

Can these threats be prevented with prompt engineering alone?

No. Prompt engineering helps but is fundamentally insufficient because LLMs cannot reliably distinguish between legitimate instructions and injected ones. Runtime security scanning — analyzing inputs before they reach the LLM and outputs before they reach the user — is required to catch attacks that bypass prompt-level defenses.

How much latency does runtime threat detection add?

Rune's L1 pattern scanning adds under 5ms. L2 semantic scanning adds 10-20ms. L3 LLM-based judgment adds 100-500ms but is optional and can run asynchronously. Most production deployments use L1+L2 for blocking and L3 for alerting.

Detect these threats in your agents

Rune scans every agent input and output for these threats in real-time. Add runtime security in under 5 minutes.

Start Free — 10K Events/Month
AI Agent Security Threats | Rune