What is prompt injection in simple terms?

Prompt injection is when attacker-controlled text reaches a language model and the model follows it as if it were a legitimate instruction. Because the model reads instructions and data through the same channel, a sentence buried in an email, document, or web page can override what you told the AI to do.

Is prompt injection the same as jailbreaking?

No. Jailbreaking is when a user tries to make a model bypass its own safety rules. Prompt injection is when a third party hides instructions in content the model processes, hijacking an application on behalf of someone else. Jailbreaking targets the model's policy; prompt injection targets the application built on top of it.

Can prompt injection be fully prevented?

Not with a single fix. As of 2026 there is no model-level technique that reliably separates trusted instructions from untrusted data. Defense is architectural: limit what the AI can do, treat all retrieved content as untrusted, keep a human in the loop for consequential actions, and monitor outputs. OWASP's LLM01:2025 guidance treats it as a risk to be contained, not eliminated.

Why don't firewalls or antivirus stop prompt injection?

Traditional controls inspect code and network traffic. Prompt injection rides inside ordinary natural-language content — an email body, a PDF, a calendar invite — that is supposed to reach the model. There is no malformed packet or binary signature to block. The attack is in the meaning of the text, not its format.

Prompt Injection: The #1 LLM Security Risk, Explained Through Real Incidents

The short answer

Prompt injection is when text an attacker controls reaches an AI model and the model obeys it as if it were a trusted command. It is ranked the number-one risk for large language model applications by the OWASP GenAI project (entry LLM01:2025), and for one structural reason: a language model reads its instructions and its input data through the same channel. It has no built-in way to tell “this is the task my owner gave me” apart from “this is a sentence I found in a document I was asked to summarize.”

That single fact is why prompt injection has produced real, documented compromises of Microsoft, GitHub, and OpenAI products — not lab curiosities, but assigned CVEs and patched vulnerabilities. This article walks through what the attack is, why it works, and the incidents that prove it matters. The companion defense playbook covers the controls in detail.

What prompt injection actually is

There are two flavours, and the distinction matters for defense.

Direct prompt injection is when the person typing to the AI tries to override its instructions — “ignore your previous rules and do X instead.” This overlaps with jailbreaking and is the version most people picture.

Indirect prompt injection is the dangerous one for businesses. Here the malicious instructions are not typed by the user at all — they are hidden inside content the AI is asked to process: an email it summarizes, a web page it browses, a document it ingests, a code comment it reads, a calendar invite it parses. The user does nothing wrong. They simply ask their assistant to “summarize my inbox,” and an attacker who sent them an email three weeks ago gets their instructions executed.

Indirect prompt injection turns every piece of untrusted content your AI touches into a potential command line. The attacker doesn’t need access to your systems — they need their text to reach your model.

Under the hood: why the model can’t just ignore it

A language model processes a single stream of tokens. System instructions, your prompt, the retrieved document, and the conversation history are concatenated and fed in together. The model was trained to be helpful and to follow instructions found in its context — that is the entire product. Asking it to follow your instructions but ignore instructions that appear inside data is asking it to make a trust distinction the architecture never encoded.

Mitigations like delimiters (“treat everything between these markers as data, not commands”) and instruction hierarchies help at the margin, but they are heuristics, not guarantees. As of 2026 there is no robust, model-level fix. This is the consensus position in the OWASP guidance and across security research — which is exactly why defense has to happen at the application layer, not be outsourced to the model.

The incidents that prove it

These are public, documented cases — not hypotheticals.

EchoLeak — zero-click data theft from Microsoft 365 Copilot (2025, CVE-2025-32711). Security researchers at Aim Labs demonstrated that a single crafted email, never opened or clicked by the victim, could cause Microsoft 365 Copilot to leak internal data. When the user later asked Copilot a normal question, the assistant pulled in the malicious email as context, followed its hidden instructions, and exfiltrated sensitive content. Microsoft assigned it a CVE and patched it. The significance: zero user interaction — the victim never had to fall for anything.

Remote code execution in developer tools. Prompt injection escaped the chat box and reached the operating system. CVE-2025-53773 affected GitHub Copilot in Visual Studio Code: injected instructions could manipulate the agent into writing configuration that led to code execution on the developer’s machine. The Cursor AI editor saw two related issues — CVE-2025-54135 and CVE-2025-54136 — where malicious content delivered through connected tools (including MCP integrations) could trigger code execution. When an AI assistant can edit files and run commands, a prompt injection becomes an RCE.

The Vanna AI library (CVE-2024-5565). Vanna let users ask questions in plain English and turned them into SQL plus visualizations. Researchers showed that a prompt injection in the question could break out of the intended flow and achieve remote code execution on the host, because generated code was passed to an execution path. A natural-language question became a system compromise.

LLM email assistants (CVE-2024-5184). An assistant that read and acted on emails could be steered by instructions embedded in the email body — manipulating its behaviour and leaking the contents of other messages.

Persistent memory poisoning in ChatGPT (2024). Security researcher Johann Rehberger demonstrated that an injection could write false “memories” into ChatGPT’s long-term memory feature. Once planted, the malicious instruction persisted across sessions, quietly exfiltrating data from future conversations — an attack he dubbed “SpAIware.” OpenAI mitigated the exfiltration path after disclosure. The lesson: when an AI has persistent state, an injection isn’t a one-shot event; it can become resident.

Who is exposed

If your organization uses any of the following, indirect prompt injection is part of your threat model:

AI assistants over your email, documents, or chat (Copilot-style tools, retrieval-augmented chatbots).
AI coding agents that can edit files, run commands, or call tools.
Customer-facing chatbots connected to internal data or actions (refunds, account changes).
Any pipeline where the model reads content from outside your trust boundary — the open web, user uploads, third-party emails, supplier documents.

The common thread is capability plus untrusted input. A model that can only chat is low-risk. A model that can act — send mail, move money, run code, change records — and also reads untrusted content is where injection turns into damage.

How you actually defend against it

The full method is in the defense playbook, but the principles are short:

Treat all retrieved content as untrusted input, the same way you treat user input in web security. Never let the model’s reading of a document silently authorize an action.
Least privilege for tools. An assistant should hold the minimum permissions for its job. If it doesn’t need to send email or delete records, it shouldn’t be able to.
Human-in-the-loop for consequential actions. Money movement, data deletion, external communication, and code execution should require explicit human approval — not a model’s say-so.
Separate the dangerous combination. Don’t connect powerful, destructive tools to an agent that is simultaneously ingesting untrusted content. That pairing is where EchoLeak and the Cursor CVEs lived.
Monitor and log what the AI does, so a hijack shows up as an anomaly. Our companion piece on spotting a compromised assistant covers the signals.

Action plan

Inventory your AI’s capabilities. List every tool, integration, and action each assistant can take. Anything destructive or outbound is a priority.
Map untrusted inputs. Identify every place the model reads content you don’t control — inboxes, the web, uploads, supplier files.
Cut the overlap. Wherever a powerful capability meets untrusted input, insert a human approval step or remove the capability.
Adopt least privilege. Scope every API key, token, and tool to the narrowest role that still works.
Turn on logging. Record prompts, tool calls, and outputs so injection attempts are visible after the fact.
Read OWASP LLM01:2025. Align your controls with the current community standard and re-check after each new integration.

Prompt injection is not a bug that will be patched away in the next model release. It is a structural property of how language models work, and it will be with us for as long as models follow instructions written in plain language. The organizations that stay safe are the ones that assume the model will be tricked and design so that it doesn’t matter.