What's the single most important prompt injection defense?

Least privilege. Most prompt injection damage comes from an assistant being able to take a powerful action — send money, delete data, run code, email externally. If the model can't perform the dangerous action without human approval, a successful injection becomes noise instead of a breach.

Should we just block prompt injection at the input?

Input filtering helps but is not sufficient on its own. Attackers continually find new phrasings, encodings, and indirect channels, so a filter is a speed bump, not a wall. Treat detection as one layer inside a defense-in-depth design — never the only one.

Does keeping a human in the loop defeat the point of automation?

No — you reserve human approval for consequential, irreversible actions: payments, deletions, external communication, code execution. Routine, low-risk steps stay automated. You're not approving every token; you're gating the handful of actions where a mistake is expensive.

How does this relate to the EU AI Act?

The AI Act's emphasis on human oversight for higher-risk uses aligns directly with injection defense: a person must be able to understand, intervene in, and stop the system. Designing meaningful human-in-the-loop controls satisfies both your security needs and your compliance obligations.

A Practical Prompt Injection Defense Playbook for Companies

The short answer

You cannot eliminate prompt injection with a model setting or a filter — there is no reliable model-level fix as of 2026. What you can do is make a successful injection harmless. The whole strategy is to assume the model will be tricked, and design so that being tricked doesn’t let anything bad happen. That means layering controls so no single failure leads to a breach.

This playbook is the practical companion to our explainer on what prompt injection is and the incidents it caused. Work through the layers below in order — they are roughly ranked by impact.

Layer 1: Least privilege (the highest-impact control)

Almost every documented prompt injection breach — EchoLeak’s data exfiltration, the Copilot and Cursor code-execution CVEs — turned damaging only because the assistant could take a powerful action. Remove the capability and you remove the harm.

Give each assistant the minimum permissions for its job. A support bot that answers questions does not need write access to your CRM. A summarizer does not need to send email.
Scope every credential narrowly. API keys and tokens should grant the smallest role that works, never blanket access. A token with full account access is a token that can do full account damage.
Prefer read-only by default. Grant write, delete, or send permissions only where genuinely required, and treat each as a risk decision.

If your assistant cannot perform a destructive action without a human, a prompt injection that tells it to becomes a failed attempt in your logs — not an incident.

Layer 2: Trust boundaries — treat all retrieved content as untrusted

Indirect prompt injection rides inside content the model reads: emails, web pages, documents, uploads, tool results. The fix mirrors decades of web security practice: never trust input from outside your boundary.

Classify your data sources. Mark which content is trusted (your own vetted system instructions) and which is untrusted (anything from users, the web, third parties, suppliers).
Never let reading authorize acting. The model summarizing a document must not be able to, on the strength of that document’s text, trigger a payment or a deletion.
Be especially careful with agents that browse or ingest. The moment an assistant reads arbitrary external content and holds powerful tools, you have recreated the exact conditions of the headline CVEs. Separate those two functions.

Layer 3: Human-in-the-loop for consequential actions

Automation is the point of AI, but not every action deserves equal automation. Gate the small set of operations where a mistake is expensive or irreversible.

Require explicit human approval for moving money, deleting or exporting data, external communication, and code execution.
Make the approval meaningful. Show the human what will happen in plain terms, so they can actually catch an anomaly — not a rubber-stamp dialog they click through.
This is also AI Act alignment. Meaningful human oversight is both a security control and a compliance posture for higher-risk uses.

Layer 4: Input and output handling

Filtering is a layer, not a wall — but layers add up.

Screen inputs for known injection patterns, while accepting that determined attackers evolve past filters. Treat it as a speed bump.
Constrain and validate outputs. If the model’s job is to return a category, a number, or a structured object, enforce that shape — don’t pass free-form output straight into a sensitive system.
Isolate generated code and commands. Never send model-generated code to an execution path without sandboxing and review. That single gap is what made the Vanna AI RCE possible.

Layer 5: Monitoring and detection

Assume some attempts will land, and make sure you see them.

Log prompts, tool calls, and outputs so a hijack is reconstructable after the fact.
Alert on anomalies — unexpected tool calls, outbound data, or actions that don’t match the user’s request. Our piece on spotting a compromised assistant details the signals.
Review logs regularly. Detection only works if someone looks.

Action plan

Run a capability audit. List every action and integration each assistant has. This is your attack surface.
Apply least privilege. Revoke any capability not strictly needed; scope every credential to the narrowest role.
Draw your trust boundaries. Label trusted vs. untrusted content sources, and ensure reading untrusted content can never authorize an action.
Gate the dangerous few. Put human approval in front of payments, deletions, external sends, and code execution.
Sandbox generated code and validate structured outputs before they touch a real system.
Turn on logging and alerting, then actually review it on a schedule.

A model that can only suggest is low-risk no matter how badly it’s injected. A model that can act needs every layer above. Start with least privilege — it’s the control that turns the most attacks into nothing.

Want a checklist version to bring to your team? Grab our free Deepfake & AI-attack red-flags checklist and adapt these layers into a one-page review for your own deployments.