Design Patterns for Securing LLM Agents against Prompt Injections
This work proposes a set of principled design patterns for building AI agents with provable resistance to prompt injection attacks. It systematically analyzes these patterns, discussing their trade-offs in terms of utility and security. The paper illustrates their real-world applicability through a series of case studies, aiming to guide LLM agent designers towards building secure systems. ✨
Article Points:
1
LLM agents face critical prompt injection threats, especially with tool access.
2
General-purpose LLM agents are unlikely to provide reliable safety guarantees currently.
3
The paper proposes six design patterns for application-specific LLM agents.
4
Patterns constrain agents to prevent consequential actions from untrusted input.
5
These designs offer a valuable trade-off between agent utility and security.
6
Combining multiple design patterns is recommended for robust security.
Design Patterns for Securing LLM Agents against Prompt Injections
Action-Selector Pattern

LLM selects from predefined actions; no feedback loop.

Plan-Then-Execute Pattern

LLM defines fixed plan before processing untrusted data.

LLM Map-Reduce Pattern

Dispatches isolated sub-agents for untrusted data processing.

Dual LLM Pattern

Privileged LLM uses tools; quarantined LLM processes untrusted data.

Code-Then-Execute Pattern

LLM writes formal program to solve tasks.

Context-Minimization Pattern

Removes unnecessary content from context to prevent injection.