This work proposes a set of principled design patterns for building AI agents with provable resistance to prompt injection attacks. It systematically analyzes these patterns, discussing their trade-offs in terms of utility and security. The paper illustrates their real-world applicability through a series of case studies, aiming to guide LLM agent designers towards building secure systems. ✨
Article Points:
1
LLM agents face critical prompt injection threats, especially with tool access.
2
General-purpose LLM agents are unlikely to provide reliable safety guarantees currently.
3
The paper proposes six design patterns for application-specific LLM agents.
4
Patterns constrain agents to prevent consequential actions from untrusted input.
5
These designs offer a valuable trade-off between agent utility and security.
6
Combining multiple design patterns is recommended for robust security.
Action-Selector Pattern
LLM selects from predefined actions; no feedback loop.
Plan-Then-Execute Pattern
LLM defines fixed plan before processing untrusted data.
LLM Map-Reduce Pattern
Dispatches isolated sub-agents for untrusted data processing.
Dual LLM Pattern
Privileged LLM uses tools; quarantined LLM processes untrusted data.
Code-Then-Execute Pattern
LLM writes formal program to solve tasks.
Context-Minimization Pattern