This paper shares insights from Microsoft's experience red teaming over 100 generative AI products. It introduces their internal threat model ontology and outlines eight key lessons learned, offering practical recommendations for aligning red teaming efforts with real-world risks. The authors also discuss common misunderstandings and open questions in the field. ✨
Article Points:
1
Understand system capabilities & application context.
2
Simple techniques often suffice to break AI systems.
3
Human judgment is crucial in AI red teaming.
4
Responsible AI harms are pervasive but hard to measure.
5
LLMs amplify existing & introduce new security risks.
6
Securing AI systems is an ongoing, never-complete process.
AI Threat Model Ontology
System
Actor
TTPs
Weakness
Impact
Foundational Principles
Understand system capabilities & application context
Simple techniques often suffice to break AI systems
AI red teaming differs from safety benchmarking
Operational Aspects
Automation expands risk landscape coverage
Human judgment is crucial in AI red teaming
Risk Nature
Responsible AI harms are pervasive but hard to measure
LLMs amplify existing & introduce new security risks
Long-term View
Securing AI systems is an ongoing process
Open Questions