ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

Previous Card

AGENT KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

security vibe coding agent knowledge graph evaluation

ASTRA is an automated red-teaming system designed to systematically uncover safety flaws in AI coding assistants and security guidance systems. It operates in three stages: building structured domain-specific knowledge graphs, performing online spatial and temporal vulnerability exploration, and generating high-quality violation-inducing cases for model alignment. ASTRA significantly outperforms existing techniques, finding 11–66% more issues and leading to 17% more effective alignment training. ✨

Article Points:

ASTRA: Automated red-teaming for AI coding assistants, focusing on realistic vulnerabilities.

Three-stage process: offline domain modeling, online exploration (spatial/temporal), model alignment.

Spatial exploration uses knowledge graphs to adaptively probe input space for boundary cases.

Temporal exploration analyzes AI's chain-of-thought to exploit reasoning weaknesses.

Outperforms existing methods, finding 11-66% more issues and improving alignment by 17%.

Addresses limitations of current blue-teaming (CB, DA) by identifying policy holes and reasoning flaws.

Source:

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

security vibe coding agent knowledge graph evaluation

Purpose

Uncover AI safety flaws

Focus on realistic vulnerabilities

Target code generation & security guidance

Methodology

Cognitive Framework for analysis

Knowledge Graphs for domain modeling

Gibbs Sampling for guided exploration

LLM Interrogation for enumeration

Key Stages

Offline Domain Modeling

- Knowledge Graph construction

- Abstraction hierarchies

Online Vulnerability Exploration

- Spatial exploration (input space)

- Temporal exploration (reasoning)

Model Alignment

- Fine-tuning with violation cases

- Balance safety & utility

Evaluation

Outperforms existing RT techniques

Spatial exploration is efficient

Temporal exploration effective on CoT models

Reasoning-based online judge

Impact

Finds 11-66% more issues

17% more effective alignment training

Improves AI system safety

Blue-Teaming Insights

Circuit Breakerharms utility

Deliberative Alignmenthas policy gaps

Source:

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

Next Card

AGENT KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Uncover AI safety flaws

Focus on realistic vulnerabilities

Target code generation & security guidance

Cognitive Framework for analysis

Knowledge Graphs for domain modeling

Gibbs Sampling for guided exploration

LLM Interrogation for enumeration

Offline Domain Modeling

Online Vulnerability Exploration

Model Alignment

Outperforms existing RT techniques

Spatial exploration is efficient

Temporal exploration effective on CoT models

Reasoning-based online judge

Finds 11-66% more issues

17% more effective alignment training

Improves AI system safety

Circuit Breakerharms utility

Deliberative Alignmenthas policy gaps

PERSONA VECTORS: MONITORING AND CONTROLLING CHARACTER TRAITS IN LANGUAGE MODELS

Related Cards

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

RAPTOR: RECURSIVE ABSTRACTIVE PROCESSING FOR TREE-ORGANIZED RETRIEVAL

Where AI is failing design systems, and where we are failing AI